APC White Paper 225: Optimize Data Center Cooling with Effective Control Systems
Cooling systems specified without considering their control methods leads to issues such as demand fighting, human error, shutdown, high operation cost, and other costly outcomes. Understanding the different levels of cooling control provides a framework for rational discussions and specifications for data center cooling systems. This paper describes four cooling control levels, when they should be used, the benefits and limitations of each level, and provides examples of each.
Growing energy cost and environmental responsibility have placed the data center industry under increasing pressure to improve its energy efficiency. Of all data center energy consumption, the cooling system typically consumes the second largest portion (the first being IT equipment). For example, assume a 1MW data center with a PUE of 1.91 at 50% IT load (see sidebar for more assumptions), the cooling system consumes about 36% of the energy used by the entire data center (including IT equipment) and about 75% of the energy used by the physical infrastructure (without IT equipment) to support the IT applications.
The calculation is based on the following data center:
- IT load: 1MW, 50% loaded
- Power density: 5kW/rack
- Air-cooled packaged chiller used
- Chiller capacity: 600kW
- No economizer in use
- Room-based cooling without group level control
- No air containment deployed
- High eff. chilled water pumps
- High efficiency UPS
- High efficiency lighting
- Power supply: 2N
Given its large energy footprint, optimizing the cooling system provides a significant opportunity to reduce energy costs. There are three high-level tasks used to establish an efficient cooling system for a new data center design, which are discussed in next section.
This paper focuses only on one of these three tasks - adopt effective cooling control systems. We investigate the challenges of data center cooling, why traditional cooling controls do not work, and what is an effective cooling control system. Finally, we describe four cooling control levels, when they should be used, the benefits and limitations of each level, and provide examples of each.
Tasks to establish an efficient cooling system
In general, you can use the following three high-level tasks to establish an efficient cooling system for a new data center design:
- Select an appropriate cooling architecture
- Adopt effective cooling control systems
- Manage airflow in IT space
Select an appropriate cooling architecture
First, select an appropriate cooling architecture (i.e. heat rejection method , economizer mode and indoor air distribution method) based on your key data center attributes like location, design capacity, average power density, and data center preferences and constraints. A few examples of preferences and constraints include if chilled water or outside air is allowed in the IT space; or if a raised floor is used for cold air supply or a drop ceiling for hot air return. Schneider Electric has developed a free tool, TradeOff Tool, Data Center Cooling Architecture Selector that proposes optimal cooling architectures based on various inputs discussed above. It’s important to note that an economizer mode can help data center cooling systems reduce a large amount of energy consumed by the mechanical cooling (with compressor) under favorable outdoor air conditions, especially for locations with a cool climate. White Paper 132, Economizer Modes of Data Center Cooling Systems discusses all economizer modes based on different cooling architectures, and compares economizer modes best suited for data centers.
Adopt effective cooling control systems
Selecting an appropriate cooling architecture is not enough to establish an efficient cooling system without effective cooling controls. For example, in many of our assessments, we have found data centers where the cooling system seldom operated under economizer mode. In all cases the reason was that, the system became unstable during periods of partial economizer mode due to cooling control issues. Therefore, the operators would manually operate under economizer mode only late into the winter season, which wasted a significant amount of economizer hour opportunities.
Manage airflow in IT space
The last task is to manage the airflow in the IT space and control the IT environment based on the latest ASHRAE thermal guidelines . A best practice for airflow management is to separate the hot and cold air streams by containing the aisle and/or therack. Rack or room level airflow management not only achieve energy savings but also enhance datacenter availability by fixing hotspots. WhitePaper135, Impact of Hot and Cold Aisle Containment on Data Center Temperature and efficiency discusses how much energy can be saved by deploying hot and cold air containment in a new data center design. White Paper 153, Implementing Hot and Cold Air Containment in Existing Data Centers discusses how to select an appropriate air containment solution in an existing data center.
Why an effective control system is important
Data center cooling is full of challenges due to data center load dynamics and cooling system dynamics. The limitations or drawbacks of traditional control approaches make the situation worse. Selecting a cooling system with an effective control system is a best practice to solve these challenges. This section explains why an effective control system is important for data center cooling optimization in the following sequence:
- Variables influencing cooling performance
- Limitations of traditional control approaches
- Characteristics of an effective control system
- Classification of control systems
Variables influencing cooling performance
- Cooling system capacity is always oversized due to availability requirements (i.e., cooling capacity is larger than the actual IT load). To make matters worse, data centers typically operate under 50% load.
- Data centers are dynamic environments where the equipment population and layout change over time, and the heat load also changes constantly in response to computing traffic. Non-uniform rack layouts and rack densities in IT the space also lead to non-uniform cooling capacity requirements.
- The cooling system efficiency varies with data center load, outdoor air temperatures, cooling settings, IT room dew point, and control approaches.
- A cooling system is normally comprised of cooling devices from different vendors. Compatibility and coordination between these devices is a big challenge.
- Traditional control approaches limit how well the cooling system adapts to changes in the data center environment, we discuss this in the next section.
- Manual adjustments: Cooling devices like CRAHs/CRACs are adjusted manually by data center operators who change the setpoints, or turn the devices on and off based on their knowledge or intuition. But, sometimes the correct re- sponse is counterintuitive. For example, data center operators normally turn on more cooling units (usually redundant units that were turned off) when they encounter hot spots, however, this action may not eliminate the hot spots and may actually make the case worse. In the case of fixed-speed CRAH fans, this action will lead to increased energy use. In fact, the correct response is to separate the hot and cold air streams and run fewer fixed-speed cooling units at higher load. In the case of variable speed CRAH fans, turning on more units actually reduces the energy up to a certain point. WhitePaper 199, How to Fix Hot Spots in the Data Center describes the root cause of hot spots, recommends methods to identify them, reviews the typical actions taken, and provides the best practices to eliminate the hot spots.
- Cooling devices work independently: The adjacent cooling devices in the IT space work independently just based on their own return air temperature and humidity readings, which leads to demand fighting among these devices and wastes a lot of energy. Another example is for chilled water cooling systems, where indoor and outdoor cooling devices like the CRAHs and chillers work independently based on their own settings and load conditions. For example, chillers don’t typically change their chilled water setpoints to save energy even under very light heat loads.
- Control based on relative humidity (also known as RH), not dew point temperature: Most CRAHs/CRACs measure the relative humidity level of data center air as it returns into the units from the IT space, and use the data to control the operation of humidifiers if they are installed within the units. RH control more easily leads to demand fighting (dehumidifying/humidifying) among the adjacent units if there is no group level control configured . Note that although this limitation can be addressed by group level control, a more effective and lower cost solution is to use a centralized air handling unit (AHU) with humidifier controlled by dew point. This eliminates the individual humidifiers in each cooling unit.
- Only monitoring, no control: A large portion of traditional cooling control approaches solely focus on monitoring the operating status of the cooling sys- tem, and do not perform control functions like adjusting the speed of com- pressors, fans, or pumps to optimize the performance. Another factor is that some cooling devices do not have variable frequency drives (VFDs) to change the speed.
- No visibility to the performance of the entire cooling system: Each cooling device in a traditional system is designed to optimize its own performance, regardless of the impact on the total cooling system energy consumption. For example, increasing chilled water set points can reduce the energy consumption of the chillers, but the indoor CRAHs will consume more energy due to smaller delta T of cooling coils, which may offset the chiller energy savings. As the chilled water set points increases continuously to a higher level, it is hard to say if the energy consumption of the entire cooling system is reduced 8 or not .
- Unreliable sensors or meters: Sensors or meters that are not calibrated or are of poor quality, make it very difficult to optimize the operation of the cooling system.
Characteristics of effective control systems
- Automatic control: The cooling system should shift between different operation modes like mechanical mode, partial economizer mode, and full economizer mode automatically based on outdoor air temperatures and IT load to optimize energy savings. It should do this without leading to issues like varia- tions in IT supply air temperatures, component stress, and downtime between these modes. Another example of automatic control is when the cooling output matches the cooling requirement dynamically, by balancing the airflow be- tween the server fan demands and the cooling devices (i.e. CRAHs or CRACs) to save fan energy under light IT load without human intervention.
- Centralized control based on IT inlet: Indoor cooling devices (i.e. CRAHs or CRACs) should work in coordination with each other to prevent demand fighting. All indoor cooling devices should be controlled based on IT inlet air temperature and humidity to ensure the IT inlet parameters are maintained within targets according to the latest ASHRAE thermal guideline.
- Centralized humidity control with dew point temperature: IT space humidity should be centrally controlled by maintaining dew point temperature at the IT intakes, which is more cost effective than maintaining relative humidity at the return of cooling units .
- Flexible controls: A good control system allows flexibility to change certain settings based on customer requirements. For example, a configurable con- trol system allows changes to the number of cooling units in a group, or turning off evaporative cooling at a certain outdoor temperature.
- Simplifies maintenance: A cooling control system makes it easy to enter into maintenance mode during maintenance intervals. The control system may even alert maintenance personnel during abnormal operation, and indicate where the issue exists.
Classification of control systems
Device level control
- Less experience required of data center operators to operate because the control program is embedded and factory verified. Data center operators need only adjust the setpoints per their environmental requirements.
- No extra capital cost required as the controls are built into the products.
- Significant energy savings can be achieved with device level control when cooling devices employ VFD fans, VFD compressors, VFD pumps, etc. which can be adjusted according to IT load.
- Device level control is the foundation of group level control, which is discussed in the next main section.
Group level control
- The cooling devices in the same group are coordinated to avoid demand fighting, which can save energy, especially for CRACs with discrete humidifiers and heaters.
- Enhance data center cooling reliability in case there’s a failure of any cooling unit. Where redundant cooling units are in “standby”, the control system will “wake it up” upon failure of a cooling unit. Or if all units are on, including re- dundant units, the CRAH or CRAC fans will spin up to provide more cooling capacity if any single unit fails. Note that if units have VFD components, it is more efficient to run all units at once. Due to the cube law relationship between the energy consumption and shaft speed, all units running at lower speeds reduce the energy losses compared to turning off redundant units.
- Figure 2 shows an example of energy savings with variable speed fans, using 600mm wide row-based CRAHs. By matching IT airflow demand with an ef- fective group level cooling control, the power consumption decreases as fan speeds decrease in response to the cooling demand on the units. This can yield a tremendous energy savings over the life of a data center. Note that the savings will vary based on the design of the room, cooling unit redundancy, and percent IT load over the life of the data center.
- Some cooling devices (same type and same vendor) are designed to support group level control by connecting them together and changing settings, with no additional cost. However, for cooling devices of different types or from dif- ferent vendors, a customized configuration is required.
System level control
- For chilled water systems, it looks at the cooling system holistically and comprehends the dynamics to minimize total cooling energy consumption.
- Move between different modes of operation without human intervention. For example, transitioning between mechanical, partial economizer, and full econ- omizer mode based on outdoor air temperatures and data center IT load, to optimize energy savings. This is done without issues like variations in IT supply air temperatures, component stress, or downtime between modes.
- A prefabricated system level control is designed to address a wide range of operating conditions and fault scenarios that cannot be tested under site commissioning or validation processes. The availability and reliability of the control systems can be ensured.
Facility level control
- Monitor the status and performance of power and cooling systems: A BMS can monitor the entire power and cooling trains from the utility connection and head of the mechanical plant down to the IT servers.
- Analyze the dependencies and interconnected nature of infrastructure components: In abnormal conditions, a BMS can help operators quickly assess the root cause and take action to keep power and cooling systems on line.
- Communicate effectively with data center operators: A BMS can provide the right context and actionable information to the right people at the right time to ensure appropriate resources are brought to bear as infrastructure status changes or incidents occur through event logging, notification, and alarms.
- Share information with other infrastructure management systems: A BMS can share and receive status and alarm information with other infrastructure management systems like electrical power monitoring system (EPMS) and data center infrastructure management system (DCIM), and also have the ability to transmit data via Application Programming Interfaces (APIs) to host information as databases, web services, and reports, which can help IT managers to be aware of what is going on with the electrical and mechanical plants.
- Pulling together separate BMS elements to provide control of the entire cool- ing system can be very challenging and costly. Typically leveraging a BMS or facility level control for the entire data center is managed through a combination of device, group, and system level control.
- Can require specialized control and set-up based on a specific facility requiring significant investment in time for programing and communications between individual devices.
- A cyber-attack on the overall facility can potentially put the data center at risk. To minimize this risk, the data center is typically disconnected from the rest of the facility.
Selection among four levels
About the author