Please send RFIs and RFQs to stan@securepower.io

APC White Paper 225: Optimize Data Center Cooling with Effective Control Systems

Revision 0
by Paul Lin

Executive summary

Cooling systems specified without considering their control methods leads to issues such as demand fighting, human error, shutdown, high operation cost, and other costly outcomes. Understanding the different levels of cooling control provides a framework for rational discussions and specifications for data center cooling systems. This paper describes four cooling control levels, when they should be used, the benefits and limitations of each level, and provides examples of each.

Introduction

Growing energy cost and environmental responsibility have placed the data center industry under increasing pressure to improve its energy efficiency. Of all data center energy consumption, the cooling system typically consumes the second largest portion (the first being IT equipment). For example, assume a 1MW data center with a PUE of 1.91 at 50% IT load (see sidebar for more assumptions), the cooling system consumes about 36% of the energy used by the entire data center (including IT equipment) and about 75% of the energy used by the physical infrastructure (without IT equipment) to support the IT applications.

Assumptions:
The calculation is based on the following data center:
    • IT load: 1MW, 50% loaded
    • Power density: 5kW/rack
    • Air-cooled packaged chiller used
    • Chiller capacity: 600kW
    • No economizer in use
    • Room-based cooling without group level control
    • No air containment deployed
    • High eff. chilled water pumps
    • High efficiency UPS
    • High efficiency lighting
    • Power supply: 2N

Given its large energy footprint, optimizing the cooling system provides a significant opportunity to reduce energy costs. There are three high-level tasks used to establish an efficient cooling system for a new data center design, which are discussed in next section.

This paper focuses only on one of these three tasks - adopt effective cooling control systems. We investigate the challenges of data center cooling, why traditional cooling controls do not work, and what is an effective cooling control system. Finally, we describe four cooling control levels, when they should be used, the benefits and limitations of each level, and provide examples of each.

Tasks to establish an efficient cooling system

In general, you can use the following three high-level tasks to establish an efficient cooling system for a new data center design:

  • Select an appropriate cooling architecture
  • Adopt effective cooling control systems
  • Manage airflow in IT space

Select an appropriate cooling architecture

First, select an appropriate cooling architecture (i.e. heat rejection method , economizer mode and indoor air distribution method) based on your key data center attributes like location, design capacity, average power density, and data center preferences and constraints. A few examples of preferences and constraints include if chilled water or outside air is allowed in the IT space; or if a raised floor is used for cold air supply or a drop ceiling for hot air return. Schneider Electric has developed a free tool, TradeOff Tool, Data Center Cooling Architecture Selector that proposes optimal cooling architectures based on various inputs discussed above. It’s important to note that an economizer mode can help data center cooling systems reduce a large amount of energy consumed by the mechanical cooling (with compressor) under favorable outdoor air conditions, especially for locations with a cool climate. White Paper 132, Economizer Modes of Data Center Cooling Systems discusses all economizer modes based on different cooling architectures, and compares economizer modes best suited for data centers.

Adopt effective cooling control systems

Selecting an appropriate cooling architecture is not enough to establish an efficient cooling system without effective cooling controls. For example, in many of our assessments, we have found data centers where the cooling system seldom operated under economizer mode. In all cases the reason was that, the system became unstable during periods of partial economizer mode due to cooling control issues. Therefore, the operators would manually operate under economizer mode only late into the winter season, which wasted a significant amount of economizer hour opportunities.

Another example of an inefficient cooling system due to a control issue is demand fighting. This is where some cooling units are cooling while some are heating (or dehumidifying / humidifying). This happens due to the lack of a group control system. Selecting a cooling system that includes group level control or system level control can minimize energy consumption while solving the challenges of data center cooling which we will discuss later.

Manage airflow in IT space

The last task is to manage the airflow in the IT space and control the IT environment based on the latest ASHRAE thermal guidelines . A best practice for airflow management is to separate the hot and cold air streams by containing the aisle and/or therack. Rack or room level airflow management not only achieve energy savings but also enhance datacenter availability by fixing hotspots. WhitePaper135, Impact of Hot and Cold Aisle Containment on Data Center Temperature and efficiency discusses how much energy can be saved by deploying hot and cold air containment in a new data center design. White Paper 153, Implementing Hot and Cold Air Containment in Existing Data Centers discusses how to select an appropriate air containment solution in an existing data center.

Why an effective control system is important

Data center cooling is full of challenges due to data center load dynamics and cooling system dynamics. The limitations or drawbacks of traditional control approaches make the situation worse. Selecting a cooling system with an effective control system is a best practice to solve these challenges. This section explains why an effective control system is important for data center cooling optimization in the following sequence:

  • Variables influencing cooling performance
  • Limitations of traditional control approaches
  • Characteristics of an effective control system
  • Classification of control systems

Variables influencing cooling performance

Cooling system dynamics are complex. Take an air-cooled packaged chiller design for example: When the IT temperature set point is increased (and the chilled water temperature is increased), the chiller energy decreases for two reasons; the data center can operate in economizer mode(s) for a larger portion of the year, and the chiller efficiency increases. However, if the CRAH supply air temperature (i.e. IT supply air) is not increased proportionally to the chilled water temperature the cooling capacity of the CRAH decreases and the CRAH fans need to spin up to compensate for this decrease, which means greater CRAH energy consumption.
The dry cooler (which operates in economizer mode instead of the chiller) energy increases because the number of economizer hours increases. As a result, it’s difficult to say how much energy savings you can achieve, furthermore, the total energy savings also depends on data center location, server fan behavior, and percent IT load. White Paper 221, The Unexpected Impact of Raising Data Center Temperatures provides a cost analysis (capex & energy) of a data center with this cooling architecture. Variables like this make it more difficult to save energy without effective controls. We explain some of these other variables below:
  • Cooling system capacity is always oversized due to availability requirements (i.e., cooling capacity is larger than the actual IT load). To make matters worse, data centers typically operate under 50% load.
  • Data centers are dynamic environments where the equipment population and layout change over time, and the heat load also changes constantly in response to computing traffic. Non-uniform rack layouts and rack densities in IT the space also lead to non-uniform cooling capacity requirements.
  • The cooling system efficiency varies with data center load, outdoor air temperatures, cooling settings, IT room dew point, and control approaches.
  • A cooling system is normally comprised of cooling devices from different vendors. Compatibility and coordination between these devices is a big challenge.
  • Traditional control approaches limit how well the cooling system adapts to changes in the data center environment, we discuss this in the next section.
Control practices that lead to poor performance
Traditional data center cooling systems were normally designed to handle a constant heat load and just monitor operation parameters like temperature, humidity, and pressure. As a result, cooling devices are normally controlled in a standalone and decentralized mode based on their return air temperature and humidity, or chilled water set points. Other limitations include:
  • Manual adjustments: Cooling devices like CRAHs/CRACs are adjusted manually by data center operators who change the setpoints, or turn the devices on and off based on their knowledge or intuition. But, sometimes the correct re- sponse is counterintuitive. For example, data center operators normally turn on more cooling units (usually redundant units that were turned off) when they encounter hot spots, however, this action may not eliminate the hot spots and may actually make the case worse. In the case of fixed-speed CRAH fans, this action will lead to increased energy use. In fact, the correct response is to separate the hot and cold air streams and run fewer fixed-speed cooling units at higher load. In the case of variable speed CRAH fans, turning on more units actually reduces the energy up to a certain point. WhitePaper 199, How to Fix Hot Spots in the Data Center describes the root cause of hot spots, recommends methods to identify them, reviews the typical actions taken, and provides the best practices to eliminate the hot spots.
  • Cooling devices work independently: The adjacent cooling devices in the IT space work independently just based on their own return air temperature and humidity readings, which leads to demand fighting among these devices and wastes a lot of energy. Another example is for chilled water cooling systems, where indoor and outdoor cooling devices like the CRAHs and chillers work independently based on their own settings and load conditions. For example, chillers don’t typically change their chilled water setpoints to save energy even under very light heat loads.
  • Control based on relative humidity (also known as RH), not dew point temperature: Most CRAHs/CRACs measure the relative humidity level of data center air as it returns into the units from the IT space, and use the data to control the operation of humidifiers if they are installed within the units. RH control more easily leads to demand fighting (dehumidifying/humidifying) among the adjacent units if there is no group level control configured . Note that although this limitation can be addressed by group level control, a more effective and lower cost solution is to use a centralized air handling unit (AHU) with humidifier controlled by dew point. This eliminates the individual humidifiers in each cooling unit.
  • Only monitoring, no control: A large portion of traditional cooling control approaches solely focus on monitoring the operating status of the cooling sys- tem, and do not perform control functions like adjusting the speed of com- pressors, fans, or pumps to optimize the performance. Another factor is that some cooling devices do not have variable frequency drives (VFDs) to change the speed.
  • No visibility to the performance of the entire cooling system: Each cooling device in a traditional system is designed to optimize its own performance, regardless of the impact on the total cooling system energy consumption. For example, increasing chilled water set points can reduce the energy consumption of the chillers, but the indoor CRAHs will consume more energy due to smaller delta T of cooling coils, which may offset the chiller energy savings. As the chilled water set points increases continuously to a higher level, it is hard to say if the energy consumption of the entire cooling system is reduced 8 or not .
  • Unreliable sensors or meters: Sensors or meters that are not calibrated or are of poor quality, make it very difficult to optimize the operation of the cooling system.
Traditional control approaches are not effective at managing the complexities of data center cooling. This results in increased operating expense, human error, and lower data center availability in most cases. It drives a requirement for effective control systems discussed in the next section.

Characteristics of effective control systems

An effective control system looks at the cooling system holistically and comprehends the dynamics of the system to achieve the lowest possible energy consumption. It also helps data center operators solve the challenges discussed above, while providing other benefits like improving thermal management and maximizing cooling capacity. The following lists the main characteristics of effective control systems:
 
  • Automatic control: The cooling system should shift between different operation modes like mechanical mode, partial economizer mode, and full economizer mode automatically based on outdoor air temperatures and IT load to optimize energy savings. It should do this without leading to issues like varia- tions in IT supply air temperatures, component stress, and downtime between these modes. Another example of automatic control is when the cooling output matches the cooling requirement dynamically, by balancing the airflow be- tween the server fan demands and the cooling devices (i.e. CRAHs or CRACs) to save fan energy under light IT load without human intervention.
  • Centralized control based on IT inlet: Indoor cooling devices (i.e. CRAHs or CRACs) should work in coordination with each other to prevent demand fighting. All indoor cooling devices should be controlled based on IT inlet air temperature and humidity to ensure the IT inlet parameters are maintained within targets according to the latest ASHRAE thermal guideline.
  • Centralized humidity control with dew point temperature: IT space humidity should be centrally controlled by maintaining dew point temperature at the IT intakes, which is more cost effective than maintaining relative humidity at the return of cooling units .
  • Flexible controls: A good control system allows flexibility to change certain settings based on customer requirements. For example, a configurable con- trol system allows changes to the number of cooling units in a group, or turning off evaporative cooling at a certain outdoor temperature.
  • Simplifies maintenance: A cooling control system makes it easy to enter into maintenance mode during maintenance intervals. The control system may even alert maintenance personnel during abnormal operation, and indicate where the issue exists.

Classification of control systems

There are many kinds of cooling control systems on the market, but there is no official or industry standard description of the classification or hierarchy. Therefore, based on available cooling functions and architectures, we propose that control systems be categorized into the following hierarchy with four levels (from the simplest to the most complex):
Device level control, less advanced than, but is the foundation of
Group level control, less advanced than, but is the foundation of
System level control, is beneficial to, but not necessary for
Facility level control
 
It is possible to deploy cooling controls at any of these four levels. The following sections describe each control level (starting with the simplest “device level control”), when the level should be used, the benefits and limitations of each level, and provide examples of each.

Device level control

Each cooling device (this could be a CRAH, a CRAC, or a chiller) typically comes with its own built-in control which we call device level control. The main functions of device level control are to ensure predictable operation and reliability of the devices. Note that a device may have many different components inside (e.g. compressors, fans, pumps and valves) but because they are designed and manufactured as a single system, its control system is considered to be at the device level.

Benefits

Compared with other control levels discussed below, device level control can be regarded as the “brain” of the cooling device and has the following benefits:
  • Less experience required of data center operators to operate because the control program is embedded and factory verified. Data center operators need only adjust the setpoints per their environmental requirements.
  • No extra capital cost required as the controls are built into the products.
  • Significant energy savings can be achieved with device level control when cooling devices employ VFD fans, VFD compressors, VFD pumps, etc. which can be adjusted according to IT load.
  • Device level control is the foundation of group level control, which is discussed in the next main section.

Limitations

For data centers with only device level control, the main limitation is that there is no communication between the adjacent cooling devices, and the cooling devices cannot be coordinated to avoid issues like demand fighting. Device level control is only recommended for a small IT room where only one cooling unit (e.g. split air- cooled DX) is used. However, the device level control is the foundation of establishing an effective control system. We suggest you to select cooling models with device level controls that support group level control and system level control.

Examples

Figure 1 shows an example of a row-based CRAH with an advanced built-in device level control. The control system can adjust the fan speed and the chilled water valve to better match the IT load and inlet temperature requirement. This control system can support group level control and system level control. The CRAH units are also controlled based on rack inlet air temperatures to ensure the temperatures are maintained within the targets.
Figure 1: Example of a row-based CRAH with effective built-in device level control (Schneider InRow RC shown)

Group level control

Group level control, as its name implies, means a group of the same type (and vendor) of cooling devices are controlled by same control algorithm. This could be a group of CRACs or CRAHs, a group of chillers, a group of cooling towers, or a group of pumps. Group level control is more advanced than device level control and can be applied in an IT room or in a chiller plant. Group level control can be customized to support devices from various vendors but is likely to experience problems in non-standard configurations.
 

Benefits

Compared with device level control, group level control has the following benefits:
  • The cooling devices in the same group are coordinated to avoid demand fighting, which can save energy, especially for CRACs with discrete humidifiers and heaters.
  • Enhance data center cooling reliability in case there’s a failure of any cooling unit. Where redundant cooling units are in “standby”, the control system will “wake it up” upon failure of a cooling unit. Or if all units are on, including re- dundant units, the CRAH or CRAC fans will spin up to provide more cooling capacity if any single unit fails. Note that if units have VFD components, it is more efficient to run all units at once. Due to the cube law relationship between the energy consumption and shaft speed, all units running at lower speeds reduce the energy losses compared to turning off redundant units.
  • Figure 2 shows an example of energy savings with variable speed fans, using 600mm wide row-based CRAHs. By matching IT airflow demand with an ef- fective group level cooling control, the power consumption decreases as fan speeds decrease in response to the cooling demand on the units. This can yield a tremendous energy savings over the life of a data center. Note that the savings will vary based on the design of the room, cooling unit redundancy, and percent IT load over the life of the data center.
Figure 2: Fan power savings of a 600mm wide row-based CRAH with an effective group level control
  • Some cooling devices (same type and same vendor) are designed to support group level control by connecting them together and changing settings, with no additional cost. However, for cooling devices of different types or from dif- ferent vendors, a customized configuration is required.

Limitations

The overall limitation of group level control is that it only controls like devices (e.g. CRAHs or Chillers or Pumps, etc.) and most likely only devices from the same vendor. Group level control improves the overall communication within similar groups of equipment, but does not support overall system optimization between groups of different device types. As an example, for chilled water systems, group level control is not enough to fully optimize system efficiency because there is no direct communication between the CRAHs, the pumps, and the chillers. As a result, it’s challenging for data center operators to manually change the settings in these devices to minimize the overall cooling system energy.
For data centers with air-cooled DX systems, group level control is enough to achieve an efficient cooling system because the indoor and outdoor unit are designed to work together as a single system. Note that the same type and vendor of the cooling devices are normally required. 

Examples

Figure 3 shows an example of a contained aisle with a delta-pressure control system. This can be regarded as group level control. differential pressure control is an alternative to traditional delta T control. Delta-pressure is more precise and has a quicker response to heat load changes. It can monitor pressure inside and out- side of the contained aisle, and improve the balance of airflow between the cooling units and IT equipment. This approach prevents the cooling devices from under cooling while saving energy and increasing availability by actively responding to pressure changes. Figure 3: Example of a group level control application (Schneider Active Flow Control shown)
Another example of group level control is the control among a set of chillers, a set of pumps (condenser water and chilled water) and a set of cooling towers in a chiller site. Direct Digital Control (DDC)10 is normally used to control the operation of chillers, pumps, and cooling towers. DDC controls the start and stop of these cooling devices in sequence to ensure system reliability. DDC also collects operation data like chilled water supply and return temperatures, and mass flow rates to calculate the actual cooling load to optimize chiller performance.

System level control

System level control coordinates the operation of different cooling subsystems within a data center (i.e. a pump and a CRAH). This is different than group level control which coordinates devices of the same type and vendor. For example, system level control coordinates the operation between packaged chiller(s) outdoors and the indoor air handlers to minimize energy use. Where as group level control only controls multiple chillers or multiple air handlers, and no communication between these two subsystem groups. Note that system level control is usually configured between subsystems from the same vendor. A system level control can be customized to support devices from various vendors but is likely to experience problems in non-standard configurations.

Benefits

Compared with device level and group level controls, system level control has the following benefits:
  • For chilled water systems, it looks at the cooling system holistically and comprehends the dynamics to minimize total cooling energy consumption.
  • Move between different modes of operation without human intervention. For example, transitioning between mechanical, partial economizer, and full econ- omizer mode based on outdoor air temperatures and data center IT load, to optimize energy savings. This is done without issues like variations in IT supply air temperatures, component stress, or downtime between modes.
  • A prefabricated system level control is designed to address a wide range of operating conditions and fault scenarios that cannot be tested under site commissioning or validation processes. The availability and reliability of the control systems can be ensured.

Limitations

System level control is normally recommended for dedicated data center facilities with cooling architectures including chilled water, direct air economizer, indirect air economizer, and glycol-cooled DX systems. These four cooling architectures are normally composed of different subsystems (i.e. device types). When a cooling system uses devices from multiple vendors, the overarching system level control will likely be customized and exhibit unique issues that must be resolved over the course of the year (depending on the variation between seasons). This is the main reason we suggest a cooling solution with system level control and subsystems from the same vendor to avoid compatibility and stability issues.

Examples

Figure 4 shows a configuration for system level control of a data center with air- cooled packaged chillers, row-based CRAHs, and room-based CRAHs, where the cooling units are located in different rooms or data halls. This control system can maximize efficiency through integrated communication between all the cooling resources on site.
Figure 4: shows a configuration for system level control of a data center with air- cooled packaged chillers, row-based CRAHs, and room-based CRAHs, where the cooling units are located in different rooms or data halls. This control system can maximize efficiency through integrated communication between all the cooling resources on site.
Figure 5 shows the chiller efficiency improvement as a result of the system level control discussed in Figure 4. Energy savings can be achieved by actively reset- ting the chilled water supply temperature during light load conditions. Note that, this kind of optimization will switch to regular control under any emergency conditions to ensure cooling reliability over energy savings.

Figure 5: Diagram showing the chiller efficiency improvement with system level control
here is also another example of system level control where all cooling components like compressors, fans, pumps, and heat exchangers are designed and manufactured as a single packaged system, and is prefabricated and validated in the factory. Figure 6 shows an example of system level control for an indirect air economization system. This control system can also be considered device level control. A solution that includes both prefabricated system level control and subsystems, saves a significant amount of time (programming and testing) and offers more predictable operation compared to customized solutions. A system level control system can also prevent condensation inside the air-to-air heat exchanger, maximize economizer hours, reduce the operational cost and ensure cooling availability under emergency conditions. White Paper 136, High Efficiency Indirect Air Economizer based Cooling for Data Centers discusses this technology in detail. Figure 6: Example of an indirect air economizer system with prefabricated system level control

Facility level control

Facility level control integrates all functions of a building into a common network, which controls everything in the building from the HVAC (heating, ventilation and air conditioning), elevators, and lighting systems to the security, emergency power, and fire protection systems. Note that in facility level controls, the cooling devices for data centers are controlled with group level or system level controls. Therefore, from the cooling system perspective, we can’t say facility level control is more advanced than system level control or group level control.
Other terms may come to mind for “facility level control”. However, the term building management system (BMS) is used to describe facility level control in this paper. A BMS can control the building mechanical infrastructure (if no system control is in place), provides real-time monitoring of facilities equipment, and actively manages cooling performance. It is able to react to changes in the cooling load or failures of equipment automatically by turning on additional equipment, opening valves, or increasing the airflow to maintain cooling.

Benefits

Compared with device level, group level, and system level control, facility level control has the following benefits:
  • Monitor the status and performance of power and cooling systems: A BMS can monitor the entire power and cooling trains from the utility connection and head of the mechanical plant down to the IT servers.
  • Analyze the dependencies and interconnected nature of infrastructure components: In abnormal conditions, a BMS can help operators quickly assess the root cause and take action to keep power and cooling systems on line.
  • Communicate effectively with data center operators: A BMS can provide the right context and actionable information to the right people at the right time to ensure appropriate resources are brought to bear as infrastructure status changes or incidents occur through event logging, notification, and alarms.
  • Share information with other infrastructure management systems: A BMS can share and receive status and alarm information with other infrastructure management systems like electrical power monitoring system (EPMS) and data center infrastructure management system (DCIM), and also have the ability to transmit data via Application Programming Interfaces (APIs) to host information as databases, web services, and reports, which can help IT managers to be aware of what is going on with the electrical and mechanical plants.

Limitations

When considering facility level control, or BMS control, for a data center there are many aspects of the equipment, monitoring, and control components that need to be taken into account.
  • Pulling together separate BMS elements to provide control of the entire cool- ing system can be very challenging and costly. Typically leveraging a BMS or facility level control for the entire data center is managed through a combination of device, group, and system level control.
  • Can require specialized control and set-up based on a specific facility requiring significant investment in time for programing and communications between individual devices.
  • A cyber-attack on the overall facility can potentially put the data center at risk. To minimize this risk, the data center is typically disconnected from the rest of the facility.

Examples

Figure 7 shows an example of a BMS architecture. White Paper 233, Selecting a Building Management System (BMS) for Sites with a Data Center or IT room provides more information on this topic.
Figure 7: Example of a BMS architecture (Schneider SmartStruxureTM shown)

Selection among four levels

In order to make effective decisions regarding the choices between device, group, system, and facility level controls for a new data center design, this section compares the four levels against various criteria commonly identified by data center designers and operators. Note that if a data center has system level control, it will also have group level and device level control because they are the foundation of system level control. Table 1: Comparison between four control levels

Conclusion

Effective cooling controls can maximize cooling capacity, simplify cooling management, eliminate hot spots, ensure that temperature SLAs are met, reduce operations cost, and enhance data center availability. Specifying the right level(s) of control for your cooling system can provide these benefits. This paper describes four cooling control levels, when they should be used, the benefits and limitations of each level, and provide examples of each.

About the author

Paul Lin is a Senior Research Analyst at Schneider Electric's Data Center Science Center. He is responsible for data center design and operation research, and consults with clients on risk assessment and design practices to optimize the availability and efficiency of their data center environment. Before joining Schneider Electric, Paul worked as the R&D Project Leader in LG Electronics for several years. He is now designated as a “Data Center Certified Associate”, an internationally recognized validation of the knowledge and skills required for a data center professional. He is also a registered HVAC professional engineer. Paul holds a master’s degree in mechanical engineering from Jilin University with a background in HVAC and Thermodynamic Engineering.