logo

Cooling NVIDIA GPUs: Exploring Liquid Cooling Options and Waste Heat Recovery for Sustainable Data Centers

TEL: 1-608-238-6001 Email: greg@infinityturbine.com

Data Center Energy Strategies Using Cluster Mesh Power Generation Learn more about cooling and power strategies for Data Centers for the Cloud, AI, and Crypto

Grid-Scale Saltwater Flow Battery by Salgenx 3000 kWh to 18 MWh Grid Scale Battery. Use for peaker plants or Data Centers. More Info

Turbine and ORC Licensing - Starting at $10,000 Build one system or turbine-generator, then decide on the license when ready More Info

ORC Radial Outflow Turbine Flat Pack Kit Infinity Turbine is now offering a ROT Radial Outflow Turbine for customer assembly flat pack kit.... More Info

Cooling NVIDIA GPUs: Exploring Liquid Cooling Options and Waste Heat Recovery for Sustainable Data Centers

As artificial intelligence (AI) and high-performance computing (HPC) workloads continue to scale, the power requirements for GPUs like NVIDIA’s A100, H100, and other advanced models have grown dramatically. These powerful GPUs can consume hundreds of watts per card, which creates a significant cooling challenge. Managing this heat effectively is not only important for performance but also for reducing the environmental impact of energy-intensive data centers.

This article explores the cooling options available for NVIDIA’s high-performance GPUs and highlights cutting-edge technologies for waste heat recovery, turning thermal energy into usable power or cooling solutions.

NVIDIA’s Cooling Solutions: Heat Sinks and Liquid Cooling

NVIDIA’s A100, H100, and newer GPUs are primarily designed for use in data centers, where heat dissipation is a critical concern. The default cooling options from NVIDIA include air cooling with heat sinks and fans, which is effective in many environments but may not be sufficient for all high-density or high-performance scenarios.

Here’s a look at the most common cooling solutions for these GPUs:

1. Air Cooling with Heat Sinks:

• Reference Design: NVIDIA’s GPUs typically come with a reference design that includes a heat sink and fans for air cooling. This method relies on airflow through the data center to move heat away from the GPU.

• Pros: Air cooling is cost-effective and straightforward to implement in traditional data center environments.

• Cons: Air cooling has limitations in high-density deployments or in data centers where cooling capacity is already maxed out. As workloads intensify, air cooling may struggle to keep temperatures low, leading to reduced performance or the need for additional infrastructure.

2. Liquid Cooling Solutions:

• For more demanding environments, liquid cooling offers a much more efficient way to remove heat from GPUs.

• While NVIDIA does not provide integrated liquid cooling with the basic GPU packages, it does support third-party liquid cooling solutions that are widely adopted in high-performance settings. Companies like CoolIT Systems and Asetek offer tailored liquid cooling kits for NVIDIA’s A100 and H100 models.

• CoolIT Systems: Provides liquid cooling technology specifically designed for NVIDIA GPUs, which includes cold plates, pumps, and other components to build a custom liquid loop.

• Asetek: Another major player offering liquid cooling systems for enterprise GPUs, focusing on energy efficiency and modularity.

• Pros: Liquid cooling is far more efficient at dissipating heat compared to air cooling, allowing for higher GPU performance and reducing the risk of thermal throttling.

• Cons: Liquid cooling systems can be more expensive and complex to maintain, requiring special infrastructure for pumps, cold plates, and tubing.

3. NVIDIA Liquid-Cooled Versions:

• Recognizing the growing need for more efficient cooling, NVIDIA has begun offering liquid-cooled versions of some of its GPUs, including the A100 PCIe and the newer H100 models. These options are particularly useful for data centers that need to maximize compute density while managing heat efficiently.

Commercially Available Waste Heat Recovery Solutions

Data centers consume massive amounts of energy, and as GPUs like the NVIDIA A100 and H100 continue to ramp up their power consumption, finding ways to recover and reuse waste heat has become increasingly important. One emerging trend is the integration of waste heat recovery systems, which can transform excess heat into usable power or cooling.

Here are some commercially available solutions that can integrate with data center cooling systems:

1. Organic Rankine Cycle (ORC) Systems:

• ORC technology allows the conversion of low-grade waste heat (often from 40°C to 120°C) into electricity. This is particularly useful in data centers where GPUs generate significant heat. Companies like Infinity Turbine and Exergy offer ORC systems that can be adapted to high-performance data centers.

• Infinity Turbine: Specializes in small, scalable ORC systems that can recover waste heat from data centers and other industrial operations. Their systems convert waste heat into electrical power, helping to offset energy costs.

• Exergy: Offers ORC solutions that can generate electricity from waste heat in a variety of settings, including data centers, industrial plants, and renewable energy facilities.

• Pros: Waste heat recovery via ORC reduces the need for additional cooling infrastructure while producing usable electricity.

• Cons: ORC systems require an initial investment and may need to be tailored to specific heat loads and environmental conditions.

2. Absorption Chillers:

• Absorption chillers are another technology used to convert waste heat into cooling. These systems use heat (rather than electricity) to power a refrigeration cycle, making them ideal for data centers looking to reduce their reliance on traditional air conditioning systems.

• Yazaki and Thermax are leaders in absorption chiller technology, providing systems that can efficiently convert waste heat from GPUs into cooling for other areas of the data center or adjacent facilities.

• Pros: Absorption chillers are highly efficient in reducing cooling costs and can be integrated into existing heat management systems.

• Cons: The efficiency of absorption chillers depends on the quality and temperature of the waste heat, and they may not be suitable for all data center designs.

3. District Heating Networks:

• Some data centers are now connecting to district heating networks, where waste heat generated by GPUs and other equipment is captured and redistributed to provide heating for nearby buildings or industrial processes. This is a growing trend, particularly in regions with cold climates.

• Examples: Cities in Scandinavia and northern Europe are leading the way in using waste heat from data centers to heat homes, offices, and factories through district heating systems.

• Pros: Using waste heat for district heating helps data centers contribute to the local energy grid while reducing overall environmental impact.

• Cons: Implementing district heating requires proximity to buildings or facilities that can use the excess heat, which may not be practical for all data centers.

Conclusion: A Path Toward Efficient and Sustainable Data Centers

As the demand for high-performance GPUs like NVIDIA’s A100 and H100 continues to grow, so does the need for efficient cooling solutions and the recovery of waste heat. Air cooling with heat sinks is no longer sufficient for many advanced data centers, prompting the adoption of liquid cooling systems from companies like CoolIT and Asetek, as well as NVIDIA’s own liquid-cooled variants.

Beyond cooling, waste heat recovery technologies such as Organic Rankine Cycle systems, absorption chillers, and district heating networks are transforming how data centers manage energy consumption. These systems not only reduce cooling costs but also create opportunities to reuse waste heat for power generation or heating, contributing to a more sustainable future.

As we move forward, the integration of liquid cooling and waste heat recovery will become increasingly critical for data centers aiming to stay competitive, reduce their environmental impact, and maximize the performance of their GPU-intensive workloads.

Understanding Temperature Spread and Thermal Expansion in ORC and sCO2 Cycles

In both the Organic Rankine Cycle (ORC) and the supercritical CO2 (sCO2) cycle, the temperature spread between the heat source (input temperature) and the cooling temperature (condenser) directly influences thermal expansion of the working fluid, which is the key driving force for turbine power generation.

1. Role of Temperature Spread in Thermal Expansion:

The temperature spread determines the amount of thermal energy that can be transferred to the working fluid. This energy causes the working fluid (whether it's an organic refrigerant or supercritical CO2) to expand. The greater the temperature difference, the more the fluid expands, increasing the pressure drop across the turbine.

The turbine converts this thermal expansion and pressure drop into mechanical energy, which is then used to generate electricity via a generator.

Key Concept: The larger the temperature spread, the more potential energy the fluid has to expand, resulting in more mechanical energy that can be extracted by the turbine.

---

2. Thermal Expansion in the Organic Rankine Cycle (ORC):

In an ORC, the working fluid is typically an organic substance with a low boiling point, such as R245fa or R134a. Here's how the temperature spread translates into thermal expansion:

Input Heat and Expansion:

• In the evaporator, the organic working fluid absorbs heat from the heat source, causing it to vaporize and expand. This vaporized fluid is then routed through the turbine.

• The degree of expansion depends on the temperature of the heat source. In typical ORC applications (such as waste heat recovery or geothermal energy), heat sources range from 70°C to 300°C (158°F to 572°F).

Cooling and Pressure Drop:

• After expanding through the turbine, the working fluid is condensed back to a liquid in the condenser (typically at 20°C to 50°C).

• The combination of high-temperature vaporization and low-temperature condensation results in a pressure drop across the turbine, which drives the turbine blades.

Impact of Temperature Spread on Thermal Expansion:

• In ORC systems, higher temperature spreads (greater difference between input heat and condenser cooling) lead to greater thermal expansion and a higher pressure drop across the turbine.

• A larger temperature spread enhances the ability of the working fluid to convert heat into mechanical energy, improving overall cycle efficiency.

---

3. Thermal Expansion in the Supercritical CO2 (sCO2) Cycle:

The supercritical CO2 cycle operates using CO2 at pressures and temperatures beyond its critical point, where CO2 behaves as both a liquid and a gas. The unique properties of CO2 in its supercritical state allow it to handle higher temperatures and pressures efficiently, making it suitable for high-temperature waste heat recovery or concentrated solar power.

Input Heat and Expansion:

• In the heater (or heat exchanger), the supercritical CO2 absorbs heat, causing it to expand significantly as it transitions from a dense supercritical state to a more expanded state. The input heat typically ranges from 200°C to 700°C (392°F to 1,292°F).

• Because CO2 is in its supercritical state, the expansion is more rapid and powerful than in typical ORC systems. This results in a high-pressure gas driving the turbine.

Cooling and Pressure Drop:

• After expanding through the turbine, the CO2 is cooled in a cooling system (such as an air or water cooler) to temperatures around 31°C to 50°C (88°F to 122°F).

• The pressure drop from the supercritical state back toward its critical point creates a significant pressure differential, which drives the turbine.

Impact of Temperature Spread on Thermal Expansion:

• In the sCO2 cycle, a larger temperature spread leads to more efficient thermal expansion because CO2 can achieve much higher pressure differences when exposed to high temperatures.

• The thermal expansion and the associated pressure drop across the turbine are maximized in high-temperature applications (e.g., 500°C to 700°C), where the working fluid can expand significantly, creating a powerful driving force for the turbine.

---

Comparing Thermal Expansion in ORC vs. sCO2 Cycles:

Organic Rankine Cycle (ORC):

• Moderate thermal expansion: The expansion of the organic fluid is moderate compared to sCO2, as ORC fluids vaporize at lower temperatures.

• Temperature spread: 50°C to 280°C (122°F to 536°F).

• Efficiency: Typically 5-20%, depending on the temperature difference.

• Best for: Low• to medium-temperature waste heat (e.g., industrial waste heat, geothermal heat, and data center waste heat).

Supercritical CO2 (sCO2) Cycle:

• Significant thermal expansion: CO2 in its supercritical state expands significantly with a high pressure drop. The expansion is more powerful than in ORC systems, especially at higher temperatures.

• Temperature spread: 150°C to 670°C (302°F to 1,238°F).

• Efficiency: Typically 30-50%, depending on temperature spread.

• Best for: High-temperature waste heat recovery (e.g., solar thermal, gas turbine exhaust, or industrial high-heat processes).

---

Impact on Data Center Waste Heat (90°F to 140°F / 32°C to 60°C):

• ORC: With the lower temperature waste heat produced by data centers (32°C to 60°C), ORC is more suited to capture and convert this heat. The moderate expansion of organic fluids at these temperatures can still generate a reasonable pressure drop, though efficiency may be lower (around 5-10%).

• sCO2: The temperature spread in data centers is likely too small for an sCO2 cycle to operate efficiently. The expansion of CO2 would be limited, and the cycle would not achieve its high efficiency without higher input temperatures.

---

Conclusion:

• ORC systems excel in environments with low to moderate temperature spreads, such as data centers, where the waste heat temperature is 90°F to 140°F (32°C to 60°C). The thermal expansion of the organic fluid in ORC systems, though moderate, is enough to generate power from low-grade heat.

• sCO2 systems require a much larger temperature spread to maximize thermal expansion. These systems are more efficient for high-temperature applications, but for low-temperature waste heat like that from data centers, the thermal expansion would be limited, making sCO2 less ideal for such applications.

For data center waste heat recovery, ORC would be the preferred choice due to its ability to operate efficiently with smaller temperature spreads.

Differences Between Organic Rankine Cycle (ORC) and Supercritical CO2 (sCO2) Cycle

Both the Organic Rankine Cycle (ORC) and the Supercritical CO2 (sCO2) Cycle are used for converting heat into electricity, particularly for waste heat recovery. While they share similarities, they differ in working fluid, operating conditions, and applications.

---

Organic Rankine Cycle (ORC) Overview:

The ORC is a variant of the traditional Rankine cycle but uses organic fluids (such as R245fa, R134a, or isobutane) instead of water/steam. These organic fluids have lower boiling points, making ORC well-suited for low-temperature heat sources.

• Typical Operating Range:

• Input heat: 70°C to 300°C (158°F to 572°F)

• Condenser temperature: 20°C to 50°C (68°F to 122°F)

• Working Fluids: Organic fluids like R134a, R245fa, or isopentane.

Advantages of ORC:

1. Lower Temperature Input: ORC is ideal for low-grade heat sources, such as data center waste heat or geothermal sources. It can operate with input temperatures as low as 70°C (158°F).

2. Scalability: ORC systems are well-suited for smaller installations and can be customized for a wide range of applications, from industrial processes to small-scale energy recovery.

3. Low Condenser Cooling Temperature: ORC systems typically operate at lower condensation temperatures, which can make them more compatible with standard cooling solutions like air or water cooling.

4. Simple Design and Maintenance: ORC systems are mature technologies, with simpler designs compared to sCO2 systems, making them easier to maintain.

---

Supercritical CO2 (sCO2) Cycle Overview:

The sCO2 cycle uses carbon dioxide as the working fluid, operating above its critical point (supercritical phase), where CO2 exhibits unique thermodynamic properties. The cycle is highly efficient, especially for higher-temperature heat sources.

• Typical Operating Range:

• Input heat: 200°C to 700°C (392°F to 1,292°F)

• Condenser temperature: 31°C to 50°C (88°F to 122°F) (sCO2 reaches supercritical state at 31°C)

• Working Fluid: Supercritical CO2 (CO2 in a state where it is neither liquid nor gas but exhibits properties of both).

Advantages of sCO2:

1. Higher Efficiency: sCO2 systems are more efficient at converting heat to electricity, especially at higher temperatures (above 200°C), with efficiency rates around 30-50%, compared to typical ORC systems.

2. Compact Design: Because of the high density of supercritical CO2, sCO2 systems can be smaller and lighter than ORC systems, allowing for more compact installations.

3. High-Temperature Compatibility: sCO2 can handle much higher temperatures than ORC, making it more suitable for industrial waste heat or solar thermal applications.

4. Better Efficiency at Moderate Cooling Temperatures: sCO2 systems can operate effectively even with relatively high cooling temperatures (around 31°C (88°F)), making them useful for hot climates or applications where cooling water is limited.

---

Input Heat and Condenser Cooling Temperature Comparison:

• Input Heat:

• ORC: Effective for low to moderate heat sources, 70°C to 300°C (158°F to 572°F).

• sCO2: Requires higher temperatures, typically 200°C to 700°C (392°F to 1,292°F), to maximize efficiency.

• Condenser Cooling Temperature:

• ORC: Generally needs a low condenser temperature, around 20°C to 50°C (68°F to 122°F). ORC is sensitive to the cooling temperature because the condensation of the organic fluid needs to be efficiently managed to maintain performance.

• sCO2: Operates at higher condensation temperatures, with the critical point of CO2 being 31°C (88°F). This makes sCO2 more resilient to warmer cooling sources, like air cooling in hot climates.

---

Suitability for Data Center Waste Heat Cooling (90°F to 140°F):

Data centers typically generate waste heat in the range of 90°F to 140°F (32°C to 60°C). Both ORC and sCO2 systems could potentially recover this heat, but their effectiveness varies.

ORC Advantages for Data Center Waste Heat:

1. Better suited for low-grade heat: ORC systems can operate efficiently with waste heat in the range of 32°C to 60°C (90°F to 140°F). This makes ORC the more natural fit for data centers, where waste heat temperatures tend to be lower.

2. Cost-Effective and Scalable: ORC systems are cheaper to install and maintain, and they can be scaled for small• to medium-sized data centers, where low-grade heat is abundant.

sCO2 Advantages for Data Center Waste Heat:

1. Potential for High Efficiency: sCO2 systems, while more efficient at higher temperatures, can still be efficient at around 60°C (140°F) if the design is optimized. The high density of CO2 allows for compact system designs, which can be a significant advantage in space-constrained environments like data centers.

2. Higher Operating Temperature Tolerance: sCO2 systems can operate efficiently with cooling temperatures closer to 31°C (88°F), which is helpful in hot climates or where water cooling is expensive.

Challenges for Each System:

• ORC: The efficiency of ORC at temperatures below 100°C (212°F) tends to be relatively low, generally around 5-15%.

• sCO2: The primary challenge is that sCO2 systems work best at higher temperatures, so they would be less efficient for data center heat recovery at the 90-140°F range unless optimized.

---

Conclusion: Which System is Better for Data Center Waste Heat?

• ORC is likely the better option for most data centers, as it can efficiently capture and convert low-grade heat, which is typical in these environments. ORC systems are more cost-effective, scalable, and better suited to the 90-140°F range.

• sCO2 systems are more efficient in higher-temperature applications and are more compact, but they may require modifications to function optimally with lower-temperature waste heat like that from data centers.

In summary, for data centers operating with waste heat between 90°F to 140°F, ORC would generally be the more efficient and practical solution. However, if higher heat inputs can be concentrated or if space-saving is a critical concern, an optimized sCO2 system could also be considered.

CONTACT TEL: 608-238-6001 Email: greg@infinityturbine.com | RSS | AMP | PDF | IG | X.com