Microsoft's GPU Power Crisis: The "Warm Shells" Bottleneck

微软GPU电力危机:"温壳"瓶颈

Research Report | November 7, 2025 | Based on BG2 Pod Podcast (October 31, 2025)

研究报告 | 2025年11月7日 | 基于BG2 Pod播客(2025年10月31日)

Executive Summary

执行摘要

On October 31, 2025, Microsoft CEO Satya Nadella revealed a critical infrastructure bottleneck: Microsoft has billions of dollars worth of AI GPUs "sitting in inventory" that cannot be deployed due to insufficient power infrastructure.

  • The Problem: Nadella stated, "you may actually have a bunch of chips sitting in inventory that I can't plug in. In fact, that is my problem today."
  • Technical Context: GPU power consumption tripled in 3-4 years (400W → 1,200W), while data center infrastructure requires 2-4 years to build.
  • Scale: Microsoft deployed 2 GW in 2025, operates 400+ facilities, yet still cannot deploy all procured GPUs.
  • Strategic Shift: Microsoft no longer wants to buy GPUs "beyond one generation" - procurement now aligned with power availability.

2025年10月31日,微软CEO萨提亚·纳德拉披露了一个关键的基础设施瓶颈:微软拥有价值数十亿美元的AI GPU"积压在库存中",由于电力基础设施不足而无法部署。

  • 问题:纳德拉表示:"你可能会有一堆芯片积压在库存中,我无法插入使用。事实上,这就是我今天的问题。"
  • 技术背景: GPU功耗在3-4年内增长了三倍(400W → 1,200W),而数据中心基础设施需要2-4年才能建成。
  • 规模:微软在2025年部署了2吉瓦容量,运营400多个设施,但仍无法部署所有采购的GPU。
  • 战略转变:微软不再希望购买"超过一代"的GPU——采购现在与电力供应相协调。

The BG2 Pod Revelation

BG2 Pod播客的揭示

On the BG2 Pod podcast hosted by Brad Gerstner and Bill Gurley, Satya Nadella (Microsoft CEO) and Sam Altman (OpenAI CEO) discussed the $3 trillion AI infrastructure buildout. Nadella's candid admission about idle GPUs marked a watershed moment in the AI industry.

"The biggest issue we are now having is not a compute glut, but it's power – it's sort of the ability to get the builds done fast enough close to power. So, if you can't do that, you may actually have a bunch of chips sitting in inventory that I can't plug in. In fact, that is my problem today."

— Satya Nadella, Microsoft CEO

"It's not a supply issue of chips; it's actually the fact that I don't have warm shells to plug into."

— Satya Nadella, Microsoft CEO

在由Brad Gerstner和Bill Gurley主持的BG2 Pod播客上,萨提亚·纳德拉(微软CEO)和萨姆·奥特曼(OpenAI CEO)讨论了3万亿美元的AI基础设施建设。纳德拉关于闲置GPU的坦率承认标志着AI行业的一个分水岭时刻。

"我们现在面临的最大问题不是计算过剩,而是电力——也就是以足够快的速度在接近电源的地方完成建设的能力。所以,如果你做不到这一点,你可能实际上会有一堆芯片积压在库存中,我无法插入使用。事实上,这就是我今天的问题。"

— 萨提亚·纳德拉,微软CEO

"这不是芯片供应的问题;实际上是我没有可以插入的'温壳'。"

— 萨提亚·纳德拉,微软CEO

GPU Power Consumption Evolution (2021-2025)

GPU功耗演变(2021-2025)

GPU power requirements have tripled in just 3-4 years, creating an unprecedented infrastructure challenge.

GPU功耗需求在短短3-4年内增长了三倍,造成了前所未有的基础设施挑战。

Understanding "Warm Shells"

理解"温壳"概念

Nadella borrowed the term "warm shells" from commercial real estate, where it refers to a building ready for immediate occupancy with all basic infrastructure in place. For data centers, a "warm shell" includes:

  • Electrical Infrastructure: High-voltage power capable of 120+ kW per rack (vs. legacy 10-20 kW)
  • Cooling Systems: Liquid cooling infrastructure (air cooling inadequate above 50 kW)
  • Water Supply: 300,000+ gallons daily for cooling operations
  • Grid Connection: Multi-gigawatt connections to electrical grid
  • Backup Power: Uninterruptible power supplies and generators

The Gap: Microsoft has procured tens of thousands of GPUs (estimated $3-10+ billion in inventory) but lacks "warm shell" facilities to house them. The timeline mismatch is stark:

GPU Procurement

6-12 months

From order to delivery

Power Infrastructure

2-4 years

Grid connections, approvals, construction

纳德拉从商业地产借用了"温壳"一词,在商业地产中,它指的是具备所有基本基础设施、可立即入住的建筑。对于数据中心,"温壳"包括:

  • 电力基础设施:能够支持每机架120+ kW的高压电力(相比传统的10-20 kW)
  • 冷却系统:液冷基础设施(50 kW以上空气冷却不足)
  • 供水:每日30万加仑以上用于冷却操作
  • 电网连接:与电网的千兆瓦级连接
  • 备用电源:不间断电源和发电机

差距:微软已采购数万GPU(估计库存价值30-100亿美元以上),但缺乏可容纳它们的"温壳"设施。时间表错配极为明显:

GPU采购

6-12个月

从订购到交付

电力基础设施

2-4年

电网连接、审批、建设

Data Center Power Density Evolution

数据中心功率密度演变

Year 年份 Average Power/Rack 平均功率/机架 AI Workload Racks AI工作负载机架 Infrastructure Challenge 基础设施挑战
2023 2023 36 kW 36千瓦 50-80 kW 50-80千瓦 Air cooling strained 空气冷却紧张
2025 2025 40 kW 40千瓦 120-142 kW 120-142千瓦 Liquid cooling mandatory 液冷必需
2027 2027 50 kW 50千瓦 240+ kW 240+千瓦 Advanced liquid cooling essential 高级液冷至关重要

Global AI Data Center Power Demand Projections

全球AI数据中心电力需求预测

Power demand is growing exponentially while infrastructure has linear growth constraints.

电力需求呈指数级增长,而基础设施的增长受线性限制。

Microsoft's Infrastructure Scale

微软的基础设施规模

2025 Power Deployment

2025年电力部署

2 GW

Powers ~1.5 million homes

可供约150万户家庭使用

Total Facilities

设施总数

400+

Data centers globally

全球数据中心

Q1 2026 Spending

2026年第一季度支出

$11.1B

Quarterly lease spending

季度租赁支出

Industry AI Infrastructure Spending (2025)

行业AI基础设施支出(2025)

Major tech companies are investing unprecedented amounts, yet all face similar power constraints.

主要科技公司投入了前所未有的资金,但都面临类似的电力约束。

Energy Solutions and Timelines

能源解决方案和时间表

Nuclear Power: Three Mile Island Deal

Capacity

835 MW

Expected Online

2028

Investment

$16 Billion

Contract Duration

20 Years

Partner: Constellation Energy | Geographic coverage: PJM Interconnection (Pennsylvania, Chicago, Virginia, Ohio)

Natural Gas

  • Currently supplies 40%+ of U.S. data center electricity
  • Chevron + GE Vernova partnership: Up to 4 GW capacity by 2027
  • Constraint: Gas turbines sold out through 2030
  • Consumer Impact: PJM market saw $9.3B price increase; +$18/month (Maryland), +$16/month (Ohio)

核电:三里岛协议

容量

835兆瓦

预计上线

2028年

投资

160亿美元

合同期限

20年

合作伙伴:Constellation Energy | 地理覆盖:PJM互联(宾夕法尼亚、芝加哥、弗吉尼亚、俄亥俄)

天然气

  • 目前供应美国数据中心40%+的电力
  • 雪佛龙 + GE Vernova合作:到2027年达到4吉瓦容量
  • 约束:燃气轮机订单已售罄至2030年
  • 消费者影响: PJM市场价格上涨93亿美元;马里兰州每月+18美元,俄亥俄州每月+16美元

Energy Source Timeline Comparison

能源来源时间表对比

Energy Source 能源来源 Decision to Power 决策到供电 Key Constraints 主要约束 Microsoft Status 微软状态
Nuclear (Restart) 核电(重启) 3-4 years 3-4年 Regulatory approval 监管审批 Three Mile Island 2028 三里岛2028年
Natural Gas 天然气 2-3 years 2-3年 Turbine availability 涡轮机可用性 Open to using 愿意使用
Behind-the-Meter Gas 现场天然气 1-2 years 1-2年 Local permits 当地许可 Likely pursuing 可能追求
Grid Connection 电网连接 2-4 years 2-4年 Utility approval 公用事业审批 Primary bottleneck 主要瓶颈

Implications for Nvidia and the GPU Market

对Nvidia和GPU市场的影响

Bear Case for Nvidia

  • Largest customers facing deployment constraints
  • Demand visibility reduced beyond current generation
  • Potential "compute glut" if infrastructure lags
  • Performance-per-watt becomes key selection criteria

Bull Case for Nvidia

  • Long-term AI demand trajectory intact
  • Infrastructure will eventually catch up
  • Technological lead remains substantial
  • CUDA ecosystem creates switching costs

Winners from Power Constraint

Energy Sector

  • Natural gas companies (EQT Corp)
  • Nuclear power (Constellation Energy)
  • Utilities with expansion capacity

Technology Sector

  • Liquid cooling providers
  • Energy-efficient chip designers
  • Data center power expertise

Nvidia的利空因素

  • 最大客户面临部署约束
  • 当代产品以外的需求可见性降低
  • 如果基础设施滞后,可能出现"计算过剩"
  • 性能功耗比成为关键选择标准

Nvidia的利好因素

  • 长期AI需求轨迹保持不变
  • 基础设施最终会赶上
  • 技术领先优势仍然可观
  • CUDA生态系统产生转换成本

电力约束的赢家

能源行业

  • 天然气公司(EQT公司)
  • 核电(Constellation Energy)
  • 具有扩张能力的公用事业

科技行业

  • 液冷提供商
  • 节能芯片设计师
  • 数据中心电力专业知识

Conclusion: The Infrastructure Race

结论:基础设施竞赛

Satya Nadella's October 31, 2025 podcast appearance marked a watershed moment in the AI infrastructure buildout. The constraint on AI deployment has fundamentally shifted from silicon availability to power infrastructure.

Microsoft and other hyperscalers have procured billions of dollars worth of advanced AI GPUs that sit idle because data centers lack the electrical power and cooling infrastructure to deploy them. This represents a timeline mismatch: GPU power consumption tripled in 3-4 years (400W → 1,200W), while data center infrastructure requires 2-4 years minimum to build.

The Strategic Shift: Companies are changing procurement strategies to align GPU purchases with power availability rather than stockpiling hardware. Competition now centers on securing energy infrastructure—power purchase agreements, nuclear plant output, natural gas resources, and grid connections—not just chip procurement.

The remainder of the 2020s will be defined by this infrastructure buildout race. The AI revolution will be powered not by the semiconductor fab, but by the electrical grid.

萨提亚·纳德拉在2025年10月31日的播客亮相标志着AI基础设施建设的分水岭时刻。AI部署的约束已从硅可用性根本转向电力基础设施

微软和其他超大规模企业已采购价值数十亿美元的先进AI GPU,但由于数据中心缺乏电力和冷却基础设施而闲置。这代表了时间表错配:GPU功耗在3-4年内增长了三倍(400W → 1,200W),而数据中心基础设施至少需要2-4年才能建成。

战略转变:公司正在改变采购策略,使GPU购买与电力可用性相一致,而不是囤积硬件。竞争现在集中在确保能源基础设施——电力购买协议、核电厂产出、天然气资源和电网连接——而不仅仅是芯片采购。

2020年代的剩余时间将由这场基础设施建设竞赛定义。AI革命将不是由半导体制造推动,而是由电网推动。