The AI Infrastructure Supply Chain: A Field Guide to Who Builds What

❝

"The bottleneck is never where the hype says it is."

The AI infrastructure buildout is the largest capital deployment cycle in the history of computing. Hundreds of billions of dollars are moving through a supply chain that most investors understand only at the top layer — the model, the chip, the hyperscaler. The real constraints, the durable margins, and the overlooked risks live further down the stack.

This field guide maps the entire chain, names the dominant players at each layer, identifies where the real bottlenecks are in 2026, and gives you five signals worth watching.

Part I: The Stack

AI infrastructure runs on a vertical supply chain with seven identifiable layers. Each layer depends on the one below it. Disruption at any layer propagates upward.

Layer 7 — AI Applications & Models
The visible layer: ChatGPT, Gemini, Claude, Grok, Midjourney. These are the products consumers and enterprises interact with. This layer captures significant margin but owns almost none of the physical infrastructure.

Layer 6 — Cloud & Inference Services
The hyperscalers — AWS, Microsoft Azure, Google Cloud, Oracle — rent compute to everyone else. This is where most enterprise AI spending lands. Custom silicon programs (Graviton, Trainium, TPU, Maia) are designed here to reduce dependence on Layer 5.

Layer 5 — Accelerators & Networking
The GPU layer. NVIDIA dominates with ~80% revenue share of AI accelerator shipments (H100, H200, B200 Blackwell family). AMD is the credible alternative (MI300X, MI325X). Intel's Gaudi is a distant third. InfiniBand and Ethernet networking at 400G/800G speeds are equally critical — and equally scarce.

Layer 4 — Advanced Packaging & HBM
High-Bandwidth Memory (HBM) is stacked DRAM bonded directly to a GPU die. It is as critical as the GPU itself and nearly as scarce. SK Hynix, Micron, and Samsung are the only suppliers. TSMC's CoWoS advanced packaging is the bottleneck within the bottleneck.

Layer 3 — Semiconductor Fabrication
TSMC produces approximately 90% of all leading-edge logic chips (3nm, 5nm). ASML's EUV lithography machines — roughly 60 shipped globally per year — are the single-point dependency enabling everything at 7nm and below.

Layer 2 — Raw Materials & Substrate
Chipmaking requires ultra-pure silicon wafers, specialty gases, photoresists, and rare earth elements. This layer is unglamorous and chronically under analyzed.

Layer 1 — Power & Physical Infrastructure
Data centers run on electricity. A hyperscale AI training cluster at 100 MW draws more power than a mid-sized American city. Everything above it is contingent on watts delivered reliably.

Part II: Key Players by Layer

Applications: OpenAI, Google DeepMind, Anthropic, Meta AI — win on data, talent, and distribution.

Cloud/Inference: AWS, Azure, GCP, Oracle — win on capital, customer lock-in, and custom silicon.

Accelerators: NVIDIA (~80%), AMD, Intel Gaudi — CUDA ecosystem moat; ROCm catching up.

HBM: SK Hynix (~50%), Samsung (~35%), Micron (~15%) — 3–5 year fab investment lead times.

Advanced Packaging: TSMC CoWoS (monopoly on leading-edge) — process IP, yield expertise.

Foundry: TSMC (~90% leading edge), Samsung — process node leadership, ASML EUV access.

EUV Equipment: ASML (global monopoly) — decades of optical physics IP.

Raw Materials: Shin-Etsu, Sumco (wafers); Linde, Air Products (gases) — scale, purity standards, long-term contracts.

Power & Grid: Vertiv, Eaton, ABB (power mgmt); Quanta, Fluor (construction) — hardware complexity and long lead times.

NVIDIA's structural position deserves elaboration. It is not simply the chip — it is the CUDA software stack, the developer ecosystem, and the NVLink interconnect that binds GPUs into a coherent cluster. A customer switching from NVIDIA to AMD does not just swap hardware; they rewrite libraries, retrain engineers, and accept higher software integration risk. This switching cost is the actual moat.

TSMC's CoWoS monopoly on advanced packaging is the least-discussed constraint in the supply chain. CoWoS capacity expansion requires 18–24 months of lead time. Every AI chip shipped is also a CoWoS allocation.

Part III: The Bottlenecks — Where the Constraints Actually Are in 2026

1. Electrical Transformers
The unsexy constraint nobody in VC talks about. Large power transformers for data center substations have lead times of 2–4 years. US manufacturing capacity is limited; much of the supply chain runs through Europe and Asia. Grid interconnection queues in Texas, Virginia, and the Pacific Northwest are measured in years, not months. No transformer delivery = no data center commissioning.

2. CoWoS Packaging Capacity
TSMC remains the only company with volume-production CoWoS capability. TSMC's CoWoS expansion is adding capacity, but it is absorbed by NVIDIA Blackwell and Apple silicon simultaneously. This constrains total AI chip shipments regardless of wafer availability.

3. HBM Supply
SK Hynix, Samsung, and Micron are all ramping HBM3E, but HBM production uses DRAM fab capacity that was previously serving commodity DRAM markets. HBM yields at leading-edge stack heights (12-hi, 16-hi) remain a manufacturing challenge. Any yield shortfall at SK Hynix cascades directly to NVIDIA GPU availability.

4. Skilled Labor in Data Center Construction
Electricians, ironworkers, and mechanical contractors specializing in data center build-outs are fully committed in every major US market. Northern Virginia — the world's largest data center cluster — has active projects totaling tens of billions of dollars competing for the same trade workforce. Schedule slippage is the norm, not the exception.

5. Water Rights & Cooling
AI training clusters run hot. Liquid cooling is increasingly the only viable solution at 1,000W+ per rack TDP. Hyperscalers are deploying direct-to-chip liquid cooling and immersion systems at scale, but these require significant water resources in geographies (Arizona, Texas) facing long-term water stress. Water permitting is an emerging constraint that does not appear in most supply chain analyses.

Part IV: What to Watch — 5 Forward-Looking Signals

Signal 1: TSMC CoWoS Capacity Utilization
TSMC doesn't publish CoWoS utilization directly, but quarterly earnings calls and supply chain analysts (SemiAnalysis, TechInsights) routinely surface lead time indicators. When CoWoS lead times compress, AI chip availability loosens. Watch TSMC's packaging revenue as a percentage of total revenue — it should grow materially through 2026–2027.

Signal 2: HBM Spot Pricing
Pricing signals emerge in quarterly earnings from SK Hynix, Samsung, and Micron. SK Hynix's HBM gross margins are a direct proxy for supply tightness. Margin compression = supply loosening. Margin expansion = still constrained.

Signal 3: Utility Grid Interconnection Queue Volume
The DOE's Lawrence Berkeley National Laboratory publishes the National Interconnection Queuing Monitor quarterly. Virginia, Texas, and Georgia queues are the ones to watch — they're a leading indicator of where buildout activity is concentrated and how long commissioning delays will run.

Signal 4: NVIDIA Revenue Recognition vs. Booking Commentary
NVIDIA books revenue on shipment, not on order. When NVIDIA guides conservatively on next-quarter revenue despite strong demand commentary, the bottleneck is physical, not demand-side. That is a signal to investigate upstream.

Signal 5: Electrical Transformer Manufacturing Capacity
Hitachi Energy, ABB, Eaton, and SPX Transformer Solutions are the primary large transformer manufacturers serving North America. Any announced capacity expansions or M&A activity in this sector signals the grid infrastructure bottleneck is being taken seriously by capital markets. This is a 3–5 year signal — but it matters for long-duration infrastructure investors.

How to Use This Guide

This field guide is a foundation, not a forecast. The supply chain changes faster than any static document can track. What doesn't change is the analytical framework: follow the physical constraints, not the narrative. The chips get the headlines; the transformers, the CoWoS allocations, and the HBM yields determine the outcome.

Supply Side AI publishes every Tuesday (deep dive) and Friday (data roundup) with current intelligence on these layers.

Supply Side AI is an independent newsletter. All analysis is original and editorial. No investment advice is provided or implied.

The AI Infrastructure Supply Chain: A Field Guide to Who Builds What

Keep reading

Supply Side AI