Google Built Two AI Chips — Here's Why Nvidia Should Care

Google TPU 8t and 8i AI chips for the agentic era

KEY TAKEAWAYS

Google unveiled TPU 8t (training) and TPU 8i (inference) at Cloud Next 2026 — the first time the company has split its chip line into two purpose-built architectures.
TPU 8t delivers 2.7x better performance-per-dollar than Ironwood for training and can scale to one million chips in a single cluster with 121 ExaFlops of compute.
TPU 8i packs 384 MB of on-chip SRAM (3x previous gen) and cuts communication latency by up to 5x, enabling nearly double the inference throughput at the same cost.
Both chips deliver 2x better performance-per-watt, but Google is supplementing — not replacing — Nvidia, even as it quietly builds the most vertically integrated AI stack in the industry.

For years, every major cloud provider has poured billions into custom silicon while continuing to write enormous checks to Nvidia. Google just made the boldest move yet. At Cloud Next 2026 on April 22, the company split its eighth-generation Tensor Processing Unit into two entirely separate chips — one optimized purely for training, the other for inference — signaling that the era of one-size-fits-all AI hardware is over. The timing is deliberate: as agentic AI workloads explode, the performance demands for training a model and serving it in real time are diverging so fast that a single chip architecture can no longer do both well.

The Split Strategy: Why Two Chips Beat One

TPU 8t: Built for Massive-Scale Training

The training-focused TPU 8t uses a 3D torus network topology and scales up to 9,600 chips in a single pod — up from 9,216 with Ironwood. But the real number that matters is the cluster ceiling: Google says TPU 8t supports near-linear scaling up to one million chips, delivering a combined 121 ExaFlops of compute. That is the kind of capacity needed to train frontier models that now routinely consume tens of thousands of GPU-hours. The chip also introduces native four-bit floating point math and a SparseCore accelerator for irregular memory access patterns, doubling throughput while maintaining accuracy with smaller memory footprints. Overall, Google claims a 2.7x performance-per-dollar improvement over Ironwood for large-scale training jobs.

TPU 8i: Purpose-Built for Inference at Scale

Inference is an entirely different animal. As mixture-of-experts models grow and chain-of-thought reasoning chains lengthen, the bottleneck shifts from raw compute to memory bandwidth and latency. TPU 8i addresses this head-on with 288 GB of high-bandwidth memory paired with 384 MB of on-chip SRAM — triple the previous generation. A new Collectives Acceleration Engine reduces on-chip latency by up to 5x, while a custom Boardfly ICI network topology connects up to 1,152 chips and cuts all-to-all communication hops by 50%. The result: roughly 80% better performance-per-dollar versus Ironwood at low-latency targets, enabling Google to serve nearly twice the customer volume at the same cost.

Business Insight — Splitting training and inference into separate silicon is not just an engineering preference — it is an economic statement. Training is a capex-heavy, batch-oriented workload measured in throughput. Inference is an opex stream measured in latency and cost-per-query. By optimizing each chip for its economic model, Google can undercut competitors on both dimensions simultaneously.

The Nvidia Question: Supplement, Not Replace

Despite the aggressive specs, Google is careful not to frame TPU 8 as an Nvidia killer. At the same event, the company confirmed it will offer Nvidia’s upcoming Vera Rubin chip to Cloud customers later in 2026. Google and Nvidia are also collaborating on Falcon, an open-source software-based networking technology contributed to the Open Compute Project. Analyst Patrick Moorhead pointed out that despite years of predictions about custom silicon displacing Nvidia, the GPU maker has only grown — now approaching a $5 trillion market cap.

The real dynamic is more nuanced. Google uses TPUs internally for its own models — Gemini, Veo, Imagen — while offering both TPUs and Nvidia GPUs to external customers. The split architecture lets Google optimize its own workloads aggressively while giving customers flexibility. For enterprises locked into Nvidia’s CUDA ecosystem, TPUs remain a harder sell. But for cost-sensitive inference workloads where Google controls the software stack, the 80% price-performance advantage is hard to ignore.

Business Insight — The “supplement not replace” narrative is strategic diplomacy. Google needs Nvidia’s ecosystem to attract enterprise customers, but every workload it moves onto TPUs improves its own margins. Watch the ratio of TPU-to-GPU usage in Google Cloud’s earnings reports — that is the real scoreboard.

The Agentic Era Demands Specialized Hardware

Google explicitly positioned TPU 8 for what it calls “the agentic era” — a world where AI systems do not just answer questions but plan, reason, and take multi-step actions autonomously. This matters because agentic workloads create fundamentally different hardware demands. A single user query to an agentic system might trigger dozens of inference calls as the model reasons through sub-tasks, checks its work, and coordinates with other agents. That means inference volume per user interaction could increase by an order of magnitude compared to simple chatbot queries.

TPU 8i’s Collectives Acceleration Engine is specifically designed for this pattern — autoregressive decoding and chain-of-thought reasoning where the chip must rapidly process sequential token generation while maintaining coherence across a distributed cluster. The 19.2 Tb/s interconnect bandwidth is tuned for mixture-of-experts models where different parts of the network activate on different chips. This is not a general-purpose improvement; it is a targeted bet on the architecture of the next generation of AI applications.

Business Insight — If agentic AI delivers on its promise, inference compute demand could grow 10-50x per user session compared to today’s chatbot interactions. The companies that own the cheapest inference silicon will capture the most margin in that future. Google is betting TPU 8i is that silicon.

What This Means for the AI Infrastructure Race

Google’s two-chip strategy arrives in a market where the infrastructure arms race shows no signs of slowing. Anthropic just committed $100 billion in cloud spending to AWS over the next decade. OpenAI’s $122 billion raise in Q1 2026 was partly justified by infrastructure needs. Microsoft is building custom chips (Maia) while maintaining its Nvidia partnership. Amazon has Trainium and Inferentia. Every hyperscaler is running the same playbook — build custom silicon for internal efficiency while offering Nvidia for external customers.

But Google may have the most vertically integrated stack of them all. It designs the chips, builds the models, controls the cloud platform, and operates the end-user products (Search, Workspace, Android) where those models are deployed. Both TPU 8 variants deliver 2x better performance-per-watt through integrated power management and liquid cooling — a detail that matters when data centers are increasingly constrained by power availability rather than floor space. With general availability expected later in 2026, the real test will be whether third-party developers adopt TPUs at scale or remain in Nvidia’s orbit.

Business Insight — The AI chip market is splitting into two tiers: companies that design silicon to run their own models (Google, Amazon, Meta) and companies that buy silicon to run everyone else’s (Nvidia). Both tiers will grow, but the margin structures are fundamentally different — and the two-chip strategy suggests Google intends to be the low-cost producer in its tier.

Sources

AI Biz Insider · AI Business EN · aibizinsider.com

Google Built Two AI Chips — Here’s Why Nvidia Should Care

The Split Strategy: Why Two Chips Beat One

TPU 8t: Built for Massive-Scale Training

TPU 8i: Purpose-Built for Inference at Scale

The Nvidia Question: Supplement, Not Replace

The Agentic Era Demands Specialized Hardware

What This Means for the AI Infrastructure Race

Related

Sources

이것이 좋아요:

AI Biz Insider에서 더 알아보기

코멘트

댓글 남기기응답 취소

더 많은 게시물

이 AI가 무료라고?

SpaceX Killed a $2B Fundraise With One Phone Call

미국 빼고 AI 동맹 맺었다…

Meta Fired 8,000 People to Hire One 28-Year-Old

Google Built Two AI Chips — Here’s Why Nvidia Should Care

The Split Strategy: Why Two Chips Beat One

TPU 8t: Built for Massive-Scale Training

TPU 8i: Purpose-Built for Inference at Scale

The Nvidia Question: Supplement, Not Replace

The Agentic Era Demands Specialized Hardware

What This Means for the AI Infrastructure Race

Related

Sources

이 글 공유하기:

이것이 좋아요:

AI Biz Insider에서 더 알아보기

코멘트

댓글 남기기응답 취소

더 많은 게시물

이 AI가 무료라고?

SpaceX Killed a $2B Fundraise With One Phone Call

미국 빼고 AI 동맹 맺었다…

Meta Fired 8,000 People to Hire One 28-Year-Old

AI Biz Insider에서 더 알아보기

AI Biz Insider에서 더 알아보기