Google Ironwood TPU Powers Next-Gen AI Inference

Ironwood in the Cloud: Empowering Developers with Next-Gen AI Hardware

Google announced Ironwood as its seventh-generation Tensor Processing Unit (TPU), a specialized high-performance chip engineered to drive advanced Gemini models into new levels of artificial intelligence capability while marking the beginning of the “age of inference.” Google’s new chip represents a core transformation of their AI infrastructure designed to meet the challenging computational needs of operations such as simulated reasoning, which Google calls “thinking”. The company identifies Ironwood as an essential component of its infrastructure, which collaborates closely with its sophisticated AI models to increase inference speeds while enabling expanded processing capabilities for extensive contextual data.

Google thinks Ironwood will enable more robust agentic AI systems. This idea refers to AI systems that independently execute actions for users by collecting data and analyzing it to create pertinent results without needing direct commands. The company sees a future in which AI assistants act more helpfully and independently while Ironwood supplies the essential computational power to make this vision possible.

The new Ironwood system brings transformative performance improvements and architectural advancements beyond previous Google TPUs. The immense computational needs of next-generation AI require operation through liquid-cooled clusters that contain up to 9,216 individual chips. The chips connect with each other and communicate at extreme speeds using an upgraded Inter-Chip Interconnect (ICI). The advanced interconnect technology plays a vital role in supporting data transfer efficiency while reducing communication delays across extensive distributed computing systems.

Google has designed this scalable architecture for use both by its internal research teams and for external developers who use Google Cloud Platform. Google plans to offer Ironwood in two distinct configurations: Google will offer two Ironwood configurations: a 256-chip server for small deployments and research projects and a full 9,216-chip cluster to handle demanding AI workloads and power large-scale production systems.

A fully configured Ironwood pod delivers immense computational power, with its performance hitting 42.5 Exaflops of inference capability. Ironwood delivers processing power that has never before been possible for complex AI workloads. Google specifications reveal that each Ironwood chip achieves a maximum throughput of 4,614 TFLOPs, which represents a substantial progression from past TPU generations.

A substantial expansion of memory capacity has taken place to meet the new processing demands. The Ironwood chip includes 192GB of high-bandwidth memory, which provides six times more storage capacity than the earlier Trillium TPU. With more on-chip memory, the TPU can handle bigger datasets and model parameters more effectively, which minimizes memory transfer requirements and boosts performance speed. The memory bandwidth reached a significant milestone of 7.2 Tbps, which represents a 4.5 times increase to ensure faster data retrieval and processing activities.

Benchmarking and Contextualizing Ironwood’s Capabilities

Google gives context for Ironwood’s abilities because comparing AI hardware directly presents challenges because benchmarking methodologies differ. FP8 precision serves as the main benchmark used by the company to assess Ironwood’s performance. The assertion that Ironwood “pods” achieve speeds 24 times greater than equivalent parts of leading supercomputers needs careful examination because those supercomputers may not have native support for FP8 precision. The varying hardware capabilities between systems can lead to inaccuracies and irrelevance in direct performance comparisons.

Direct performance comparisons did not feature Google’s TPU v6 (Trillium). Google claims Ironwood provides double the power efficiency compared to TPU v6 (Trillium), which shows substantial advancements in performance per watt. According to a Google spokesperson, Ironwood serves as the direct successor to TPU v5p while Trillium functions as the follow-up to the less powerful TPU v5e. The peak FP8 performance of Trillium reached approximately 918 TFLOPS.

Ironwood is setting new standards for Agentic AI development.

Ironwood’s superior speed, combined with expanded memory capacity and better power efficiency, will have a major influence on Google’s AI ecosystem while enabling more advanced AI applications. Ironwood will build on the existing powerful infrastructure that runs advanced large language models and systems like Gemini 2.5 to deliver expanded potential for agentic AI applications.

AI systems will obtain and analyze data from multiple sources to produce suitable responses or actions for users requiring little to no direct guidance. Google sees Ironwood as the fundamental drive behind the new period of advanced AI interactions which will lead to breakthroughs in natural language processing and machine learning as well as create more capable AI agents.