This week at the Cloud Next conference, Google unveiled the latest generation of its artificial intelligence accelerator chip, TPU.
The new chip, dubbed Ironwood, is Google‘s seventh generation of TPU and the first to be optimized for inference, or running artificial intelligence models. Scheduled to launch later this year for Google Cloud customers, Ironwood will be available in two configurations: 256-chip cluster and 9216-chip cluster.
“Ironwood is our most powerful, high-performance, and energy-efficient TPU yet,” wrote Google Cloud vice president Amin Wahdat in a blog post provided to TechCrunch. “And it’s specifically designed to run thinking, inferential AI models at scale.”
Ironwood comes at a time when competition in the AI accelerator market is heating up. Nvidia may be the leader, but tech giants including Amazon and Microsoft are pushing their own solutions. Amazon has its Trainium, Inferentia, and Graviton processors available through AWS, and Microsoft hosts Azure instances for its Cobalt 100 AI chip.
Ironwood can deliver 4,614 TFLOPs of processing power at peak, according to Google’s internal benchmarking. Each chip has 192 GB of dedicated RAM with a bandwidth approaching 7.4 Tbps.
Ironwood has an advanced specialized SparseCore core to handle the types of data common in “advanced ranking” and “recommendation” workloads (e.g., an algorithm that suggests clothes you might like). The TPU architecture has been designed to minimize data movement and on-chip latency, which leads to power savings, Google says.
In the near future, Google plans to integrate Ironwood with its Artificial Intelligence Hypercomputer, a modular computing cluster in the Google Cloud, Wahdat added.
“Ironwood represents a unique breakthrough in the age of inference,” Wahdat said, “with increased processing power, storage, […] networking advances, and reliability.