Get ready, because the AI world is shifting! Google's latest move puts a spotlight on inference, and it could change everything. For years, Google has been quietly crafting its own custom AI accelerators, the Tensor Processing Units (TPUs). Now in their seventh generation, these aren't your run-of-the-mill, general-purpose chips like those from Nvidia. Google's TPUs are laser-focused, designed specifically for AI tasks.
Google's newest creation, the Ironwood TPU, is poised to make waves. Set to roll out to Google Cloud customers soon, it's also paired with the new Arm-based Axion virtual machine instances, promising a big boost in performance for every dollar spent. The goal? To make AI inference and agentic AI workloads more affordable.
But what exactly is the 'age of inference'? While the Ironwood TPU can still handle the heavy lifting of AI training – feeding massive amounts of data to teach AI models – it's built to excel at inference. Google's blog post highlights that Ironwood offers a 10x peak performance improvement over its predecessor, the TPU v5p, and over 4x better performance per chip compared to the TPU v6e (Trillium) for both training and inference.
Inference is the process of using a trained AI model to generate a response. It's less computationally demanding than training, but it demands quick responses and the ability to handle a huge volume of requests. Google believes the future lies in this shift, with organizations moving from training models to actually using them. Agentic AI, the current hot topic, is essentially a series of inference tasks. With AI becoming more and more integrated, Google anticipates a near-exponential surge in demand for computing power.
For AI companies, like Anthropic, efficiency is key. Anthropic, for instance, has just inked a deal to expand its use of Google's TPUs for both training and inference. Under this new deal, Anthropic will have access to 1 million TPUs, helping them reach their goal of generating $70 billion in revenue and becoming cash-flow-positive by 2028. The efficiency of Google's new TPUs was likely a major factor in this agreement.
Here's where it gets interesting: Google Cloud is playing catch-up in the cloud computing arena, trailing behind Microsoft Azure and Amazon Web Services. But AI could be the game-changer. Both Microsoft and Amazon are also investing heavily in AI computing capabilities and creating their own custom AI chips. Google Cloud, though smaller, is rapidly expanding and gaining ground on AWS.
In the third quarter, Google Cloud saw a 34% year-over-year revenue increase, reaching $15.2 billion, and generated $3.6 billion in operating income, resulting in an operating margin of roughly 24%. Meanwhile, AWS grew revenue by 20% to $33 billion, while Azure and other Microsoft cloud services saw a 40% increase.
As more companies transition from experimenting with AI to deploying real-world AI applications, Google, with its vast TPU resources, is well-positioned to benefit. Having worked on these chips for a decade, Google may have a significant advantage as the need for AI computing explodes.
What do you think? Will Google's focus on inference give it a winning edge in the AI race? Share your thoughts in the comments below!