In a significant development, Amazon Web Services (AWS), Amazon’s cloud computing division, has introduced its newest AI chips designed specifically for model training and inference. This strategic move is a response to the escalating demand for generative AI and the current GPU shortage.
The demand for generative AI, which typically relies on GPUs for training and operation, is on the rise. This has led to a shortage of GPUs, with Nvidia’s top-performing chips reportedly sold out until 2024. The CEO of chipmaker TSMC recently suggested that the GPU shortage from Nvidia and its competitors could extend into 2025.
In response to this, companies that can afford it, such as tech giants, are developing custom chips specifically designed for the creation, iteration, and productization of AI models. Amazon is one such company, which recently unveiled the latest generation of its AI chips for model training and inference at its annual re:Invent conference.
The first of these, AWS Trainium2, is designed to offer up to 4x better performance and 2x better energy efficiency than the first-generation Trainium, which was introduced in December 2020. Available in EC Trn2 instances in clusters of 16 chips in the AWS cloud, Trainium2 can scale up to 100,000 chips in AWS’ EC2 UltraCluster product.
A cluster of 100,000 Trainium chips can deliver 65 exaflops of compute, which equates to 650 teraflops per single chip. While there may be complicating factors affecting this calculation, it’s estimated that a single Trainium2 chip can deliver approximately 200 teraflops of performance. This puts it well above the capacity of Google’s custom AI training chips around 2017.
Amazon claims that a cluster of 100,000 Trainium chips can train a 300-billion parameter AI large language model in weeks rather than months. Parameters, which are learned from training data, essentially define the skill of the model on a problem, such as generating text or code. This is about 1.75 times the size of OpenAI’s GPT-3, the predecessor to the text-generating GPT-4.
Amazon’s Custom AI Chips: Trainium and Inferentia
Amazon has custom-designed two new AI chips, Trainium and Inferentia, to provide AWS customers with an alternative to GPUs for training their large language models. These AI chips are expected to offer a cost-effective, high-throughput solution for running models.
Trainium, specifically, is a chip designed for training deep learning models. It promises up to 50% cost savings when training, compared to similar Amazon Elastic Compute Cloud (Amazon EC2) instances.
Amazon is striving to establish itself in the realm of generative AI. The company has been developing these chips in a nondescript office building in Austin, Texas. Adam Selipsky, the CEO of Amazon Web Services, expressed confidence in Amazon’s ability to meet the collective capacity needs of its customers.
However, Amazon is more accustomed to pioneering markets rather than chasing them. For the first time in a while, they find themselves playing catch-up.
Alongside the launch of the new AI chips, Amazon announced a deeper collaboration with Anthropic. Anthropic, a leading provider of foundation models and advocate for the responsible use of generative AI, will train and deploy their future foundation models on the AWS Cloud using Trainium and Inferentia AI chips.
This partnership is expected to give AWS customers access to future generations of Anthropic’s foundation models through Amazon Bedrock.
In the long run, Amazon’s custom silicon could give it an edge in generative AI. The company started production of custom silicon back in 2013 with a piece of specialized hardware called Nitro. Now, with the introduction of Trainium and Inferentia, Amazon is poised to make a significant impact in the field of Artificial Intelligence (AI) and machine learning.
As the demand for AI continues to grow, Amazon’s new AI chips represent a significant step forward in providing powerful, cost-efficient, and energy-saving solutions for training and running AI models.