Last Updated on November 29, 2023 by SPN Editor
In a significant development, Amazon Web Services (AWS), Amazon’s cloud computing division, has introduced its newest AI chips designed specifically for model training and inference. This strategic move is a response to the escalating demand for generative AI and the current GPU shortage.
The demand for generative AI, which typically relies on GPUs for training and operation, is on the rise. This has led to a shortage of GPUs, with Nvidia’s top-performing chips reportedly sold out until 2024. The CEO of chipmaker TSMC recently suggested that the GPU shortage from Nvidia and its competitors could extend into 2025.
In response to this, companies that can afford it, such as tech giants, are developing custom chips specifically designed for the creation, iteration, and productization of AI models. Amazon is one such company, which recently unveiled the latest generation of its AI chips for model training and inference at its annual re:Invent conference.
The first of these, AWS Trainium2, is designed to offer up to 4x better performance and 2x better energy efficiency than the first-generation Trainium, which was introduced in December 2020. Available in EC Trn2 instances in clusters of 16 chips in the AWS cloud, Trainium2 can scale up to 100,000 chips in AWS’ EC2 UltraCluster product.
A cluster of 100,000 Trainium chips can deliver 65 exaflops of compute, which equates to 650 teraflops per single chip. While there may be complicating factors affecting this calculation, it’s estimated that a single Trainium2 chip can deliver approximately 200 teraflops of performance. This puts it well above the capacity of Google’s custom AI training chips around 2017.
Amazon claims that a cluster of 100,000 Trainium chips can train a 300-billion parameter AI large language model in weeks rather than months. Parameters, which are learned from training data, essentially define the skill of the model on a problem, such as generating text or code. This is about 1.75 times the size of OpenAI’s GPT-3, the predecessor to the text-generating GPT-4.
Amazon’s Custom AI Chips: Trainium and Inferentia
Amazon has custom-designed two new AI chips, Trainium and Inferentia, to provide AWS customers with an alternative to GPUs for training their large language models. These AI chips are expected to offer a cost-effective, high-throughput solution for running models.
Trainium, specifically, is a chip designed for training deep learning models. It promises up to 50% cost savings when training, compared to similar Amazon Elastic Compute Cloud (Amazon EC2) instances.
Amazon is striving to establish itself in the realm of generative AI. The company has been developing these chips in a nondescript office building in Austin, Texas. Adam Selipsky, the CEO of Amazon Web Services, expressed confidence in Amazon’s ability to meet the collective capacity needs of its customers.
However, Amazon is more accustomed to pioneering markets rather than chasing them. For the first time in a while, they find themselves playing catch-up.
Alongside the launch of the new AI chips, Amazon announced a deeper collaboration with Anthropic. Anthropic, a leading provider of foundation models and advocate for the responsible use of generative AI, will train and deploy their future foundation models on the AWS Cloud using Trainium and Inferentia AI chips.
This partnership is expected to give AWS customers access to future generations of Anthropic’s foundation models through Amazon Bedrock.
In the long run, Amazon’s custom silicon could give it an edge in generative AI. The company started production of custom silicon back in 2013 with a piece of specialized hardware called Nitro. Now, with the introduction of Trainium and Inferentia, Amazon is poised to make a significant impact in the field of Artificial Intelligence (AI) and machine learning.
As the demand for AI continues to grow, Amazon’s new AI chips represent a significant step forward in providing powerful, cost-efficient, and energy-saving solutions for training and running AI models.
[…] chatbot Q is named after a character from either the James Bond films or the Star Trek TV series, depending […]
[…] the input side, the model accommodates up to 100 languages depending on the task. Moreover, SeamlessM4T inherently identifies the source language(s), […]
[…] terms of speed and accuracy, Google asserts that Gemini outperforms other AI language models. However, while ChatGPT is also a highly accurate and quick AI language model, it demands more […]
[…] of X (previously known as Twitter). It is built on the foundation of Grok-0, X’s proprietary Large Language Model (LLM) that has been developed over the past four […]
[…] past experiments, large language models have been employed to solve mathematical problems with known solutions. FunSearch operates by […]
[…] investments are envisioned to enhance competitiveness in the field of large language models (LLMs) and other forms of generative AI. The Netherlands is strategically positioning itself to be […]
[…] to pushing the boundaries of AI technology. The company claims superior performance on global language model benchmarks such as MMLU, HellaSwag, BBH, PIQA, and […]
[…] week, the Chinese authorities sanctioned 14 large language models (LLMs) for public use, marking the fourth round of approvals. Among the beneficiaries of these […]
[…] report by the Lords’ Communications and Digital Committee focused on large language models and tools such as ChatGPT. The report advocated for revised copyright laws and called on the […]
[…] to accelerate machine learning workloads. They can significantly reduce the time required to train machine learning models, making the process more […]
[…] a user’s prompt, expands it into a more comprehensive set of instructions, and then employs an AI model trained on videos and images to generate the new […]
[…] understanding of how AI can assist with cyber offense and defense countermeasures, and developing large language models that are more resilient to threats. The funding supports researchers at institutions including The […]
[…] Blackwell platform enables real-time AI generation using trillion-parameter large language models. This breakthrough allows for rapid and dynamic content creation, impacting fields such as natural […]