Microsoft Debuts 1st AI Chip & CPU For Cloud Infrastructure
Microsoft unveils its first two custom silicon chips for its cloud infrastructure, set to launch in 2024, that can work with large language models and theoretically reduce dependence on a pricey partnership with Nvidia.
Nvidia’s H100 GPUs have seen extreme demand, causing the GPUs to sell for as much as $40,000 on eBay, which is why Microsoft has developed its Azure Maia AI chip and Azure Cobalt CPU (128-core chip).
With Microsoft’s need to manage AI workloads, both chips are custom-built and optimised for use by the tech giant, which removes the middleman.
The Azure Cobalt CPU, named after blue pigment, is based on Arm Neoverse CSS and can run general cloud services on Azure, adjust performance as well as power per core, and per virtual machine. Microsoft says it is 40% faster than its current Arm servers.
The Azure Maia AI chip is for cloud AI workloads, such as training and inference of large language models. It has 105 billion transistors and uses sub 8-bit and MX data types to boost model speed. It will power some of Microsoft’s biggest AI workloads on Azure, including its deal with OpenAI.
“Azure’s end-to-end AI architecture, now optimized down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for our customers,” says Sam Altman, CEO of OpenAI.
According to Rani Borkar, head of Azure hardware systems and infrastructure at Microsoft, Maia is made with a 5-nanometer TSMC process and has 105 billion transistors, which is about 30% less than the 153 billion transistors in AMD’s MI300X AI GPU that competes with Nvidia.
Maia is the first to use the MX data types, sub 8-bit data types that allow hardware and software to work together better, making model training and inference faster, Borkar said.
Microsoft is part of an elite collection of companies inclusive of AMD, Arm, Intel, Meta, Nvidia, and Qualcomm that are standardising next-gen data designs for AI models.
Specifically, Microsoft intends to progress the Open Compute Project (OCP) to alter entire systems to the requirements of AI.
“Maia is the first complete liquid-cooled server processor built by Microsoft,” reveals Borkar.
“The goal here was to enable higher density of servers at higher efficiencies. Because we’re reimagining the entire stack, we purposely think through every layer, so these systems are actually going to fit in our current data center footprint.”
Microsoft’s AI servers can be deployed faster with its unique rack and liquid chiller for Maia chips, because Maia is an AI chip for cloud AI workloads, such as GPT 3.5 Turbo that runs ChatGPT, Bing AI, and GitHub Copilot.
Additionally, the tech giant designed an exclusive rack to house Maia server boards in with a “sidekick” liquid chiller that acts like a radiator comparable to one found in a car or a premium gaming PC to chill the surface of the Maia chips.
The tech company typically releases its MX data types and rack designs with partners, but not its Maia chip designs. Maia’s performance is not yet uncovered, but Microsoft says it works with Nvidia and AMD for Azure’s AI cloud.
“At the scale at which the cloud operates, it’s really important to optimize and integrate every layer of the stack, to maximize performance, to diversify the supply chain, and frankly to give our customers infrastructure choices,” says Borkar.
He also highlighted that Microsoft’s chips can lower AI costs for customers, as Nvidia’s AI chips are in high demand. The tech company will release more chips in a series, said Borkar.
The tech firm is staying mum about new server costs, but it has already quietly established its Copilot for Microsoft 365 for an extra $30 per month per user.
Copilot for Microsoft 365 is only available to Microsoft’s biggest clients at the moment, with enterprise users needing to sign up for at least 300 users to access its new AI-powered Office assistant.
As Microsoft continues to add more Copilot features this week and rename Bing Chat, Maia could soon assist with the demand for the AI chips that enable these new features.