π΄ Website π https://u-s-news.com/
Telegram π https://t.me/usnewscom_channel
At the recent Hot Chips 2024 symposium, Microsoft revealed details about its first-generation custom AI accelerator, the Maia 100, designed for large-scale AI workloads on its Azure platform.
Unlike its rivals, Microsoft has opted for older HBM2E memory technology, integrated with the intriguing ability to “unlock new capabilities” via firmware updates. This decision appears to be a strategic move to balance performance and cost efficiency.
The Maia 100 accelerator is a reticle-size SoC, built on TSMCβs N5 process and featuring a COWOS-S interposer. It includes four HBM2E memory dies, delivering 1.8TBps bandwidth and 64GB capacity, tailored for high-throughput AI workloads. The chip is designed to support up to 700W TDP but is provisioned at 500W, making it energy-efficient for its class.
“Not as capable as a Nvidia H100”
Microsoft’s approach with Maia 100 emphasizes a vertically integrated architecture, from custom server boards to specialized racks and a software stack designed to enhance AI capabilities. The architecture includes a high-speed tensor unit and a custom vector processor, supporting various data formats and optimized for machine learning needs.
Additionally, the Maia 100 supports Ethernet-based interconnects with up to 4800Gbps all-gather and scatter-reduced bandwidth, using a custom RoCE-like protocol for reliable, secure data transmission.
Patrick Kennedy from ServeTheHome reported on Maia at Hot Chips, noting, βIt was really interesting that this is a 500W/ 700W device with 64GB of HBM2E. One would expect it to be not as capable as a Nvidia H100 since it has less HBM capacity. At the same time, it is using a good amount of power. In todayβs power-constrained world, it feels like Microsoft must be able to make these a lot less expensive than Nvidia GPUs.β
The Maia SDK simplifies deployment by allowing developers to port their models with minimal code changes, supporting both PyTorch and Triton programming models. This enables developers to optimize workload performance across different hardware backends without sacrificing efficiency.