NVIDIA Ethernet Networking Accelerates World’s Largest AI Supercomputer, Built by xAI

NVIDIA announced that xAI’s Colossus supercomputer cluster comprising 100,000 NVIDIA Hopper GPUs in Memphis, Tennessee, achieved this massive scale by using the NVIDIA Spectrum-X Ethernet networking platform, which is designed to deliver superior performance to multi-tenant, hyperscale AI factories using standards-based Ethernet, for its Remote Direct Memory Access (RDMA) network.

Colossus, the world’s largest AI supercomputer, is being used to train xAI’s Grok family of large language models, with chatbots offered as a feature for X Premium subscribers. xAI is in the process of doubling the size of Colossus to a combined total of 200,000 NVIDIA Hopper GPUs.

The supporting facility and state-of-the-art supercomputer was built by xAI and NVIDIA in just 122 days, instead of the typical timeframe for systems of this size that can take many months to years. It took 19 days from the time the first rack rolled onto the floor until training began.

While training the extremely large Grok model, Colossus achieves unprecedented network performance. Across all three tiers of the network fabric, the system has experienced zero application latency degradation or packet loss due to flow collisions. It has maintained 95% data throughput enabled by Spectrum-X congestion control.

This level of performance cannot be achieved at scale with standard Ethernet, which creates thousands of flow collisions while delivering only 60% data throughput.

“AI is becoming mission-critical and requires increased performance, security, scalability, and cost-efficiency,” said Gilad Shainer, senior vice president of networking at NVIDIA. “The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis, and execution of AI workloads, and in turn accelerates the development, deployment, and time to market of AI solutions.”

“Colossus is the most powerful training system in the world,” said Elon Musk on X. “Nice work by xAI team, NVIDIA and our many partners/suppliers.”

“xAI has built the world’s largest, most-powerful supercomputer,” said a spokesperson for xAI. “NVIDIA’s Hopper GPUs and Spectrum-X allow us to push the boundaries of training AI models at a massive-scale, creating a super-accelerated and optimized AI factory based on the Ethernet standard.”

At the heart of the Spectrum-X platform is the Spectrum SN5600 Ethernet switch, which supports port speeds of up to 800Gb/s and is based on the Spectrum-4 switch ASIC. xAI chose to pair the Spectrum-X SN5600 switch with NVIDIA BlueField-3 SuperNICs for unprecedented performance.

Spectrum-X Ethernet networking for AI brings advanced features that deliver highly effective and scalable bandwidth with low latency and short tail latency, previously exclusive to InfiniBand. These features include adaptive routing with NVIDIA Direct Data Placement technology, congestion control, as well as enhanced AI fabric visibility and performance isolation — all key requirements for multi-tenant generative AI clouds and large enterprise environments.

Also Read: Infosys Unveils Small Language Models Built on NVIDIA AI Stack

NVIDIA Ethernet Networking Accelerates World’s Largest AI Supercomputer, Built by xAI

News Desk

Related Posts

Tech Mahindra Opens Consumer Agentic Experience Studio in Hyderabad

Government plans ₹2.5 Lakh Crore Credit Guarantee Scheme to Aid War-Hit Businesses

Healthtech startup Gabify raises $175,000 in pre-seed round led by Inflection Point Ventures

Tech Mahindra, IIT Bombay Partner to Develop 3D Digital Twin for Smart Infrastructure

Who Will Dominate India’s AI Infrastructure, Ambani and Adani or Global Tech?

TCS Hosts AI Hackathon for Non-Engineering Students

SAP Aims to Help 12 Million Workers Gain AI Skills by 2030

IBM Impact Accelerator: Seeks AI Solutions for Education and Workforce Development

More Articles

Hitachi Launches BuilMirai Experience Center in India for Smart Building Technologies

Tech Mahindra Opens Consumer Agentic Experience Studio in Hyderabad

Microsoft Expands Azure AI Infrastructure With AMD’s Next-Generation AI Platform

HDFC Bank, Axis Bank Shares Fall After Q1 Results as Investors Focus on Shrinking Margins

Get Weekly CXO Intelligence.

CXO Insights

The Hidden Dangers of Public Wi-Fi: Why Convenience Should Never Replace Caution

Connected Everywhere, Vulnerable Anywhere: The Security Side of Wi-Fi

Shadow AI: The Invisible Threat Growing Inside Modern Enterprises

From Barcode to Intelligence: How Traceability Is Redefining Manufacturing in India

CXO Interviews

How AI is transforming skills, education, and workforce development in the future of work

How 1Point1 Solutions Is Betting Its Future on AI to Redefine BPM

Reimagining Enterprise Transformation: Varun Goswami on the Future of NewgenONE and AI-Driven Automation

Leadership in Emerging Markets: Exclusive Interview with Jagat Shah, Chairman & CEO of MITSUMI Distribution

Easy Links

Welcome Back!

Retrieve your password

Add New Playlist