Arista releases Etherlink platforms for AI networking focused on large clusters

This week marks the 10th anniversary of Arista Networks Inc. as a publicly traded company, and to celebrate, the company held a special event at the New York Stock Exchange, which is not only the exchange that trades ANET stock but also one of the early Arista customers.

However, the shindig at the NYSE wasn’t the only way the company marked the IPOiversary. Arista also announced some artificial intelligence-related news.

Arista Wednesday announced the launch of the Arista Etherlink AI platforms, which it engineered to enhance network performance for AI workloads, such as training and inferencing. Equipped with new AI-optimized features in Arista EOS, the Arista Etherlink AI portfolio can support AI cluster sizes from thousands to hundreds of thousands of XPUs. It utilizes efficient one- and two-tier network topologies to deliver superior application performance compared to more complex multi-tier networks, and it offers advanced monitoring capabilities, including flow-level visibility.

Recently, I had a chance to talk about the products with Arista executives Martin Hull, vice president and general manager of cloud and AI platforms, and John Peach, senior director of product management. Hull started by stating an obvious but important statement. “AI is the only topic for investors, tech analysts, customers, you name it,” he said. “And there’s a lot of conversations around what is it about AI that is different.”

He sees a couple of distinct factors. “The scale of the networks being talked about is larger,” he said. “We’re also architecting the networks now to be fully non-blocking. And in many cases, you put more network IO in the network’s core than at the edge computing. So, you have zero or negative oversubscription.”

Non-blocking is a term that many vendors use inaccurately. The term refers to a switch where all ports can operate at full rate. Many switches assume there won’t be a situation where this is required, which is usually correct, so vendors have switches that partially block traffic and then retransmit to avoid congestion.

A “blocking” switch would have bad marketing implications, so vendors have used descriptors such as “near non-blocking, which can be misleading. With AI workloads, any level of blocking could cause performance problems, and its essential network engineers purchase fully non-blocking products.

The updates to Arista’s Etherlink AI platforms

Hull shared the details of the company’s Etherlink platforms, aimed at handling the unique needs of AI networking, including 7060X6 AI Leaf, 7800R4 AI Spine, and the 7700R4 AI Distributed Etherlink Switch.

7060X6 AI Leaf switch family, available now, utilizes Broadcom Tomahawk 5 silicon, offering a capacity of 51.2 terabits per second with support for 64 800G or 128 400G Ethernet ports.
7800R4 AI Spine, which is in customer testing and will be available in the second half of the year, is the fourth generation of Arista’s 7800 modular systems that feature Broadcom Jericho3-AI processors with an AI-optimized packet pipeline. It delivers non-blocking throughput using virtual output queuing architecture, supporting up to 460 terabits per second in a single chassis, equivalent to 576 800G or 1152 400G Ethernet ports.
7700R4 AI Distributed Etherlink Switch, also in customer testing and available in the year’s second half, was designed for the largest AI clusters. It offers massively parallel distributed scheduling and congestion-free traffic spraying based on the Jericho3-AI architecture. The 7700R4 represents a new series of ultra-scalable, intelligent distributed systems, ensuring the highest consistent throughput for very large AI clusters.

The company said a single-tier network topology using Etherlink platforms will support more than 10,000 XPUs. A two-tier network with Etherlink supports more than 100,000 XPUs. Arista said reducing the number of network tiers optimizes AI application performance, decreases the number of optical transceivers, lowers costs and improves reliability. With AI, there are concerns about power running amok, so this combination can partially help offset the increase.

All the Arista Etherlink switches are compatible with the emerging Ultra Ethernet Consortium standards, which should provide additional performance benefits when UEC NICs become available.

In addition, according to the company, Arista’s EOS and CloudVision suites are critical components of the new AI-focused networking platforms, including features for networking, security, segmentation, visibility and telemetry, providing robust and reliable support for high-value AI clusters and workloads.

Moving too fast for distributed controllers

As he thought about Arista’s approach, Hull told me: “I think most of the customers have shied away from controllers, subnet managers and open flow controllers. These networks are moving too fast to have distributed controllers in them. You can have management, so we have CloudVision — all the information from this network gets sent to the CloudVision System. But that’s not the same as running the control plane separately.”

Some final thoughts

Arista is putting much of its muscle behind its move to the AI world and has been cozying up to Nvidia Corp. The results are already clear: the company is showing the value of Ethernet, even though some have said that Infiniband is the way forward for AI. It’s fitting that Arista held the event at the NYSE, as among all the network vendors, Arista has done the best job of tying its product vision and roadmap to the growth of AI.