Stop using servers to do things they’re not good at: How DPUs can change the data game

This syndicated post originally appeared at Zeus Kerravala – SiliconANGLE.

If the definition of insanity is doing the same thing repeatedly and expecting a different result, it might be time to stop trying to get servers to do something they’re not designed to do.

Servers are effective at many things but were never designed for the rigors of software-defined networking, storage and security. Servers excel in processing massive amounts of data, but they should not be performing functions such as network address translation, telemetry, firewall, storage-related services and, down the road, artificial intelligence functions such as AllReduce.

This is where data processing units, or DPUs, can add significant value. DPUs disaggregate data processing from servers and SDNs, freeing servers to do what they are best at. There are currently many DPU providers, but they are not equal. The AMD Pensando DPU is a fully programmable, Ethernet-based platform that delivers, at scale, cloud, compute, network, storage and security services.

Built on the same tech that cloud hyperscalers use, AMD’s DPUs have been shown to deliver 1.6 million connections per second — with minimal latency and jitter — which is some 10 times more than smart network interface cards. Since AMD acquired Pensando, it has stepped on the DPU gas and is well past the industry vision. The company has several marquee customers like Goldman Sachs, Microsoft Azure, IBM Cloud, NetApp, Alibaba and Oracle Cloud Infrastructure.

The Pensando distributed services card was one of the first DPUs to support VMware vSphere 8, formerly Project Monterey. The company says this partnership aims to help companies cut ops costs with unified management of workloads while offloading processing from CPUs and adding a layer of security by isolating infrastructure services from tenants.

AMD is making it easy for customers to test-drive the product as the company has made it available with the VMware virtual Distributed Services Engine, or vDSE. This hands-on lab will allow users to compare the performance of accelerated and unaccelerated infrastructure. They’ll also be able to learn more about the security and efficiency design points of vDSE.

One of the interesting questions regarding DPUs going forward is support for Ethernet. There has been great debate in the industry regarding when and if Ethernet would displace Infiniband. The latter is fast and lossless and has been the gold standard for connectivity within compute clusters, but it also presents scheduling issues, security challenges, and management problems.

Considering the high demand for AI training and inferencing in hyper scalers, programmable congestion control, standardization on transport, telemetry and support for both scale-up and scale-out are key requirements; Ethernet now presents a better option. For Pensando, AMD uses Ethernet, the winner in every other part of networking. Ethernet has all the features that Infiniband lacks and now has the market acceptance that you need in a technology, with support from the likes of Amazon Web Services Inc., Alibaba Cloud, Google Cloud, Meta Platforms Inc., Oracle Corp. and others.

Also, a new standard for Ethernet is on the horizon – Ultra Ethernet, which closes any performance gap between Ethernet and Infiniband. AMD is a key contributor to the Ultra Ethernet Consortium. In addition to AMD, the consortium has many strong supporters, including Cisco Systems Inc., the market leader in networking; Broadcom Inc. and Intel Corp., the largest silicon manufacturers; Arista Networks Inc., the leader in high-performance networking; and Hewlett Packard Enterprise Co., one of the biggest computing vendors.

Another interesting element of the AMD Pensando DPU is that it is fully programmable via its Software-in-Silicon Development Kit or SSDK, which is an environment that enables the creation of software for the DPU. Instead of developing their own pipeline from scratch, customers can use the SSDK for programmatic control of the GPU with a custom pipeline.

The SSDK provides a container-based development environment on x86 systems. The code can run on a physical DPU or a DPU simulator on an x86 host. Key features of the SSDK include simplified setup, examples and reference pipelines to accelerate development, containerized infrastructure to build, test and debug code, and extensive documentation to navigate the development environment better.

DPUs used to be called “offload engines.” I like the term DPU more, but the “offload” metaphor is apt. For busy information technology and networking groups struggling to keep pace with the explosion of data and the need to move large chunks of it regularly, AMD is an excellent approach for DPUs.

The server era for data infrastructure processing is coming to an end. Putting processing-intensive loads on your servers is inefficient and can harm a business. So, let me ask again: Why keep doing the same thing and expect a different result? DPUs are a great option for businesses processing massive amounts of data – almost all companies.

The AI era is rapidly approaching, and this will further the data processing requirements. Organizations need to ensure their infrastructure has been modernized, and DPUs are a key part of this.

Author: Zeus Kerravala

Zeus Kerravala is the founder and principal analyst with ZK Research. Kerravala provides a mix of tactical advice to help his clients in the current business climate and long term strategic advice.