How Nvidia enables accelerated computing

Everywhere we look these days, data-intensive applications are increasing at breakneck speed. One of the companies at the center of this development is Nvidia Corp., which has been riding high of late because of the chips it makes to power artificial intelligence.

Recently, Nvidia hosted an analyst briefing, presented John Zedlewski, senior director of data science engineering at Nvidia, about how the company is approaching accelerated computing. This post includes some thoughts on the briefing.

Nvidia was on the ground floor of accelerated computing a few decades ago. It has come a long way in that time and has speeded up considerably over the past year or two. Talking about system architectures, John made an interesting point.

“All that hardware is fantastic and sometimes, exotic as it is, isn’t successful without software to run it,” he said. “We want to make it really easy for developers to get the maximum performance out of this incredibly sophisticated hardware and make that performance easy to drop into your application domain.”

Nvidia packages its offering in platforms such as Nvidia AI and end-to-end frameworks such as Nemo for large language models and MONAI for medical imaging, Zedlewski noted. Most think of Nvidia as a graphics processing unit manufacturer, and though it is arguably best-in-class in this area, its system approach has kept it ahead of its rivals, Intel Corp. and Advanced Micro Devices Inc.

Nvidia packages its GPUs with software development kits, acceleration libraries, system software and hardware for an end-to-end solution. This simplifies the process of using Nvidia technology since it becomes almost “plug and play.”

Zedlewski added that before you train a large language model, you start by figuring out the data set you need — maybe even something as broad as all the text on the internet — which presents massive data science and data management problems.

“If you want to do that effectively, if you want to be able to iterate, refine and improve your data, you need a way to accelerate it so you’re not waiting for months for each iteration,” he said. “We hear this from our forecasting partners all the time. They say, ‘Look, we have legacy systems that have been great at doing monthly and weekly forecasts.’”

Those partners need a way to build models and run them faster to forecast, not monthly, weekly or daily. They need them in real time. Speed is also critical in other applications, such as fraud detection, genomics and cybersecurity, where massive data sets must be analyzed as events unfold. The tools data scientists use can’t keep up with the need to comb through vast data stores.

Nvidia’s Triton, an open-source inference platform specializing in deep learning inference, has been augmented to support many tree-based models that data scientists and machine learning engineers are still building across the industry.

“Increasingly, we’re seeing interest in deployment frameworks that include vector search, whether that’s a large language model that has a vector search component, image search or a recommender system,” Zedlewski told me. “So we also have accelerators for vector search by RAPIDS Raft.”

Nvidia enables data scientists to work comfortably with datasets with hundreds of millions of rows. The company also recognizes that no one tool can do everything. So it has more than 100 open-source and commercial software integrations. Zedlewski told me that those integrations are in place to make the work smooth and seamless so that building complex multi-component pipelines is simpler.

There are 350 contributors to the company’s GitHub open-source projects, translating to adopting the company’s open-source tools. More than 25% of Fortune 500 companies use RAPIDS, with enterprise adoption picking up steam, according to Zedlewski. Companies using RAPIDS include Adobe Inc., Walmart Inc. and AstraZeneca PLC.

With a central processing unit model, Walmart just couldn’t crunch through enough data in a fixed window every night to forecast how much perishable goods to ship to their stores — a decision that could have substantial financial implications. So, to fit in the time window, Walmart data scientists compromised their model quality.

That approach was not working, so the company became one of the first users of RAPIDS. As a result, Walmart achieved 100 times faster feature engineering and 20 times faster model training with RAPIDS.

Zedlewski told me he hears from large partners that when they try an experimental approach that integrates graph features in their models or integrated graph analytic steps when they must serve data, it boosts model accuracy, especially for fraud and cyber.

For such a challenge, RAPIDS cuGraph can handle preprocessing, post-processing and traditional algorithms needed for modern graph analytics. In the process, it can support trillion-plus edge graphs, all of which can work with familiar application programming interfaces and happen 85 times faster than on a CPU.

RAPIDS RAFT accelerator can take a challenging problem — sifting through hundreds of millions or even a billion pieces of content, maybe a product, an image, or a piece of text — and build on nearest-neighbor and approximate nearest-neighbor methods. This happens 10 times faster on throughput and 33 times faster on build time. And what used to consume a bank of servers can happen on a single machine.

I’m intrigued to see how the adoption of Nvidia RAPIDS will continue, and I look forward to more stories about the new applications. I’ll also be interested to see whether it integrates with the Ultra Ethernet Consortium, which promises to be better for accelerated computing and AI than InfiniBand.

I asked Nvidia about it and was told, “We share the vision that Ethernet needs to evolve in the era of AI, and our Quantum and Spectrum-X end-to-end platforms already embody these AI compute fabric virtues. These platforms will continue to evolve, and we will support new standards that may emerge.”

With that being said, the network vendors have been trying to replace InfiniBand for decades, and it has yet to displace Ethernet for high-performance workloads. Nvidia has been good about doing what’s best for the customer, so if Ultra Ethernet does live up to its promise, I’m sure it will support it. Until then, tried and true InfiniBand is there.

We’re seeing rapid development almost daily, but it’s important to remember that we are at the dawn of accelerated computing. It’s kind of like the Web in 1994. Let’s see where the next 30 years take us.