Nine notable innovations from AWS CEO Matt Garman’s re:Invent keynote

This syndicated post originally appeared at Zeus Kerravala – SiliconANGLE.

Amazon Web Services Inc. Chief Executive Matt Garman delivered a three-hour keynote at the company’s annual re:Invent conference to an audience of 60,000 attendees in Las Vegas and another 400,000 watching online, ad they heard a lot of news from the new leader, who became CEO earlier this year after joining the company in 2006.

The conference, dedicated to builders and developers, offered 1,900 in-person sessions and featured 3,500 speakers. Many of the sessions were led by customers, partners and AWS experts. In his keynote, Garman (pictured) announced a litany of advancements designed to make developers’ work easier and more productive.

Here are nine key innovations he shared:

AWS will play a big role in AI

Garman kicked off his presentation by announcing the general availability of the company’s latest Trainium chip — Trainium2 — along with EC2 Trn-2 instances. He described these as the most powerful instances for generative artificial intelligence thanks to custom processors built in-house by AWS.

He said Trainium2 delivers 30% to 40% better price performance than current graphics processing unit-powered instances. “These are purpose-built for the demanding workloads of cutting-edge gen AI training and inference,” Garman said. Trainium2 gives customers “more choices as they think about the perfect instance for the workload they’re working on.”

Beta tests showed “impressive early results,” according to Garman. He said the organizations that did the testing — Adobe Inc., Databricks Inc. and Qualcomm Inc. — all expect the new chips and instances will deliver better results and a lower total cost of ownership. He said some customers expect to save 30% to 40% over the cost of alternatives. “Qualcomm will use the new chips to deliver AI systems that can train in the cloud and then deploy at the edge,” he said.

When the announcement was made, many media outlets painted Trn2 as Amazon looking to go to war with Nvidia Crop. I asked Garman about this in the analyst Q&A, and he emphatically said that was not the case. The goal with its own silicon is to make the overall AI silicon pie bigger where everyone wins. This is how Amazon approaches the processor industry, and there is no reason to assume it will change how it handles partners other than having headlines be clickbait. More Nvidia workloads are run in the AWS cloud, and I don’t see that changing.

New servers to accommodate huge models

Today’s models have become very big and very fast, with hundreds of billions to trillions of parameters. That makes them too big to fit on a single server. To address that, AWS announced EC2 Trainium2 UltraServers. These connect four Trainium2 instances — 64 Trainium2 chips — all interconnected by high-speed, low-latency Neuronlink connectivity.

This gives customers a single ultranode with over 83 petaflops of compute power from a single compute node. Garman said this will have a “massive impact on latency and performance.” It enables very large models to be loaded into a single node to deliver much better latency and performance without having to break it up across multiple nodes. Garman said Trainium3 chips will be available in 2025 to keep up with gen AI’s evolving needs and provide the landscape customers need for their inferences.

Leveraging Nvidia’s Blackwell architecture

Garman said AWS is the easiest, most cost-effective way for customers to use Nvidia’s Blackwell architecture. AWS announced a new P6 family of instances based on Blackwell. Coming in early 2025, the new instances featuring Nvidia’s latest GPUs will deliver up to 2.5 times faster compute than the current generation of GPUs.

AWS’s collaboration with Nvidia has led to significant advancements in running generative AI workloads. Bedrock gives customers model choice: It’s not one model to rule them all but a single source for a wide range of models, including AWS’ newly announced Nova models. There won’t be a divide between applications and gen AI applications. Gen AI will be part of every application, using inference to enhance, build or change an application.

Garman said Bedrock resonates with customers because it provides everything they need to integrate gen AI into production applications, not just proofs of concept. He said customers are starting to see real impact from this. Genentech Inc., a leading biotech and pharmaceutical company, wanted to accelerate drug discovery and development by using scientific data and AI to rapidly identify and target new medicines and biomarkers for their trials. Finding all this data required scientists to scour many external and internal sources.

Using Bedrock, Genentech devised a gen AI system so scientists can ask detailed questions about the data. The system can identify the appropriate databases and papers from a huge library and synthesize the insights and data sources.

It summarizes where it gets the information and cites the sources, which is incredibly important so scientists can do their work. It used to take Genentech scientists many weeks to do one of these lookups. Now, it can be done in minutes.

According to Garman, Genentech expects to automate five years of manual efforts and deliver new medications more quickly. “Leading ISVs, like Salesforce, SAP, and Workday, are integrating Bedrock deep into their customer experiences to deliver GenAI applications,” he said.

Bedrock model distillation simplifies a complex process

Garman said AWS is making it easier for companies to take a large, highly capable frontier model and send it all their prompts for the questions they want to ask. “Then you take all of the data and the answers that come out of that, and you use that output and your questions to train a smaller model to be an expert at one particular thing,” he explained. “So, you get a smaller, faster model that knows the right way to answer one particular set of questions. This works quite well to deliver an expert model but requires machine learning involvement. You have to manage all of the data workflows and training data. You have to tune model parameters and think about model weights. It’s pretty challenging. That’s where model distillation in Bedrock comes into play.”

Distilled models can run 500% faster and 75% more cheaply than the model from which they were distilled. This is a massive difference, and Bedrock does it for you,” he said. This difference in cost can turn around the gen AI application ROI from being too expensive to roll it out in production to be very valuable. You send Bedrock sample prompts from your application, and it does all of the work.

But getting the right model is just the first step. “The real value in Generative AI applications is when you bring your enterprise data together with the smart model. That’s when you get really differentiated and interesting results that matter to your customers. Your data and your IP really make the difference,” Garman said.

AWS has expanded Bedrock’s support for a wide range of formats and added new vector databases, such as OpenSearch and Pinecone. Bedrock enables users to get the right model, accommodates an organization’s enterprise data, and sets boundaries for what applications can do and what the responses look like.

Enabling customers to deploy responsible AI — with guardrails

Bedrock Guardrails make it easy to define the safety of applications and implement responsible AI checks. “These are guides to your models,” said Garman. “You only want your gen AI applications to talk about the relevant topics. Let’s say, for instance, you have an insurance application, and customers come and ask about various insurance products you have. You’re happy to have it answer questions about policy, but you don’t want it to answer questions about politics or give healthcare advice, right? You want these guardrails saying, ‘I only want you to answer questions in this area.’”

This is a huge capability for developing production applications, Garman said. “This is why Bedrock is so popular,” he explained. “Last year, lots of companies were building POCs for gen AI applications, and capabilities like Guardrails were less critical. It was OK to have models ‘do cool things.’ But when you integrate gen AI deeply into your enterprise applications, you must have many of these capabilities as you move to production applications.”

Making it easier for developers to develop

Garman said AWS wants to help developers innovate and free them from undifferentiated heavy lifting so they can focus on the creative things that “make what you’re building unique.” Gen AI is a huge accelerator of this capability. It allows developers to focus on those pieces and push off some of that undifferentiated heavy lifting. Q Developer, which debuted in 2023, is the developers’ “AWS expert.” It’s the “most capable gen AI assistant for software development,” he said.

Q Developer helped Datapel Systems “achieve up to 70% efficiency improvements. They reduced the time needed to deploy new features, completed tasks faster, and minimized repetitive actions,” Garman said.

But it’s about more than efficiency. The Financial Industry Regulatory Authority or FINRA has seen a 20% improvement in code quality and integrity by using Q Developer to help them create better-performing and more security software. Amazon Q has the “highest reported acceptance rate of any multi-line coding assistant in the market,” said Garman.

However, a coding assistant is just a tiny part of what most developers need. AWS research shows that developers spend just one hour a day coding. They spend the rest of the time on other end-to-end development tasks.

Three new autonomous agents for Amazon Q

According to Garman, autonomous agents for generating user tests, documentation and code reviews are now generally available. The first enables Amazon Q to generate end-to-end user tests automatically. It leverages advanced agents and knowledge of the entire project to provide developers with full test coverage.

The second can automatically create accurate documentation. “It doesn’t just do this for new code,” Garman said. “The Q agent can apply to legacy code as well. So, if a code base wasn’t perfectly documented, Q can understand what that code is doing.”

The third new Q agent can perform automatic code reviews. It will “scan for vulnerabilities, flag suspicious coding patterns, and even identify potential open-source package risks” that might be present,” said Garman. It will identify where it views a deployment risk and suggest mitigations to make deployment safer.

“We think these agents can materially reduce a lot of the time spent on really important, but maybe undifferentiated tasks and allow developers to spend more time on value-added activities,” he said.

Garman also announced a new “deep integration between Q Developer and GitLab.” Q Developer functionality is now deeply embedded in GitLab’s platform. “This will help power many of the popular aspects of their Duo Assistant,” he said. Teams can access Q Developer capabilities, which will be natively available in the GitLab workflows. Garman said more will be added over time.

Mainframe modernization

Another new Q Developer capability is performing mainframe modernization, which Garman called “by far the most difficult to migrate to the cloud.” Q Transformation for Mainframe offers several agents that can help organizations streamline this complex and often overwhelming workflow. “It can do code analysis, planning, and refactor applications,” he said. “Most mainframe code is not very well-documented. People have millions of lines of COBOL code, and they have no idea what it does. Q can take that legacy code and build real-time documentation that lets you know what it does. It helps let you know which applications you want to modernize.”

Garman said it’s not yet possible to make mainframe migration a “one-click process,” but with Q, instead of a multiyear effort, it can be a “multiquarter process.”

Integrated analytics

Garman introduced the next generation of Amazon SageMaker, which he called “the center for all your data, analytics and AI needs.” He said AWS is expanding SageMaker by adding “the most comprehensive set of data, analytics, and AI tools.” SageMaker scales up analytics and now provides “everything you need for fast analytics, data processing, search data prep, AI model development and generative AI” for a single view of your enterprise data.

He also introduced SageMaker Unified Studio, “a single data and AI development environment that allows you to access all the data in your organization and act on it with the best tool for the job. Garman said SageMaker Unified Studio, which is currently in preview, “consolidates the functionality that analysts and data scientists use across a wide range of standalone studios in AWS today.” It offers standalone query editors and a variety of visual tools, such as EMR, Glue, Redshift, Bedrock and all the existing SageMaker Studio capabilities.

Even with all these new and upgraded products, solutions and capabilities, Garman promised more to come.

Author: Zeus Kerravala

Zeus Kerravala is the founder and principal analyst with ZK Research. Kerravala provides a mix of tactical advice to help his clients in the current business climate and long term strategic advice.