Amazon Bedrock adds Anthropic’s Claude 3 family and Mistral Large

Amazon Web Services Inc. recently announced that Anthropic PBC, an artificial intelligence safety and research company, will make its Claude 3 family of models available on Amazon Bedrock. Claude 3 Sonnet and Claude 3 Haiku are already available on Bedrock, and today Claude 3 Opus is generally available to AWS customers.

AWS says this should help customers test, build and deploy generative AI apps. With responsible AI built in, Amazon Bedrock provides a range of fully managed large language models and foundation models, as well as ease-of-use capabilities, and it is the easiest way to build and scale gen AI applications.

This news comes shortly after Mistral Large, the latest and most advanced large language model from Mistral AI, a leading AI startup out of France, became available. Mistral Large joins the already available Mistral 7B and Mixtral 8x7B models. Customers can use Mistral AI’s models to summarize, answer questions, and help organize information with their deep understanding of text structure and architecture. The work with Mistral AI and Anthropic solidifies AWS’ commitment to delivering AI solutions to its customers across various industries.

Before Opus’ general availability announcement, I talked Bedrock with Vasi Philomin, AWS’ vice president of generative AI. He told me that access to multiple types of models is a clear need.

Using different models

“When you’re building a generative AI application, you’re going to need access to different models,” he told me. “There’s no way that one model will be sufficient. We’ve heard this loud and clear from our customers.” He said AWS’ position from the start was to get the best models out there and make them available on Bedrock serverless so that customers can experiment and iterate.

This new offering from Anthropic adds to Amazon Bedrock’s already extensive array of gen AI models — including existing models from Anthropic, plus AI21 Labs, Cohere, Meta Platforms, Mistral AI, Stability AI and Amazon.

Anthropic recently reported that Claude 3 Opus outperforms other available models, including OpenAI’s GPT-4, in reasoning, math and coding. Claude 3 Sonnet, according to Anthropic, is twice as fast as earlier Claude models, but it doesn’t sacrifice intelligence in the process. And Claude 3 Haiku, designed to provide near-instant responses, is the most affordable of the three.

The idea of not being locked into a specific model such as OpenAI’s GPT-4 provides users with various options. But finding the right model that will do what you need can be challenging. Philomin claims AWS has that figured out.

Evaluating what you need

“At re:Invent last year, we announced the preview of the model evaluation capability on Bedrock itself,” he said. “And here’s what it allows customers to do: You go into Bedrock and pick many models you want to evaluate. Then, for your application, you need a set of prompts that determine the kind of application that you’re working on. Then you can do a comparison between these models with your prompts.”

He said there are two approaches to the model evaluation capability within Bedrock.

“The first one is the automated approach, where we have a bunch of prepopulated metrics so customers can just use the metrics we’ve defined,” he said. “We also have some additional and public datasets if they want to use them. But, ideally, they should have their datasets — the prompts I talked about.”

Customers then use the automated evaluation capability to determine which model is better.

Using humans to figure out the best fit

“I always believe that automated evaluations give you a sense of direction,” he said. “But ultimately, you want to have a human evaluation to determine which one’s better-suited. So, as part of the model evaluation capability on Bedrock, we also offer the opportunity for you to bring in your human workforce.”

This might involve company employees reviewing and rating the model outputs for the prompts relevant to a specific application. He says they built those workflows in Bedrock for that purpose. The company has other evaluation capabilities, but Philomin thinks most will rely on the model evaluation capability.

A few key takeaways

While Microsoft has gone down the “one model to rule them all path,” which is very Redmond, AWS has chosen to build its AI strategy on openness and choice. This is a longer, more challenging road as it emphasizes the ability to evaluate, test and choose the suitable models but potentially has a better long-term payoff. Today, the company has a broad array of models available for evaluation and testing, and adding the latest batch from Anthropic enhances its lead.

If you look to customer lists to see which solution is best for you, there’s no beating the names that AWS touts that are building gen AI apps with Claude tech, including ADP, Amdocs, Bridgewater Associates, Dana-Farber Cancer Institute, Delta Air Lines, GoDaddy, Intuit, LexisNexis Legal & Professional, Pfizer, PGA TOUR and Siemens.

Models such as GPT-4 are intriguing and very capable. However, for business applications, that might require a mix of datasets. Google LLC recently announced it has Opus in preview, but Amazon Bedrock is the only service to offer all three Claude 3 models – Sonnet, Haiku and Opus – as generally available to customers, putting it in the gen AI pole position, at least for now.