Anthropic Claude 3.5 Sonnet, a large language model (LLM), tops the S&P AI Benchmarks by Kensho – an AI Innovation Hub for S&P Global. Kensho utilised Amazon Bedrock to test the LLM across a suite of business and financial tasks, emphasising limitations of standard LLM evaluations which often fare poorly in domain-specific tasks.
Kensho’s S&P AI Benchmarks aim to fill the gap for real-world finance industry evaluations. The benchmark measures a model’s domain knowledge, quantity extraction, and quantitative reasoning – crucial components of tackling financial domain tasks. Anthropic Claude 3.5 Sonnet ranks first on this benchmark (as of July 2024), testifying to its acumen in the business and finance sector.
The model faces hurdles in various categories including domain knowledge where it must understand business and financial terms; quantity extraction where it needs to identify relevant numerical information from financial reports; and quantitative reasoning where it has to perform complex calculations to generate accurate responses to finance-specific questions. In all the three categories, Anthropic Claude 3.5 Sonnet has shown strong capability and understanding despite the complexity of the tasks involved.
Apart from its high ranking on the S&P AI Benchmarks, Anthropic Claude 3.5 Sonnet performs exceptionally across a diverse range of tasks such as undergraduate-grade knowledge (MMLU), graduate-grade reasoning (GPQA), and code generation (HumanEval). The LLM boasts improvements across visual processing, writing and content generation, and insight development.
Anthropic Claude 3.5 Sonnet is available in Amazon Bedrock, along with other models by AI-industry leaders. Amazon Bedrock offers a host of features to assist in the development of generative AI applications, with strong privacy and security controls built in. The benchmarking of Anthropic Claude 3.5 Sonnet on Amazon Bedrock was done within 24 hours due to the efficient and rapid accessibility provided by the platform.
In conclusion, the S&P AI Benchmark scores released by Kensho affirm the efficiency and reliability of Anthropic Claude 3.5 Sonnet in business and financial tasks. As AI adoption accelerates across industries, the single-API access to AI models on Amazon Bedrock is set to facilitate generative AI applications across companies of all sizes.
This post was co-authored by Qingwei Li, Joe Dunn, Raghvender Arni, Simon Zamarin, and Scott Mullins – from Amazon Web Services and AWS Generative AI GTM team – who have considerable experience in infrastructure architecture, financial services, and AI/ML solutions.