This is a joint collaboration post between Salesforce and AWS, in which they discuss how the Salesforce Einstein AI Platform team has utilized Amazon SageMaker to enhance the efficiency and performance of their code generation LLM (Large Language Models) features, known as CodeGen.
Salesforce, a cloud-based software company, offers customer relationship management (CRM) software applications focused on numerous business sectors. In recent years, they have developed their AI technology (Salesforce Einstein) to include over 60 features designed to improve business productivity and engagement with customers. Salesforce’s commitment towards the development of AI models is ongoing, with current efforts being concentrated on improving the performance and capabilities of LLMs for use with Einstein product offerings.
An internal challenge that the Salesforce team addressed was how to efficiently host CodeGen, which is an open source LLM for code understanding and generation. After a comprehensive evaluation of potential solutions and services, the Salesforce team decided that Amazon SageMaker was the most efficient and effective solution, due to its access to GPUs, scalability, flexibility, and performance optimizations.
In using Amazon SageMaker, the Salesforce team was able to significantly improve their CodeGen model performance. This was accomplished through the utilization of a blueprint of model performance optimization parameters provided by SageMaker LMI, used in conjunction with NVIDIA’s FasterTransformer library. The results of this integration saw the Salesforce system now capable of handling around 400 requests per minute, with a much reduced latency of approximately seven seconds per request, which represents an over 6,500 percent increase in throughput.
The Salesforce team learned several key lessons from this experience. For instance, it’s important to remain up-to-date with the latest inferencing engines and optimization techniques as they significantly influence model optimization. It’s also necessary to tailor optimization strategies according to each model, which might require a unique approach. Other takeaways include the need to implement cost-effective model hosting, such as virtualization that can host multiple models on a single GPU, and keep pace with technological innovations, such as Amazon SageMaker JumpStart and Amazon Bedrock.
In conclusion, by using Amazon SageMaker, the Salesforce Einstein AI Platform team was able to drastically improve the latency and throughput of their code generation LLM, which has resulted in an impressive increase in throughput performance. This cooperative venture showcases the benefits of using innovative and advanced technologies to optimize business models and improve efficiency.