With the rise of virtual business meetings in the corporate world, especially due to the impact of the COVID-19 pandemic, managing the information flow from multiple meetings has become a significant challenge. According to a survey conducted by American Express in 2023, it’s projected that by 2024, 41% of business meetings will occur in a hybrid or virtual format. The abundance of meetings can negatively impact project timelines and customer trust, with meeting summaries often disrupting focus. However, generative artificial intelligence (AI) and speech-to-text technologies can create meeting summaries automatically at the end of a call.
This article introduces a solution to automatically summarize virtual meetings using Amazon Transcribe and Amazon SageMaker Hugging Face containers. Uploaded meeting recordings are transcribed and then analysed by a large language model (LLM) to generate a summary.
Amazon Transcribe utilises automatic speech recognition (ASR) to transcribe audio data to text and supports speaker diarization, recognizing up to 10 unique speakers in a conversation. The transcribed data is then processed through Hugging Face, an open-source machine learning (ML) platform that provides extensive resources for the development of AI projects. This includes over 200,000 pre-trained models and 30,000 datasets, integrated with Amazon SageMaker for deep learning.
The workflow involves uploaded meeting recordings triggering an AWS Lambda Transcribe function, which transcribes the meeting recording to text and stores the transcripts. This triggers an inference lambda function, which processes the transcript for ML inference, sends it to a SageMaker endpoint hosting the Hugging Face model, and produces a meeting summary which is stored and emailed to subscribers.
The article uses the Mistral 7B Instruct LLM, developed by Mistral AI, which can perform summarization tasks based on user instructions. This model is equipped with over 7 billion parameters to generate contextually appropriate text.
The post also provides prerequisites for using the discussed workflow, a guide to deploying the solution, limitations of the model (which include high accuracy for English language and a context length limit), and steps to delete the deployed resources.
In conclusion, the combination of Amazon services and Hugging Face offer an efficient way of summarising meeting content to allow teams to keep track of details and progress without disrupting their attention during the sessions. For more specific contexts such as contact centre environments, this technique could be incorporated into Live Call and Post Call Analytics suites of solutions for further value.
This post is a collaborative effort by Gabriel Rodriguez Garcia, Jahed Zaïdi, Mateusz Zaremba, and Kemeng Zhang, all part of AWS Professional Services. Their roles involve assisting customers in achieving their business goals with ML use cases and implementing cloud computing initiatives.