This article focuses on the issue of moderating content on online platforms, such as games and social communities, to prevent hate speech, cyberbullying, harassment, and scams, and presents a solution using various services provided by Amazon Web Services (AWS).
Social platforms need a moderation solution that is easy to initiate, customisable, and considers factors like latency and cost. Currently, many companies rely on human moderators to review toxic content, but this process can be time-consuming and challenging to scale. Therefore, the article proposes using Amazon’s services to automate moderation using various APIs and machine learning models.
One proposed approach is to use audio and text chat moderation using Amazon Transcribe, Amazon Comprehend, Amazon Bedrock, and Amazon OpenSearch Service. These services can classify toxic content using large language models (LLMs), balance simplicity, latency, cost, and offer flexibility to various requirements. The entire process can be initiated when users report violations, such as profanity, hate speech, or harassment.
The proposed workflow involves receiving the audio file, storing it on Amazon S3 bucket, and then invoking the Amazon Transcribe StartTranscriptionJob API with Toxicity Detection enabled. If the toxicity analysis returns a score exceeding a certain threshold, for example, 50%, Amazon Bedrock can evaluate the message against customized policies using LLMs.
The human moderator receives an audio moderation report highlighting the conversation segments considered toxic and in violation of policy, allowing them to make an informed decision. The system can also be triggered proactively, recording all conversations and applying analysis on its own.
Toxicity Detection in Amazon Transcribe uses machine learning to identify and classify toxic content across seven categories including sexual harassment, hate speech, threats, abuse, graphic language etc. The benefits of LLMs is their flexibility. They allow for modifying prompts in human language, leading to improved efficiency and reduced time to train the models.
Amazon Bedrock knowledge bases, another component of this system, is a managed Retrieval Augmented Generation (RAG) system. Policies can be flexibly managed in this system, retrieving only relevant policy segments for each input message which are analyzed by LLMs.
The text chat moderation workflow is similar, but uses Amazon Comprehend toxicity analysis, which is tailored for text moderation. If the toxicity analysis returns a toxicity score exceeding a certain threshold, Amazon Bedrock can evaluate the message against customized policies using the LLM.
In conclusion, Amazon provides various resources for moderating content on social platforms, making it easier to detect toxic messages and prevent violations. These solutions use pre-trained models for toxicity analysis and can be further optimized by using generative LLMs, striking a balance between accuracy, cost, and latency.