Amazon’s EU Design and Construction (Amazon D&C) team is in charge of designing and constructing Amazon warehouses. They have to sift through a large number of documents and information to ensure the warehouse designs meet high standards. As part of a pilot scheme, Amazon D&C implemented an AI-powered solution built on Amazon SageMaker to help in identifying accurate information from large volumes of documents.
This solution used a Retrieval Augmented Generation (RAG) pipeline with a fine-tuned large language model (LLM). The solution was tested by Amazon engineers and their feedback was collected and analyzed to identify inaccuracies and hallucinations provided by the RAG. This feedback was then used to train the model through reinforcement learning for better results, and another LLM was used to generate feedback scores for increased training samples.
The user feedback was collected through a user interface and the engineers could select from five satisfaction levels and provide a better answer or comment on the satisfactory level of the LLM response. The analysis showed that 45% of the total feedback were negative (53 out of 118), and these errors can be improved through LLM fine-tuning and reinforcement learning.
In addition to using human feedback, AI feedback was also used for reinforcement learning. Another LLM was used to provide evaluation scores, which made the training process less subjective and dependent on a small group of subject matter experts (SMEs), and much more scalable. The adoption of AI feedback also reduced the validation workload for SMEs by about 80%.
The newly fine-tuned model was then further improved through reinforcement learning using the Proximal Policy Optimization (PPO) algorithm. This improvement led to an increase of positive scores (above 3) from 78.1% to 85.5%, and a decrease of negative scores (below 3) from 21.9% to 14.5%.
Through this process, Amazon learnt several valuable lessons like the importance of human validation and augmentation, and the utility of AI in automating the evaluation and learning cycle. The use of reinforcement learning also ensures regular improvements in bot quality, especially with the availability of new data. Amazon is planning to further automate this continuous learning process with a human in the loop.