During the AWS re:Invent 2023, Amazon announced the general availability of Knowledge Bases for Amazon Bedrock. This allows secure connection of foundation models in Amazon Bedrock to your company data using a fully managed Retrieval Augmented Generation (RAG) model.
This feature helps in improving the accuracy of the generated responses from foundation models depending upon the context provided to the model. Contexts are extracted from vector stores based on user queries. The newly released feature, hybrid search, facilitates a combination of semantic search with keyword search yielding more accurate search results.
In addition to being able to retrieve semantically relevant chunks, you can also fetch a well-defined subset of those relevant chunks based on applied metadata filters and values. This feature enables you to utilize a custom metadata file for each document in the knowledge base – aiding in filtering your retrievals. This added control over retrieved documents significantly contributes to enhancing search accuracy, consequently leading to better performance advantages like reduction in CPU cycles and cost of querying the vector store.
For instance, you can use legal documents with similar terms for different contexts, or movies that have a similar plot released in different years. You could further narrow down the results based on various filtering options like document chatbot for a software company, conversational search of an organization’s application, or intelligent search for software developers.
To use metadata filtering, users need to provide metadata files with the same name as the source data file, with a .metadata.json suffix. It is available in the AWS regions US East (N. Virginia), and US West (Oregon). The metadata can include string, number, or Boolean values.
Moreover, this post provides detailed instructions on how to prepare a dataset for use as a knowledge base and then query it with metadata filtering. Users can query using either the AWS Management Console or SDK.
The metadata filtering capability in Knowledge Bases for Amazon Bedrock helps to increase the context accuracy and cut the cost of querying the vector database. To use it, custom metadata must be added to documents and utilized as filters when retrieving and querying the documents. Some extensive resources for further information include the user guide for Knowledge bases for Amazon Bedrock, a YouTube video on the topic, and some GitHub repo code samples.