Historically, image captioning and text-to-image search have fascinated machine-learning practitioners and businesses. The open-source models, Contrastive Language-Image Pre-training (CLIP) and Bootstrapping Language-Image Pre-training (BLIP), were the first to produce near-human results. Recently, multimodal models using generative models are being used to map text and images to the same embedding space for best results. Amazon has now combined Amazon Personalize with Amazon OpenSearch Service and Amazon Titan Multimodal Embeddings to enhance user’s image search experience based on learned user preferences.
Multimodal models have widespread use in text-to-image searches across various industries. A feature these models lack is the ability to incorporate individual user preferences in the search results. A user’s preferences can ideally be learned from their previous interaction with images, which could include viewed, favorited, or downloaded images. These preferences can then be used to provide contextually relevant image responses in line with recent interactions and style preferences. This requires: creating embeddings for images, storing these embeddings in a datastore, creating a cluster for the embeddings, updating the image interactions dataset with the image cluster, creating an Amazon Personalize personalized ranking solution, and finally, serving user search requests.
To implement the solution, one would require AWS account and familiarity with Amazon Personalize, Amazon SageMaker, OpenSearch Service, and Amazon Bedrock, among others. This workflow is set to serve a user’s search request and provide personalized ranked results based on the user’s previous interactions. By combining Amazon Personalize with OpenSearch Service, the search results are influenced by the user’s previous interactions with images. As user’s style preferences change, the search results would also update, thereby delivering relevant search results in alignment with their style preferences.
For avoiding future charges, the resources created should be deleted. By combining Amazon Titan Multimodal Embeddings, OpenSearch Service vector indexing, and Amazon Personalize ML recommendations, user experience can be enhanced by recommending more relevant items in their search results by learning from their previous interactions and preferences.