With an increase in the adoption of pre-trained language models in recent years, the use of neural-based retrieval models has been on the rise. One of these models is Dense Retrieval (DR), known for its effectiveness and impressive ranking performance on several benchmarks. In particular, Multi-Vector Dense Retrieval (MVDR) employs multiple vectors to describe documents or queries.
Generative Retrieval (GR) is a newer evolution in the field of information retrieval that instead of relying on traditional tactics, has been designed to immediately generate relevant document identifiers for given queries. It employs a single sequence-to-sequence model to manage indexing, retrieval, and rating tasks and utilizes an encoder-decoder architecture to translate queries directly to relevant document identifiers.
However, little has been known about the interaction between GR and other retrieval techniques, particularly dense retrieval models. To rectify this, a research team from Shandong University, China and the University of Amsterdam systematically bridged the gap between MVDR methods and generative retrieval.
This research unveiled that both MVDR and GR focus on semantic matching and training targets. The team was able to show that the loss function in GR can be recreated to resemble the united MVDR framework by scrutinizing the attention layer and prediction head of the algorithm. They were also able to shed light on how GR and MVDR vary concerning document encoding and alignment.
Interestingly, the team found that both MVDR and GR utilize the same framework, aggregating the products of document vectors and an alignment matrix, to assess how pertinent a document is to a specified query.
Delving further into how both models assess relevance, the researchers discovered that GR employs unique procedures to calculate the alignment matrix and document token vectors. They cross-verified their findings, demonstrating that both models feature similar phrase matching within their alignment matrices.
In summary, the team provided new insights into Generative Retrieval (GR) from a Multi-Vector Dense Retrieval (MVDR) perspective and proposed a common paradigm for determining query-document relevance. They explored GR methods specifically, ensuring better understanding of its utilization and implementation. In-depth analytical experiments were conducted, highlighting the term-matching phenomenon and the properties of different alignment directions in both GR and MVDR, significantly contributing to the empirical comprehension of these methods. These findings by the team have been made publicly available for study and potential application expansion in the field.