The increasing demand for financial data analysis and management has propelled the expansion of question-answering (QA) systems powered by artificial intelligence (AI). These systems improve customer service, aid in risk management, and provide personalized stock recommendations, thus requiring a comprehensive understanding of financial data. This data’s complexity, domain-specific terminology, market instability, and decision-making processes make it complex to analyze. Against this backdrop, long-form question answering (LFQA) scenarios gain importance due to the intricate tasks they entail, such as data retrieval, summarization, analysis, comprehension, and reasoning.
Several LFQA datasets like ELI5, WikiHowQA, and WebCPM are publicly available, but none cater specifically to the financial sector. This is a considerable gap given that complex, open-domain questions typically need extensive paragraph-length responses and relevant document retrieval. Current financial QA standards struggle to manage the diversity and complexity of these questions as they heavily depend on numerical calculation and sentiment analysis.
To address these challenges, scientists from HSBC Lab, Hong Kong University of Science and Technology (Guangzhou), and Harvard University have developed FinTextQA, a novel dataset for evaluating QA models regarding general finance, policy, or regulation. The dataset comprises LFQAs drawn from finance textbooks and government agencies’ websites. With 1,262 QA pairs and document contexts, FinTextQA was selected through five rounds of human screening, encompasses six question categories, and boasts an average text length of 19,7k words.
While this dataset is a significant advancement in the field, the researchers acknowledge its limitations. For instance, there are fewer QA pairs in comparison to larger AI-generated datasets. Therefore, models trained on FinTextQA may not generalize to broader real-world scenarios. Furthermore, acquiring high-quality data remains a challenge and often copyright restrictions impede sharing it. As a result, future studies should concentrate on strategies to overcome data scarcity and data augmentation. Exploring more advanced RAG functionalities and retrieval methods, along with expanding the dataset to include a wider range of sources, may also prove beneficial.
Despite these hurdles, the FinTextQA dataset presents a critical progress in augmenting financial concept comprehension and support by introducing the first LFQA financial dataset and conducting extensive benchmark trials on it. Besides showcasing the efficacy of different model configurations, the team’s experimental research emphasizes the significance of enhancing current strategies to render financial QA systems more accurate and comprehensible.