Data exploration is an exciting process that can uncover patterns in datasets and reveal potential relationships among variables. By utilizing multiple steps such as filtering, sorting, and grouping, it can extract key insights from data. However, data exploration is often interactive and requires manual exploration, making it time-consuming and necessitating domain expertise. To tackle this issue, researchers at Microsoft have released InsightPilot, a system that uses LLMs to automate the process of data exploration.
InsightPilot consists of three components: an interface that allows users to ask questions in natural language and display the analysis results, an LLM that facilitates data exploration by selecting the appropriate analysis based on context, and an insight engine that does the analysis and presents the results in natural language. After a user poses a query in the interface, the insight engine generates preliminary insights. Depending on the context, the LLM identifies the most relevant insights and keeps querying the engine to get more details. At the end of the data exploration step, the engine presents the top-K insights in a coherent report, which is then displayed to the user via the interface.
To evaluate its performance, the researchers conducted user studies to simulate real-world use cases of InsightPilot. The results showed that InsightPilot consistently outperformed both OpenAI Code Interpreter and Langchain Pandas Agent. In addition, a case study based on a car sales dataset was conducted to assess the performance of InsightPilot. When enquiring about the overall trend of Toyota’s car sales, the system not only identified ‘Camry’ as the key driver of Toyota’s sales but also compared Toyota’s sales with that of Honda and provided other interesting insights as well.
InsightPilot is an exciting breakthrough that could revolutionize the process of data exploration and save time and effort. Although it performs better than other state-of-the-art systems, it often produces vague answers that necessitate manual evaluation, which is why further research is necessary to ensure the method can be deployed in real-world scenarios and bolster efficiency and data-driven decision-making. We can’t wait to see what this system can do and the amazing insights it will uncover!