QUASAR

Description

We present the QUASAR system for question answering over unstructured text, structured tables, and knowledge graphs, with unified treatment of all sources. The system adopts a RAG-based architecture, with a pipeline of evidence retrieval followed by answer generation, with the latter powered by a moderate-sized language model. Additionally and uniquely, QUASAR has components for question understanding, to derive crisper input for evidence retrieval, and for re-ranking and filtering the retrieved evidence before feeding the most informative pieces into the answer generation. Experiments with three different benchmarks demonstrate the high answering quality of our approach, being on par with or better than large GPT models, while keeping the computational cost and energy consumption orders of magnitude lower.

The QUASAR system is a pipeline of four major stages, as illustrated in the Figure above. First, the input question is analyzed and decomposed, in order to compute a structured intent (SI) representation that will pass on to the subsequent steps, along with the original question. Second, the SI is utilized to retrieve pieces of evidence from different sources: text, KG and tables. Third, this pool of potentially useful evidence is filtered down, with iterative re-ranking, to arrive at a tractably small set of most promising evidence. The final stage generates the answer from this evidence, passing back the answer as well as evidence snippets for user-comprehensible explanation.

Further details in our IEEE Data Engineering Bulletin paper.

Code

GitHub link to QUASAR code Directly download QUASAR code

Contact

For feedback and clarifications, please contact:

Related Papers

RAG-based Question Answering over Heterogeneous Data and Text,
IEEE Data Engineering Bulletin '24 (Special Issue on Retrieval-Augmented Generation).
[Preprint]

CompMix: A Benchmark for Heterogeneous Question Answering,
Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum.
WWW '24 (Resource Track).
[Preprint] [Slides] [Video] [Website]

CompMix Benchmark

Please check out our website on CompMix for details.
Download Training Set (4966 questions) Download Dev Set (1680 questions) Download Test Set (2764 questions) The CompMix benchmark is licensed under a Creative Commons Attribution 4.0 International License.

The CompMix dataset is also available at Hugging Face:
https://huggingface.co/datasets/pchristm/CompMix