CompMix

Download Training Set (4966 questions) Download Dev Set (1680 questions) Download Test Set (2764 questions) The CompMix benchmark is licensed under a Creative Commons Attribution 4.0 International License.

The CompMix dataset is also available at Hugging Face:
https://huggingface.co/datasets/pchristm/CompMix

Description

CompMix collates the completed versions of the conversational questions in ConvMix, that are provided directly by crowdworkers from Amazon Mechanical Turk (AMT). Questions in CompMix exhibit complex phenomena like the presence of multiple entities, relations, temporal conditions, comparisons, aggregations, and more. It is aimed at evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes). The dataset has 9,410 questions, split into train (4,966 questions), dev (1,680), and test (2,764) sets. All answers provided in the CompMix dataset are grounded to the KB (except for dates which are normalized, and other literals like names).

Further details will be provided in a dedicated write-up soon.

How was CompMix created?

CompMix collates the completed versions of the conversational questions in ConvMix, and are provided directly by the crowdworkers.

The ConvMix benchmark, on which CompMix is based, was created by real humans. We tried to ensure that the collected data is as natural as possible. Master crowdworkers on Amazon Mechanical Turk (AMT) selected an entity of interest in a specific domain, and then started issuing conversational questions on this entity, potentially drifting to other topics of interest throughout the course of the conversation. By letting users choose the entities themselves, we aimed to ensure that they are more interested into the topics the conversations are based on. After writing a question, users were asked to find the answer in eithers Wikidata, Wikipedia text, a Wikipedia table or a Wikipedia infobox, whatever they find more natural for the specific question at hand. Since Wikidata requires some basic understanding of knowledge bases, we provided video guidelines that illustrated how Wikidata can be used for detecting answers, following an example conversation. For each conversational question, that might be incomplete, the crowdworker provides a completed question that is intent-explicit, and can be answered without the conversational context. These questions constitute the CompMix dataset. We provide also the answer source the user found the answer in and question entities.

Further details will be provided in a dedicated write-up soon.

How do questions in CompMix start?

How do answers in CompMix look like?

Contact

For feedback and clarifications, please contact: Philipp Christmann (pchristm AT mpi HYPHEN inf DOT mpg DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/question-answering/.

Paper

CompMix: A Benchmark for Heterogeneous Question Answering,
Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum.
WWW '24 (Resource Track).
[Preprint] [Slides] [Video]

RAG-based Question Answering over Heterogeneous Data and Text,
IEEE Data Engineering Bulletin '24 (Special Issue on Retrieval-Augmented Generation).
[Preprint] [Website]

CompMix Leaderboard

Model	P@1	MRR	Hit@5
HQA-GPT-4 * Lehmann et al. '24	0.655	-	-
SPAGHETTI (GPT-4) Zhang et al. '24	0.565	-	-
QUASAR (Llama-8B) Christmann and Weikum '24	0.564	0.609	0.632
GPT-3 (text-davinci-003) Brown et al. '20	0.502	-	-
EXPLAIGNN Christmann et al. '23	0.442	0.518	0.617
UniK-QA Oguz et al. '22	0.440	0.467	0.494
CONVINSE Christmann et al. '22	0.407	0.437	0.483

* Result computed on a random sample of 200 questions.

How do questions in CompMix look like?

The sources in square brackets are the ones the respective answer can be found in.

Comparative

Question

Which movie is longer, Hamlet or Gone with the Wind?

Answer

Hamlet [KB, Infobox]

Superlative

Question

Which soccer player scored the most number of goals in the UEFA Euro 2004 tournament?

Answer

Milan Baroš [KB, Text, Infobox, Table]

Count

Question

How many matches has João Félix played for Portugal in 2019?

Answer

5 [Table]

Ordinal

Question

Where did the Uruguay national football team play their first recorded match?

Answer

Paso del Molino [Text]

Temporal

Question

Who was the kit manufacturer of Chelsea Football Club from 1981 to 1983?

Answer

Le Coq sportif [Text, Table]

Multiple complexities

Question

Which player was awarded the most number of Man of the match titles in the FIFA world cup of 2006?

Answer

Andrea Pirlo [KB, Text]

Ad-hoc

Question

Author of the book To Kill a Mockingbird?

Answer

Harper Lee [KB, Text, Infobox]

Simple

Question

In what year was André Jardine born?

Answer

1979 [KB, Text, Infobox]