CompMix

Download Training Set (4966 questions) Download Dev Set (1680 questions) Download Test Set (2764 questions) The CompMix benchmark is licensed under a Creative Commons Attribution 4.0 International License.
Creative Commons License

The CompMix dataset is also available at Hugging Face:
https://huggingface.co/datasets/pchristm/CompMix

Description

CompMix collates the completed versions of the conversational questions in ConvMix, that are provided directly by crowdworkers from Amazon Mechanical Turk (AMT). Questions in CompMix exhibit complex phenomena like the presence of multiple entities, relations, temporal conditions, comparisons, aggregations, and more. It is aimed at evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes). The dataset has 9,410 questions, split into train (4,966 questions), dev (1,680), and test (2,764) sets. All answers provided in the CompMix dataset are grounded to the KB (except for dates which are normalized, and other literals like names).

Further details will be provided in a dedicated write-up soon.

How was CompMix created?

CompMix collates the completed versions of the conversational questions in ConvMix, and are provided directly by the crowdworkers.

The ConvMix benchmark, on which CompMix is based, was created by real humans. We tried to ensure that the collected data is as natural as possible. Master crowdworkers on Amazon Mechanical Turk (AMT) selected an entity of interest in a specific domain, and then started issuing conversational questions on this entity, potentially drifting to other topics of interest throughout the course of the conversation. By letting users choose the entities themselves, we aimed to ensure that they are more interested into the topics the conversations are based on. After writing a question, users were asked to find the answer in eithers Wikidata, Wikipedia text, a Wikipedia table or a Wikipedia infobox, whatever they find more natural for the specific question at hand. Since Wikidata requires some basic understanding of knowledge bases, we provided video guidelines that illustrated how Wikidata can be used for detecting answers, following an example conversation. For each conversational question, that might be incomplete, the crowdworker provides a completed question that is intent-explicit, and can be answered without the conversational context. These questions constitute the CompMix dataset. We provide also the answer source the user found the answer in and question entities.

Further details will be provided in a dedicated write-up soon.

How do questions in CompMix start?

How do answers in CompMix look like?

Contact

For feedback and clarifications, please contact: Philipp Christmann (pchristm AT mpi HYPHEN inf DOT mpg DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/question-answering/.

Paper

"CompMix: A Benchmark for Heterogeneous Question Answering",
Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum.
WWW '24 (Resource Track).
[Preprint] [Slides] [Video]

CompMix Leaderboard

Model P@1 MRR Hit@5
HQA-GPT-4 *
Lehmann et al. '24
0.655 - -
SPAGHETTI (GPT-4)
Zhang et al. '24
0.565 - -
GPT-3 (text-davinci-003)
Brown et al. '20
0.502 - -
EXPLAIGNN
Christmann et al. '23
0.442 0.518 0.617
UniK-QA
Oguz et al. '22
0.440 0.467 0.494
CONVINSE
Christmann et al. '22
0.407 0.437 0.483

* Result computed on a random sample of 200 questions.

How do questions in CompMix look like?

The sources in square brackets are the ones the respective answer can be found in.

Comparative
Question
Which movie is longer, Hamlet or Gone with the Wind?
Answer
Hamlet [KB, Infobox]
Superlative
Question
Which soccer player scored the most number of goals in the UEFA Euro 2004 tournament?
Answer
Milan Baroš [KB, Text, Infobox, Table]
Count
Question
How many matches has João Félix played for Portugal in 2019?
Answer
5 [Table]
Ordinal
Question
Where did the Uruguay national football team play their first recorded match?
Answer
Paso del Molino [Text]
Temporal
Question
Who was the kit manufacturer of Chelsea Football Club from 1981 to 1983?
Answer
Le Coq sportif [Text, Table]
Multiple complexities
Question
Which player was awarded the most number of Man of the match titles in the FIFA world cup of 2006?
Answer
Andrea Pirlo [KB, Text]
Ad-hoc
Question
Author of the book To Kill a Mockingbird?
Answer
Harper Lee [KB, Text, Infobox]
Simple
Question
In what year was André Jardine born?
Answer
1979 [KB, Text, Infobox]