ComQA

What is ComQA?

ComQA is a dataset of 11,214 questions, which were collected from WikiAnswers, a community question answering website. By collecting questions from such a site we ensure that the information needs are ones of interest to actual users. Moreover, questions posed there are often cannot be answered by commercial search engines or QA technology, making them more interesting for driving future research compared to those collected from an engine's query log. The dataset contains questions with various challenging phenomena such as the need for temporal reasoning, comparison (e.g., comparatives, superlatives, ordinals), compositionality (multiple, possibly nested, subquestions with multiple entities), and unanswerable questions (e.g., Who was the first human being on Mars?). Through a large crowdsourcing effort, questions in ComQA are grouped into 4,834 paraphrase clusters that express the same information need. Each cluster is annotated with its answer(s). ComQA answers come in the form of Wikipedia entities wherever possible. Wherever the answers are temporal or measurable quantities, TIMEX3 and the International System of Units (SI) are used for normalization.

To cite ComQA, please use:

"ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters", Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT '19), Minneapolis, USA, 02 - 07 June 2019, pages 307 - 317.

Examples

Cluster 1

Question 1: What is the least populous country in Central America?
Question 2: What country in Central America is the least populated?
Answer: [https://en.wikipedia.org/wiki/belize]

Cluster 2

Question 1: What is the second largest city in France?
Question 2: Which city is 2nd biggest in France?
Question 3: What is the second biggest city in France?
Answer: [https://en.wikipedia.org/wiki/marseille]

Cluster 3

Question 1: US president during Vietnam conflict?
Question 2: Who was the president of the US during Vietnam war?
Question 3: Who was the US president during the war of Vietnam?
Answer: [ https://en.wikipedia.org/wiki/richard_nixon,
https://en.wikipedia.org/wiki/dwight_d._eisenhower,
https://en.wikipedia.org/wiki/lyndon_b._johnson,
https://en.wikipedia.org/wiki/john_f._kennedy,
https://en.wikipedia.org/wiki/gerald_ford]

Cluster 4

Question 1: What movie did Steven Strait act in?
Question 2: What movie did Steven Strait play in?
Answer: [ https://en.wikipedia.org/wiki/the_covenant_(film),
https://en.wikipedia.org/wiki/10,000_bc_(film),
https://en.wikipedia.org/wiki/after_(2012_film),
Extra Credit and Arthur Darks,
https://en.wikipedia.org/wiki/undiscovered,
https://en.wikipedia.org/wiki/stop-loss_(film),
https://en.wikipedia.org/wiki/city_island_(film),
Hot,
https://en.wikipedia.org/wiki/sky_high_(2005_film)]

Cluster 5

Question 1: What is the name of Kristen Stewart adopted brother?
Answer: [Taylor Stewart, Dana Stewart]

Cluster 6

Question 1: Who was the first human being on Mars?
Question 2: Who is the first human landed in Mars?
Question 3: first human in Mars?
Answer: []

Leaderboard

	System	P	R	F1
1	Abujabal et al. (2017)	21.2	38.4	22.4
2	Bast and Haussmann (2015)	20.7	37.6	21.6
3	Berant et al. (2013)	13.7	20.1	12.0
4	Berant and Liang (2015)	10.7	15.4	10.6
5	Fader et al. (2013)	7.22	6.59	6.73

ComQA

What is ComQA?

ComQA paper

Training Set

Dev Set

Test Set

Evaluation Script