A methodology for creating question answering corpora using inverse data annotation

Deriu, Jan Milan; Mlynchyk, Katsiaryna; Schläpfer, Philippe; Rodrigo, Alvaro; von Grünigen, Dirk; Kaiser, Nicolas; Stockinger, Kurt; Agirre, Eneko; Cieliebak, Mark

doi:10.18653/v1/2020.acl-main.84

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-20319

Full metadata record

DC Field	Value	Language
dc.contributor.author	Deriu, Jan Milan	-
dc.contributor.author	Mlynchyk, Katsiaryna	-
dc.contributor.author	Schläpfer, Philippe	-
dc.contributor.author	Rodrigo, Alvaro	-
dc.contributor.author	von Grünigen, Dirk	-
dc.contributor.author	Kaiser, Nicolas	-
dc.contributor.author	Stockinger, Kurt	-
dc.contributor.author	Agirre, Eneko	-
dc.contributor.author	Cieliebak, Mark	-
dc.date.accessioned	2020-08-05T15:21:20Z	-
dc.date.available	2020-08-05T15:21:20Z	-
dc.date.issued	2020-07	-
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/20319	-
dc.description.abstract	In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.	de_CH
dc.language.iso	en	de_CH
dc.publisher	Association for Computational Linguistics	de_CH
dc.rights	https://creativecommons.org/licenses/by/4.0/	de_CH
dc.subject	Natural language interface to database	de_CH
dc.subject	Artificial intelligence	de_CH
dc.subject	Deep learning	de_CH
dc.subject	Semantic parsing	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.subject.ddc	400: Sprache und Linguistik	de_CH
dc.title	A methodology for creating question answering corpora using inverse data annotation	de_CH
dc.type	Konferenz: Paper	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.18653/v1/2020.acl-main.84	de_CH
dc.identifier.doi	10.21256/zhaw-20319	-
zhaw.conference.details	58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), online, 5-10 July 2020	de_CH
zhaw.funding.eu	No	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.pages.end	911	de_CH
zhaw.pages.start	897	de_CH
zhaw.publication.status	publishedVersion	de_CH
zhaw.publication.review	Peer review (Publikation)	de_CH
zhaw.title.proceedings	Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics	de_CH
zhaw.webfeed	Information Engineering	de_CH
zhaw.webfeed	Natural Language Processing	de_CH
zhaw.funding.zhaw	LIHLITH - Learning to Interact with Humans by Lifelong Interaction with Humans	de_CH
zhaw.funding.zhaw	EU Horizon 2020: INODE - Intelligent Open Data Exploration	de_CH
zhaw.author.additional	No	de_CH
zhaw.display.portrait	Yes	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2020_Deriu-etal_Question-answering-corpora-inverse-data-annotation.pdf		556.6 kB	Adobe PDF	View/Open

Show simple item record

Deriu, J. M., Mlynchyk, K., Schläpfer, P., Rodrigo, A., von Grünigen, D., Kaiser, N., Stockinger, K., Agirre, E., & Cieliebak, M. (2020). A methodology for creating question answering corpora using inverse data annotation [Conference paper]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 897–911. https://doi.org/10.18653/v1/2020.acl-main.84

Deriu, J.M. et al. (2020) ‘A methodology for creating question answering corpora using inverse data annotation’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 897–911. Available at: https://doi.org/10.18653/v1/2020.acl-main.84.

J. M. Deriu et al., “A methodology for creating question answering corpora using inverse data annotation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp. 897–911. doi: 10.18653/v1/2020.acl-main.84.

DERIU, Jan Milan, Katsiaryna MLYNCHYK, Philippe SCHLÄPFER, Alvaro RODRIGO, Dirk VON GRÜNIGEN, Nicolas KAISER, Kurt STOCKINGER, Eneko AGIRRE und Mark CIELIEBAK, 2020. A methodology for creating question answering corpora using inverse data annotation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Conference paper. Association for Computational Linguistics. Juli 2020. S. 897–911

Deriu, Jan Milan, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, and Mark Cieliebak. 2020. “A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.” Conference paper. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 897–911. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.84.

Deriu, Jan Milan, et al. “A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 897–911, https://doi.org/10.18653/v1/2020.acl-main.84.