Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-20319
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDeriu, Jan Milan-
dc.contributor.authorMlynchyk, Katsiaryna-
dc.contributor.authorSchläpfer, Philippe-
dc.contributor.authorRodrigo, Alvaro-
dc.contributor.authorvon Grünigen, Dirk-
dc.contributor.authorKaiser, Nicolas-
dc.contributor.authorStockinger, Kurt-
dc.contributor.authorAgirre, Eneko-
dc.contributor.authorCieliebak, Mark-
dc.date.accessioned2020-08-05T15:21:20Z-
dc.date.available2020-08-05T15:21:20Z-
dc.date.issued2020-07-
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/20319-
dc.description.abstractIn this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.de_CH
dc.language.isoende_CH
dc.publisherAssociation for Computational Linguisticsde_CH
dc.rightshttps://creativecommons.org/licenses/by/4.0/de_CH
dc.subjectNatural language interface to databasede_CH
dc.subjectArtificial intelligencede_CH
dc.subjectDeep learningde_CH
dc.subjectSemantic parsingde_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.subject.ddc400: Sprache und Linguistikde_CH
dc.titleA methodology for creating question answering corpora using inverse data annotationde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
dc.identifier.doi10.18653/v1/2020.acl-main.84de_CH
dc.identifier.doi10.21256/zhaw-20319-
zhaw.conference.details58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), online, 5-10 July 2020de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end911de_CH
zhaw.pages.start897de_CH
zhaw.publication.statuspublishedVersionde_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.title.proceedingsProceedings of the 58th Annual Meeting of the Association for Computational Linguisticsde_CH
zhaw.webfeedInformation Engineeringde_CH
zhaw.webfeedNatural Language Processingde_CH
zhaw.funding.zhawLIHLITH - Learning to Interact with Humans by Lifelong Interaction with Humansde_CH
zhaw.funding.zhawEU Horizon 2020: INODE - Intelligent Open Data Explorationde_CH
zhaw.author.additionalNode_CH
zhaw.display.portraitYesde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2020_Deriu-etal_Question-answering-corpora-inverse-data-annotation.pdf556.6 kBAdobe PDFThumbnail
View/Open
Show simple item record
Deriu, J. M., Mlynchyk, K., Schläpfer, P., Rodrigo, A., von Grünigen, D., Kaiser, N., Stockinger, K., Agirre, E., & Cieliebak, M. (2020). A methodology for creating question answering corpora using inverse data annotation [Conference paper]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 897–911. https://doi.org/10.18653/v1/2020.acl-main.84
Deriu, J.M. et al. (2020) ‘A methodology for creating question answering corpora using inverse data annotation’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 897–911. Available at: https://doi.org/10.18653/v1/2020.acl-main.84.
J. M. Deriu et al., “A methodology for creating question answering corpora using inverse data annotation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp. 897–911. doi: 10.18653/v1/2020.acl-main.84.
DERIU, Jan Milan, Katsiaryna MLYNCHYK, Philippe SCHLÄPFER, Alvaro RODRIGO, Dirk VON GRÜNIGEN, Nicolas KAISER, Kurt STOCKINGER, Eneko AGIRRE und Mark CIELIEBAK, 2020. A methodology for creating question answering corpora using inverse data annotation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Conference paper. Association for Computational Linguistics. Juli 2020. S. 897–911
Deriu, Jan Milan, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, and Mark Cieliebak. 2020. “A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.” Conference paper. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 897–911. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.84.
Deriu, Jan Milan, et al. “A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 897–911, https://doi.org/10.18653/v1/2020.acl-main.84.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.