Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-29048
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDeriu, Jan-
dc.contributor.authorvon Däniken, Pius-
dc.contributor.authorTuggener, Don-
dc.contributor.authorCieliebak, Mark-
dc.date.accessioned2023-11-10T18:02:13Z-
dc.date.available2023-11-10T18:02:13Z-
dc.date.issued2023-
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/29048-
dc.description.abstractA major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensive, and automated metrics often display considerable disagreements with human judgments. In this paper, we propose to apply automated metrics for Text Generation in a preference-based evaluation protocol. The protocol features a statistical model that incorporates various levels of uncertainty to account for the error-proneness of the metrics. We show that existing metrics are generally over-confident in assigning significant differences between systems. As a remedy, the model allows to combine human ratings with automated ratings. We show that it can reduce the required amounts of human ratings to arrive at robust and statistically significant results by more than 50%, while yielding the same evaluation outcome as the pure human evaluation in 95% of cases. We showcase the benefits of the evaluation protocol for three text generation tasks: dialogue systems, machine translation, and text summarization.de_CH
dc.language.isoende_CH
dc.publisherAssociation for Computational Linguisticsde_CH
dc.rightshttp://creativecommons.org/licenses/by/4.0/de_CH
dc.subjectPreference ratingde_CH
dc.subjectAutomated metricsde_CH
dc.subjectMachine translationde_CH
dc.subjectText generationde_CH
dc.subjectBayesiande_CH
dc.subjectError correctionde_CH
dc.subject.ddc410.285: Computerlinguistikde_CH
dc.titleCorrection of errors in preference ratings from automated metrics for text generationde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitCentre for Artificial Intelligence (CAI)de_CH
dc.identifier.doi10.18653/v1/2023.findings-acl.404de_CH
dc.identifier.doi10.21256/zhaw-29048-
zhaw.conference.details61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, Canada, 9-14 July 2023de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end6474de_CH
zhaw.pages.start6456de_CH
zhaw.parentwork.editorRogers, Anna-
zhaw.parentwork.editorBoyd-Graber, Roger-
zhaw.parentwork.editorOkazaki, Naoaki-
zhaw.publication.statuspublishedVersionde_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.title.proceedingsFindings of the Association for Computational Linguistics: ACL 2023de_CH
zhaw.webfeedNatural Language Processingde_CH
zhaw.author.additionalNode_CH
zhaw.display.portraitYesde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2023_Deriu-etal_Correction-of-errors-in-preference-ratings.pdf623.63 kBAdobe PDFThumbnail
View/Open
Show simple item record
Deriu, J., von Däniken, P., Tuggener, D., & Cieliebak, M. (2023). Correction of errors in preference ratings from automated metrics for text generation [Conference paper]. In A. Rogers, R. Boyd-Graber, & N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023 (pp. 6456–6474). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-acl.404
Deriu, J. et al. (2023) ‘Correction of errors in preference ratings from automated metrics for text generation’, in A. Rogers, R. Boyd-Graber, and N. Okazaki (eds) Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, pp. 6456–6474. Available at: https://doi.org/10.18653/v1/2023.findings-acl.404.
J. Deriu, P. von Däniken, D. Tuggener, and M. Cieliebak, “Correction of errors in preference ratings from automated metrics for text generation,” in Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 6456–6474. doi: 10.18653/v1/2023.findings-acl.404.
DERIU, Jan, Pius VON DÄNIKEN, Don TUGGENER und Mark CIELIEBAK, 2023. Correction of errors in preference ratings from automated metrics for text generation. In: Anna ROGERS, Roger BOYD-GRABER und Naoaki OKAZAKI (Hrsg.), Findings of the Association for Computational Linguistics: ACL 2023. Conference paper. Association for Computational Linguistics. 2023. S. 6456–6474
Deriu, Jan, Pius von Däniken, Don Tuggener, and Mark Cieliebak. 2023. “Correction of Errors in Preference Ratings from Automated Metrics for Text Generation.” Conference paper. In Findings of the Association for Computational Linguistics: ACL 2023, edited by Anna Rogers, Roger Boyd-Graber, and Naoaki Okazaki, 6456–74. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-acl.404.
Deriu, Jan, et al. “Correction of Errors in Preference Ratings from Automated Metrics for Text Generation.” Findings of the Association for Computational Linguistics: ACL 2023, edited by Anna Rogers et al., Association for Computational Linguistics, 2023, pp. 6456–74, https://doi.org/10.18653/v1/2023.findings-acl.404.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.