Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-29666
Publication type: Bachelor thesis
Title: The influence of audio length on the performance of Swiss-German speech translation models
Authors: van der Heide, Niklas Rijk
Saaro, Felix Matthias
Advisors / Reviewers: Cieliebak, Mark
Deriu, Jan Milan
DOI: 10.21256/zhaw-29666
Extent: 89
Issue Date: 2023
Series: Bachelorarbeiten ZHAW School of Engineering
Publisher / Ed. Institution: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
Publisher / Ed. Institution: Winterthur
Language: English
Subject (DDC): 418.02: Translating and interpreting
430: German
Abstract: Speech Translation models designed to convert spoken Swiss-German to written German have been in existence for some time. While these models generally perform well, their performance in various scenarios remains poorly understood. In this thesis, we explore the influence of audio length on the performance of Swiss-German speech translation models and identify the necessary factors for achieving better performance on longer audio segments. To achieve this, we examined four speech translation models from different institutions. A model from the Zurich University of Applied Sciences (ZHAW), one from the University of Applied Sciences Northwestern Switzerland (FHNW), a model from Microsoft, as well as a model from OpenAI called Whisper. We conducted eight different experiments using a Swiss-German corpus collected by the ZHAW and FHNW. In the experiments, the audio length was augmented in various ways. From there, we found that while the ZHAW, FHNW and Microsoft models showed a tendency to perform worse on longer duration, extending the duration by adding silence did not influence on the performance. Changing the playback speed has a negative influence on the ZHAW, Microsoft and Whisper models, both when speeding segments up or slowing them down. The FHNW model exhibited extraordinary robustness to changes in playback speed, as the results when accelerated by a factor of 1.25 were nearly identical to the results when the playback speed was not altered. The biggest influence on performance was when adding more than one sentence to a segment. Without a segmentation of the input audio the ZHAW, FHNW and Microsoft models performed badly, indicating that segmentation should be introduced as soon as more than one sentence appears in an audio recording. Training a model specifically on multi-sentence segments showed promising results, on single sentence segments and multi-sentence segments as well as in scenarios where sentences are split while segmenting the audio recordings. Comparing a sentence-based segmentation, which is considered ideal for models trained on single sentence segments, to a fixed-window segmentation with an overlap showed an almost identical result. Examining the models on a real-life recording showed that the ZHAW (lowercase) and ZHAW (multisentence) models perform considerably worse than the FHNW, Microsoft and Whisper models. Indicating that more investigation is required to fully understand what makes a speech translation model work well in real-life scenarios.
URI: https://digitalcollection.zhaw.ch/handle/11475/29666
License (according to publishing contract): CC BY 4.0: Attribution 4.0 International
Departement: School of Engineering
Appears in collections:Bachelorarbeiten ZHAW School of Engineering

Files in This Item:
File Description SizeFormat 
2023_van-der-Heide-Niklas_Saaro-Felix_BA_SoE.pdf10.04 MBAdobe PDFThumbnail
View/Open
Show full item record
van der Heide, N. R., & Saaro, F. M. (2023). The influence of audio length on the performance of Swiss-German speech translation models [Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften]. https://doi.org/10.21256/zhaw-29666
van der Heide, N.R. and Saaro, F.M. (2023) The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. Available at: https://doi.org/10.21256/zhaw-29666.
N. R. van der Heide and F. M. Saaro, “The influence of audio length on the performance of Swiss-German speech translation models,” Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, Winterthur, 2023. doi: 10.21256/zhaw-29666.
VAN DER HEIDE, Niklas Rijk und Felix Matthias SAARO, 2023. The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
van der Heide, Niklas Rijk, and Felix Matthias Saaro. 2023. “The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models.” Bachelor’s thesis, Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften. https://doi.org/10.21256/zhaw-29666.
van der Heide, Niklas Rijk, and Felix Matthias Saaro. The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models. ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2023, https://doi.org/10.21256/zhaw-29666.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.