Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-29666
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorCieliebak, Mark-
dc.contributor.advisorDeriu, Jan Milan-
dc.contributor.authorvan der Heide, Niklas Rijk-
dc.contributor.authorSaaro, Felix Matthias-
dc.date.accessioned2024-01-27T13:41:27Z-
dc.date.available2024-01-27T13:41:27Z-
dc.date.issued2023-
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/29666-
dc.description.abstractSpeech Translation models designed to convert spoken Swiss-German to written German have been in existence for some time. While these models generally perform well, their performance in various scenarios remains poorly understood. In this thesis, we explore the influence of audio length on the performance of Swiss-German speech translation models and identify the necessary factors for achieving better performance on longer audio segments. To achieve this, we examined four speech translation models from different institutions. A model from the Zurich University of Applied Sciences (ZHAW), one from the University of Applied Sciences Northwestern Switzerland (FHNW), a model from Microsoft, as well as a model from OpenAI called Whisper. We conducted eight different experiments using a Swiss-German corpus collected by the ZHAW and FHNW. In the experiments, the audio length was augmented in various ways. From there, we found that while the ZHAW, FHNW and Microsoft models showed a tendency to perform worse on longer duration, extending the duration by adding silence did not influence on the performance. Changing the playback speed has a negative influence on the ZHAW, Microsoft and Whisper models, both when speeding segments up or slowing them down. The FHNW model exhibited extraordinary robustness to changes in playback speed, as the results when accelerated by a factor of 1.25 were nearly identical to the results when the playback speed was not altered. The biggest influence on performance was when adding more than one sentence to a segment. Without a segmentation of the input audio the ZHAW, FHNW and Microsoft models performed badly, indicating that segmentation should be introduced as soon as more than one sentence appears in an audio recording. Training a model specifically on multi-sentence segments showed promising results, on single sentence segments and multi-sentence segments as well as in scenarios where sentences are split while segmenting the audio recordings. Comparing a sentence-based segmentation, which is considered ideal for models trained on single sentence segments, to a fixed-window segmentation with an overlap showed an almost identical result. Examining the models on a real-life recording showed that the ZHAW (lowercase) and ZHAW (multisentence) models perform considerably worse than the FHNW, Microsoft and Whisper models. Indicating that more investigation is required to fully understand what makes a speech translation model work well in real-life scenarios.de_CH
dc.format.extent89de_CH
dc.language.isoende_CH
dc.publisherZHAW Zürcher Hochschule für Angewandte Wissenschaftende_CH
dc.relation.ispartofseriesBachelorarbeiten ZHAW School of Engineeringde_CH
dc.rightshttp://creativecommons.org/licenses/by/4.0/de_CH
dc.subject.ddc418.02: Translationswissenschaftde_CH
dc.subject.ddc430: Deutschde_CH
dc.titleThe influence of audio length on the performance of Swiss-German speech translation modelsde_CH
dc.typeThesis: Bachelorde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.publisher.placeWinterthurde_CH
dc.identifier.doi10.21256/zhaw-29666-
zhaw.originated.zhawYesde_CH
Appears in collections:Bachelorarbeiten ZHAW School of Engineering

Files in This Item:
File Description SizeFormat 
2023_van-der-Heide-Niklas_Saaro-Felix_BA_SoE.pdf10.04 MBAdobe PDFThumbnail
View/Open
Show simple item record
van der Heide, N. R., & Saaro, F. M. (2023). The influence of audio length on the performance of Swiss-German speech translation models [Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften]. https://doi.org/10.21256/zhaw-29666
van der Heide, N.R. and Saaro, F.M. (2023) The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. Available at: https://doi.org/10.21256/zhaw-29666.
N. R. van der Heide and F. M. Saaro, “The influence of audio length on the performance of Swiss-German speech translation models,” Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, Winterthur, 2023. doi: 10.21256/zhaw-29666.
VAN DER HEIDE, Niklas Rijk und Felix Matthias SAARO, 2023. The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
van der Heide, Niklas Rijk, and Felix Matthias Saaro. 2023. “The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models.” Bachelor’s thesis, Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften. https://doi.org/10.21256/zhaw-29666.
van der Heide, Niklas Rijk, and Felix Matthias Saaro. The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models. ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2023, https://doi.org/10.21256/zhaw-29666.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.