Please use this identifier to cite or link to this item:
https://doi.org/10.21256/zhaw-29666
Publication type: | Bachelor thesis |
Title: | The influence of audio length on the performance of Swiss-German speech translation models |
Authors: | van der Heide, Niklas Rijk Saaro, Felix Matthias |
Advisors / Reviewers: | Cieliebak, Mark Deriu, Jan Milan |
DOI: | 10.21256/zhaw-29666 |
Extent: | 89 |
Issue Date: | 2023 |
Series: | Bachelorarbeiten ZHAW School of Engineering |
Publisher / Ed. Institution: | ZHAW Zürcher Hochschule für Angewandte Wissenschaften |
Publisher / Ed. Institution: | Winterthur |
Language: | English |
Subject (DDC): | 418.02: Translating and interpreting 430: German |
Abstract: | Speech Translation models designed to convert spoken Swiss-German to written German have been in existence for some time. While these models generally perform well, their performance in various scenarios remains poorly understood. In this thesis, we explore the influence of audio length on the performance of Swiss-German speech translation models and identify the necessary factors for achieving better performance on longer audio segments. To achieve this, we examined four speech translation models from different institutions. A model from the Zurich University of Applied Sciences (ZHAW), one from the University of Applied Sciences Northwestern Switzerland (FHNW), a model from Microsoft, as well as a model from OpenAI called Whisper. We conducted eight different experiments using a Swiss-German corpus collected by the ZHAW and FHNW. In the experiments, the audio length was augmented in various ways. From there, we found that while the ZHAW, FHNW and Microsoft models showed a tendency to perform worse on longer duration, extending the duration by adding silence did not influence on the performance. Changing the playback speed has a negative influence on the ZHAW, Microsoft and Whisper models, both when speeding segments up or slowing them down. The FHNW model exhibited extraordinary robustness to changes in playback speed, as the results when accelerated by a factor of 1.25 were nearly identical to the results when the playback speed was not altered. The biggest influence on performance was when adding more than one sentence to a segment. Without a segmentation of the input audio the ZHAW, FHNW and Microsoft models performed badly, indicating that segmentation should be introduced as soon as more than one sentence appears in an audio recording. Training a model specifically on multi-sentence segments showed promising results, on single sentence segments and multi-sentence segments as well as in scenarios where sentences are split while segmenting the audio recordings. Comparing a sentence-based segmentation, which is considered ideal for models trained on single sentence segments, to a fixed-window segmentation with an overlap showed an almost identical result. Examining the models on a real-life recording showed that the ZHAW (lowercase) and ZHAW (multisentence) models perform considerably worse than the FHNW, Microsoft and Whisper models. Indicating that more investigation is required to fully understand what makes a speech translation model work well in real-life scenarios. |
URI: | https://digitalcollection.zhaw.ch/handle/11475/29666 |
License (according to publishing contract): | CC BY 4.0: Attribution 4.0 International |
Departement: | School of Engineering |
Appears in collections: | Bachelorarbeiten ZHAW School of Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2023_van-der-Heide-Niklas_Saaro-Felix_BA_SoE.pdf | 10.04 MB | Adobe PDF | View/Open |
Show full item record
van der Heide, N. R., & Saaro, F. M. (2023). The influence of audio length on the performance of Swiss-German speech translation models [Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften]. https://doi.org/10.21256/zhaw-29666
van der Heide, N.R. and Saaro, F.M. (2023) The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. Available at: https://doi.org/10.21256/zhaw-29666.
N. R. van der Heide and F. M. Saaro, “The influence of audio length on the performance of Swiss-German speech translation models,” Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, Winterthur, 2023. doi: 10.21256/zhaw-29666.
VAN DER HEIDE, Niklas Rijk und Felix Matthias SAARO, 2023. The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
van der Heide, Niklas Rijk, and Felix Matthias Saaro. 2023. “The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models.” Bachelor’s thesis, Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften. https://doi.org/10.21256/zhaw-29666.
van der Heide, Niklas Rijk, and Felix Matthias Saaro. The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models. ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2023, https://doi.org/10.21256/zhaw-29666.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.