The influence of audio length on the performance of Swiss-German speech translation models

van der Heide, Niklas Rijk; Saaro, Felix Matthias

doi:10.21256/zhaw-29666

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-29666

Publication type:	Bachelor thesis
Title:	The influence of audio length on the performance of Swiss-German speech translation models
Authors:	van der Heide, Niklas Rijk Saaro, Felix Matthias
Advisors / Reviewers:	Cieliebak, Mark Deriu, Jan Milan
DOI:	10.21256/zhaw-29666
Extent:	89
Issue Date:	2023
Series:	Bachelorarbeiten ZHAW School of Engineering
Publisher / Ed. Institution:	ZHAW Zürcher Hochschule für Angewandte Wissenschaften
Publisher / Ed. Institution:	Winterthur
Language:	English
Subject (DDC):	418.02: Translating and interpreting 430: German
Abstract:	Speech Translation models designed to convert spoken Swiss-German to written German have been in existence for some time. While these models generally perform well, their performance in various scenarios remains poorly understood. In this thesis, we explore the influence of audio length on the performance of Swiss-German speech translation models and identify the necessary factors for achieving better performance on longer audio segments. To achieve this, we examined four speech translation models from different institutions. A model from the Zurich University of Applied Sciences (ZHAW), one from the University of Applied Sciences Northwestern Switzerland (FHNW), a model from Microsoft, as well as a model from OpenAI called Whisper. We conducted eight different experiments using a Swiss-German corpus collected by the ZHAW and FHNW. In the experiments, the audio length was augmented in various ways. From there, we found that while the ZHAW, FHNW and Microsoft models showed a tendency to perform worse on longer duration, extending the duration by adding silence did not influence on the performance. Changing the playback speed has a negative influence on the ZHAW, Microsoft and Whisper models, both when speeding segments up or slowing them down. The FHNW model exhibited extraordinary robustness to changes in playback speed, as the results when accelerated by a factor of 1.25 were nearly identical to the results when the playback speed was not altered. The biggest influence on performance was when adding more than one sentence to a segment. Without a segmentation of the input audio the ZHAW, FHNW and Microsoft models performed badly, indicating that segmentation should be introduced as soon as more than one sentence appears in an audio recording. Training a model specifically on multi-sentence segments showed promising results, on single sentence segments and multi-sentence segments as well as in scenarios where sentences are split while segmenting the audio recordings. Comparing a sentence-based segmentation, which is considered ideal for models trained on single sentence segments, to a fixed-window segmentation with an overlap showed an almost identical result. Examining the models on a real-life recording showed that the ZHAW (lowercase) and ZHAW (multisentence) models perform considerably worse than the FHNW, Microsoft and Whisper models. Indicating that more investigation is required to fully understand what makes a speech translation model work well in real-life scenarios.
URI:	https://digitalcollection.zhaw.ch/handle/11475/29666
License (according to publishing contract):	CC BY 4.0: Attribution 4.0 International
Departement:	School of Engineering
Appears in collections:	Bachelorarbeiten ZHAW School of Engineering

Files in This Item:

File	Description	Size	Format
2023_van-der-Heide-Niklas_Saaro-Felix_BA_SoE.pdf		10.04 MB	Adobe PDF	View/Open

Show full item record

van der Heide, N. R., & Saaro, F. M. (2023). The influence of audio length on the performance of Swiss-German speech translation models [Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften]. https://doi.org/10.21256/zhaw-29666

van der Heide, N.R. and Saaro, F.M. (2023) The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. Available at: https://doi.org/10.21256/zhaw-29666.

N. R. van der Heide and F. M. Saaro, “The influence of audio length on the performance of Swiss-German speech translation models,” Bachelor’s thesis, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, Winterthur, 2023. doi: 10.21256/zhaw-29666.

VAN DER HEIDE, Niklas Rijk und Felix Matthias SAARO, 2023. The influence of audio length on the performance of Swiss-German speech translation models. Bachelor’s thesis. Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften

van der Heide, Niklas Rijk, and Felix Matthias Saaro. 2023. “The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models.” Bachelor’s thesis, Winterthur: ZHAW Zürcher Hochschule für Angewandte Wissenschaften. https://doi.org/10.21256/zhaw-29666.

van der Heide, Niklas Rijk, and Felix Matthias Saaro. The Influence of Audio Length on the Performance of Swiss-German Speech Translation Models. ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2023, https://doi.org/10.21256/zhaw-29666.