Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-3784
Full metadata record
DC FieldValueLanguage
dc.contributor.authorStadelmann, Thilo-
dc.contributor.authorGlinski-Haefeli, Sebastian-
dc.contributor.authorGerber, Patrick-
dc.contributor.authorDürr, Oliver-
dc.date.accessioned2018-06-28T09:44:15Z-
dc.date.available2018-06-28T09:44:15Z-
dc.date.issued2018-
dc.identifier.isbn978-3-319-99977-7de_CH
dc.identifier.isbn978-3-319-99978-4de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/7429-
dc.description.abstractDeep neural networks have become a veritable alternative to classic speaker recognition and clustering methods in recent years. However, while the speech signal clearly is a time series, and despite the body of literature on the benefits of prosodic (suprasegmental) features, identifying voices has usually not been approached with sequence learning methods. Only recently has a recurrent neural network (RNN) been successfully applied to this task, while the use of convolutional neural networks (CNNs) (that are not able to capture arbitrary time dependencies, unlike RNNs) still prevails. In this paper, we show the effectiveness of RNNs for speaker recognition by improving state of the art speaker clustering performance and robustness on the classic TIMIT benchmark. We provide arguments why RNNs are superior by experimentally showing a “sweet spot” of the segment length for successfully capturing prosodic information that has been theoretically predicted in previous work.de_CH
dc.language.isoende_CH
dc.publisherSpringerde_CH
dc.relation.ispartofseriesLecture Notes in Computer Sciencede_CH
dc.rightsLicence according to publishing contractde_CH
dc.subjectSpeaker clusteringde_CH
dc.subjectSpeaker recognitionde_CH
dc.subjectRecurrent neural networkde_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.titleCapturing suprasegmental features of a voice with RNNs for improved speaker clusteringde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Informatik (InIT)de_CH
dc.identifier.doi10.1007/978-3-319-99978-4_26de_CH
dc.identifier.doi10.21256/zhaw-3784-
zhaw.conference.details8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end345de_CH
zhaw.pages.start333de_CH
zhaw.publication.statusacceptedVersionde_CH
zhaw.series.number11081de_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.title.proceedingsArtificial Neural Networks in Pattern Recognitionde_CH
zhaw.webfeedDatalabde_CH
zhaw.webfeedInformation Engineeringde_CH
zhaw.webfeedMachine Perception and Cognitionde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
ANNPR_2018b.pdfAccepted Version692.47 kBAdobe PDFThumbnail
View/Open
Show simple item record
Stadelmann, T., Glinski-Haefeli, S., Gerber, P., & Dürr, O. (2018). Capturing suprasegmental features of a voice with RNNs for improved speaker clustering [Conference paper]. Artificial Neural Networks in Pattern Recognition, 333–345. https://doi.org/10.1007/978-3-319-99978-4_26
Stadelmann, T. et al. (2018) ‘Capturing suprasegmental features of a voice with RNNs for improved speaker clustering’, in Artificial Neural Networks in Pattern Recognition. Springer, pp. 333–345. Available at: https://doi.org/10.1007/978-3-319-99978-4_26.
T. Stadelmann, S. Glinski-Haefeli, P. Gerber, and O. Dürr, “Capturing suprasegmental features of a voice with RNNs for improved speaker clustering,” in Artificial Neural Networks in Pattern Recognition, 2018, pp. 333–345. doi: 10.1007/978-3-319-99978-4_26.
STADELMANN, Thilo, Sebastian GLINSKI-HAEFELI, Patrick GERBER und Oliver DÜRR, 2018. Capturing suprasegmental features of a voice with RNNs for improved speaker clustering. In: Artificial Neural Networks in Pattern Recognition. Conference paper. Springer. 2018. S. 333–345. ISBN 978-3-319-99977-7
Stadelmann, Thilo, Sebastian Glinski-Haefeli, Patrick Gerber, and Oliver Dürr. 2018. “Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering.” Conference paper. In Artificial Neural Networks in Pattern Recognition, 333–45. Springer. https://doi.org/10.1007/978-3-319-99978-4_26.
Stadelmann, Thilo, et al. “Capturing Suprasegmental Features of a Voice with RNNs for Improved Speaker Clustering.” Artificial Neural Networks in Pattern Recognition, Springer, 2018, pp. 333–45, https://doi.org/10.1007/978-3-319-99978-4_26.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.