Please use this identifier to cite or link to this item:
https://doi.org/10.21256/zhaw-3761
Publication type: | Conference paper |
Type of review: | Peer review (publication) |
Title: | Speaker identification and clustering using convolutional neural networks |
Authors: | Lukic, Yanick Vogt, Carlo Dürr, Oliver Stadelmann, Thilo |
DOI: | 10.21256/zhaw-3761 10.1109/MLSP.2016.7738816 |
Proceedings: | 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), |
Conference details: | 26th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2016), Vietri sul Mare, Italy, 13-16 Sept. 2016 |
Issue Date: | 2016 |
Publisher / Ed. Institution: | IEEE |
ISBN: | 978-1-5090-0746-2 |
Other identifiers: | INSPEC Accession Number: 16449884 |
Language: | English |
Subjects: | Datalab; Speaker identification; Speaker clustering; Deep learning |
Subject (DDC): | 006: Special computer methods |
Abstract: | Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual sub-systems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art – without the need for handcrafted features. |
URI: | https://digitalcollection.zhaw.ch/handle/11475/7087 |
Fulltext version: | Accepted version |
License (according to publishing contract): | Licence according to publishing contract |
Departement: | School of Engineering |
Organisational Unit: | Institute of Applied Information Technology (InIT) Institute of Data Analysis and Process Design (IDP) |
Appears in collections: | Publikationen School of Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
MLSP_2016.pdf | 897.9 kB | Adobe PDF | ![]() View/Open |
Show full item record
Lukic, Y., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. https://doi.org/10.21256/zhaw-3761
Lukic, Y. et al. (2016) ‘Speaker identification and clustering using convolutional neural networks’, in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. IEEE. Available at: https://doi.org/10.21256/zhaw-3761.
Y. Lukic, C. Vogt, O. Dürr, and T. Stadelmann, “Speaker identification and clustering using convolutional neural networks,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 2016. doi: 10.21256/zhaw-3761.
LUKIC, Yanick, Carlo VOGT, Oliver DÜRR und Thilo STADELMANN, 2016. Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. Conference paper. IEEE. 2016. ISBN 978-1-5090-0746-2
Lukic, Yanick, Carlo Vogt, Oliver Dürr, and Thilo Stadelmann. 2016. “Speaker Identification and Clustering Using Convolutional Neural Networks.” Conference paper. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP),. IEEE. https://doi.org/10.21256/zhaw-3761.
Lukic, Yanick, et al. “Speaker Identification and Clustering Using Convolutional Neural Networks.” 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2016, https://doi.org/10.21256/zhaw-3761.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.