A Twitter corpus and benchmark resources for german sentiment analysis

Cieliebak, Mark; Deriu, Jan Milan; Egger, Dominic; Uzdilli, Fatih

doi:10.21256/zhaw-1530

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-1530

Full metadata record

DC Field	Value	Language
dc.contributor.author	Cieliebak, Mark	-
dc.contributor.author	Deriu, Jan Milan	-
dc.contributor.author	Egger, Dominic	-
dc.contributor.author	Uzdilli, Fatih	-
dc.date.accessioned	2017-12-14T14:26:16Z	-
dc.date.available	2017-12-14T14:26:16Z	-
dc.date.issued	2017	-
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/1856	-
dc.description.abstract	In this paper we present SB10k, a new corpus for sentiment analysis with approx.10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art bench-marks for sentiment analysis in German:we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available.	de_CH
dc.language.iso	en	de_CH
dc.publisher	Association for Computational Linguistics	de_CH
dc.rights	Licence according to publishing contract	de_CH
dc.subject	Sentiment Analysis	de_CH
dc.subject	Corpus	de_CH
dc.subject	Twitter	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.subject.ddc	410.285: Computerlinguistik	de_CH
dc.title	A Twitter corpus and benchmark resources for german sentiment analysis	de_CH
dc.type	Konferenz: Paper	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.21256/zhaw-1530	-
dc.identifier.doi	10.18653/v1/W17-1106	de_CH
zhaw.conference.details	5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017	de_CH
zhaw.funding.eu	No	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.pages.end	51	de_CH
zhaw.pages.start	45	de_CH
zhaw.publication.status	publishedVersion	de_CH
zhaw.publication.review	Peer review (Abstract)	de_CH
zhaw.webfeed	Datalab	de_CH
zhaw.webfeed	Software Systems	de_CH
zhaw.webfeed	Natural Language Processing	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
10_Paper.pdf		516.72 kB	Adobe PDF	View/Open

Show simple item record

Cieliebak, M., Deriu, J. M., Egger, D., & Uzdilli, F. (2017). A Twitter corpus and benchmark resources for german sentiment analysis [Conference paper]. 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, 45–51. https://doi.org/10.21256/zhaw-1530

Cieliebak, M. et al. (2017) ‘A Twitter corpus and benchmark resources for german sentiment analysis’, in 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017. Association for Computational Linguistics, pp. 45–51. Available at: https://doi.org/10.21256/zhaw-1530.

M. Cieliebak, J. M. Deriu, D. Egger, and F. Uzdilli, “A Twitter corpus and benchmark resources for german sentiment analysis,” in 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, 2017, pp. 45–51. doi: 10.21256/zhaw-1530.

CIELIEBAK, Mark, Jan Milan DERIU, Dominic EGGER und Fatih UZDILLI, 2017. A Twitter corpus and benchmark resources for german sentiment analysis. In: 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017. Conference paper. Association for Computational Linguistics. 2017. S. 45–51

Cieliebak, Mark, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. 2017. “A Twitter Corpus and Benchmark Resources for German Sentiment Analysis.” Conference paper. In 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, 45–51. Association for Computational Linguistics. https://doi.org/10.21256/zhaw-1530.

Cieliebak, Mark, et al. “A Twitter Corpus and Benchmark Resources for German Sentiment Analysis.” 5th International Workshop on Natural Language Processing for Social Media, Boston MA, USA, 11 December 2017, Association for Computational Linguistics, 2017, pp. 45–51, https://doi.org/10.21256/zhaw-1530.