Leveraging large amounts of weakly supervised data for multi-language sentiment classification

Deriu, Jan Milan; Lucchi, Aurelien; De Luca, Valeria; Severyn, Aliaksei; Müller, Simone; Cieliebak, Mark; Hofmann, Thomas; Jaggi, Martin

doi:10.1145/3038912.3052611

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-1525

Full metadata record

DC Field	Value	Language
dc.contributor.author	Deriu, Jan Milan	-
dc.contributor.author	Lucchi, Aurelien	-
dc.contributor.author	De Luca, Valeria	-
dc.contributor.author	Severyn, Aliaksei	-
dc.contributor.author	Müller, Simone	-
dc.contributor.author	Cieliebak, Mark	-
dc.contributor.author	Hofmann, Thomas	-
dc.contributor.author	Jaggi, Martin	-
dc.date.accessioned	2017-12-14T14:14:52Z	-
dc.date.available	2017-12-14T14:14:52Z	-
dc.date.issued	2017	-
dc.identifier.isbn	9781450349130	de_CH
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/1851	-
dc.description.abstract	This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly-supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse – but still acceptable – performance when compared to the single language model, while benefiting from better generalization properties across languages.	de_CH
dc.language.iso	en	de_CH
dc.publisher	Association for Computing Machinery	de_CH
dc.rights	Licence according to publishing contract	de_CH
dc.subject	Sentiment Analysis	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.title	Leveraging large amounts of weakly supervised data for multi-language sentiment classification	de_CH
dc.type	Konferenz: Paper	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.1145/3038912.3052611	de_CH
dc.identifier.doi	10.21256/zhaw-1525	-
zhaw.conference.details	26th International World Wide Web Conference Committee (IW3C2), Perth, Australia, 3-7 April 2017	de_CH
zhaw.funding.eu	No	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.pages.end	1052	de_CH
zhaw.pages.start	1045	de_CH
zhaw.publication.status	publishedVersion	de_CH
zhaw.publication.review	Not specified	de_CH
zhaw.title.proceedings	Proceedings of the 26th International Conference on World Wide Web	de_CH
zhaw.webfeed	Software Systems	de_CH
zhaw.webfeed	Natural Language Processing	de_CH
zhaw.funding.zhaw	DeepText: Intelligente Textanalyse mit Deep Learning	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
p1045-deriu.pdf		3.78 MB	Adobe PDF	View/Open

Show simple item record

Deriu, J. M., Lucchi, A., De Luca, V., Severyn, A., Müller, S., Cieliebak, M., Hofmann, T., & Jaggi, M. (2017). Leveraging large amounts of weakly supervised data for multi-language sentiment classification [Conference paper]. Proceedings of the 26th International Conference on World Wide Web, 1045–1052. https://doi.org/10.1145/3038912.3052611

Deriu, J.M. et al. (2017) ‘Leveraging large amounts of weakly supervised data for multi-language sentiment classification’, in Proceedings of the 26th International Conference on World Wide Web. Association for Computing Machinery, pp. 1045–1052. Available at: https://doi.org/10.1145/3038912.3052611.

J. M. Deriu et al., “Leveraging large amounts of weakly supervised data for multi-language sentiment classification,” in Proceedings of the 26th International Conference on World Wide Web, 2017, pp. 1045–1052. doi: 10.1145/3038912.3052611.

DERIU, Jan Milan, Aurelien LUCCHI, Valeria DE LUCA, Aliaksei SEVERYN, Simone MÜLLER, Mark CIELIEBAK, Thomas HOFMANN und Martin JAGGI, 2017. Leveraging large amounts of weakly supervised data for multi-language sentiment classification. In: Proceedings of the 26th International Conference on World Wide Web. Conference paper. Association for Computing Machinery. 2017. S. 1045–1052. ISBN 9781450349130

Deriu, Jan Milan, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simone Müller, Mark Cieliebak, Thomas Hofmann, and Martin Jaggi. 2017. “Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification.” Conference paper. In Proceedings of the 26th International Conference on World Wide Web, 1045–52. Association for Computing Machinery. https://doi.org/10.1145/3038912.3052611.

Deriu, Jan Milan, et al. “Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification.” Proceedings of the 26th International Conference on World Wide Web, Association for Computing Machinery, 2017, pp. 1045–52, https://doi.org/10.1145/3038912.3052611.