Opinion corpus of Slovene web commentaries KKS

Version 1.001, 2017-04-24

The corpus of web commentaries with sentiment categorizations was developed as a part of BSc Thesis (Kadunc, 2016) and served for evaluation of Slovene Sentiment Lexicon KSS. It contains web commentaries about different topics (business, politics, sport, and other) from 4 Slovene web portals (RtvSlo, 24ur, Finance, Reporter). The corpus is in XML format and available in two forms:
Both files are available from Clarin.si repository.

References:
  1. Klemen Kadunc (2016). Določanje sentimenta slovenskim spletnim komentarjem s pomočjo strojnega učenja. Diplomsko delo. Univerza v Ljubljani, Fakulteta za računalništvo in informatiko (in Slovene). metainfo
  2. Klemen Kadunc, Marko Robnik-Šikonja (2016). Analiza mnenj s pomočjo strojnega učenja in slovenskega leksikona sentimenta. Conference on Language Technologies & Digital Humanities, Ljubljana (in Slovene), slides, proceedings

Back to repository of research resources