Opinion corpus of Slovene web commentaries KKS
Version 1.001, 2017-04-24
The corpus of web commentaries with sentiment categorizations was
developed as a part of BSc Thesis (Kadunc, 2016) and served for
evaluation of Slovene
Sentiment Lexicon KSS. It contains web commentaries about
different topics (business, politics, sport, and other) from 4
Slovene web portals (RtvSlo, 24ur, Finance, Reporter). The corpus is
in XML format and available in two forms:
- original corpus, containing 4,777 commentaries, 898 positive,
3,291 negative and 588 neutral commentaries.
- balanced corpus, containing 1,740 commentaries, 580 of each
type of sentiment (positive, negative and neutral).
Both files are available from Clarin.si repository.
References:
- Klemen Kadunc (2016). Določanje
sentimenta slovenskim spletnim komentarjem s pomočjo
strojnega učenja. Diplomsko delo. Univerza v Ljubljani,
Fakulteta za računalništvo in informatiko (in Slovene). metainfo
- Klemen Kadunc, Marko Robnik-Šikonja (2016). Analiza
mnenj s pomočjo strojnega učenja in slovenskega leksikona
sentimenta. Conference on Language Technologies &
Digital Humanities, Ljubljana (in Slovene), slides, proceedings
Back to repository of research resources