Comprehensive study on lexicon-based ensemble classification sentiment analysis

Abstract

We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words) in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3-5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners) applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

Publication
Entropy
Łukasz Augustyniak
Łukasz Augustyniak
PhD Student

Data Scientist, Machine Learning Engineer, Lawyer.

Piotr Szymański
Piotr Szymański
Associate Professor

Piotr Szymański is an assistant Professor at the Department of Computational Intelligence at the Wrocław University of Science and Technology and a Machine Learning Engineer at Avaya. Professionally involved in data analysis, statistical reasoning, geospatial data science, natural language processing, machine learning and artificial intelligence techniques.

Tomasz Kajdanowicz
Tomasz Kajdanowicz
Associate Professor, head of faculty’s doctoral studies, head of department

My research interests include representation learning, social network and media analysis, and machine learning.

Related