Título
A Case Study of Spanish Text Transformations for Twitter Sentiment Analysis
Autor
Oscar Sánchez Siordia
Eric Tellez
SABINO MIRANDA JIMENEZ
Mario Graff
Daniela Moctezuma
Elio Atenógenes Villaseñor García
Nivel de Acceso
En Embargo
Identificador alterno
doi: https://doi.org/10.1016/j.eswa.2017.03.071
Materias
Resumen o descripción
Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text because of the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads. The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n-grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n-grams and character-based q-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional wordbased combination by 11.17% and 5.62% on the INEGI and TASS’15 dataset, respectively.
Editor
Elsevier
Fecha de publicación
septiembre de 2017
Tipo de publicación
Artículo
Versión de la publicación
Versión aceptada
Recurso de información
Formato
application/pdf
Fuente
Expert Systems with Applications Volume 81, 15 September 2017, Pages 457-471
Idioma
Inglés
Audiencia
Estudiantes
Investigadores
Maestros
Repositorio Orígen
Repositorio Institucional de CENTROGEO
Descargas
0