Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics

James Mountstephens and Tan, Mathieson Zui Quen and Lai, Po Hung (2023) Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics. Journal of Theoretical and Applied Information Technology, 101 (12). pp. 5037-5050. ISSN 1992-8645

[img] Text
ABSTRACT.pdf

Download (39kB)
[img] Text
FULL TEXT.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

This research addresses a number of important issues involved in performing Sentiment Analysis (SA) on Malaysian Social Media (SM), including an analysis of bilingual or mixed language, choice of sentiment lexicon, normalisation heuristics, and the use of public datasets. This work is the first to quantify the level of language mixing in informal Malaysian text. Analysis of the 2M tweet Malaya dataset revealed a significant level of English sentiment content in Malaysian social media (13.5%), demonstrating the neccessity of a bilingual approach to Malaysian Sentiment Analysis. Significant patterns in noisy Malaysian SM text were identified and heuristics for normalising them were devised. The popular and effective English lexicon-based SA system VADER (Valence Aware Dictionary and sEntiment Reasoner) was translated to Malay using automatic and manual methods, with the combination of English and Malay VADER yielding a bilingual SA system. A subset of the Malaya dataset was both corrected and extended from two to three classes in order to properly test the bilingual SA system. Bilingual VADER with normalisation heuristics was able to achieve an impressive level of performance on a three-class problem (accuracy=0.71, mean F1=0.72), as compared to Malay VADER alone and several popular machine learning-based algorithms.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science > QA76.75-76.765 Computer software
T Technology > T Technology (General) > T1-995 Technology (General) > T10.5-11.9 Communication of technical information
Department: FACULTY > Faculty of Computing and Informatics
Depositing User: SITI AZIZAH BINTI IDRIS -
Date Deposited: 19 Jul 2024 16:13
Last Modified: 19 Jul 2024 16:13
URI: https://eprints.ums.edu.my/id/eprint/39226

Actions (login required)

View Item View Item