James Mountstephens and Mathieson Tan Zui Quen (2023) Mixed-language sentiment analysis on Malaysian social media using translated Vader and normalization heuristics.
![]() |
Text
FULLTEXT.pdf Restricted to Registered users only Download (317kB) | Request a copy |
Abstract
Most work in Sentiment Analysis has so far been in a single language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian social media, which freely combines both Malay and English. The highly cited and effective English Sentiment Analysis system VADER was converted to Malay for the first time and used in combination with English VADER to create a Multilanguage Sentiment Analysis system. Significant patterns in noisy Malaysian Social Media text were identified and heuristics for normalizing them were devised. Mixed-language VADER with normalization heuristics was able to achieve a 12% improvement in accuracy as compared to Malay VADER alone. In absolute terms, performance must be improved, but the results obtained here are encouraging for the future continuation of this approach.
Item Type: | Proceedings |
---|---|
Keyword: | Sentiment analysis, Mixed language, VADER, Normalization, Malaysian social media |
Subjects: | L Education > LB Theory and practice of education > LB5-3640 Theory and practice of education > LB2300-2430 Higher education > LB2331.7-2335.8 Teaching personnel P Language and Literature > PE English language > PE1-3729 English > PE1001-1693 Modern English |
Department: | FACULTY > Faculty of Computing and Informatics |
Depositing User: | JUNAINE JASNI - |
Date Deposited: | 08 Aug 2025 16:16 |
Last Modified: | 08 Aug 2025 16:16 |
URI: | https://eprints.ums.edu.my/id/eprint/44801 |
Actions (login required)
![]() |
View Item |