Mixed-language sentiment analysis on Malaysian social media using translated Vader and normalization heuristics

James Mountstephens and Mathieson Tan Zui Quen (2023) Mixed-language sentiment analysis on Malaysian social media using translated Vader and normalization heuristics.

[img] Text
FULLTEXT.pdf
Restricted to Registered users only

Download (317kB) | Request a copy

Abstract

Most work in Sentiment Analysis has so far been in a single language context, primarily English. This work addresses the neglected issue of Sentiment Analysis in a mixed-language environment: Malaysian social media, which freely combines both Malay and English. The highly cited and effective English Sentiment Analysis system VADER was converted to Malay for the first time and used in combination with English VADER to create a Multilanguage Sentiment Analysis system. Significant patterns in noisy Malaysian Social Media text were identified and heuristics for normalizing them were devised. Mixed-language VADER with normalization heuristics was able to achieve a 12% improvement in accuracy as compared to Malay VADER alone. In absolute terms, performance must be improved, but the results obtained here are encouraging for the future continuation of this approach.

Item Type: Proceedings
Keyword: Sentiment analysis, Mixed language, VADER, Normalization, Malaysian social media
Subjects: L Education > LB Theory and practice of education > LB5-3640 Theory and practice of education > LB2300-2430 Higher education > LB2331.7-2335.8 Teaching personnel
P Language and Literature > PE English language > PE1-3729 English > PE1001-1693 Modern English
Department: FACULTY > Faculty of Computing and Informatics
Depositing User: JUNAINE JASNI -
Date Deposited: 08 Aug 2025 16:16
Last Modified: 08 Aug 2025 16:16
URI: https://eprints.ums.edu.my/id/eprint/44801

Actions (login required)

View Item View Item