An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora

Rayner Alfred and Leow, Ching Leong and Joe Henry Obit (2017) An evolutionary-based term reduction approach to bilingual clustering of Malay-English corpora. In: International Conference on Advances in Information and Communication Technology (ICTA 2016), 12–13 December 2016, Thai Nguyen city, Vietnam.

[img] Text
An Evolutionary-Based Term Reduction Approach to Bilingual Clustering of Malay-English Corpora ABSTRACT.pdf

Download (61kB)
[img] Text
An Evolutionary-Based Term Reduction Approach to Bilingual Clustering of Malay-English Corpora.pdf
Restricted to Registered users only

Download (152kB) | Request a copy

Abstract

The document clustering process groups the unstructured text documents into a predefined set of clusters in order to provide more information to the users. There are many studies conducted in clustering monolingual documents. With the enrichment of current technologies, the study of bilingual clustering would not be a problem. However clustering bilingual document is still facing the same problem faced by a monolingual document clustering which is the “curse of dimensionality”. Hence, this encourages the study of term reduction technique in clustering bilingual documents. The objective in this study is to study the effects of reducing terms considered in clustering bilingual corpus in parallel for English and Malay documents. In this study, a genetic algorithm (GA) is used in order to reduce the number of feature selected. A single-point crossover with a crossover rate of 0.8 is used. Not only that, this study also assesses the effects of applying different mutation rate (e.g., 0.1 and 0.01) in selecting the number of features used in clustering bilingual documents. The result shows that the implementation of GA does improve the clustering mapping compared to the initial clustering mapping. Not only that, this study also discovers that GA with a mutation rate of 0.01 produces the best parallel clustering mapping results compared to GA with a mutation rate of 0.1.

Item Type: Conference or Workshop Item (Paper)
Keyword: Clustering bilingual documents , Hierarchical agglomerative clustering , Evolutionary algorithm , Genetic algorithm , Corpus
Subjects: Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science > QA76.75-76.765 Computer software
Department: FACULTY > Faculty of Computing and Informatics
Depositing User: SAFRUDIN BIN DARUN -
Date Deposited: 13 Oct 2021 16:02
Last Modified: 13 Oct 2021 16:02
URI: https://eprints.ums.edu.my/id/eprint/29090

Actions (login required)

View Item View Item