Clustering bilingual documents using various clustering linkages coupled with different proximity measurement techniques

Rayner Alfred and Leow Ching Leong, Ching Leong and Joe Henry Obit and Mohd Hanafi Ahmad Hijazi and Chin, Kim On (2015) Clustering bilingual documents using various clustering linkages coupled with different proximity measurement techniques. Advanced Science Letters, 21 (10). pp. 3307-3312. ISSN 1936-6612

[img]
Preview
Text
Clustering_bilingual_documents_using_various_clustering_linkages_coupled_with_different_proximity_measurement_techniques.pdf

Download (46kB) | Preview

Abstract

Clustering is an unsupervised learning algorithm. k-Means algorithm is one of the well-known and promising clustering algorithms that can converge to a local optimum in few iterative. In our work, we will be hybridizing k-Means algorithm with Genetic Algorithms to look for the solution in the global search space in order to converge to a global optima. The problem for clustering is that when the number of clusters increases up to the same number of total records in the dataset, it leads to a scenario in which a cluster only contains a single record, and thus the cluster purity is maximized to the maximum value, 1. However, it will be useless since the common regularities among records will not be seen. Therefore, choosing the best number of clusters is trivial. Instead of choosing an inappropriate number of clusters and risking the main purpose of the clustering process, a Genetic Algorithm based k-Means ensemble is proposed in order to find the consensus result of several runs of clustering task using different number of clusters, k.

Item Type: Article
Keyword: K-Means algorithm , hybridizing , dataset
Subjects: ?? QA75 ??
Department: FACULTY > Faculty of Engineering
Depositing User: ADMIN ADMIN
Date Deposited: 04 May 2017 14:07
Last Modified: 25 Oct 2017 14:06
URI: https://eprints.ums.edu.my/id/eprint/15349

Actions (login required)

View Item View Item