Ensemble clustering based on feature selection approach to learning relational data

Kung , Ke Shin (2015) Ensemble clustering based on feature selection approach to learning relational data. Universiti Malaysia Sabah. (Unpublished)

[img]PDF
361Kb

Abstract

The employment of classification in learning big relational data is an important research field. Learning big relational data often involves large feature dimensionality and this can be very time consuming. Many approaches have been developed to learn relational data. One of the approaches used to learn relational data is DARA. The DARA algorithm is designed to summarize data with one-to-many relations. However, DARA suffers a major drawback when the cardinalities of attributes are very high because the size of the vector space representation depends on the number of unique values that exist for all attributes in the dataset. A feature selection process can be introduced to overcome this problem. However, different feature selection methods used will produce different sets of selected features and thus produces different classification results. The final results obtained based on these sets of features selected can be further optimized by computing the consensus result in order to achieve a good classification result. This can be achieved by introducing an ensemble technique into the framework. Ensembles are frequently used to improve the predictive accuracies of multiple classifiers by producing the final consensus result from multiple classifiers. In this project, a two-layered genetic algorithm-based feature selection is proposed to form the basic ensemble in order to improve the classification performance in learning relational datasets. Results from the experiments show that the proposed method is able to improve the accuracies of classification tasks and k-NN classifiers with Euclidean distance as similarity measurements outperformed other classifiers.

Item Type:Academic Exercise
Uncontrolled Keywords:DARA algorithm, summarize data, ensembles, accuracies of classification, multiple classifiers
Subjects:Q Science > QA Mathematics > QA76 Computer software
Divisions:FACULTY > Faculty of Computing and Informatics
ID Code:12101
Deposited By:IR Admin
Deposited On:30 Oct 2015 11:54
Last Modified:30 Oct 2015 11:54

Repository Staff Only: item control page


Browse Repository
Collection
   Articles
   Book
   Speeches
   Thesis
   UMS News
Search
Quick Search

   Latest Repository

Link to other Malaysia University Institutional Repository

Malaysia University Institutional Repository