A direct ensemble classifier for learning imbalanced multiclass data

Samry @ Mohd Shamrie Sainin (2013) A direct ensemble classifier for learning imbalanced multiclass data. Doctoral thesis, Universiti Malaysia Sabah.

[img] Text
24 PAGES.pdf

Download (912kB)
[img] Text
FULLTEXT.pdf
Restricted to Registered users only

Download (12MB)

Abstract

A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods used to solve multiclass classification tasks. In this thesis, the problem of learning from imbalanced multiclass data classification is studied. In the multiclass classification problem, decision can be estimated not only by the final single class label, but also by other appropriate class. Many real-world multiclass classification problems can be represented into a setting where non-crisp label need to be observed. An in-depth review and method to solve this special learning task is explained in this thesis. An alternative ensemble learning framework called Direct Ensemble Classifier for Imbalance Learning (DECIML) is proposed combining the advantages of existing single classifiers and ensemble methods and strategies. The learning framework consists of ensemble learning and decision combiner model with general supervised learning algorithms as base learner. Feature selection is also applied in DECIML in order to increase the performance of the ensemble learning. In order to facilitate the experiments and future research on the imbalanced multiclass problem, a standard pool of benchmark data is created, which consists of 16 datasets with different degrees of imbalanced ratio and 4 datasets for imbalanced multiclass with feature selection purposes. The benchmark data is used to evaluate and compare the proposed frameworks with several ensemble methods, such as bagging and adaboost. The DECIML with feature selection is also evaluated and compared with methods named CFsSubsetEval and Filteredsubseteval. The results obtained show that the proposed learning frameworks are comparable to other methods. In addition, the selected benchmark data, experiments and the results are useful for future research on the imbalanced multiclass classification problem. Furthermore, the DECIML framework was applied to the real world leaf classification problem based on the shape features. Extensive experiments and results show that the DECIML method does provide a promising performance in imbalanced multiclass with highly noisy data.

Item Type: Thesis (Doctoral)
Keyword: Imbalanced data, Multiclass classification, Data
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK1-9971 Electrical engineering. Electronics. Nuclear engineering > TK7800-8360 Electronics > TK7885-7895 Computer engineering. Computer hardware
Department: SCHOOL > School of Engineering and Information Technology
Depositing User: DG MASNIAH AHMAD -
Date Deposited: 29 Apr 2024 10:37
Last Modified: 29 Apr 2024 10:37
URI: https://eprints.ums.edu.my/id/eprint/38557

Actions (login required)

View Item View Item