Comparison of imbalanced data treatments: a case study on cleft lip and palate data

Zaturrawiah Ali Omar and Chin, Su Na and Siti Rahayu Mohd. Hashim and Norhafiza Hamzah (2020) Comparison of imbalanced data treatments: a case study on cleft lip and palate data.

[img] Text
Comparison of imbalanced data treatments.pdf

Download (40kB)
[img] Text
Comparison of imbalanced data treatments1.pdf
Restricted to Registered users only

Download (342kB) | Request a copy

Abstract

This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison between a Balanced Random Forest (BRF), Synthetic Minority Over-sampling Technique (SMOTE) on Random Forests (RF) and Weighted Random Forest (WRF) were then conducted on the CLP dataset and results were compared using the area under the curve (AUC) and the tradeoff between Sensitivity and Specificity. The results showed no difference in predictive ability between untreated (RF), oversampling (SMOTE+RF) and penalty treatment (WRF) but poor performances of the downsampling treatment (BRF). It was observed that the small number of training and test sample size had attributed to the results obtained and severely affect the performance of the classifier used for each treatment. The SMOTE+RF oversampling method, however, demonstrated to be promising for the CLP dataset.

Item Type: Proceedings
Uncontrolled Keywords: Imbalanced data, SMOTE, Weighted Random Forest, Balanced Random Forest, Random Forests
Subjects: Q Science > QA Mathematics
S Agriculture > SD Forestry
Divisions: FACULTY > Faculty of Science and Natural Resources
Depositing User: SITI AZIZAH BINTI IDRIS -
Date Deposited: 17 Jun 2021 02:32
Last Modified: 17 Jun 2021 02:32
URI: http://eprints.ums.edu.my/id/eprint/21431

Actions (login required)

View Item View Item