Exploring clusters of rare events using unsupervised random forests

Z A Omar and Chin, Su Na and Siti Rahayu Mohd. Hashim and N Hamzah (2022) Exploring clusters of rare events using unsupervised random forests. In: 14th Seminar on Science and Technology 2021 (S&T 2021) (Virtual Conference), 8 - 9 September 2021, Universiti Malaysia Sabah (UMS).

[img] Text
Restricted to Registered users only

Download (485kB) | Request a copy
[img] Text

Download (64kB)


Given highly imbalanced data, most learning algorithms face the challenge of accurately predicting rare events, while such cases are the ones that carry importance and useful knowledge. In a binary class label dataset, the rare events are the ones in the minority class. This study used a stroke dataset with a binary class label and the class imbalance ratio was 54:1. In addition to that, the dataset contained missing values and mixed data types. To identify the intrinsic structures in the minority class (the stroke group), Random Forest Clustering was used to produce the proximity matrix and fed to Partition around Medoid (PAM) clustering method to identify the optimal number of clusters. The proximity plot seems to show there could be cluster tendency as Hopkins’s statistics test value was H = 0.8735 and k=2 was identified to be the optimal number of clusters. Based on the internal cluster validation, however, the silhouette coefficient width was small (0.1), indicating that many of the data objects were within the other boundary of the other class. We have suggested a further investigation plan in this paper for the next action.

Item Type: Conference or Workshop Item (Paper)
Keyword: Forest clustering , Partition around Medoid , PAM
Subjects: S Agriculture > SD Forestry > SD1-669.5 Forestry > SD388 Forestry machinery and engineering
Department: FACULTY > Faculty of Science and Natural Resources
Depositing User: SAFRUDIN BIN DARUN -
Date Deposited: 12 Oct 2022 10:28
Last Modified: 12 Oct 2022 10:28
URI: https://eprints.ums.edu.my/id/eprint/34420

Actions (login required)

View Item View Item