Enhancing phishing detection with advanced ensemble learning techniques

Nur Syakirin Mohd Shahiran and Mohammed Ahmed (2025) Enhancing phishing detection with advanced ensemble learning techniques. International Journal of Machine Intelligence and Computing, 1 (2). 26 -37.

[img] Text
FULLTEXT.pdf
Restricted to Registered users only

Download (493kB) | Request a copy

Abstract

Phishing attacks contribute to over 90% of data breaches, posing a severe cybersecurity threat by tricking users into divulging sensitive information. Traditional detection methods, such as blacklists and heuristic-based approaches, are often ineffective against new phishing websites due to their rapidly evolving nature. This study introduces an advanced phishing detection model that leverages ensemble learning techniques to improve accuracy, robustness, and adaptability. The model integrates Decision Tree, Support Vector Machine (SVM), and k-Nearest Neighbours (kNN) as base classifiers, combined through a stacking ensemble approach, with Logistic Regression serving as the meta-classifier. Feature selection is performed using Random Forest, selecting the most impactful attributes based on importance scores greater than 0.01. Principal Component Analysis (PCA) is applied to reduce dimensionality while retaining 95% of the variance, minimizing information loss. Hyperparameter optimization is achieved through Grid Search. The dataset was sourced from an open-access phishing detection repository and consists of 11,430 URLs, with 60% classified as phishing and 40% as legitimate. It includes 87 features that are categorized into URL structure, webpage content, and external service queries. The model's performance is evaluated using accuracy, precision, recall, and F1-score across various test sizes (10%, 20%, 30%, and 40%). Experimental results demonstrate that the stacking ensemble model achieves a peak accuracy of 97.64% with PCA (95%) and feature selection (importance score >0.01) at a 10% test size, significantly outperforming traditional methods. Performance comparisons across different test sizes highlight the positive impact of feature selection and PCA on phishing detection. Statistical validation through t-tests (p < 0.05) further confirms the model’s reliability, indicating substantial improvements over baseline methods. This study showcases the potential of ensemble learning and feature optimization in enhancing phishing detection, offering a robust solution for practical cybersecurity applications.

Item Type: Article
Keyword: Phishing detection, ensemble learning, feature selection, principal component analysis, stacking model
Subjects: Q Science > QA Mathematics > QA1-939 Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science > QA76.75-76.765 Computer software
Department: FACULTY > Faculty of Computing and Informatics
Depositing User: JUNAINE JASNI -
Date Deposited: 11 Aug 2025 12:01
Last Modified: 13 Aug 2025 12:19
URI: https://eprints.ums.edu.my/id/eprint/44866

Actions (login required)

View Item View Item