A Data-Centric and Explainable Framework for Trustworthy Network Intrusion Detection

By: Mayes Nasser Ahmad | Qasem Abu Al-Haija | Pages: 1 - 8 | pdf icon

Open

Abstract

Intrusion Detection Systems (IDSs) play a critical role in protecting modern networks against increasingly sophisticated cyber threats. This paper presents a data-centric and explainable machine learning framework for network intrusion detection using the CIC-IDS2017 benchmark dataset. The proposed framework integrates data preprocessing, SMOTE-based class imbalance mitigation, supervised machine learning, and explainable artificial intelligence techniques to improve both detection performance and transparency. Three widely used classifiers—Logistic Regression, Random Forest, and XGBoost—are evaluated using security-oriented metrics including accuracy, precision, recall, F1-score, confusion matrices, and ROC-AUC. Experimental results demonstrate the superiority of XGBoost, achieving 99.79% accuracy, 99.85% precision, 99.79% recall, and 99.81% F1-score, while achieving an ROC-AUC of 1.0 for binary intrusion detection. Furthermore, SHAP-based explainability analysis identifies the most influential network-flow features contributing to attack detection decisions. The results confirm that combining data-centric preprocessing, imbalance-aware learning, and explainable AI can significantly enhance the robustness, interpretability, and practical applicability of machine learning-based intrusion detection systems.

DOI URL: https://doi.org/10.64820/AEPJICS.21.1.8.62026