Analisis Dampak SelectKBest dan SMOTEENN terhadap Akurasi Model Klasifikasi Penyakit Cacar Monyet Menggunakan Algoritma Machine Learning

  • Agung Triatna Universitas Buana Perjuangan Karawang
  • Yana Cahyana Universitas Buana Perjuangan Karawang
  • Tohirin Al Mudzakir Universitas Buana Perjuangan Karawang
  • Adi Rizky Pratama Universitas Buana Perjuangan Karawang
Keywords: Cacar Monyet, Klasifikasi, SelectKBest, SMOTEENN, Machine Learning

Abstract

Penyebaran cacar monyet yang cepat dan sulit dikendalikan membutuhkan metode prediksi penyakit yang akurat. Kesalahan prediksi false negative dapat menyebabkan infeksi tidak terdeteksi. Sebaliknya, diagnosis false positive menimbulkan kecemasan yang tidak perlu dan membebani fasilitas kesehatan dengan kasus yang sebenarnya tidak terinfeksi. Penelitian ini dilakukan untuk mengetahui pengaruh SelectKBest dan SMOTEENN terhadap akurasi model klasifikasi penyakit cacar monyet. Dataset yang digunakan berisi rekam medis gejala klinis pasien cacar monyet dengan dimensi (25.000, 11). Tahapan pengolahan data meliputi pengumpulan data, analisis data eksploratif (Exploratory Data Analysis / EDA), prapemrosesan, pemodelan, dan evaluasi. Penelitian ini menggunakan empat variasi dataset, yaitu dataset asli tanpa modifikasi, dataset hasil seleksi fitur menggunakan SelectKBest, dataset hasil resampling menggunakan SMOTEENN, serta dataset hasil kombinasi SelectKBest dan SMOTEENN. Hasil penelitian menunjukkan bahwa kombinasi SelectKBest dan SMOTEENN terbukti paling efektif dalam meningkatkan akurasi model klasifikasi. Algoritma XGBoost mencapai akurasi sebesar 100%, diikuti oleh Gradient Boosting dengan akurasi 98,57%, serta AdaBoost sebesar 89,97%. Temuan ini menunjukkan bahwa pemilihan fitur yang tepat, yang dikombinasikan dengan metode resampling data, mampu meningkatkan performa model dalam klasifikasi penyakit cacar monyet.

References

[1] N. Mascie Taylor and K. Moji, “Pandemics,” J. Peace Nucl. Disarm., vol. 4, no. sup1, pp. 47–59, Mar. 2021, doi: 10.1080/25751654.2021.1880769.
[2] H. Harapan et al., “Monkeypox: A Comprehensive Review,” Viruses, vol. 14, no. 10, p. 2155, Sep. 2022, doi: 10.3390/v14102155.
[3] J. P. Thornhill et al., “Monkeypox Virus Infection in Humans across 16 Countries — April–June 2022,” N. Engl. J. Med., vol. 387, no. 8, pp. 679–691, Aug. 2022, doi: 10.1056/NEJMoa2207323.
[4] World Health Organization, “Mpox,” https://www.who.int/news-room/fact-sheets/detail/mpox.
[5] M. E. dr. Siti Nadia Tarmizi and K. K. R. Biro Komunikasi dan Pelayanan Publik, “88 Kasus Konfirmasi Mpox di Indonesia, Seksual Sesama Jenis Jadi Salah Satu Penyebab.” Accessed: Nov. 17, 2024. [Online]. Available: https://kemkes.go.id/id/88-kasus-konfirmasi-mpox-di-indonesia-seksual-sesama-jenis-jadi-salah-satu-penyebab
[6] V. De Pace et al., “Molecular Diagnosis of Human Monkeypox Virus during 2022–23 Outbreak: Preliminary Evaluation of Novel Real-Time Qualitative PCR Assays,” Microorganisms, vol. 12, no. 4, p. 664, Mar. 2024, doi: 10.3390/microorganisms12040664.
[7] Z. L. Chelsky, D. Dittmann, T. Blanke, M. Chang, E. Vormittag-Nocito, and L. J. Jennings, “Validation Study of a Direct Real-Time PCR Protocol for Detection of Monkeypox Virus,” J. Mol. Diagnostics, vol. 24, no. 11, pp. 1155–1159, Nov. 2022, doi: 10.1016/j.jmoldx.2022.09.001.
[8] A. Hamdan and D. Ekmekci, “Design of Monkeypox Disease Diagnosis Model Using Classical Machine Learning Algorithm,” J. Soft Comput. Artif. Intell., vol. 5, no. 1, pp. 1–10, Jun. 2024, doi: 10.55195/jscai.1461849.
[9] L. Siena, T. H. Saragih, R. A. Nugroho, D. Kartini, Muliadi, and W. Caesarendra, “Evaluation of the Impact of SMOTEENN on Monkeypox Case Classification Performance Using Boosting Algorithms,” Indones. J. Electron. Electromed. Eng. Med. Informatics, vol. 7, no. 2, pp. 203–220, Apr. 2025, doi: 10.35882/nrgqsz63.
[10] S. Nagro, “A stacked ensemble approach for symptom-based monkeypox diagnosis,” Comput. Biol. Med., vol. 191, no. March, p. 110140, Jun. 2025, doi: 10.1016/j.compbiomed.2025.110140.
[11] R. Mahmood, J. Lucas, J. M. Alvarez, S. Fidler, and M. T. Law, “Optimizing Data Collection for Machine Learning,” Adv. Neural Inf. Process. Syst., vol. 35, no. NeurIPS, pp. 1–14, Oct. 2022, [Online]. Available: http://arxiv.org/abs/2210.01234
[12] Kaggle, “Monkeypox Patients Dataset.” Accessed: Jan. 09, 2025. [Online]. Available: https://www.kaggle.com/datasets/muhammad4hmed/monkeypox-patients-dataset
[13] S. Aldera, A. Emam, M. Al-Qurishi, M. Alrubaian, and A. Alothaim, “Exploratory Data Analysis and Classification of a New Arabic Online Extremism Dataset,” IEEE Access, vol. 9, pp. 161613–161626, 2021, doi: 10.1109/ACCESS.2021.3132651.
[14] E. Ibrahimi et al., “Overview of data preprocessing for machine learning applications in human microbiome research,” Front. Microbiol., vol. 14, no. October, pp. 1–8, Oct. 2023, doi: 10.3389/fmicb.2023.1250909.
[15] E. Poslavskaya and A. Korolev, “Encoding categorical data: Is there yet anything ‘hotter’ than one-hot encoding?,” Dec. 2023, doi: https://doi.org/10.48550/arXiv.2312.16930.
[16] M. K. Dahouda and I. Joe, “A Deep-Learned Embedding Technique for Categorical Features Encoding,” IEEE Access, vol. 9, pp. 114381–114391, 2021, doi: 10.1109/ACCESS.2021.3104357.
[17] N. Hidayat, “Improving the Accuracy of the Logistic Regression Algorithm Model using SelectKBest in Customer Prediction Based on Purchasing Behavior Patterns,” vol. 1, no. 1, pp. 9–17, 2023.
[18] S. Julkaew, T. Wongsirichot, K. Damkliang, and P. Sangthawan, “Improving accuracy of vascular access quality classification in hemodialysis patients using deep learning with K highest score feature selection,” J. Int. Med. Res., vol. 52, no. 4, Apr. 2024, doi: 10.1177/03000605241232519.
[19] F. Gurcan and A. Soylu, “Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis,” Cancers (Basel)., vol. 16, no. 19, p. 3417, Oct. 2024, doi: 10.3390/cancers16193417.
[20] F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, p. 344, Dec. 2022, doi: 10.1186/s12911-022-02075-2.
[21] L. Strani, M. Cocchi, D. Tanzilli, A. Biancolillo, F. Marini, and R. Vitale, “One class classification (class modelling): State of the art and perspectives,” TrAC Trends Anal. Chem., vol. 183, no. May 2024, p. 118117, Feb. 2025, doi: 10.1016/j.trac.2024.118117.
[22] K. A. A. W. Wardana and A. M. A. Rahim, “Analisis Perbandingan Algoritma XGBoost Dan Algoritma Random Forest Untuk Klasifikasi Data Kesehatan Mental,” Log. J. Ilmu Komput. dan Pendidik., vol. 2, pp. 808–818, Aug. 2024.
[23] M. Maharina, “Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost,” J. Appl. Data Sci., vol. 6, no. 1, pp. 496–507, Jan. 2024, doi: 10.47738/jads.v6i1.503.
[24] S. Wu and S. Meng, “Applied Mathematics and Nonlinear Sciences (aop) (aop) Applied Mathematics and Nonlinear Sciences A Modern Communication Path for Traditional Chinese Cultural Design Concepts Based on AdaBoost Model,” 2023, doi: 10.2478/10.2478/amns.2023.2.00068.
[25] O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, no. 1, p. 6086, Mar. 2024, doi: 10.1038/s41598-024-56706-x.
Published
2026-01-31