Perbandingan Metode Oversampling SMOTE dan ADASYN pada Klasifikasi Diabetes Menggunakan Algoritma CatBoost

Authors

  • Jose Julian Hidayat Universitas Pelita Bangsa
  • Zaenur Rozikin Universitas Pelita Bangsa

DOI:

https://doi.org/10.51903/mifortekh.v6i1.1157

Keywords:

Diabetes Classification, CatBoost Algorithm, SMOTE Oversampling, ADASYN Oversampling, Class Imbalance

Abstract

Class imbalance is a major challenge in diabetes classification, as it can lead models to become biased toward the majority class. Oversampling approaches such as the Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) are applied to address this issue by improving the representation of the minority class. This study compares the performance of both methods using the CatBoost algorithm on a diabetes classification dataset. The evaluation is carried out using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results indicate that the baseline CatBoost model already achieves strong performance, with an accuracy of 0.9720 and a ROC-AUC of 0.9796; however, the recall for the minority class remains relatively low at 0.6935. The implementation of SMOTE yields the most optimal improvement, achieving an accuracy of 0.9727, precision of 0.9737, recall of 0.6971, and an F1-score of 0.8125, while maintaining a ROC-AUC of 0.9796. Meanwhile, ADASYN also improves performance compared to the baseline, but its results are slightly lower than SMOTE, with an accuracy of 0.9719 and recall of 0.6924. Overall, SMOTE proves to be more effective in enhancing the CatBoost model’s ability to detect the minority class without compromising overall performance. Therefore, SMOTE is recommended as a more stable and optimal oversampling method for handling imbalanced data in diabetes classification tasks.

References

A. Rahim, A. M., Inggrid Yanuar Risca Pratiwi, & Muhammad Ainul Fikri. (2023). Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier. Indonesian Journal of Computer Science, 12(5). https://doi.org/10.33022/ijcs.v12i5.3413

Alwaliyanto, Siska Kurnia Gusti, Iis Afrianty, & Fadhilah Syafria. (2025). Penerapan Metode ADASYN Dalam Mengatasi Imbalanced Data Untuk Klasifikasi Penyakit Stroke Menggunakan Support Vector Machine. Bulletin of Computer Science Research, 5(4), 532–541. https://doi.org/10.47065/bulletincsr.v5i4.612

Amin, M. D. I., Hidayat, J. J., Setyowati, C., Fitri, E. K., Anggraini, A. N., & Werdana, A. P. (n.d.). Implementasi Model LSTM Untuk Peramalan Curah Hujan Di Bekasi Dengan Pemanfaatan Data Cuaca BMKG. Jurnal Teknologi Informasi Digital, 1(2), 90–99. Retrieved https://jurnal.ipdig.id/index.php/jtid/article/view/200

Anshor, A. H., & Zy, A. T. (2024). Implementasi Metode Decision Tree pada Sistem Prediksi Status Kualitas Produk Minuman A. Jurnal Ilmiah Informatika Global, 15(1), 17–22. https://doi.org/10.36982/jiig.v15i1.3778

Dini Gladis Saputri, Chika Amelia Putri, Cyntia Ramandani, Depi Sugesti, Ayuni Dwi Andini, Carmia Alysia Dina Haqiqi, & Liss Dyah Dwi Arini. (2025). Diabetes Melitus sebagai Gangguan Endokrin: Tinjauan Patofisiologi dan Pendekatan Diagnosis. Jejak Digital: Jurnal Ilmiah Multidisiplin, 1(6 SE-Articles), 4382–4387. https://doi.org/10.63822/ggpkeg19

Dn, U. K. J., & Rahardi, M. (2026). Analysis of SMOTE and Random Search on Machine Learning Algorithms for Stroke Disease Diagnosis. Journal of Applied Informatics and Computing, 10(1), 847–855. https://doi.org/10.30871/jaic.v10i1.12046

Ernawati, S., & Maulana, I. (2025). Meningkatkan Klasifikasi Penyakit Diabetes Menggunakan Metode Ensemble Softvoting Dengan SMOTE-ENN dan Optimasi Bayesian. Evolusi : Jurnal Sains Dan Manajemen, 13(1), 71–86. https://doi.org/10.31294/evolusi.v13i1.8267

Febriansyah Istianto, A., Id Hadiana, A., & Rakhmat Umbara, F. (2024). PREDIKSI CURAH HUJAN MENGGUNAKAN METODE CATEGORICAL BOOSTING (CATBOOST). JATI (Jurnal Mahasiswa Teknik Informatika), 7(4), 2930–2937. https://doi.org/10.36040/jati.v7i4.7304

Hidayat, J. J., Amin, M. D. I., Fitri, E. K., Anggraini, A. N., Werdana, A. P., Setyowati, C., & Sasongko, A. T. (2026). Prediksi Diabetes Menggunakan Deep Neural Network dengan Penyesuaian Hiperparameter Berbasis Bayesian Optimization. Journal of Practical Computer Science, 5(2), 130–143. https://doi.org/10.37366/jpcs.v5i2.6419

Hidayat, J. J., & Hasanudin, S. (2026). Prediksi Volatilitas IHSG Dengan Hybrid Model GARCH–Random Forest Berbasis Machine Learning. Jurnal Manajemen Informatika & Teknologi, 6(1 SE-Articles), 130–140. https://doi.org/10.51903/mifortekh.v6i1.1134

Hidayat, J. J., Setyowati, C., & Werdana, A. P. (2025). Perancangan Sistem Prediksi Penyakit pada Tanaman Padi Berbasis Image Processing Menggunakan Algoritma VGG-16 Transfer Learning dan K-Means Segmentation. Journal of Practical Computer Science, 5(1), 1–15. https://doi.org/10.37366/jpcs.v5i1.5759

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 27. https://doi.org/10.1186/s40537-019-0192-5

Kamalia, A. Z., Choiriyatun Nisa Latansa, & Zaenur Rozikin. (2026). Klasifikasi Kondisi Pasar Harga Emas ANTAM Indonesia Menggunakan Algoritma Decision Tree. Jurnal Komputer Teknologi Informasi Sistem Informasi (JUKTISI), 4(3), 2087–2098. https://doi.org/10.62712/juktisi.v4i3.800

Masruriyah, A., Novita, H., Sukmawati, C., Ramadhan, A., Arif, S., & Dermawan, B. (2024). Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung. Computer Science (CO-SCIENCE), 4(1), 62–70. https://doi.org/10.31294/coscience.v4i1.2389

Muhammad Idris. (2025). Pemanfaatan Machine Learning untuk Optimasi Big Data dalam Sistem Informatika Modern. Journal of Information Systems and Technology, 1(1), 33–39. https://doi.org/10.64845/jistech.v1i1.41

Mujabi, M. F., & Yuniartika, W. (2018). Hubungan Kadar Gula Darah Dengan Tingkat Depresi Dan Aktifitas Fisik Pada Penderita Diabetes Mellitus. Jurnal Berita Ilmu Keperawatan, 11(2), 73–83.

Nugroho, A., Danny, M., & Nawangsih, I. (2025). Ensemble Learning for Robust Anomaly Detection in Banking Fraud. 2025 8th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 649–654. https://doi.org/10.1109/ISRITI68345.2025.11393116

Nugroho, A., Wiyanto, & Maulana, D. (2025). COMPARATIVE ANALYSIS OF CLASSIFICATION ALGORITHMS IN HANDLING IMBALANCED DATA WITH SMOTE OVERSAMPLING APPROACH. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 11(2), 487–495. https://doi.org/10.33480/jitk.v11i2.6956

Nurhayati, L. D., & Rahardi, M. (2025). Impact of SMOTE and ADASYN on Class Imbalance in Metabolic Syndrome Classification Using Random Forest Algorithm. Journal of Applied Informatics and Computing, 9(5), 2807–2813. https://doi.org/10.30871/jaic.v9i5.10657

Nurhopipah, A., & Magnolia, C. (2023). Perbandingan metode Resampling pada Imbalanced Dataset untuk klasifikasi komentar program MBKM. Jurnal Publikasi Ilmu Komputer Dan Multimedia, 2(1), 9–22.

Nurrifqi Fakhri Fikrillah, H., Hudawiguna, S., & Juliane, C. (2023). Klasifikasi Penerima Bansos Menggunakan Algoritma Naive Bayes. Jurnal Teknik Informatika Dan Sistem Informasi, 10(1), 683–695. http://jurnal.mdp.ac.id

Puspita Febriani, R., Agung Prabowo, N., Puspa Putri, D., Setyawan, S., & Benedictus. (2023). PANDUAN DIABETES BAGI PASIEN DAN KELUARGA. Penerbit Tahta Media, (SE-Katalog Buku). https://tahtamedia.co.id/index.php/issj/article/view/433

Putra, M. R. P., Juwariyah, S., Ridwan, M., & Marco, R. (2025). Optimasi Prediksi Kelayakan Pinjaman dengan Teknik Resampling dan Algoritma Boosting. Komputika : Jurnal Sistem Komputer, 14(2). https://doi.org/10.34010/komputika.v14i2.15485

Putra Sadewa, F., & Kurniawan, D. (2026). Application of ADASYN and Optuna in the XGBoost Algorithm for Stunting Detection. Journal of Applied Informatics and Computing, 10(1), 1006–1014. https://doi.org/10.30871/jaic.v10i1.12035

Putranto, A. F., -, A. S., -, D. B. P., -, R. E. D. M., & -, R. F. A. (2025). Optimasi Ulasan Palsu Menggunakan ADASYN Dan SMOTE. Journal of Informatics and Interactive Technology, 2(3), 413–418. https://doi.org/10.63547/jiite.v2i3.81

Putri Pasaribu, J., Indra, Z., Iskandar Al Idrus, S., Nasution, H., & Yandra Niska, D. (2025). STUDI KOMPARATIF: EVALUASI PERFORMA ALGORITMA ARTIFICIAL NEURAL NETWORK DENGAN ALGORITMA MACHINE LEARNING DALAM KLASIFIKASI PENYAKIT DIABETES. JATI (Jurnal Mahasiswa Teknik Informatika), 9(5), 8245–8253. https://doi.org/10.36040/jati.v9i5.15071

Rahman Wahid, M. A., Nugroho, A., & Halim Anshor, A. (2023). Prediksi Penyakit Kanker Paru-Paru Dengan Algoritma Regresi Linier. Bulletin of Information Technology (BIT), 4(1), 63–74. https://doi.org/10.47065/bit.v4i1.501

Reynaldi Valerian, F., Syarief, M., & Abdul Fatah, D. (2025). KLASIFIKASI TINGKAT OBESITAS MENGGUNAKAN METODE GBM DAN CONFUSION MATRIX. JATI (Jurnal Mahasiswa Teknik Informatika), 9(2), 2242–2249. https://doi.org/10.36040/jati.v9i2.13062

Rifai, A. M., Raharjo, S., Utami, E., & Ariatmanto, D. (2024). Analysis for diagnosis of pneumonia symptoms using chest X-ray based on MobileNetV2 models with image enhancement using white balance and contrast limited adaptive histogram equalization (CLAHE). Biomedical Signal Processing and Control, 90, 105857. https://doi.org/10.1016/j.bspc.2023.105857

Samodro, M. (2026). Analisis Pengaruh Ketidakseimbangan Data terhadap Kinerja Model Klasifikasi Penyakit Jantung. Journal of Software Engineering and Information System (SEIS), 6(1 SE-Articles), 56–62. https://ejurnal.umri.ac.id/index.php/SEIS/article/view/11050

Samudra, G., Anggraeni, A. W., Jamroni, A. R. B., Sarif, A., & Wiyanto, W. (2025). Efektivitas Teknik SMOTE Dalam Meningkatkan Performa Naïve Bayes Deteksi Gangguan Kecemasan Mahasiswa. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 12(3). https://doi.org/10.35957/jatisi.v12i3.12197

Santoso, J. T., Manongga, D., Setyawan, I., Purnomo, H. D., & Hendry. (2024). Exploring Data Analytics in Attendance Systems: Unveiling Machine Learning Techniques, Patterns, Practices, and Emerging Trends. Scientific Journal of Informatics, 11(2), 325–340. https://doi.org/10.15294/sji.v11i2.3438

Saputra, G. E., Swari, M. H. P., & Nurlaili, A. L. (2025). Implementasi Algoritma XGBoost, CatBoost, dan LGBM untuk Klasifikasi Pencemaran Udara. JIIP - Jurnal Ilmiah Ilmu Pendidikan, 8(12), 14135–14139. https://doi.org/10.54371/jiip.v8i12.10102

Setyowati, C., Anggraini, A. N., & Fitri, E. K. (n.d.). Analisis Sentimen dan Karakteristik Linguistik Komentar Publik terhadap Kebijakan Militer Menggunakan Model RoBERTa. Jurnal Teknologi Informasi Digital, 2(1), 39–46. Retrieved https://jurnal.ipdig.id/index.php/jtid/article/view/219

Sidiq, S., Alfian, A., & Mabrur, N. S. (2025). Pengembangan Model Prediksi Risiko Diabetes Menggunakan Pendekatan AdaBoost dan Teknik Oversampling SMOTE. Jurnal Ilmiah Informatika Dan Ilmu Komputer (JIMA-ILKOM), 4(1 SE-Articles), 13–23. https://doi.org/10.58602/jima-ilkom.v4i1.41

Supiyan, D. (2025). Pengembangan Sistem Pakar Untuk Diagnosa Penyakit Diabetes Melitus Menggunakan Metode Forward Chaining. Bit-Tech, 7(3), 918–927. https://doi.org/10.32877/bt.v7i3.2244

Suryaputri, C. O., & Rahardi, M. (2026). Analysis of Gradient Boosted Trees Algorithm in Breast Cancer Classification. Journal of Applied Informatics and Computing, 10(1), 605–618. https://doi.org/10.30871/jaic.v10i1.11875

Syaban, K., & Mardiawati. (2025). Evaluasi Model Ensemble Learning pada Identifikasi Faktor Risiko Diabetes Mellitus. Jurnal Teknologi Dan Informasi, 15(2), 121–130. https://doi.org/10.34010/jati.v15i2.16238

Tena, A. (2023). Penggunaan teknologi berbasis e-health sebagai upaya dalam mengontrol glikemik pasien diabetes mellitus: A Scoping Review. Universitas Hasanuddin.

Werdana, A. P. (n.d.). Pemodelan Klasifikasi Efisiensi Kalori Berbasis Data Aktivitas dan Kondisi Fisiologis Menggunakan Random Forest dan SMOTE. Jurnal Teknologi Informasi Digital, 2(1), 54–62. Retrieved https://jurnal.ipdig.id/index.php/jtid/article/view/222

Zhou, F., Pan, H., Gao, Z., Huang, X., Qian, G., Zhu, Y., & Xiao, F. (2021). Fire prediction based on catboost algorithm. Mathematical Problems in Engineering, 2021(1), 1929137.

Downloads

Published

2026-05-01

How to Cite

Perbandingan Metode Oversampling SMOTE dan ADASYN pada Klasifikasi Diabetes Menggunakan Algoritma CatBoost. (2026). Jurnal Manajemen Informatika & Teknologi, 6(1), 151-164. https://doi.org/10.51903/mifortekh.v6i1.1157

Similar Articles

1-10 of 21

You may also start an advanced similarity search for this article.