Perbandingan Metode Oversampling SMOTE dan ADASYN pada Klasifikasi Diabetes Menggunakan Algoritma CatBoost
DOI:
https://doi.org/10.51903/mifortekh.v6i1.1157Keywords:
Diabetes Classification, CatBoost Algorithm, SMOTE Oversampling, ADASYN Oversampling, Class ImbalanceAbstract
Class imbalance is a major challenge in diabetes classification, as it can lead models to become biased toward the majority class. Oversampling approaches such as the Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) are applied to address this issue by improving the representation of the minority class. This study compares the performance of both methods using the CatBoost algorithm on a diabetes classification dataset. The evaluation is carried out using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results indicate that the baseline CatBoost model already achieves strong performance, with an accuracy of 0.9720 and a ROC-AUC of 0.9796; however, the recall for the minority class remains relatively low at 0.6935. The implementation of SMOTE yields the most optimal improvement, achieving an accuracy of 0.9727, precision of 0.9737, recall of 0.6971, and an F1-score of 0.8125, while maintaining a ROC-AUC of 0.9796. Meanwhile, ADASYN also improves performance compared to the baseline, but its results are slightly lower than SMOTE, with an accuracy of 0.9719 and recall of 0.6924. Overall, SMOTE proves to be more effective in enhancing the CatBoost model’s ability to detect the minority class without compromising overall performance. Therefore, SMOTE is recommended as a more stable and optimal oversampling method for handling imbalanced data in diabetes classification tasks.
References
A. Rahim, A. M., Inggrid Yanuar Risca Pratiwi, & Muhammad Ainul Fikri. (2023). Klasifikasi Penyakit Jantung Menggunakan Metode Synthetic Minority Over-Sampling Technique Dan Random Forest Clasifier. Indonesian Journal of Computer Science, 12(5). https://doi.org/10.33022/ijcs.v12i5.3413
Alwaliyanto, Siska Kurnia Gusti, Iis Afrianty, & Fadhilah Syafria. (2025). Penerapan Metode ADASYN Dalam Mengatasi Imbalanced Data Untuk Klasifikasi Penyakit Stroke Menggunakan Support Vector Machine. Bulletin of Computer Science Research, 5(4), 532–541. https://doi.org/10.47065/bulletincsr.v5i4.612
Amin, M. D. I., Hidayat, J. J., Setyowati, C., Fitri, E. K., Anggraini, A. N., & Werdana, A. P. (n.d.). Implementasi Model LSTM Untuk Peramalan Curah Hujan Di Bekasi Dengan Pemanfaatan Data Cuaca BMKG. Jurnal Teknologi Informasi Digital, 1(2), 90–99. Retrieved https://jurnal.ipdig.id/index.php/jtid/article/view/200
Anshor, A. H., & Zy, A. T. (2024). Implementasi Metode Decision Tree pada Sistem Prediksi Status Kualitas Produk Minuman A. Jurnal Ilmiah Informatika Global, 15(1), 17–22. https://doi.org/10.36982/jiig.v15i1.3778
Dini Gladis Saputri, Chika Amelia Putri, Cyntia Ramandani, Depi Sugesti, Ayuni Dwi Andini, Carmia Alysia Dina Haqiqi, & Liss Dyah Dwi Arini. (2025). Diabetes Melitus sebagai Gangguan Endokrin: Tinjauan Patofisiologi dan Pendekatan Diagnosis. Jejak Digital: Jurnal Ilmiah Multidisiplin, 1(6 SE-Articles), 4382–4387. https://doi.org/10.63822/ggpkeg19
Dn, U. K. J., & Rahardi, M. (2026). Analysis of SMOTE and Random Search on Machine Learning Algorithms for Stroke Disease Diagnosis. Journal of Applied Informatics and Computing, 10(1), 847–855. https://doi.org/10.30871/jaic.v10i1.12046
Ernawati, S., & Maulana, I. (2025). Meningkatkan Klasifikasi Penyakit Diabetes Menggunakan Metode Ensemble Softvoting Dengan SMOTE-ENN dan Optimasi Bayesian. Evolusi : Jurnal Sains Dan Manajemen, 13(1), 71–86. https://doi.org/10.31294/evolusi.v13i1.8267
Febriansyah Istianto, A., Id Hadiana, A., & Rakhmat Umbara, F. (2024). PREDIKSI CURAH HUJAN MENGGUNAKAN METODE CATEGORICAL BOOSTING (CATBOOST). JATI (Jurnal Mahasiswa Teknik Informatika), 7(4), 2930–2937. https://doi.org/10.36040/jati.v7i4.7304
Hidayat, J. J., Amin, M. D. I., Fitri, E. K., Anggraini, A. N., Werdana, A. P., Setyowati, C., & Sasongko, A. T. (2026). Prediksi Diabetes Menggunakan Deep Neural Network dengan Penyesuaian Hiperparameter Berbasis Bayesian Optimization. Journal of Practical Computer Science, 5(2), 130–143. https://doi.org/10.37366/jpcs.v5i2.6419
Hidayat, J. J., & Hasanudin, S. (2026). Prediksi Volatilitas IHSG Dengan Hybrid Model GARCH–Random Forest Berbasis Machine Learning. Jurnal Manajemen Informatika & Teknologi, 6(1 SE-Articles), 130–140. https://doi.org/10.51903/mifortekh.v6i1.1134
Hidayat, J. J., Setyowati, C., & Werdana, A. P. (2025). Perancangan Sistem Prediksi Penyakit pada Tanaman Padi Berbasis Image Processing Menggunakan Algoritma VGG-16 Transfer Learning dan K-Means Segmentation. Journal of Practical Computer Science, 5(1), 1–15. https://doi.org/10.37366/jpcs.v5i1.5759
Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 27. https://doi.org/10.1186/s40537-019-0192-5
Kamalia, A. Z., Choiriyatun Nisa Latansa, & Zaenur Rozikin. (2026). Klasifikasi Kondisi Pasar Harga Emas ANTAM Indonesia Menggunakan Algoritma Decision Tree. Jurnal Komputer Teknologi Informasi Sistem Informasi (JUKTISI), 4(3), 2087–2098. https://doi.org/10.62712/juktisi.v4i3.800
Masruriyah, A., Novita, H., Sukmawati, C., Ramadhan, A., Arif, S., & Dermawan, B. (2024). Pengukuran Kinerja Model Klasifikasi dengan Data Oversampling pada Algoritma Supervised Learning untuk Penyakit Jantung. Computer Science (CO-SCIENCE), 4(1), 62–70. https://doi.org/10.31294/coscience.v4i1.2389
Muhammad Idris. (2025). Pemanfaatan Machine Learning untuk Optimasi Big Data dalam Sistem Informatika Modern. Journal of Information Systems and Technology, 1(1), 33–39. https://doi.org/10.64845/jistech.v1i1.41
Mujabi, M. F., & Yuniartika, W. (2018). Hubungan Kadar Gula Darah Dengan Tingkat Depresi Dan Aktifitas Fisik Pada Penderita Diabetes Mellitus. Jurnal Berita Ilmu Keperawatan, 11(2), 73–83.
Nugroho, A., Danny, M., & Nawangsih, I. (2025). Ensemble Learning for Robust Anomaly Detection in Banking Fraud. 2025 8th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 649–654. https://doi.org/10.1109/ISRITI68345.2025.11393116
Nugroho, A., Wiyanto, & Maulana, D. (2025). COMPARATIVE ANALYSIS OF CLASSIFICATION ALGORITHMS IN HANDLING IMBALANCED DATA WITH SMOTE OVERSAMPLING APPROACH. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 11(2), 487–495. https://doi.org/10.33480/jitk.v11i2.6956
Nurhayati, L. D., & Rahardi, M. (2025). Impact of SMOTE and ADASYN on Class Imbalance in Metabolic Syndrome Classification Using Random Forest Algorithm. Journal of Applied Informatics and Computing, 9(5), 2807–2813. https://doi.org/10.30871/jaic.v9i5.10657
Nurhopipah, A., & Magnolia, C. (2023). Perbandingan metode Resampling pada Imbalanced Dataset untuk klasifikasi komentar program MBKM. Jurnal Publikasi Ilmu Komputer Dan Multimedia, 2(1), 9–22.
Nurrifqi Fakhri Fikrillah, H., Hudawiguna, S., & Juliane, C. (2023). Klasifikasi Penerima Bansos Menggunakan Algoritma Naive Bayes. Jurnal Teknik Informatika Dan Sistem Informasi, 10(1), 683–695. http://jurnal.mdp.ac.id
Puspita Febriani, R., Agung Prabowo, N., Puspa Putri, D., Setyawan, S., & Benedictus. (2023). PANDUAN DIABETES BAGI PASIEN DAN KELUARGA. Penerbit Tahta Media, (SE-Katalog Buku). https://tahtamedia.co.id/index.php/issj/article/view/433
Putra, M. R. P., Juwariyah, S., Ridwan, M., & Marco, R. (2025). Optimasi Prediksi Kelayakan Pinjaman dengan Teknik Resampling dan Algoritma Boosting. Komputika : Jurnal Sistem Komputer, 14(2). https://doi.org/10.34010/komputika.v14i2.15485
Putra Sadewa, F., & Kurniawan, D. (2026). Application of ADASYN and Optuna in the XGBoost Algorithm for Stunting Detection. Journal of Applied Informatics and Computing, 10(1), 1006–1014. https://doi.org/10.30871/jaic.v10i1.12035
Putranto, A. F., -, A. S., -, D. B. P., -, R. E. D. M., & -, R. F. A. (2025). Optimasi Ulasan Palsu Menggunakan ADASYN Dan SMOTE. Journal of Informatics and Interactive Technology, 2(3), 413–418. https://doi.org/10.63547/jiite.v2i3.81
Putri Pasaribu, J., Indra, Z., Iskandar Al Idrus, S., Nasution, H., & Yandra Niska, D. (2025). STUDI KOMPARATIF: EVALUASI PERFORMA ALGORITMA ARTIFICIAL NEURAL NETWORK DENGAN ALGORITMA MACHINE LEARNING DALAM KLASIFIKASI PENYAKIT DIABETES. JATI (Jurnal Mahasiswa Teknik Informatika), 9(5), 8245–8253. https://doi.org/10.36040/jati.v9i5.15071
Rahman Wahid, M. A., Nugroho, A., & Halim Anshor, A. (2023). Prediksi Penyakit Kanker Paru-Paru Dengan Algoritma Regresi Linier. Bulletin of Information Technology (BIT), 4(1), 63–74. https://doi.org/10.47065/bit.v4i1.501
Reynaldi Valerian, F., Syarief, M., & Abdul Fatah, D. (2025). KLASIFIKASI TINGKAT OBESITAS MENGGUNAKAN METODE GBM DAN CONFUSION MATRIX. JATI (Jurnal Mahasiswa Teknik Informatika), 9(2), 2242–2249. https://doi.org/10.36040/jati.v9i2.13062
Rifai, A. M., Raharjo, S., Utami, E., & Ariatmanto, D. (2024). Analysis for diagnosis of pneumonia symptoms using chest X-ray based on MobileNetV2 models with image enhancement using white balance and contrast limited adaptive histogram equalization (CLAHE). Biomedical Signal Processing and Control, 90, 105857. https://doi.org/10.1016/j.bspc.2023.105857
Samodro, M. (2026). Analisis Pengaruh Ketidakseimbangan Data terhadap Kinerja Model Klasifikasi Penyakit Jantung. Journal of Software Engineering and Information System (SEIS), 6(1 SE-Articles), 56–62. https://ejurnal.umri.ac.id/index.php/SEIS/article/view/11050
Samudra, G., Anggraeni, A. W., Jamroni, A. R. B., Sarif, A., & Wiyanto, W. (2025). Efektivitas Teknik SMOTE Dalam Meningkatkan Performa Naïve Bayes Deteksi Gangguan Kecemasan Mahasiswa. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 12(3). https://doi.org/10.35957/jatisi.v12i3.12197
Santoso, J. T., Manongga, D., Setyawan, I., Purnomo, H. D., & Hendry. (2024). Exploring Data Analytics in Attendance Systems: Unveiling Machine Learning Techniques, Patterns, Practices, and Emerging Trends. Scientific Journal of Informatics, 11(2), 325–340. https://doi.org/10.15294/sji.v11i2.3438
Saputra, G. E., Swari, M. H. P., & Nurlaili, A. L. (2025). Implementasi Algoritma XGBoost, CatBoost, dan LGBM untuk Klasifikasi Pencemaran Udara. JIIP - Jurnal Ilmiah Ilmu Pendidikan, 8(12), 14135–14139. https://doi.org/10.54371/jiip.v8i12.10102
Setyowati, C., Anggraini, A. N., & Fitri, E. K. (n.d.). Analisis Sentimen dan Karakteristik Linguistik Komentar Publik terhadap Kebijakan Militer Menggunakan Model RoBERTa. Jurnal Teknologi Informasi Digital, 2(1), 39–46. Retrieved https://jurnal.ipdig.id/index.php/jtid/article/view/219
Sidiq, S., Alfian, A., & Mabrur, N. S. (2025). Pengembangan Model Prediksi Risiko Diabetes Menggunakan Pendekatan AdaBoost dan Teknik Oversampling SMOTE. Jurnal Ilmiah Informatika Dan Ilmu Komputer (JIMA-ILKOM), 4(1 SE-Articles), 13–23. https://doi.org/10.58602/jima-ilkom.v4i1.41
Supiyan, D. (2025). Pengembangan Sistem Pakar Untuk Diagnosa Penyakit Diabetes Melitus Menggunakan Metode Forward Chaining. Bit-Tech, 7(3), 918–927. https://doi.org/10.32877/bt.v7i3.2244
Suryaputri, C. O., & Rahardi, M. (2026). Analysis of Gradient Boosted Trees Algorithm in Breast Cancer Classification. Journal of Applied Informatics and Computing, 10(1), 605–618. https://doi.org/10.30871/jaic.v10i1.11875
Syaban, K., & Mardiawati. (2025). Evaluasi Model Ensemble Learning pada Identifikasi Faktor Risiko Diabetes Mellitus. Jurnal Teknologi Dan Informasi, 15(2), 121–130. https://doi.org/10.34010/jati.v15i2.16238
Tena, A. (2023). Penggunaan teknologi berbasis e-health sebagai upaya dalam mengontrol glikemik pasien diabetes mellitus: A Scoping Review. Universitas Hasanuddin.
Werdana, A. P. (n.d.). Pemodelan Klasifikasi Efisiensi Kalori Berbasis Data Aktivitas dan Kondisi Fisiologis Menggunakan Random Forest dan SMOTE. Jurnal Teknologi Informasi Digital, 2(1), 54–62. Retrieved https://jurnal.ipdig.id/index.php/jtid/article/view/222
Zhou, F., Pan, H., Gao, Z., Huang, X., Qian, G., Zhu, Y., & Xiao, F. (2021). Fire prediction based on catboost algorithm. Mathematical Problems in Engineering, 2021(1), 1929137.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Jurnal Manajemen Informatika & Teknologi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.









