Comparison of data mining algorithms (random forest, C4.5, catboost) based on adaptive boosting in predicting diabetes mellitus
DOI:
https://doi.org/10.35335/cit.Vol16.2024.730.pp1-12Keywords:
Diabetes Mellitus, Comparative Analysis, Data MiningAbstract
This research aims to evaluate the performance of three algorithms data mining, namely C4.5, Random Forest, and Catboost Classifier, which are strengthened by Adaptive Boosting in predicting diabetes mellitus in humans. Through analysis, it was found that the C4.5 algorithm is based on Adaptive Boosting obtained an average accuracy of 73.74%, precision of 61.39%, and recall amounting to 69.00%. Random Forest algorithm based on Adaptive Boosting shows an average accuracy of 73.52%, precision of 65.79%, and recall amounting to 65.06%. Meanwhile, the Catboost Classifier algorithm is Adaptive based Boosting has an average accuracy of 73.67%, precision of 61.19%, and recall was 69.18%. Thus, although all three algorithms shows similar performance, the C4.5 algorithm based on Adaptive Boosting stands out with better performance in terms of accuracy, precision and recall. The implication of this research is that the use of the C4.5 algorithm is based Adaptive Boosting can be a more effective approach to support early detection of diabetes mellitus in humans
Downloads
References
S. Alam, M. K. Hasan, S. Neaz, N. Hussain, M. F. Hossain, and T. Rahman, “Diabetes Mellitus: insights from epidemiology, biochemistry, risk factors, diagnosis, complications and comprehensive management,” Diabetology, vol. 2, no. 2, pp. 36–50, 2021.
O. O. Oguntibeju, “Type 2 diabetes mellitus, oxidative stress and inflammation: examining the links,” Int. J. Physiol. Pathophysiol. Pharmacol., vol. 11, no. 3, p. 45, 2019.
R. C. R. Meex, E. E. Blaak, and L. J. C. van Loon, “Lipotoxicity plays a key role in the development of both insulin resistance and muscle atrophy in patients with type 2 diabetes,” Obes. Rev., vol. 20, no. 9, pp. 1205–1217, 2019.
S. Hong and K. M. Choi, “Sarcopenic obesity, insulin resistance, and their implications in cardiovascular and metabolic consequences,” Int. J. Mol. Sci., vol. 21, no. 2, p. 494, 2020.
P. Morigny, J. Boucher, P. Arner, and D. Langin, “Lipid and glucose metabolism in white adipocytes: pathways, dysfunction and therapeutics,” Nat. Rev. Endocrinol., vol. 17, no. 5, pp. 276–295, 2021.
A. A. Choudhury and V. D. Rajeswari, “Gestational diabetes mellitus-A metabolic and reproductive disorder,” Biomed. Pharmacother., vol. 143, p. 112183, 2021.
J. S. Varghese et al., “Diabetes diagnosis, treatment, and control in India: results from a national survey of 1.65 million adults aged 18 years and older, 2019-2021,” medRxiv, pp. 2002–2023, 2023.
A. Sanyaolu et al., “Diabetes mellitus: An overview of the types, prevalence, comorbidity, complication, genetics, economic implication, and treatment,” World J. Meta-Analysis, vol. 11, no. 5, pp. 134–143, 2023.
N. Sneha and T. Gangil, “Analysis of diabetes mellitus for early prediction using optimal features selection,” J. Big data, vol. 6, no. 1, pp. 1–19, 2019.
A. Chauin, “The main causes and mechanisms of increase in cardiac troponin concentrations other than acute myocardial infarction (Part 1): physical exertion, inflammatory heart disease, pulmonary embolism, renal failure, sepsis,” Vasc. Health Risk Manag., pp. 601–617, 2021.
L. Chaves and G. Marques, “Data mining techniques for early diagnosis of diabetes: a comparative study,” Appl. Sci., vol. 11, no. 5, p. 2218, 2021.
F. A. Khan, K. Zeb, M. Al-Rakhami, A. Derhab, and S. A. C. Bukhari, “Detection and prediction of diabetes using data mining: a comprehensive review,” IEEE Access, vol. 9, pp. 43711–43735, 2021.
Y. Liu, Z. Yu, and Y. Yang, “Diabetes risk data mining method based on electronic medical record analysis,” J. Healthc. Eng., vol. 2021, 2021.
H. Thakkar, V. Shah, H. Yagnik, and M. Shah, “Comparative anatomization of data mining and fuzzy logic techniques used in diabetes prognosis,” Clin. eHealth, vol. 4, pp. 12–23, 2021.
H. Yan, N. Yang, Y. Peng, and Y. Ren, “Data mining in the construction industry: Present status, opportunities, and future trends,” Autom. Constr., vol. 119, p. 103331, 2020.
W.-T. Wu et al., “Data mining in clinical big data: the frequently used databases, steps, and methodological models,” Mil. Med. Res., vol. 8, no. 44, pp. 1–12, 2021, doi: https://doi.org/10.1186/s40779-021-00338-z.
S. Deng, N. Zhang, J. Kang, Y. Zhang, W. Zhang, and H. Chen, “Meta-learning with dynamic-memory-based prototypical network for few-shot event detection,” in Proceedings of the 13th international conference on web search and data mining, 2020, pp. 151–159.
F. E. Bock, R. C. Aydin, C. J. Cyron, N. Huber, S. R. Kalidindi, and B. Klusemann, “A review of the application of machine learning and data mining approaches in continuum materials mechanics,” Front. Mater., vol. 6, p. 110, 2019.
E. Martinez-Ríos, L. Montesinos, M. Alfaro-Ponce, and L. Pecchia, “A review of machine learning in hypertension detection and blood pressure estimation based on clinical and physiological data,” Biomed. Signal Process. Control, vol. 68, p. 102813, 2021.
A. Shrivastava, M. Chakkaravarthy, and M. A. Shah, “A new machine learning method for predicting systolic and diastolic blood pressure using clinical characteristics,” Healthc. Anal., vol. 4, p. 100219, 2023.
A. Dinh, S. Miertschin, A. Young, and S. D. Mohanty, “A data-driven approach to predicting diabetes and cardiovascular disease with machine learning,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–15, 2019.
F. Kazerouni, A. Bayani, F. Asadi, L. Saeidi, N. Parvizi, and Z. Mansoori, “Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches,” BMC Bioinformatics, vol. 21, pp. 1–13, 2020.
C. Fiarni, E. M. Sipayung, and S. Maemunah, “Analysis and prediction of diabetes complication disease using data mining algorithm,” Procedia Comput. Sci., vol. 161, pp. 449–457, 2019.
M. M. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, “Likelihood prediction of diabetes at early stage using data mining techniques,” in Computer vision and machine intelligence in medical image analysis, Springer, 2020, pp. 113–125.
X. Wang et al., “Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier,” BMC Med. Inform. Decis. Mak., vol. 21, pp. 1–14, 2021.
L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G. Stiglic, “Early detection of type 2 diabetes mellitus using machine learning-based prediction models,” Sci. Rep., vol. 10, no. 1, p. 11981, 2020.
A. Andi, T. Thamrin, A. Susanto, E. Wijaya, and D. Djohan, “Analysis of the random forest and grid search algorithms in early detection of diabetes mellitus disease,” J. Mantik, vol. 7, no. 2, pp. 1117–1124, 2023.
B. A. C. Permana, R. Ahmad, H. Bahtiar, A. Sudianto, and I. Gunawan, “Classification of diabetes disease using decision tree algorithm (C4. 5),” in Journal of Physics: Conference Series, IOP Publishing, 2021, p. 12082.
P. Purbandini, E. Purwanti, E. Hariyanti, and F. Y. Ramadhan, “Application of the decision tree C4. 5 method on the classification of diet types of people with diabetes mellitus,” in AIP Conference Proceedings, AIP Publishing, 2023.
A. U. Haq et al., “Intelligent machine learning approach for effective recognition of diabetes in E-healthcare using clinical data,” Sensors, vol. 20, no. 9, p. 2649, 2020.
B. Kreshpaj et al., “What is precarious employment? A systematic review of definitions and operationalizations from quantitative and qualitative studies,” Scand. J. Work. Environ. Health, vol. 46, no. 3, pp. 235–247, 2020.
T. Hascher and J. Waber, “Teacher well-being: A systematic review of the research literature from the year 2000–2019,” Educ. Res. Rev., vol. 34, p. 100411, 2021.
H. K. Mohajan, “Quantitative research: A successful investigation in natural and social sciences,” J. Econ. Dev. Environ. People, vol. 9, no. 4, pp. 50–79, 2020.
S. Acharya, “Comparative analysis of classification accuracy for XGBoost, LightGBM, CatBoost, H2O, and Classifium.” 2022.
S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, pp. 40–46, 2021.
M. R. Islam, S. Banik, K. N. Rahman, and M. M. Rahman, “A comparative approach to alleviating the prevalence of diabetes mellitus using machine learning,” Comput. Methods Programs Biomed. Updat., vol. 4, p. 100113, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Yennimar Yennimar, William Leonardi, Harris Weide, Devin Cantona, Gani Mores Hutagalung

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

