Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet

Authors

  • Suraya Suraya Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia
  • Muhammad Sholeh Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia
  • Dina Andayati Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia

DOI:

https://doi.org/10.35335/cit.Vol15.2023.408.pp73-83

Keywords:

Clustering, datasheet, K-means, Davis Bouldin;, Distance matrix

Abstract

One of the data mining models is clustering, clustering models can be used to create groupings of data. Clustering is done by creating groups of data that are close to each other. The research was conducted by clustering wheat seed datasheets.  The wheat grain datasheet contains various types of wheat data.  The purpose of this research is to create a clustering model. The algorithm used is the K-means algorithm and a comparison is made with several distance Metric algorithms. The datasheet used was tested with the K-means algorithm and tested the clustering value (k) ranging from k = 2 to k = 6. Comparison of clustering results with K-means is also done by comparing with distance metric algorithms, namely Euclidean distance, Manhattan distance, and Chebychev distance.  All testing processes are evaluated, and the evaluation is done to select many good groupings. The evaluation process is carried out using the Davis-Bouldin method. The results of the grouping that has been done, each seen Davis Bouldin evaluation. The evaluation value of Davis Bouldin is sought from the smallest value and if the evaluation result is negative, the value is solved. The research method used is Knowledge Discovery in Database (KDD). The results showed that the same datasheet and using the K-means algorithm and the same evaluation resulted in different evaluation values. The Euclidian, Manhattan, and Chebychev algorithms produce the best k value of 2, The conclusion of the wheat seed datasheet clustering research produces a value of k = 2

Downloads

Download data is not yet available.

References

M. Arhami and M. Nasir, Data Mining - Algoritma dan Implementasi. Yogyakarta: Penerbit Andi, 2020.

D. Cielen, A. D. B. Meysman, and M. Ali, Introducing Data Science. New York: Manning Publications, 2016.

X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,†Soc. Sci. Res., vol. 110, no. 102817, pp. 13–24, 2022, doi: https://doi.org/10.1016/j.ssresearch.2022.102817.

Ekka Pujo Ariesanto Akhmad, “Data Mining Menggunakan Regresi Linear untuk Prediksi Harga Saham Perusahaan Pelayaran,†J. Apl. Pelayaran dan Kepelabuhanan, vol. 10, no. 2, pp. 120–131, 2020, doi: 10.30649/japk.v10i2.83.

S. M. A. A, O. M. E. E, S. M. A. A, and T. F. Sarnaghi, “Determination of households benefits from subsidies by using data mining approaches,†J. Inf. Technol. Polit., vol. 19, no. 3, pp. 1–20, 2022, doi: https://doi.org/10.1080/19331681.2022.2097974.

Y. L. Jin Yang et al., “Brief introduction of medical database and data mining technology in big data era,†J. Evidenceâ€Based Med. Online Libr., vol. 13, no. 1, pp. 57–69, 2020, doi: https://doi.org/10.1111/jebm.12373.

D. K. Sharma, S. Lohana, S. Arora, A. Dixit, M. Tiwari, and T. Tiwari, “E-Commerce product comparison portal for classification of customer data based on data mining,†in Materials Today: Proceedings, 2022, vol. 51, no. 1, pp. 166–171, doi: https://doi.org/10.1016/j.matpr.2021.05.068.

K. Deepika and N. Sathyanarayana, “Comparison Of Student Academic Performance On Different Educational Datasets Using Different Data Mining Techniques,†Int. J. Comput. Eng. Res., vol. 8, no. 9, pp. 28–38, 2018.

A. O. Oyedeji, A. M. Salami, O. Folorunsho, and O. R. Abolade, “Analysis and Prediction of Student Academic Performance Using Machine Learning,†J. Inf. Technol. Comput. Eng., vol. 4, no. 1, pp. 10–15, 2020, doi: https://doi.org/10.25077/jitce.4.01.10-15.2020.

A. Bastian, H. Sujadi, and G. Febrianto, “Penerapan Algoritma K-Means Clustering Analysis Pada Penyakit Menular Manusia (Studi Kasus Kabupaten Majalengka),†J. Sist. Inf. (Journal Inf. Syst., vol. 14, no. 1, pp. 26–32, 2018, doi: https://doi.org/10.21609/jsi.v14i1.566.

A. Fotouhi and M. Montazeri-Gh, “Tehran driving cycle development using the K-means clustering method,†Sci. Iran., vol. 20, no. 2, pp. 286–293, 2013, doi: https://doi.org/10.1016/j.scient.2013.04.001.

W.-J. Son and I.-S. Cho, “Analysis of Trends in Mega-Sized Container Ships Using the K-Means Clustering Algorithm,†Appl. Sci., vol. 12, no. 4, pp. 10–17, 2022, doi: https://doi.org/10.3390/app12042115.

J. Vijay and J. Subhashin, “An efficient brain tumor detection methodology using K-means clustering algoriftnn,†in 2013 International Conference on Communication and Signal Processing-IEEE Xplore, 2013, pp. 653–657, doi: doi: 10.1109/iccsp.2013.6577136.

T. Hardiani, “Analisis Clustering Kasus Covid 19 Di Indonesia Menggunakan Algoritma K-Means,†Janapati, vol. 11, no. 2, pp. 156–165, 2022, doi: https://doi.org/10.23887/janapati.v11i2.45376.

A. Lia Hananto et al., “Analysis of Drug Data Mining with Clustering Technique Using K-Means Algorithm,†J. Phys. Conf. Ser., vol. 1908, no. 1, 2021, doi: 10.1088/1742-6596/1908/1/012024.

K. Rahayu, L. Novianti, and M. Kusnandar, “Implementation Data Mining with K-Means Algorithm for Clustering Distribution Rabies Case Area in Palembang City,†J. Phys. Conf. Ser., vol. 1500, no. 1, 2020, doi: 10.1088/1742-6596/1500/1/012121.

A. Dogan and D. Birant, “Machine learning and data mining in manufacturing,†Expert Syst. Appl., vol. 166, no. 114060, p. 166, 2021, doi: https://doi.org/10.1016/j.eswa.2020.114060.

S. Kunnakorntammanop, N. Thepwuttisathaphon, and S. Thaicharoen, “An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers,†Springer-Soft Comput. Data Sci., vol. 1100, no. August, pp. 208–222, 2019, doi: DOI: 10.1007/978-981-15-0399-3_17.

Caro Fuchs, S. Spolaor, M. S. Nobile, and U. Kaymak, “pyFUME: a Python Package for Fuzzy Model Estimation,†IEEE Int. Conf. Fuzzy Syst., vol. 19911264, no. August, pp. 43–47, 2020, doi: 10.1109/FUZZ48607.2020.9177565.

B. Suharjo, “Application of K-Means Cluster and Spatial Statistics using Python to Analyze the Indicators of Indonesia Information Technology,†Digit. Zo. J. Teknol. Inf. Komun., vol. 12, no. 1, pp. 11–18, 2021.

J.-J. Beunza et al., “Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease),†J. Biomed. Inform., vol. 97, no. 103257, pp. 1–6, 2019, doi: https://doi.org/10.1016/j.jbi.2019.103257.

Rienna Oktarina and Junita, “Determine the clustering of cities in Indonesia for disaster management using K-Means by excel and RapidMiner,†IOP Conf. Ser. Earth Environ. Sci., vol. 794, no. 012094, pp. 1–10, 2020, doi: DOI 10.1088/1755-1315/794/1/012094.

R. W. Sari, H. Dedy, I. dan Gunawan, and W. P. Agus, “Aplikasi RapidMiner dalam Pengelompokkan Kasus Penyakit AIDS berdasarkan Provinsi dengan Data Mining K-means Clustering,†Reg. Dev. Ind. Heal. Sci. Technol. Art Life, pp. 59–69, 2018.

Y. Religia, “Metode Manhattan, Euclidean Dan Chebyshev Pada Algoritma K-Means Untuk Pengelompokan Status Desa,†Universitas Dian Nuswantoro Semarang, Semarang, 2016.

M. Nishom, “Perbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma K-Means Clustering berbasis Chi-Square,†J. Inform. J. Pengemb. IT, vol. 4, no. 1, pp. 20–24, 2019, doi: 10.30591/jpit.v4i1.1253.

G. Bonaccorso, Machine Learning Algorithm. Birmingham: Packt Publishing Ltd, 2017.

A. Chisholm, Exploring Data with RapidMiner. Birmingham: Packt Publishing Ltd, 2013.

E. Mohamed and T. Celik, “Early detection of failures from vehicle equipment data using K-means clustering design,†Comput. Electr. Eng., vol. 103, no. 108351, pp. 1–10, 2022, doi: https://doi.org/10.1016/j.compeleceng.2022.108351.

J. Chen, X. Qi, L. Chen, F. Chen, and G. Cheng, “Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection,†Knowledge-Based Syst., vol. 203, no. 106167, pp. 1–10, 2020, doi: https://doi.org/10.1016/j.knosys.2020.106167.

H. Cuesta and S. Kumar, Practical Data Analysis Second Edition. Birmingham: Packt Publishing Ltd, 2016.

B. Santosa and A. Umam, Data Mining dan Big Data Analytics. Bantul: Penebar Media Pustaka, 2018.

S. Ozdemir, Three Principles of Data Science. Birmingham: Packt Publishing Ltd, 2017.

Downloads

Published

2023-05-31

How to Cite

Suraya, S., Sholeh, M. ., & Andayati, D. . (2023). Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet. Jurnal Teknik Informatika C.I.T Medicom, 15(2), 73–83. https://doi.org/10.35335/cit.Vol15.2023.408.pp73-83