Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet

Suraya Suraya; Muhammad  Sholeh; Dina  Andayati

doi:10.35335/cit.Vol15.2023.408.pp73-83

Authors

Suraya Suraya Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia
Muhammad Sholeh Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia
Dina Andayati Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia

DOI:

https://doi.org/10.35335/cit.Vol15.2023.408.pp73-83

Keywords:

Clustering, datasheet, K-means, Davis Bouldin;, Distance matrix

Abstract

One of the data mining models is clustering, clustering models can be used to create groupings of data. Clustering is done by creating groups of data that are close to each other. The research was conducted by clustering wheat seed datasheets.Â The wheat grain datasheet contains various types of wheat data.Â The purpose of this research is to create a clustering model. The algorithm used is the K-means algorithm and a comparison is made with several distance Metric algorithms. The datasheet used was tested with the K-means algorithm and tested the clustering value (k) ranging from k = 2 to k = 6. Comparison of clustering results with K-means is also done by comparing with distance metric algorithms, namely Euclidean distance, Manhattan distance, and Chebychev distance.Â All testing processes are evaluated, and the evaluation is done to select many good groupings. The evaluation process is carried out using the Davis-Bouldin method. The results of the grouping that has been done, each seen Davis Bouldin evaluation. The evaluation value of Davis Bouldin is sought from the smallest value and if the evaluation result is negative, the value is solved. The research method used is Knowledge Discovery in Database (KDD). The results showed that the same datasheet and using the K-means algorithm and the same evaluation resulted in different evaluation values. The Euclidian, Manhattan, and Chebychev algorithms produce the best k value of 2, The conclusion of the wheat seed datasheet clustering research produces a value of k = 2

Downloads

Download data is not yet available.

References

M. Arhami and M. Nasir, Data Mining - Algoritma dan Implementasi. Yogyakarta: Penerbit Andi, 2020.

D. Cielen, A. D. B. Meysman, and M. Ali, Introducing Data Science. New York: Manning Publications, 2016.

X. Shu and Y. Ye, â€œKnowledge Discovery: Methods from data mining and machine learning,â€ Soc. Sci. Res., vol. 110, no. 102817, pp. 13â€“24, 2022, doi: https://doi.org/10.1016/j.ssresearch.2022.102817.

Ekka Pujo Ariesanto Akhmad, â€œData Mining Menggunakan Regresi Linear untuk Prediksi Harga Saham Perusahaan Pelayaran,â€ J. Apl. Pelayaran dan Kepelabuhanan, vol. 10, no. 2, pp. 120â€“131, 2020, doi: 10.30649/japk.v10i2.83.

S. M. A. A, O. M. E. E, S. M. A. A, and T. F. Sarnaghi, â€œDetermination of households benefits from subsidies by using data mining approaches,â€ J. Inf. Technol. Polit., vol. 19, no. 3, pp. 1â€“20, 2022, doi: https://doi.org/10.1080/19331681.2022.2097974.

Y. L. Jin Yang et al., â€œBrief introduction of medical database and data mining technology in big data era,â€ J. Evidenceâ€Based Med. Online Libr., vol. 13, no. 1, pp. 57â€“69, 2020, doi: https://doi.org/10.1111/jebm.12373.

D. K. Sharma, S. Lohana, S. Arora, A. Dixit, M. Tiwari, and T. Tiwari, â€œE-Commerce product comparison portal for classification of customer data based on data mining,â€ in Materials Today: Proceedings, 2022, vol. 51, no. 1, pp. 166â€“171, doi: https://doi.org/10.1016/j.matpr.2021.05.068.

K. Deepika and N. Sathyanarayana, â€œComparison Of Student Academic Performance On Different Educational Datasets Using Different Data Mining Techniques,â€ Int. J. Comput. Eng. Res., vol. 8, no. 9, pp. 28â€“38, 2018.

A. O. Oyedeji, A. M. Salami, O. Folorunsho, and O. R. Abolade, â€œAnalysis and Prediction of Student Academic Performance Using Machine Learning,â€ J. Inf. Technol. Comput. Eng., vol. 4, no. 1, pp. 10â€“15, 2020, doi: https://doi.org/10.25077/jitce.4.01.10-15.2020.

A. Bastian, H. Sujadi, and G. Febrianto, â€œPenerapan Algoritma K-Means Clustering Analysis Pada Penyakit Menular Manusia (Studi Kasus Kabupaten Majalengka),â€ J. Sist. Inf. (Journal Inf. Syst., vol. 14, no. 1, pp. 26â€“32, 2018, doi: https://doi.org/10.21609/jsi.v14i1.566.

A. Fotouhi and M. Montazeri-Gh, â€œTehran driving cycle development using the K-means clustering method,â€ Sci. Iran., vol. 20, no. 2, pp. 286â€“293, 2013, doi: https://doi.org/10.1016/j.scient.2013.04.001.

W.-J. Son and I.-S. Cho, â€œAnalysis of Trends in Mega-Sized Container Ships Using the K-Means Clustering Algorithm,â€ Appl. Sci., vol. 12, no. 4, pp. 10â€“17, 2022, doi: https://doi.org/10.3390/app12042115.

J. Vijay and J. Subhashin, â€œAn efficient brain tumor detection methodology using K-means clustering algoriftnn,â€ in 2013 International Conference on Communication and Signal Processing-IEEE Xplore, 2013, pp. 653â€“657, doi: doi: 10.1109/iccsp.2013.6577136.

T. Hardiani, â€œAnalisis Clustering Kasus Covid 19 Di Indonesia Menggunakan Algoritma K-Means,â€ Janapati, vol. 11, no. 2, pp. 156â€“165, 2022, doi: https://doi.org/10.23887/janapati.v11i2.45376.

A. Lia Hananto et al., â€œAnalysis of Drug Data Mining with Clustering Technique Using K-Means Algorithm,â€ J. Phys. Conf. Ser., vol. 1908, no. 1, 2021, doi: 10.1088/1742-6596/1908/1/012024.

K. Rahayu, L. Novianti, and M. Kusnandar, â€œImplementation Data Mining with K-Means Algorithm for Clustering Distribution Rabies Case Area in Palembang City,â€ J. Phys. Conf. Ser., vol. 1500, no. 1, 2020, doi: 10.1088/1742-6596/1500/1/012121.

A. Dogan and D. Birant, â€œMachine learning and data mining in manufacturing,â€ Expert Syst. Appl., vol. 166, no. 114060, p. 166, 2021, doi: https://doi.org/10.1016/j.eswa.2020.114060.

S. Kunnakorntammanop, N. Thepwuttisathaphon, and S. Thaicharoen, â€œAn Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers,â€ Springer-Soft Comput. Data Sci., vol. 1100, no. August, pp. 208â€“222, 2019, doi: DOI: 10.1007/978-981-15-0399-3_17.

Caro Fuchs, S. Spolaor, M. S. Nobile, and U. Kaymak, â€œpyFUME: a Python Package for Fuzzy Model Estimation,â€ IEEE Int. Conf. Fuzzy Syst., vol. 19911264, no. August, pp. 43â€“47, 2020, doi: 10.1109/FUZZ48607.2020.9177565.

B. Suharjo, â€œApplication of K-Means Cluster and Spatial Statistics using Python to Analyze the Indicators of Indonesia Information Technology,â€ Digit. Zo. J. Teknol. Inf. Komun., vol. 12, no. 1, pp. 11â€“18, 2021.

J.-J. Beunza et al., â€œComparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease),â€ J. Biomed. Inform., vol. 97, no. 103257, pp. 1â€“6, 2019, doi: https://doi.org/10.1016/j.jbi.2019.103257.

Rienna Oktarina and Junita, â€œDetermine the clustering of cities in Indonesia for disaster management using K-Means by excel and RapidMiner,â€ IOP Conf. Ser. Earth Environ. Sci., vol. 794, no. 012094, pp. 1â€“10, 2020, doi: DOI 10.1088/1755-1315/794/1/012094.

R. W. Sari, H. Dedy, I. dan Gunawan, and W. P. Agus, â€œAplikasi RapidMiner dalam Pengelompokkan Kasus Penyakit AIDS berdasarkan Provinsi dengan Data Mining K-means Clustering,â€ Reg. Dev. Ind. Heal. Sci. Technol. Art Life, pp. 59â€“69, 2018.

Y. Religia, â€œMetode Manhattan, Euclidean Dan Chebyshev Pada Algoritma K-Means Untuk Pengelompokan Status Desa,â€ Universitas Dian Nuswantoro Semarang, Semarang, 2016.

M. Nishom, â€œPerbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma K-Means Clustering berbasis Chi-Square,â€ J. Inform. J. Pengemb. IT, vol. 4, no. 1, pp. 20â€“24, 2019, doi: 10.30591/jpit.v4i1.1253.

G. Bonaccorso, Machine Learning Algorithm. Birmingham: Packt Publishing Ltd, 2017.

A. Chisholm, Exploring Data with RapidMiner. Birmingham: Packt Publishing Ltd, 2013.

E. Mohamed and T. Celik, â€œEarly detection of failures from vehicle equipment data using K-means clustering design,â€ Comput. Electr. Eng., vol. 103, no. 108351, pp. 1â€“10, 2022, doi: https://doi.org/10.1016/j.compeleceng.2022.108351.

J. Chen, X. Qi, L. Chen, F. Chen, and G. Cheng, â€œQuantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection,â€ Knowledge-Based Syst., vol. 203, no. 106167, pp. 1â€“10, 2020, doi: https://doi.org/10.1016/j.knosys.2020.106167.

H. Cuesta and S. Kumar, Practical Data Analysis Second Edition. Birmingham: Packt Publishing Ltd, 2016.

B. Santosa and A. Umam, Data Mining dan Big Data Analytics. Bantul: Penebar Media Pustaka, 2018.

S. Ozdemir, Three Principles of Data Science. Birmingham: Packt Publishing Ltd, 2017.

Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Halaman Sampul

International Indexing

National Accreditation

Quick Menu

Tools used

Information

Jurnal Teknik Informatika C.I.T Medicom

Policies and Regulations Link