Foundational Study on Integrating Machine Learning with Distributed Computing for Scalable Intelligent Systems
DOI:
https://doi.org/10.35335/cit.Vol17.2025.1391.pp171-182Keywords:
Distributed Machine Learning, Scalable Intelligent Systems, Distributed Computing Architecture, Parallel Processing, High-Performance Computing (HPC)Abstract
The rapid growth of data-intensive applications and increasingly complex machine learning (ML) models has created an urgent need for computational architectures capable of supporting large-scale intelligent systems. This research presents a foundational study on integrating machine learning with distributed computing to achieve scalable, high-performance AI workflows. The study develops a conceptual integration model comprising four core layers data, compute, communication, and model designed to address scalability, fault tolerance, and resource optimization. Using experimental benchmarking and architectural analysis, the research evaluates multiple distributed frameworks, data partitioning strategies, and ML models to measure improvements in training speed, throughput, latency, and resource utilization across cluster-based and cloud environments. Results demonstrate significant performance gains compared to single-node execution, particularly for deep learning workloads, while also identifying critical bottlenecks such as communication overhead, synchronization delays, heterogeneous hardware constraints, and data imbalance. The findings highlight key trade-offs between accuracy and computational speed, as well as cost and system performance, underscoring the importance of strategic design decisions in large-scale ML deployments. This study contributes theoretical and practical insights into distributed ML integration and offers a framework that can guide the development of next-generation intelligent systems capable of operating across massively distributed environments.
Downloads
References
R. Arghandeh and Y. Zhou, Big data application in power systems. Elsevier, 2017.
R. Munk, “Grid of Clouds.” School of The Faculty of Science, University of Copenhagen, 2021.
S. Hu, X. Chen, W. Ni, E. Hossain, and X. Wang, “Distributed machine learning for wireless communication networks: Techniques, architectures, and applications,” IEEE Commun. Surv. Tutorials, vol. 23, no. 3, pp. 1458–1493, 2021.
Q.-V. Pham, K. Dev, P. K. R. Maddikunta, T. R. Gadekallu, and T. Huynh-The, “Fusion of federated learning and industrial Internet of Things: A survey,” arXiv Prepr. arXiv2101.00798, 2021.
R. Mayer and H.-A. Jacobsen, “Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools,” ACM Comput. Surv., vol. 53, no. 1, pp. 1–37, 2020.
S. Li, Y. Qin, Z. Jiang, and W. Yang, “Efficient communication scheduling for parameter synchronization of dml in data center networks,” IEEE Trans. Netw. Sci. Eng., vol. 9, no. 4, pp. 1970–1985, 2021.
O.-P. Infrastructure, “A Paradigm Shift towards On-Premise Modern Data Center Infrastructure for Agility and Scalability in Resource Provisioning,” Int. J., vol. 9, no. 4, 2020.
T. Le Duc, R. G. Leiva, P. Casari, and P.-O. Östberg, “Machine learning methods for reliable resource provisioning in edge-cloud computing: A survey,” ACM Comput. Surv., vol. 52, no. 5, pp. 1–39, 2019.
X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, “Convergence of edge computing and deep learning: A comprehensive survey,” IEEE Commun. Surv. tutorials, vol. 22, no. 2, pp. 869–904, 2020.
G. Verma, Y. Gupta, A. M. Malik, and B. Chapman, “Performance evaluation of deep learning compilers for edge inference,” in 2021 IEEE international parallel and distributed processing symposium workshops (IPDPSW), IEEE, 2021, pp. 858–865.
M. A. Shahid, N. Islam, M. M. Alam, M. M. Su’ud, and S. Musa, “A comprehensive study of load balancing approaches in the cloud computing environment and a novel fault tolerance approach,” IEEE access, vol. 8, pp. 130500–130526, 2020.
J. Schumacher, C. Plessl, and W. Vandelli, “High-throughput and low-latency network communication with NetIO,” in Journal of Physics: Conference Series, IOP Publishing, 2017, p. 82003.
Y. Wang, Q. Wang, and X. Chu, “Energy-efficient inference service of transformer-based deep learning models on GPUS,” in 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), IEEE, 2020, pp. 323–331.
K. Nagaraj, C. Killian, and J. Neville, “Structured comparative analysis of systems logs to diagnose performance problems,” in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), 2012, pp. 353–366.
S. Jain, “Synergizing Advanced Cloud Architectures with Artificial Intelligence: A Paradigm for Scalable Intelligence and Next-Generation Applications,” Tech. Int. J. Eng. Res., vol. 7, pp. a1–a12, 2020.
M. P. Singh and A. K. Chopra, “The internet of things and multiagent systems: Decentralized intelligence in distributed computing,” in 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), IEEE, 2017, pp. 1738–1747.
P. Shantharama, A. S. Thyagaturu, and M. Reisslein, “Hardware-accelerated platforms and infrastructures for network functions: A survey of enabling technologies and research studies,” IEEE Access, vol. 8, pp. 132021–132085, 2020.
J. Lee, J. Ni, J. Singh, B. Jiang, M. Azamfar, and J. Feng, “Intelligent maintenance systems and predictive manufacturing,” J. Manuf. Sci. Eng., vol. 142, no. 11, p. 110805, 2020.
J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,” Acm Comput. Surv., vol. 53, no. 2, pp. 1–33, 2020.
Y. Bengio, “Practical recommendations for gradient-based training of deep architectures,” in Neural networks: Tricks of the trade: Second edition, Springer, 2012, pp. 437–478.
T. Yu and M. Pradel, “Pinpointing and repairing performance bottlenecks in concurrent programs,” Empir. Softw. Eng., vol. 23, no. 5, pp. 3034–3071, 2018.
S. D. Pasham, “Fault-Tolerant Distributed Computing for Real-Time Applications in Critical Systems,” Comput., pp. 1–29, 2020.
R. Lim, “Methods for accelerating machine learning in high performance computing,” Univ. Oregon–Area-2019-01, 2019.
C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless networking: A survey,” IEEE Commun. Surv. tutorials, vol. 21, no. 3, pp. 2224–2287, 2019.
H. Liu, F. Eldarrat, H. Alqahtani, A. Reznik, X. De Foy, and Y. Zhang, “Mobile edge cloud system: Architectures, challenges, and approaches,” IEEE Syst. J., vol. 12, no. 3, pp. 2495–2508, 2017.
S. V. Bhaskaran, “Integrating data quality services (dqs) in big data ecosystems: Challenges, best practices, and opportunities for decision-making,” J. Appl. Big Data Anal. Decis. Predict. Model. Syst., vol. 4, no. 11, pp. 1–12, 2020.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Galih Prakoso Rizky A, Rohani Situmorang

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

