Power, Control, and Data Processing Systems

Power, Control, and Data Processing Systems

Graph Topology and Machine Learning for Enhanced Link Prediction

Document Type : Original Research

Author
Lowa State University
Abstract
Social networking platforms have emerged as a focal point for academic and practical research, driven by their growing influence in modern society. Among various analytical tasks, link prediction has gained prominence as a critical challenge in social network analysis. This study examines three primary link prediction strategies: feature-based methods, Bayesian statistical models, and probabilistic relational models. Acknowledging the significant challenge of class imbalance in link prediction, we explore a combination of algorithmic techniques, advanced data preprocessing, and effective feature selection methods to improve predictive outcomes. Our research specifically focuses on coauthorship networks, leveraging topological attributes and enhanced data mining practices to extract meaningful patterns. Through extensive experimentation, we evaluate the performance of different approaches, emphasizing decision trees and Naive Bayes classifiers. These models consistently outperform alternatives in terms of predictive accuracy, particularly when assessed using F-measure and AUC metrics. Notably, our findings underscore the critical role of robust data preprocessing in achieving superior results, highlighting its potential to mitigate the impact of class imbalance. This study contributes valuable insights to the field of link prediction, offering practical guidance for developing more effective algorithms and addressing challenges in real-world social network applications.
Keywords

Subjects


[1] M. Al Hasan and M. J. Zaki, A survey of link prediction in social networks, in Social Network Data Analytics. Springer, 2011, pp. 243-275. doi: 10.1007/978-1-4419-8462-3-9.
[2] M. Hall et al., The WEKA data mining software: An update, SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009. doi: 10.1145/1656274.1656278.
[3] D. Liben-Nowell and J. Kleinberg, The link-prediction problem for social networks, J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019- 1031, 2007. doi: 10.1145/956863.956972.
[4] W. Cukierski, B. Hamner, and B. Yang, Graph-based features for supervised link prediction, in Neural Networks (IJCNN), The 2011 International Joint Conference on, 2011, pp. 1237-1244. doi: 10.1109/IJCNN.2011.6033365.
[5] S. Aouay, S. Jamoussi, and F. Gargouri, Feature-based link prediction, in 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), 2014, pp. 523-527. doi: 10.1109/AICCSA.2014.7073243 .
[6] M. Fire, L. Tenenboim-Chekina, R. Puzis, O. Lesser, L. Rokach, and Y. Elovici, Computationally efficient link prediction in a variety of social networks, ACM Trans. Intell. Syst. Technol., vol. 5, no. 1, p. 10, 2013. doi: 10.1145/2542182.2542192.
[7] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, Link prediction using supervised learning, Tech. Rep., 2006.
[8] S. Scellato, A. Noulas, and C. Mascolo, Exploiting place features in link prediction on location-based social networks, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, p. 1046. doi: 10.1145/2020408.2020575.
[9] H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu, Scalable proximity estimation and link prediction in online social networks, in Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, 2009, pp. 322- 335. doi: 10.1145/1644893.1644932.
[10] U. L. Backstrom and U. J. Leskovec, Supervised random walks: predicting and recommending links in social networks, in WSDM '11 Proceedings of the fourth ACM international conference on Web search and data mining, 2011, pp. 635-644. doi: 10.1145/1935826.1935914.
[11] M. E. Newman, Clustering and preferential attachment in growing networks, Phys. Rev. E. Stat. Nonlin. Soft Matter Phys., vol. 64, no. 2 Pt 2, p. 025102, 2001. doi: 10.1103/Phys- RevE.64.025102.
[12] F. Chung and W. Zhao, Pagerank and random walks on graphs, in Fete of Combinatorics and Computer Science. Springer, 2010, pp. 43-62. doi: 10.1007/978-3-642-13580-4-3.
[13] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 2011. doi: 10.1016/C2009-0-19715-5 .
[14] G. Jeh and J. Widom, Simrank: a measure of structural-context similarity, in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, pp. 538-543. doi: 10.1145/775047.77512.
[15] R. Longadge and S. Dongre, Class imbalance problem in data mining review, Tech. Rep., 2013.
[16] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., vol. 16, pp. 321-357, 2002. doi: 10.1613/jair.953 .
[17] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid-based approaches, Syst. Man, Cybern. Part C Appl. Rev. IEEE Trans., vol. 42, no. 4, pp. 463-484, 2012. doi: 10.1109/TSMCC.2011.2161285.
[18] L. Breiman, Bagging predictors, Mach. Learn., vol. 24, pp. 123-140, 1996. doi: 10.1007/BF00058655.
[19] R. E. Schapire, The strength of weak learnability, Mach. Learn., vol. 5, no. 2, pp. 197-227, 1990. doi: 10.1007/BF00116037.
[20] Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, in Springer, Berlin, Heidelberg, 1995, pp. 23-37. doi: 10.1006/jcss.1997.1504 .
[21] X. Wu et al., Top 10 algorithms in data mining, Knowl. Inf. Syst., vol. 14, no. 1, pp. 1-37, Jan. 2008. doi: 10.1007/s10115-007-0114-2.
[22] C. Rudin, I. Daubechies, and R. E. Schapire, The dynamics of adaboost: Cyclic behavior and convergence of margins, J. Mach. Learn. Res., vol. 5, pp. 1557-1595, 2004.[23] K. P. Murphy, Performance evaluation of binary classifiers, Tech. Rep., 2007.
Volume 2, Issue 3
Summer 2025
Pages 1-8

  • Receive Date 09 December 2024
  • Revise Date 06 June 2025
  • Accept Date 07 June 2025
  • First Publish Date 07 June 2025
  • Publish Date 01 September 2025