Heterogeneous Graph Neural Networks (HGNN) may be used to characterise scientific production of Academic Collaboration Networks, using potentially flawed and incomplete data as a starting point.
This work examines how Heterogeneous Graph Neural Networks (HGNN) may be used to characterise scientific production of Academic Collaboration Networks, using potentially flawed and incomplete data as a starting point. We cast our investigation on a dataset based on the publications of Machine Learning Genoa Center (MaLGa) faculty members. Leveraging our direct knowledge of MaLGa, we may accurately assess and interpret the obtained results and their quality. We start by collecting the list of papers published by MaLGa faculty since The publicly available dataset (ACN-MaLGa)[https://github.com/annalisabarla/ACN-MaLGa], sourced from an institutional public repository of scientific results, presents two key challenges: non-normalized author data and incomplete semantic attributes, including missing keywords and abstracts, due to its heterogeneity and potential sparsity. We tackle these issues by employing a preprocessing pipeline that uses authoring files and prior knowledge to normalise authors' names. We also complete the missing keyword using a keyword attribution strategy based on NLP and LLM, selected among several state-of-the-art methods. The preprocessing phase described above facilitates the construction of an information-rich heterogeneous graph, which is utilized to characterize the Academic Collaboration Network (ACN) in terms of node-type prediction. We compare several embedding techniques on heterogeneous graphs with or without authors' information to identify the most effective approach. Specifically, we employ a predictive model based on Heterogeneous Graph Neural Networks (HGNN) which achieves excellent performance in our evaluation.
Our results are encouraging towards the possibility of using artificial intelligence to scale up the comprehension of intricate scientific results, particularly in fields where organized scientific databases are scarce.
Andrea Vian, Annalisa Barla
D. Pretolesi, D. Garbarino, D. Giampaoli, A. Vian and A. Barla
"Geometric Deep Learning Strategies for the Characterization of Academic Collaboration Networks" in IEEE Transactions on Emerging Topics in Computing, vol. , no. 01, pp. 1-12, 5555.
D. Pretolesi, M. Cuneo, A. Barla and A. Vian
"Heterogeneous graph neural networks for Academic Collaboration Network characterisation from spurious data"
NETSCI 2023: INTERNATIONAL SCHOOL AND CONFERENCE ON NETWORK SCIENCE, Napoli, Italy
D. Pretolesi, A. Vian, M. Cuneo, G. Carella, F. Zurlo, A. Barla
"Graph-based ML methods foster innovation and understanding of interdisciplinary scientific research."
Complex Networks 2023, Menton, France
Daniele Pretolesi AIT Austrian Institute of Technology, Marina Cuneo Consultant
This work is preliminary to a series of deeper investigations on the topic of assistive software for the understanding and dissemination of science. We consider as a paradigmatic example of a complex scenario the case of biomedical research, specifically two neurodegenerative diseases:
(Alzheimer’s disease (AD))[https://observablehq.com/d/13b51125f0cfa295]
(Multiple Sclerosis (MS))
In both cases, the etiology and characterization of these diseases are still largely uncomprehended. We cast our investigation in the context of the Science of Science (SciSci) research, where the main aim is to leverage the ever-increasing digital information on scientific production to gain insight into the progress of science, the amount of scientific collaboration between researchers, and the degree of openness.
Andrea Vian, Annalisa Barla, Ilaria Stanzani
I. Stanzani, G. Ratto, A. Vian, A. Barla
Poster presentation at the Italian Regional Conference\r
on Complex Systems 2023, Napoli, Italy
Annalisa Barla, Andrea Vian
Daniele Pretolesi (AIT, Vienna)