Apprentissage de graphes

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

25 personnes membres du GdR ISIS, et 46 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 90 personnes.

Annonce

Les structures de graphes ont été exploitées avec succès dans de nombreux domaines tels que le traitement du signal (GSP), le transport optimal, et l'apprentissage profond (GNN). Dans ces exemples, la plupart des travaux traitent d'un graphe connu, c'est-à-dire qu'ils développent des algorithmes qui tiennent compte d'une connaissance a priori de la structure des relations entre les entités. En amont, les problématiques d'apprentissage de graphes visent à révéler (ou raffiner) une topologie de graphe directement à partir des données, afin de pouvoir appliquer ensuite les techniques évoquées. Cette problématique a motivé de nombreux travaux, tant dans le domaine non supervisé (apprentissage d'un graphe pour refléter la structure des données), que supervisé (apprentissage d'un graphe pour optimiser une tâche telle que la classification).

Cette journée a pour but de proposer une plateforme d'échange autour de cette thématique qui regroupe plusieurs communautés, avec différentes approches, outils, et applications. Elle s'organisera autour de plusieurs exposés sur des format tutoriel, ainsi que des formats plus courts.

Les propositions de communications concernant l'apprentissage de graphe, ainsi que l'utilisation de graphe en traitement du signal et apprentissage statistique sont les bienvenues. Ces propositions de communications (titre et résumé de quelques lignes) sont à envoyer aux organisateurs d'ici le 17 mai :

Arnaud Breloy, arnaud.breloy@lecnam.net
Benjamin Girault, benjamin.girault@inria.fr

Date : jeudi 6 juin 2024

Lieu : CNAM Paris, Amphithéâtre Fabry Pérot

http://cedric.cnam.fr/~courtiep/planCnam/plan_Cnam_3e_arrondissement.html

Programme

--- AM ---

9h00 : Rémi Flamary, CMAP, Ecole Polytechnique
Learning on graphs with Gromov-Wasserstein : From unsupervised learning to GNN
Abstract : In recent years the Optimal Transport (OT) based Gromov-Wasserstein (GW) divergence has been investigated as a similarity measure between structured data expressed as distributions typically lying in different metric spaces, such as graphs with arbitrary sizes. In this talk, we will address the optimization problem inherent in the computation of GW and some of its recent extensions, such as Entropic, Fused and semi-relaxed GW divergences. Next we will illustrate how these OT problems can be used to model graph data in learning scenarios such as graph compression, clustering, classification and structured prediction. Finally we will present a recent application of GW distance as a novel pooling layer in Graph Neural Networks.

10h00 : Johannes Lutzeyer, LIX, Ecole Polytechnique
Graph Learning for and with Graph Neural Networks
Abstract : In this talk, we will discuss three different perspectives of -Graph Learning- that arise in the context of Graph Neural Networks (GNNs). We will consider GNN approaches, including some of our own, in which the graph structure is altered (or learned) 1) previous to training a GNN and 2) during the training of a GNN. A third perspective that we offer on graph learning is a discussion of one of the many important topics that arise when performing learning on graphs. In particular, we will discuss the robustness of GNNs to adversarial attacks and review recent work in which we propose more robust GNN variants.

11h00 : Yixuan He (Sheyrl), Department of Statistics, University of Oxford
MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian
Abstract : Signed and directed networks are ubiquitous in real-world applications. However, there has been relatively little work proposing spectral graph neural networks (GNNs) for such networks. Here we introduce a signed directed Laplacian matrix, which we call the magnetic signed Laplacian, as a natural generalization of both the signed Laplacian on signed graphs and the magnetic Laplacian on directed graphs. We then use this matrix to construct a novel efficient spectral GNN architecture and conduct extensive experiments on both node clustering and link prediction tasks. In these experiments, we consider tasks related to signed information, tasks related to directional information, and tasks related to both signed and directional information. We demonstrate that our proposed spectral GNN is effective for incorporating both signed and directional information, and attains leading performance on a wide range of data sets. Additionally, we provide a novel synthetic network model, which we refer to as the Signed Directed Stochastic Block Model, and a number of novel real-world data sets based on lead-lag relationships in financial time series.

11h30 : Titouan Vayer, LIP, ENS Lyon
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection
Abstract : Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). Typically, this process involves aligning two graphs depicting the relationship between samples in the input high-dimensional space and their corresponding positions in the output low-dimensional space. In this talk we will present a new perspective on these approaches that is based on optimal transport and the Gromov-Wasserstein distance. Precisely, we will propose a new general framework, called distributional reduction, that recovers dimension reduction and clustering as special cases and allows us to address them jointly with a single optimization problem. We then empirically showcase the relevance of our approach on both image and genomics datasets.

12h00 : Break

--- PM ---

13h30 : Junjie Yang, S2A, LTCI, Télécom Paris, IP Pari
Exploiting Edge Features in Graph-based Learning with Fused Network Gromov-Wasserstein Distance
Abstract : Pairwise comparison of graphs is key to many applications in Machine Learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them, namely the Gromov-Wasserstein distance and its variant the fused Gromov-Wasserstein that applies on node attributed graphs. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of the fused Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We present a range of studies that illustrate the properties of the proposed distance and empirically demonstrate its effectiveness in supervised graph prediction tasks.

13h50 : Kévin CORTIAL, Institut Pascal - OpenStudio, Université Clermont Auvergne
Clustering du Product Space par apprentissage de graphe pour la diversification de la production industrielle
Abstract : La diversification de la production industrielle émerge comme une stratégie importante pour répondre aux divers défis sociétaux. Le Product Space, un graphe représentant la proximité des savoir-faire industriels entre les produits manufacturés, constitue un outil précieux pour recommander des diversifications aux industriels. Ces recommandations s'appuient sur les arêtes du graphe pour identifier les produits recommandables. Elles peuvent être améliorées en regroupant des produits similaires, ce qui entraîne des suggestions de diversifications plus précises et pertinentes. Contrairement à la topologie, les données des noeuds (textuelles dans le Product Space) sont généralement peu utilisées dans les méthodes de clustering de graphe. Dans ce contexte, nous proposons un apprentissage de ce graphe économique qui intègre l'apprentissage des données des noeuds en plus de la topologie du réseau. En appliquant cette méthode au jeu de données du Product Space, nous montrons comment les recommandations de diversification ont été améliorées en présentant des applications et évaluations réelles sur le terrain. Notre recherche, utilisant un Graph Neural Network (GNN), démontre des performances supérieures par rapport aux méthodes traditionnelles telles que Louvain et I-Louvain. L'utilisation de méthodes d'apprentissage de graphe constitue une avancée dans la littérature macroéconomique et répond au besoin de diversifier la production industrielle lors de pénuries notamment. Nous discutons à la fois des avantages et des limitations des modèles d'apprentissage profond sur les graphes en économie.

14h10: Thu Ha Phi, LEME, Université Paris Nanterre
Leveraging standardization in conditionnal correlation graph learning
Abstract : Standardization of the variables is a staple scaling tool that can be beneficial in most statistical learning tasks. In the context of graph learning, we propose to take further advantage of this pre-processing, as it yields an additional implicit normalization structure for the covariance matrix. For Gaussian graphical models, we propose a new algorithm that aims at estimating a sparse precision matrix while its inverse is constrained to be a (possibly low-rank structured) correlation matrix. The corresponding optimization problem is addressed by using the framework of Riemannian optimization. Simulations over various graph structures illustrate the interest of the proposed approach.

14h30 : Juliana DU, LPENSL, ENS Lyon
Graph-regularized Covid-19 reproduction number estimators assessment
Abstract : Accurate estimation of the reproduction number -- number of secondary infections generated by one infected person -- is a major challenge for the surveillance and control of epidemics. Our goal is to compare different estimations of the spatio-temporal Covid19 reproduction number. In particular, we want to assess the impact of a graph regularized strategy taking into account geographic links (e.g. sharing borders territories) encoded in a graph. This comparison is impaired due to the lack of knowledge about the true evolution of the pandemic. Multivariate synthetic reproduction numbers are thus constructed using an original graph-based Tikhonov-regularization strategy. These spatially correlated ground truth further enables the synthesis of realistic spatio-temporal infection counts. They are then used to compare several state-of-the-art estimators through quantitative performance assessments, showing the superiority of the graph-regularized strategy compared to the other estimators.

14h50 : Ronny Tonato Zambrano, CEA-leti
Unraveling Biological Networks: Graph-Based Microscope Image Analysis
Abstract : Understanding the phenomenon of cell network formation has become an important area of study in biology because its overall analysis can conduct to the study of diseases and the potential development of treatments. In this talk I will present a workflow to analyze and characterize cell behaviors in cellular networks formed by pancreatic epithelial cells.

15h10 short break

15h30 : Antonin JOLY, INRIA Rennes
Graph Coarsening with Message-Passing Guarantees
Abstract : In data science, graph coarsening aims to reduce the size of a large graph, while preserving some of its key properties. It has been used in many applications, to reduce computational load and memory footprint. For instance, in machine learning, training Graph Neural Networks (GNNs) on coarsened graphs leads to drastic savings in time and memory. However, GNNs rely on the message-passing paradigm, and classical spectral guarantees for graph coarsening do not directly lead to guarantees when performing naive message-passing on the coarsened graph. In this work, we propose a new message-passing operation specific to coarsened graphs, which exhibit theoretical guarantees on the preservation of the propagated signal. Interestingly, and in a sharp departure from previous proposals, this operation on coarsened graphs is oriented, even when the original graph is undirected. We conduct node classification tasks on synthetic and real data and observe improved results compared to performing naive message-passing on the coarsened graph.

15h50: Benjamin Girault, Laboratoire Hubert Curien, Saint-Étienne

Learning a Graph and Importance of its Vertices
Abstract : Graph learning is the topic of discovering the structure between entities of a dataset, with the goal of using this structure to understand the relation between the entities, or process additional data. One of key approach is to identify the Laplacian of the graph with a precision matrix of the data. This involves inverting the covariance matrix of data, of which only an estimate can be generally computed. Instead, using a multivariate Gaussian assumption, the classical approach perform maximum likelihood estimation of the precision matrix, and regularize the obtained cost function using a l1 sparse penality: this is the graphical Lasso method. Unfortunately, including constraints to obtain a valid graph is not satisfactory, with the literature showing that the l1 sparse penality decreases sparsity of the graph instead of increasing it. In this work, we alter the initial assumption identifying the Laplacian of the graph with the precision matrix. Instead, we identify the precision matrix to the Laplacian matrix plus a diagonal matrix of positive vertex importances. We show three key benefits of such an approach: (i) this can be identified to learning a generalized Laplacian, (ii) experimentally, the graph learnt are much sparser, without the need for a regularization, and (iii) we keep a spectral interpretation of the statistics of the signals, albeit with a different power spectrum.

16h10 : Paul-Antoine Beaudoin, TIMC, Université Grenoble Alpes
Utilisation de l'IA pour contribuer à l'évaluation du service médical rendu au sein des entrepôts de sonnées de santé
Abstract : Depuis 2019, le CHU Grenoble Alpes dispose d'un entrepôt de données de santé (EDS) autorisé par la commission de l'informatique et des libertés. Cet entrepôt regroupe des données de vraie vie, hétérogènes et multi-sources, harmonisées et modélisées selon un modèle de données graphe. Cette modélisation se montre particulièrement bien adaptée aux premiers outils de traitement actuellement déployés sur l'entrepôt en réponse aux différents cas d'usage rencontrés. Néanmoins, à ce jour, force est de constater les difficultés à promouvoir des approches permettant l'exploitation de toute la richesse intrinsèque des données et informations de vraie vie présentes dans l'entrepôt. Ces dernières sont en effet complexes du fait de nombreux facteurs tels que leur caractère hétérogène, la multitude des référentiels médicaux considérés, les biais d'usage associé aux pratiques, leur organisation temporelle ou encore des redondances ou des données non présentes. C'est dans ce contexte qu'il est nécessaire de développer de nouvelles approches de traitement de données de santé, plus intégratives et permettant de répondre au défi précédemment identifié. Les algorithmes d'intelligence artificielle basés sur des graphes permettent aujourd'hui de bénéficier de la représentation particulière de données complexes et connectées pour leur analyse. L'implémentation de ces méthodes au sein d'un EDS pourrait permettre une approche plus intégrative des données de soin des patients en prenant notamment en considération la temporalité dans le processus de soin ainsi que des données multimodales comme les textes de compte-rendu médicaux ou encore l'imagerie médicale. La structuration des données graphe et leur exploitation algorithmique pourraient ainsi servir à des tâches aidant le clinicien, comme la prédiction de diagnostic, la prédiction de risques cliniques ou encore le risque de ré hospitalisation ou l'admission en unité de soins intensifs. La communication proposée présentera d'abord ce qu'est un EDS et les spécificités de ces données avant de détailler l'implémentation d'un modèle de données graphe pour modéliser les données de vraie vie hospitalières. Enfin, grâce à une revue de la littérature du domaine, il s'agira de détailler des perspectives prometteuses d'analyse de ces graphes au moyen d'algorithmes d'intelligence artificielle.

16h30 : Fatia Lekbour, ENSEA
Alignement d'entités : des modèles de translations aux GNNs
Abstract : La multiplication des graphes de connaissances (KG) dans divers domaines et le besoin d'intégration des connaissances par différents KG à amené la tâche d'alignement d'entité, dont l'objectif est de détecter les noeuds d'une paire de KG qui potentiellement décrivent la même entité du monde réel. Actuellement on retrouve dans la littérature deux grandes familles de méthodes d'alignement d'entités. Tout d'abord, les modèles de translation, une classe d'approches largement utilisée dans l'alignement d'entités, visent à capturer les relations sémantiques entre les entités à travers des opérations de translation dans l'espace vectoriel des embeddings. Cependant, ces méthodes ne permettent pas de prendre en compte la totalité de l'information structurelle car elles considèrent chaque triplet de manière indépendant. D'un autre côté, les réseaux de neurones sur graphes (GNNs)utilisent la totalité de l'information structurelle de chaque entité, mais les informations de relations entre les entités sont plus difficilement exploitables. Dans ce travail préliminaire, nous cherchons à associer les GNN aux modèles de translation. En nous basant sur les travaux de [1], nous proposons d'intégrer les informations de relation sémantique sous la forme d'un encodage de position apprenable, afin de pouvoir les utiliser avec un GNN qui utilise l'information structurelle. L'analyse comparative des résultats obtenus avec différentes méthodes d'encodage de positions montre des résultats encourageants. Finalement dans de futurs travaux, nous envisageons d'explorer différents modèles de translation. Des méthodes de masking sont également envisagées. [1] Li, G., Sun, Z., Hu, W., Cheng, G., & Qu, Y. (2023). Position-Aware Relational Transformer for Knowledge Graph Embedding. IEEE Transactions on Neural Networks and Learning Systems.

16h50 Paul Krzakala,
Telecom Paris Prédiction supervisée de graphes via un modèle de deep learning et une loss de transport optimal.
Abstract : L'objectif de ce travail est de concevoir un modèle de Deep Learning capable d'apprendre à prédire des graphes. Contrairement aux GNN, la sortie du modèle est un graphe tandis que l'entrée peut prendre plusieurs formes (images, texte etc.). L'apprentissage se fait de manière supervisée à travers d'une nouvelle loss basée sur le transport optimal.

17h10 Yannis Karmim, CEDRIC, CNAM
Supra-Laplacian Encoding for Transformer on Dynamic Graph (SLATE)
Abstract : Fully connected Graph Transformers (GT) have rapidly become prominent in the static graph community as an alternative to Message-Passing models, which suffer from a lack of expressivity, oversquashing, and under-reaching. However, in a dynamic context, by interconnecting all nodes at multiple snapshots with self-attention, traditional GT loose both structural and temporal information. We introduce Supra-Laplacian Encoding for Transformer on Dynamic Graph (SLATE), a new spatio-temporal encoding to leverage the GT architecture while keeping spatio-temporal information. Specifically, we transform Discrete Time Dynamic Graphs into multi-layer graphs and take advantage of the spectral properties of their associated supra-Laplacian matrix. Our second contribution explicitly model nodes' pairwise relationships with a cross-attention mechanism, providing an accurate edge representation for dynamic link prediction. SLATE outperforms numerous state-of-the-art methods based on Message-Passing Graph Neural Networks combined with recurrent models (e.g, LSTM), and Dynamic Graph Transformers, on~9 datasets. Code and instructions to reproduce our results will be open-sourced.

17h30 end

Identification

Apprentissage de graphes

Inscriptions

Annonce

Programme

Résumés des contributions