Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.
Inscriptions closes à cette réunion.
33 personnes membres du GdR ISIS, et 52 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 95 personnes.
Le transport optimal (TO) et la distance de Wasserstein qui lui est associée sont devenus dans ces dernières année des outils très utilisés en apprentissage statistique et en traitement du signal. Ils ont été utilisés avec succès en apprentissage supervisé, apprentissage non-supervisé et apprentissage de modèle génératifs (Wasserstein GAN) avec des applications en adaptation de domaine ou apprentissage de dictionnaire. Les distances basées TO ont en effet la capacité d'intégrer la géométrie intrinsèque des données et peuvent encoder cette information lors d'un calcul de similarité. Le développement récent de méthodes de régularisation entropique du plan de transport a amené de nouveaux algorithmes efficaces et la possibilité de s'intégrer dans les algorithmes d'apprentissage à base de programmation différentiable.
Cette journée, organisé en partenariat avec le GDR MIA propose de faire un état des lieux sur les travaux en cours sur ces problèmes fondamentaux et appelle à des contributions sur les thèmes (non exhaustifs) suivants :
Les personnes souhaitant présenter leurs travaux sont invitées à faire part de leur intention aux organisateurs avant le 14 Juin 2019, en envoyant par mail aux organisateurs un titre, un résumé et la liste des auteurs avec le sujet GDR ISIS OTML aux adresses suivantes : remi.flamary@unice.fr, ncourty@irisa.fr
Gabriel Peyré
TBA
Mokhtar Z. Alaya, Maxime Bérar, Gilles Gasso, Alain Rakotomamonjy
We introduce in this paper a novel strategy for efficiently approximate the Sinkhorn distance between two discrete measures. After identifying neglectible components of the dual solution of the regularized Sinkhorn problem, we propose to screen those components by directly setting them at that value before entering the Sinkhorn problem. This allows us to solve a smaller Sinkhorn problem while ensuring approximation with provable guarantees. More formally, the approach is based on a reformulation of dual of Sinkhorn divergence problem and on the KKT optimality conditions of this problem, which enable identification of dual components to be screened. This new analysis leads to the SCREENKHORN algorithm. We illustrate the efficiency of SCREENKHORN on complex tasks such as dimensionality reduction or domain adaptation involving regularized optimal transport.
Arthur Leclaire, Julien Rabin (Ensicaen)
The optimal transport (OT) framework has been largely used in inverse imaging and computer vision problems, as an interesting way to incorporate statistical constraints or priors. In recent years, OT has also been used in machine learning, mostly as a metric to compare probability distributions. This work addresses the semi-discrete OT problem where a continuous source distribution is matched to a discrete target distribution. We introduce a fast stochastic algorithm to approximate such a semi-discrete OT problem using a hierarchical multi-layer transport plan. This method allows for tractable computation in high-dimensional case and for large point-clouds, both during training and synthesis time. Experiments demonstrate its numerical advantage over multi-scale (or multi-level) methods. Applications to exemplar-based texture synthesis based on patch matching with two layers, also show stunning improvements over previous single layer approaches.
Vianney Perchet, Marc Quincampoix
Studying continuous time counterpart of some discrete time dynamics is now a standard and fruitful technique, as some properties hold in both setups. In game theory, this is usually done by considering differential games on Euclidean spaces. This allows to infer properties on the convergence of values of a repeated game, to deal with the various concepts of approachability, etc. In this paper, we introduce a specific but quite abstract differential game defined on the Wasserstein space of probability distributions and we prove the existence of its value. Going back to the discrete time dynamics, we derive results on weak approachability with partial monitoring: we prove that any set satisfying a suitable compatibility condition is either weakly approachable or weakly excludable. We also obtain that the value for differential games with nonanticipative strategies is the same that those defined with a new concept of strategies very suitable to make links with repeated games.
Nicolas Papadakis
TBA
Loqman Salamatian, Dali Kaafar, Kavé Salamatian
The monitoring of large dynamic networks is a major challenge for a wide range of applications. The complexity stems from properties of the underlying graphs, in which slight local changes can lead to sizable variations of global properties, e.g., under certain conditions, a single link cut that may be overlooked during monitoring can result in splitting the graph into two disconnected components. Moreover, it is often difficult to determine whether a change will propagate globally or remain local. Traditional graph theory measures, e.g. graph centrality or assortativity features, are very often not fit to characterize global evolving properties of a dynamic graph. In this presentation we tackle the problem of real-time monitoring of dynamic large-scale graphs by developing a geometric approach that leverages notions of geometric curvature and recent development in graph embedding using Ollivier-Ricci curvature and optimal transport. We illustrate the use of our method by considering the practical case of monitoring dynamic variations of the global Internet using topology changes information provided by combining several BGP feeds. In particular, we use our method to detect major events and changes via the geometry of the embedding of the graph, where a major event is defined as an incident that either impacts a substantial number of prefixes or can be observed by numerous route monitors within a given period of time. We first adapt the Ricci curvature to characterize the AS level graph and develop an anomaly tracking mechanism to detect large variations of curvature that translate into major variations in the topology of the graph. These changes are then considered as major BGP-level events. We then describe a mechanism for identifying the network elements responsible for the set of coordinated changes and isolate their implications in the geometric space. We evaluate this system in operational settings and show its performance in real-life scenarios. In the course of this analysis we will show that a version of Gauss-Bonnet Theorem is valid over empirical graphs.
Ievgen Redko
TBA
Hermina Petric Maretic, Mireille EL Gheche, Giovanni Chierchia, Pascal Frossard
Nous étudions le problème de la comparaison de graphes ou de réseaux, qui est très actuel en sciences des données. Nous utilisons des méthodes de transport optimal pour définir une mesure explicite de la distance entre deux graphes et pour construire une méthode d'optimisation pour l'estimation de l'alignement de graphes.
Plus précisément, nous exploitons la distribution Gaussienne des signaux lisses sur les graphes définie en fonction de la topologie des graphes. Cela nous permet de dériver une expression explicite de la distance de Wasserstein entre les distributions des signaux en fonction des matrices Laplaciennes, que nous considérons comme représentative de la distance entre graphes. En particulier, cette mesure est capable de prendre en compte la structure globale des graphes, et elle s'avère particulièrement utile dans le cadre de l'estimation de la permutation entre deux graphes. Pour résoudre le problème d'optimisation non-convexe associé, nous proposons un algorithme stochastique basé sur l'exploration bayésienne.
Nous démontrons la performance de l'approche proposée sur trois différentes applications : l'alignement des graphes, la classification des graphes et la prédiction des signaux sur graphes. Nous montrons l'efficacité de la méthode par rapport aux approches existantes.
Hicham Janati, Marco Cuturi, Alexandre Gramfort
We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space. Such problems often employ sparse priors, which promote models using a small subset of regressors. To increase statistical power, the so-called multi-task techniques were proposed, which consist in the simultaneous estimation of several related models. Combined with sparsity assumptions, it lead to models enforcing the active regressors to be shared across models, thanks to, for instance L1/Lq norms. We argue in this paper that these techniques fail to leverage the spatial information associated to regressors. Indeed, while sparse priors enforce that only a small subset of variables is used, the assumption that these regressors overlap across all tasks is overly simplistic given the spatial variability observed in real data. In this paper, we propose a convex regularizer for multi-task regression that encodes a more flexible geometry. Our regularizer is based on unbalanced optimal transport (OT) theory, and can take into account a prior geometric knowledge on the regressor variables, without necessarily requiring overlapping supports. We derive an efficient algorithm based on a regularized formulation of OT, which iterates through applications of Sinkhorn?s algorithm along with coordinate descent iterations. The performance of our model is demonstrated on regular grids with both synthetic and real datasets as well as complex triangulated geometries of the cortex with an application in neuroimaging.
Antoine Liutkus
By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions.
Titouan Vayer, Rémi Flamary, Romain Tavenard, Laetitia Chapel, Nicolas Courty
Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions that do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the Sliced Wasserstein (SW) distance exploits the direct solution of W on the line, that only requires sorting discrete samples in 1D. This paper propose a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being O(n2) to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute
Benjamin Tardy, Jordi Inglada, Julien Michel
La production automatique de cartes d'occupation des sols est gouvernée par la disponibilité des données de référence (classification supervisée). Ces données sont extrêmement coûteuses à obtenir (en temps et ressources humaines) et ont une durée de validité limitée à la période d'acquisition en raison des divers changements d'occupation des sols (cultures agricoles, déforestation,...). Malgré ces coûts, les données de référence sont acquises annuellement, mais retardent la production de la carte. L'idée est alors d'exploiter les données passées pour court-circuiter la dépendance aux données de référence correspondantes, ce qui nous place dans un cas typique d'adaptation de domaine. Le Transport Optimal a été appliqué à l'adaptation de domaine pour la classification d'images de télédétection avec succès. L'idée est alors d'exploiter le Transport Optimal afin de produire la carte d'occupation des sols d'une année pour laquelle nous n'avons pas les données de référence. Pour cela nous avons exploité un jeu de données composé de 7 séries temporelle annuelle d'images Formosat-2 ( 24*24km, 8m de résolution, 108 bandes spectrales et dates) et les données de référence correspondantes (1 million de pixels en moyenne). Pour chaque paire d'années, l'une représente le domaine Source (donnée de référence et images) et l'autre le domaine Cible pour lequel les données de référence sont utilisées pour la validation exclusivement. Nous utilisons une nomenclature à 17 classes, représentative de la zone d'étude majoritairement agricole.
Une première étude a été menée, cherchant à mesurer la pertinence des algorithmes de Transport proposés dans la librairie POT (https: // github . com / rflamary / POT) à savoir l'EMD et la version régularisée Sinkhorn. Lors de cette manipulation, les échantillons sont totalement maîtrisés (ils proviennent des jeux d'apprentissage des deux domaines), supprimant ainsi les biais liés à la sélection d'échantillons. Cette étude à mis en évidence un problème trop complexe pour le Sinkhorn avec une OA (overall accuracy) de 30 % en moyenne. L'EMD offre de meilleures performances avec une OA moyenne de 65 %. L'utilisation des étiquettes des deux domaines lors de l'estimation du transport offre un gain conséquent de 35 % pour le Sinkhorn et 20 % pour l'EMD qui se rapproche des performances de la classification supervisée évaluée à 89 % en moyenne.
La seconde étude réalisée, cherche à reproduire ces performances sans exploiter les données de référence du domaine Cible. Nous avons alors été confrontés à un réel problème amenant à proposer plusieurs méthodes de sélection d'échantillons afin de produire un Transport pertinent. La meilleure approche a été de diviser le problème entre classes annuelles (cultures) et pérennes (forêts, bâti,...) et de réaliser un Transport par classe annuelle. Les échantillons des classes pérennes sont directement extraits du domaine Cible. Pour guider la sélection des échantillons, une première classification « naïve » est réalisée pour filtrer les classes. Une classification naïve est définie par l'apprentissage d'un classifieur sur les données du domaine source et appliqué au domaine cible sans autre forme d'adaptation. La classification naïve est évaluée à 65 % d'OA en moyenne. L'utilisation du Transport offre un gain mineur d'environ 4 % avec une OA à 69 % en moyenne. Les résultats restent encourageants lors de l'évaluation du gain en précision pour les classes annuelles.
Yaël Frégier, Jean-Baptiste Gouray
We propose an approach for transfer learning with GAN architectures. In general, transfer learning enables deep networks for classification tasks to be trained with limited computing and data resources. However a similar approach is missing in the specific context of generative tasks. This is partly due to the fact that the extremal layers of the two networks of a GAN, which should be learned during transfer, are on two opposite sides. This requires back-propagating information through both networks, which is computationally expensive. We develop a method to directly train these extremal layers against each other, by-passing all the intermediate layers. We also prove rigorously, for Wasserstein GANs, a theorem ensuring the convergence of the learning of the transferred GAN. Finally, we compare our method to state-of-the-art methods and show that our method converges much faster and requires less data.
Emmanuel de Bézenac, Ibrahim Ayed, Patrick Gallinari
Domain Translation is the problem of finding a meaningful correspondence between two domains. Since in a majority of settings paired supervision is not available, much work focuses on Unsupervised Domain Translation (UDT) where data samples from each domain are unpaired. Following the seminal work of CycleGAN for UDT, many variants and extensions of this model have been proposed. However, there is still little theoretical understanding behind their success. We observe that these methods yield solutions which are approximately minimal w.r.t. a given transportation cost, leading us to reformulate the problem in the Optimal Transport (OT) framework. This viewpoint gives us a new perspective on Unsupervised Domain Translation and allows us to prove the existence and uniqueness of the retrieved mapping, given a large family of transport costs. We then propose a novel framework to efficiently compute optimal mappings in a dynamical setting. We show that it generalizes previous methods and enables a more explicit control over the computed optimal mapping. It also provides smooth interpolations between the two domains. Experiments on toy and real world datasets illustrate the behavior of our method.
Bharath Bhushan Damodaran, Kilian Fatras, Sylvain Lobry, Rémi Flamary, Devis Tuia, Nicolas Courty
Noisy labels often occur in vision datasets, especially when they are issued from crowdsourcing or Web scraping. In this paper, we propose a new regularization method which enables one to learn robust classifiers in presence of noisy data. To achieve this goal, we augment the virtual adversarial loss with a Wasserstein distance. This distance allows us to take into account specific relations between classes by leveraging on the geometric properties of this optimal transport distance. Notably, we encode the class similarities in the ground cost that is used to compute the Wasserstein distance. As a consequence, we can promote smoothness between classes that are very dissimilar, while keeping the classification decision function sufficiently complex for similar classes. While designing this ground cost can be left as a problem-specific modeling task, we show in this paper that using the semantic relations between classes names already leads to good results.Our proposed Wasserstein Adversarial Training (WAT) outperforms state of the art on four datasets corrupted with noisy labels: three classical benchmarks and one real case in remote sensing image semantic segmentation.
Date : 2019-07-09
Lieu : Salle de conférence, Délégation CNRS, 27 Rue P. Bert, Ivry-sur-Seine https://goo.gl/maps/TBSfB1Yh72E2
Thèmes scientifiques :
A - Méthodes et modèles en traitement de signal
T - Apprentissage pour l'analyse du signal et des images
Inscriptions closes à cette réunion.
(c) GdR IASIS - CNRS - 2024.