Théorie du deep learning

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

79 personnes membres du GdR ISIS, et 102 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 200 personnes.

Annonce

Les réseaux de neurones profonds ont marqué l'entrée dans une nouvelle ère de l'intelligence artificielle, ponctuée par des succès opérationnels dans des domaines variés de la science des données comme la classification d'images, la reconnaissance vocale, ou le traitement de la langue naturelle.
En dépit de ces succès importants, les garanties théoriques associées à ces modèles décisionnels restent aujourd'hui toujours fragiles. L'objectif de cette journée est de faire un état des lieux sur la compréhension du fonctionnement des réseaux de neurones profonds, à travers un appel à contributions centré autour les thèmes (non exhaustifs) suivants :

Expressivité des modèles
Robustesse décisionnelle (incertitude, stabilité, attaques adversaires)
Optimisation et problèmes non convexes
Théorie de la généralisation
Lien entre modèles physiques et architectures de réseaux de neurones

Les outils utilisés pour aborder ces thématiques pourront venir de l'apprentissage statistique, mais des méthodes venant de disciplines connexes (décomposition tensorielles, analyse harmonique, méthodes géométriques / algébriques, physique statistique) sont fortement encouragées.

Orateurs inivtés :

Rémi Gribonval, LIP ENS Lyon,
Edouard Oyallon, LIP6 Paris, CNRS
Stefano Spigler, EPFL Lausanne

Appel à contributions :
Les personnes souhaitant présenter leurs travaux à cette journée sont invitées à envoyer, par e-mail, leur proposition (titre et résumé de 1 page maximum) aux organisateurs avant le 26 septembre 2019.

Organisateurs :

Caroline Chaux (caroline.chaux@univ-amu.fr), Université Aix-Marseille, I2M
Valentin Emiya (valentin.emiya@lis-lab.fr), Université Aix-Marseille, LIS
François Malgouyres (Francois.Malgouyres@math.univ-toulouse.fr), Institut de Mathématiques de Toulouse (IMT, CNRS UMR 5219)
Nicolas Thome (nicolas.thome@cnam.fr), Cnam Paris
Konstantin Usevich (konstantin.usevich@univ-lorraine.fr), CRAN, Nancy

Programme

09:00-09:10 Accueil
09:10-09:30 Adrien Chan-Hon-Tong (ONERA), "Adversarial poisoning against deep networks"
09:30-09:50 Nicolas Keriven (GIPSA-lab), "Universal Invariant and Equivariant Graph Neural Networks"
09:50- 10:10 Charles Corbière (CNAM), "Addressing Failure Prediction by Learning Model Confidence"
10:10-10:30 Pause
10:30-11:20 Présentation invitée : Rémi Gribonval (ENS Lyon), "Approximation with sparsely connected deep networks"
11:20-11:40 Achille Salaün (Nokia-FR/Paris Saclay), "Comparaison de l'expressivité des réseaux de neurones récurrents et des modèles de Markov cachés"
11:40-12:00 El Mehdi Achour (IMT, Toulouse), "Properties of second order critical points in deep linear networks"
12:00-12:20 Camille Castera (IRIT, Toulouse), "An Inertial Newton Algorithm for Deep Learning with Convergence Guarantees"
12:20-14:00 Pause déjeuner
14:00-14:50 Présentation invitée : Stefano Spigler (EPFL Lausanne), "Loss landscape and performance in deep learning"
14:50-15:10 Alexei Tsygvintsev (ENS Lyon), "On the overfly algorithm in deep learning of neural networks"
15:10-15:30 Abdellatif Zaidi (Université Paris-Est Marne la Vallée), "Variational Information Bottleneck, Connections and Recent Advances"
15:30-15:50 Michel Barlaud (I3S), "Learning structured sparse deep neural networks using a splitting projection-gradient method"
15:50-16:10 pause
16:10-17:00 Présentation invitée : Edouard Oyallon (CNRS/LIP6), "Two training sins: greedy and lazy?"
17:00-17:20 Hervé Le Borgne (CEA LIST), "Contrôle des facteurs de variation continus d'images générés par GAN et VAE"
17:20-17:40 Frédéric Barbaresco (Thales), "Structures géométriques de l'apprentissage machine pour les groupes de Lie"
17:40-18:00 Pierre Jacob (ETIS/ENSEA), "Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings projection-gradient method"

Résumés des contributions

Adversarial poisoning against deep networks, Adrien Chan-Hon-Tong (ONERA).
cedric.cnam.fr/~thomen/recherche/ISIS/DL-Theory/Chan-Hon-Tong.pdf

Addressing Failure Prediction by Learning Model Confidence, Charles Corbière (Cnam).
Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models.
We propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context.
Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.

arxiv.org/pdf/1910.04851.pdf

Universal Invariant and Equivariant Graph Neural Networks, Nicolas Keriven (GIPSA-lab).
cedric.cnam.fr/~thomen/recherche/ISIS/DL-Theory/Keriven.pdf

Présentation invitée : Approximation with sparsely connected deep networks, Rémi Gribonval (ENS Lyon).

Many of the data analysis and processing pipelines that have been carefully engineered by generations of practitioners and mathematicians correspond to functions that can be implemented as deep networks. Allowing the parameters of these networks to be automatically trained (or even randomized) allows to revisit certain classical constructions -such as fast linear transforms- and calls for a better understanding of the tradeoffs between complexity and quality.

Measuring a network's complexity by its number of connections or by its number of neurons, one can consider the class of functions for which the approximation error decays at a certain rate when increasing the network's complexity budget. Perhaps surprisingly, classical results from approximation theory imply that this function class is a linear space, called approximation space. We further establish that allowing the networks to have certain types of skip connections does not change the resulting approximation spaces: in other words, this has no impact on the expressivity of the networks. Finally, we discuss the role of the network's nonlinearity on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity (and its powers), the newly constructed spaces can be related to classical smoothness spaces from functional analysis. The established connections highlight the ability of sparse neural networks to approximate well some functions of very low smoothness, if the networks are sufficiently deep.

Comparaison de l'expressivité des réseaux de neurones récurrents et des modèles de Markov cachés, Achille Salaün (Nokia).

cedric.cnam.fr/~thomen/recherche/ISIS/DL-Theory/Salaun.pdf

Properties of second order critical points in deep linear networks, El Mehdi Achour (IMT Toulouse).

On s'intéresse dans ce travail aux point critiques de la fonction de coût d'un réseau de neurones linéaire à plusieurs couches lors d'un apprentissage supervisé (dans ce cas la fonction est celle des moindres carrés). L'objectif est de lier les points critiques du second ordre aux minimiseurs globaux. Pour le cas général d'une profondeur quelconque on donne une réponse partielle et pour le cas particulier de 2 couches cachées on fournit une caractérisation complète des points critiques du second ordre en fonction du rang des matrices de paramètres.

An Inertial Newton Algorithm for Deep Learning with Convergence Guarantees, Camille Castera (IRIT Toulouse).

cedric.cnam.fr/~thomen/recherche/ISIS/DL-Theory/Castera.pdf

Présentation invitée : Loss landscape and performance in deep learning, Stefano Spigler (EPFL Lausanne).

Deep learning is very powerful at a variety of tasks, including self-driving cars and playing Go beyond human level. Despite these engineering successes, why deep learning works remains a question with many facets.
I will discuss two of them: (i) Deep learning is a fitting procedure, achieved by defining a loss function which is high when data are poorly fitted. Learning corresponds to a descent in the loss landscape. Why isn't it stuck in bad local minima, as occurs when cooling glassy systems in physics? What is the geometry of the loss landscape? (ii) In recent years it has been realised that deep learning works best in the over-parametrised regime, where the number of fitting parameters is much larger than the number of data to be fitted, contrarily to intuition and to standard views in statistics. I will propose a resolution of these two problems, based on both an analogy with the energy landscape of repulsive particles and an analysis of asymptotically wide nets.

On the overfly algorithm in deep learning of neural networks, Alexei Tsygvintsev (ENS Lyon).
We investigate the supervised backpropagation training of multilayer neural networks from a dynamical systems point of view. We discuss some links with the qualitative theory of differential equations and introduce the overfly algorithm to tackle the local minima problem. Our approach is based on the existence of first integrals of the generalised gradient system with build-in dissipation.

Reference: A. Tsygvintsev, On the overfly algorithm in deep learning of neural networks, Applied Mathematics and Computation, 349 (2019) 348-358

Variational Information Bottleneck, Connections and Recent Advances, Abdellatif Zaidi (Université Paris-Est).
We connect the information flow in a neural network to sufficient statistics; and show how techniques that are rooted in information theory, such as the source-coding based information bottleneck method can lead to improved architectures, as well as a better understanding of the theoretical foundation of neural networks, viewed as a cascade compression network. We illustrate our results and view through some numerical examples.

arxiv.org/abs/1807.04193

Structures géométriques de l'apprentissage machine pour les groupes de Lie, Frédéric Barbaresco (Thalès).
cedric.cnam.fr/~thomen/recherche/ISIS/DL-Theory/Barbaresco.pdf

Présentation inivtée : Two training sins: greedy and lazy?, Edouard Oyallon (CNRS/LIP6).

The optimization of typical deep neural network is difficult to analyze because the objective loss is non convex and the objective of intermediary layers is not specified at training time. I will present you two ideas which challenge the common intuitions about deep training procedures. The first consists in the idea of lazy training [1], which corresponds to a setting in which a deep network behaves similarly as its linearization at initialization (work with Lénaïc Chizat and Francis Bach). Secondly, I will discuss greedy training [2], a procedure in which each intermediary layer objective is explicitly specified (work with Eugene Belilovsky and Michael Eickenberg). Applications to remove the locks of backward, update or forward [3] at training time will be discussed, through the scope of a decoupled learning procedure [4].
[1] arxiv.org/abs/1812.07956 [2] arxiv.org/abs/1812.11446 [3] arxiv.org/abs/1608.05343 [4] arxiv.org/abs/1901.08164

Contrôle des facteurs de variation continus d'images générés par GAN et VAE, Hervé Le Borgne (CEA LIST).
Les modèles génératifs profonds récents sont capables de produire des images photoréalistes ainsi que des prolongements (embeddings) visuel et textuel particulièrement puissants pour traiter diverses tâches de vision par ordinateur et de traitement du langage naturel. Leur utilité est néanmoins souvent limitée par le manque de contrôle sur le processus de génération ou par la mauvaise compréhension de la représentation apprise. Pour surmonter ces problèmes, des travaux récents ont montré l'intérêt d'étudier la sémantique de l'espace latent des modèles génératifs. Dans cette présentation, nous nous intéressons à l?interprétabilité des modèles génératifs profonds en introduisant une méthode permettant de trouver des directions utiles dans leur espace latent, le long desquelles il est possible de se déplacer pour contrôler avec précision les propriétés spécifiques de l'image générée. Notre méthode est faiblement supervisée et convient particulièrement à la recherche de directions codant des transformations simples de l'image générée, telles que la translation, le zoom ou les variations de couleur. Nous proposons aussi une fonction de coût permettant de produire des images comportant plus de détails (moins floues) ainsi qu'une méthode d'évaluation quantitative de la qualité des images générées sous contrôle de la position et de l'échelle de l'objet principal qu'elles contiennent. Nous fournissons des résultats pour des images de synthèse et des images naturelles, à la fois pour les GAN et les auto-encodeurs variationnels.

Learning structured sparse deep neural networks using a splitting projection-gradient method,
Michel Barlaud (I3S Nice).
In recent years, deep neural networks (DNN) have been applied to different domains and achieved dramatic performance improvements over state-of-the-art classical methods. These performances of DNNs were however often obtained with networks containing millions of parameters and which training required heavy computational power. However It is known that DNN models are largely over-parametrized and that in practice, relatively few network weights are actually necessary to learn accurately data features. Thus, in order to cope with this computational issue a huge literature deals with proximal regularization methods which are time consuming.
In this talk, we propose instead a constrained approach to learn these few network weights . We provide the general framework for our new splitting projection gradient method. Our splitting algorithm iterates a gradient step and a projection on the convex constraint. We design algorithms for three constraints: the classical $\ell_1$ constraint and structured constraints: the nuclear norm constraint and $\ell_2,1$ constraint.
We demonstrate the effectiveness of our method on two popular datasets (MNIST and CIFAR). Experiments on these datasets show that our splitting projection methods with nuclear or $\ell_2,1$ constraints provide the best structured sparsity resulting in high reduction of memory and computational power for both convolutional and linear layer.
Extending our method to other constraints is easy provided that efficient projection on the constraint are available.

Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings projection-gradient method,
Pierre Jacob (ETIS/ENSEA).
Learning an effective similarity measure between image representations is key to the success of recent advances in visual search tasks (e.g. verification or zero-shot learning). Although the metric learning part is well addressed, this metric is usually computed over the average of the extracted deep features. This representation is then trained to be discriminative. However, these deep features tend to be scattered across the feature space. Consequently, the representations are not robust to outliers, object occlusions, background variations, etc.
In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval).

Identification

Théorie du deep learning

Inscriptions

Annonce

Programme

Résumés des contributions