Explication des modèles des réseaux profonds en problèmes de classification, d'amélioration et d'interprétation des images et des signaux/données

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

104 personnes membres du GdR ISIS, et 65 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 300 personnes.

Annonce

Explication des modèles des réseaux profonds en problèmes de classification, d'amélioration et d'interprétation des images et des signaux/données.

Journée commune Thème B, Thème T, GDR IGRV

L'apprentissage profond, un des outils-phares de l'Intelligence Artificielle, a remporté un grand succès dans de nombreux domaines en traitement et analyse des images, des vidéos, de l'information multimodale. Cependant, l'aspect boîte noire des réseaux de neurones profonds est devenu l'un des principaux obstacles à leur large acceptation dans des applications critiques telles que le diagnostic médical et la thérapie, voire la conduite autonome. Au lieu de développer et d'utiliser les réseaux de neurones profonds comme des boîtes noires et d'adapter des architectures connues à une variété de problèmes, le but de l'apprentissage profond explicable est de proposer des méthodes pour "comprendre" et "expliquer" comment ces systèmes produisent leurs décisions. L'explication des décisions des réseaux profonds comporte deux aspects: l'analyse des décisions et la présentation des explications à l'utilisateur. Elle fait donc intervenir deux communautés scientifiques Intelligence artificielle/Image et Visualisation de l'information. L'objectif de cette journée conjointe des GDR-ISIS et GDR-IGRV est de rassembler les communautés des chercheurs qui travaillent sur la question de l'amélioration de l'explicabilité des algorithmes et systèmes d'IA dans le domaine image-signal.

Les principaux sujets que nous proposons de traiter sont les suivants mais peuvent être étendus :

explication des caractéristiques générées par des couches de convolution des réseaux profonds convolutionnels,
les mécanismes d'attention dans les réseaux neuronaux profonds et leur explication ;
pour les données temporelles, l'explication des caractéristiques et des moments les plus importants pour la prédiction et des intervalles de temps où la contribution de chaque donnée est importante ;
comment l'explication peut aider à rendre les architectures d'apprentissage profond plus parcimonieuses et plus légères ;
lors de l'utilisation de données multimodales, comment les prédictions dans les flux de données sont corrélées et s'expliquent entre elles ;
la génération automatique d'explications / justifications des décisions des algorithmes et des systèmes ;
visualisation des explications de manière interprétable pour les utilisateurs ;
évaluation des explications générées par l'apprentissage profond et d'autres systèmes d'IA.

Cette journée est organisée conjointement par le thème B Image et vision et le thème T du GDR-ISIS et le GT Visualisation du GDR IGRV. Le programme comporte 2 conférences invitées :

"Reasoning vs. bias exploitation: X-raying high-capacity deep networks", Chrstian Wolf (LIRIS/INSA).

"Une analyse théorique de la méthode LIME", Damien Garreau, Laboratoire J.A. Dieudonné UMR CNRS 7351, Université de Nice Côte d'Azur

et 6 présentations courtes.

Organisateurs :

GDR-ISIS

Nicolas Thome : nicolas.thome@cnam.fr
Jenny Benois-Pineau : jenny.benois-pineau@u-bordeaux.fr
Alexandre Benoit : alexandre.benoit@univ-smb.fr

GDR-IGRV

Romain Vuillemot : romain.vuillemot@ec-lyon.fr
Romain Bourqui : romain.bourqui@u-bordeaux.fr

Propositions des exposés sont à envoyer aux organisateurs de la journée. Capacité d'inscription y compris les exposants est de 45 personnes

Programme

10h00 - 12h30

Conférence invitée 1 :
Chrstian Wolf (LIRIS). Reasoning vs. bias exploitation: X-raying high-capacity deep networks

Conférence invitée 2 :
Damien Garreau (INRIA Sophia Antipolis/Univ. de Nice). A theoretical analysis of LIME

Exposés :

Tristan Gomez, Suiyi Ling, Thomas Fréour, Harold Mouchère (L2SN/Université de Nantes). Améliorer l'interprétabilité de l'attention : Un modèle d'attention à haute résolution précis et interprétable
L. Bouroux, J. Benois-Pineau, R. Bourqui, R. Giot. (LABRI/Université de Bordeaux), FEM - multicouches. Méthode d'explication des réseaux CNN dans des problèmes de classification des images

12h30-14h Pause déjeuner

14h00 - 16h30

Exposés :

Romain Xu-Darme, Georges Quenot, Zakaria Chihani, Marie-Christine Rousset (CEA-LIST/LIG). CASUAL - CASe-based reasoning using Unsupervised Attribute Learning
Olivier Petit, Nicolas Thome, Clément Rambour CEDRIC, CNAM, Loic Themyr (CEDRIC, Cnam / IRCAD), Toby Collins (IRCAD), Luc Soler (Visible Patient), U-Net Transformer: Self and Cross Attention for Medical Image Segmentation
Christophe Hurter (ENAC, Toulouse). Transparent Artificial Intelligence and Automation to Air Traffic Management Systems
Pierre Dardouillet (LISTIC, Annecy), Explicability of neural networks: application to the semantic segmentation of images for the detection of hydrocarbons on the sea surface

16h30 - 17h30 Discussion et cloture

Résumés des contributions

Title: Reasoning vs. bias exploitation: X-raying high-capacity deep networks

Christian Wolf (LIRIS)

High-capacity deep networks trained on large-scale data are increasingly used to learn agents capable of automatically taking complex decisions on high-dimensional data like text, images, videos. Certain applications require robustness - the capacity of robustly taking the right decisions at the right moments, with high-risks associated with wrong decisions. We require these agents to acquire the right kind of reasoning capabilities, i.e. that they take decisions for reasons the designers had in mind. This is made difficult by the diminishing role experts have in the design and engineering process process, as the agents' decisions are in large part dominated by the impact of training data.

In this talk we will address the problem of learning explainable and interpretable models, in particular deep networks. We start by exploring the question of explainability in a broader sense in terms of feasability and trade-offs.

We then focus on visual reasoning tasks and we target different situations involving the need of the agents to acquire a certain approximation of common knowledge, including robotics and vision-and-language reasoning. We explore this problem in a holistic way and study it from various angles: what are the tasks which lead to emergence of reasoning? How can we evaluate agents and measure reasoning vs. bias exploitation? How can we x-ray neural models and visualize their internal behavior? What are the bottlenecks in learning reasoning?

A theoretical analysis of LIME

Damien Garreau (INRIA Sophia Antipolis/Université de Nice)

Abstract: Since its release in 2016, LIME has emerged as one of the main model-agnostic explainability methods. It is implemented in software toolboxes used by industry practitioners and is also at the source of many extensions. But while LIME is used to explain complicated models, there are few guarantees that it makes sense even on the simplest ones. In this talk, I will present a first theoretical analysis of LIME, focusing on image data. We will see that, despite some satisfying properties, LIME has a few issues which can be problematic in practice.

The main reference for this talk is Damien Garreau and Dina Mardaoui, What does LIME really see in images? ICML, 2021 (available at https://arxiv.org/abs/2102.06307)

Exposés :

1. Tristan Gomez, Suiyi Ling, Thomas Fréour, Harold Mouchère (L2SN/Université de Nantes)
Titre: Améliorer l'interprétabilité de l'attention : Un modèle d'attention à haute résolution précis et interprétable

La prévalence de l'utilisation des mécanismes d'attention a suscité des inquiétudes quant à l'interprétabilité des cartes de saillance produites. Bien qu'elle donne des indications sur le fonctionnement d'un modèle, l'utilisation de l'attention pour expliquer les prédictions du modèle reste difficile. La communauté est toujours à la recherche de méthodes plus interprétables pour mieux identifier les régions actives locales qui contribuent le plus à la décision finale. Afin d'améliorer l'interprétabilité des modèles d'attention existants, nous proposons une nouvelle stratégie d'attention non paramétrique représentative bilinéaire (BR-NPA) qui capture les informations interprétables en fonction de la tâche. Le modèle cible est d'abord distillé pour obtenir des cartes de caractéristiques intermédiaires à plus haute résolution. Les caractéristiques représentatives sont ensuite regroupées en fonction de la similarité locale des caractéristiques par paires, afin de produire des cartes d'attention plus fines et plus précises mettant en évidence les parties de l'entrée pertinentes pour la tâche. Les cartes d'attention obtenues sont classées en fonction du "niveau actif" de la caractéristique composée, qui fournit des informations sur le niveau d'importance des régions mises en évidence.

Le modèle proposé peut être facilement adapté dans une grande variété de modèles profonds modernes, où la classification est impliquée. Il est également plus précis, plus rapide et moins gourmand en mémoire que les modules d'attention neuronaux habituels. Des expériences approfondies montrent que les explications visuelles sont plus complètes que celles des modèles de visualisation de pointe dans plusieurs tâches, notamment la classification fine d'mages, la réidentification de personnes et la classification fine d'images.

2. L. Bouroux, J. Benois-Pineau, R. Bourqui, R. Giot (LABRI/Université de Bordeaux),
Titre: FEM - multicouches. Méthode d'explication des réseaux CNN dans des problèmes de classification des images

Dans cet exposé nous présentons l'extension de la méthode d'explication des Réseaux CNNs « Feature Explanation Method ». La méthode explique la contribution des zones de l'image à classifier dans la décision du réseau entrainé. La contribution consiste à analyser les cartes des caractéristiques des différentes couches du réseau et fusionner explications de chaque couche dans un seul processus de retro-propagation. Une méthodologie d'évaluation est proposée qui consiste à comparer les cartes d'importance des pixels obtenues avec les cartes d'attention visuelle enregistrées lors des expériences psychovisuelles

« guidée par la tâche de reconnaissance" (Base de données MexCulture)
« observation libre » - Base Salicon.

La méthode est comparée avec d'autres méthodes de l"état de l'art d'explication des décision des réseaux profonds entrainées.

3. Romain Xu-Darme, Georges Quenot, Zakaria Chihani, Marie-Christine Rousset (CEA-LIST-LIG/INP).
Title: CASUAL - CASe-based reasoning using Unsupervised Attribute Learning

The purpose of CASUAL is to boost the performance and interpretability of image classifiers using case-based reasoning in the context of fine-grained recognition (classification task where all categories are very similar, e.g. distinguish different bird species or car models). Our approach is twofold:

We first perform an unsupervised learning of different semantic parts of the object. This method:
- reduces the cost of building training datasets for fine-grained recognition as it does require annotations. In the context of fine-grained recognition, we exploit the high similarities between training images and mine recurring patterns appearing in most images;
- can improve interpretability. After training, we attach a definition to each part detector by identifying the corresponding zones of interest on training images (part visualization and validation is performed using a modified version of integrated gradients).
We hope to improve the process of learning discriminative examples of each class (called "prototypes") by using our semantic part detectors to pre-select relevant parts of the training images.

4. Olivier Petit, Nicolas Thome, Clément Rambour CEDRIC, CNAM, Loic Themyr (CEDRIC, Cnam / IRCAD), Toby Collins (IRCAD), Luc Soler (Visible Patient)
Title: U-Net Transformer: Self and Cross Attention for Medical Image Segmentation structures.

In this talk, we introduce the U-Transformer network, which combines a U-shaped architecture for image segmentation with self- and cross-attention from Transformers. U-Transformer overcomes the inability of U-Nets to model long-range contextual interactions and spatial dependencies, which are arguably crucial for accurate segmentation in challenging contexts. To this end, attention mechanisms are incorporated at two main levels: a self-attention module leverages global interactions between encoder features, while cross-attention in the skip connections allows a fine spatial recovery in the U-Net decoder by filtering out non-semantic features. Experiments on two abdominal CT-image datasets show the large performance gain brought out by U-Transformer compared to U-Net and local Attention U-Nets. We also highlight the importance of using both self- and cross-attention, and the nice interpretability features brought out by U-Transformer.

5. Christophe Hurter (ENAC, Toulouse)
Title: Transparent Artificial Intelligence and Automation to Air Traffic Management Systems

Recently, artificial intelligence (AI) algorithms have shown increasable interest in various application domains including in Air Transportation Management (ATM). Different AI in particular Machine Learning (ML) algorithms are used to provide decision support in autonomous decision-making tasks in the ATM domain e.g. predicting air transportation traffic and optimising traffic flows. However, most of the time these automated systems are not accepted or trusted by the intended users as the decisions provided by AI are often opaque, non-intuitive and not understandable by human operators. Safety is the major pillar to air traffic management, and no black box process can be inserted in a decision-making process when human life is involved. In order to address this challenge related to transparency of the automated system in the ATM domain, ARTIMATION focuses on investigating AI methods in predicting air transportation traffic and optimizing traffic flows based on the domain of Explainable Artificial Intelligence (XAI). Here, AI models' explainability in terms of understanding a decision i.e., post hoc interpretability and understanding how the model works i.e., transparency can be provided in the air traffic management. In predicting air transportation traffic and optimizing traffic flows systems, ARTIMATION will provide a proof-of-concept of transparent AI models that includes visualization, explanation, generalisation with adaptability over time to ensure safe and reliable decision support.

6. Pierre Dardouillet (LISTIC, Annecy),
Title: Explicability of neural networks: application to the semantic segmentation of images for the detection of hydrocarbons on the sea surface.

Explicability methods for deep neural network models are rapidly developing in image analysis on problems such as classification, captioning and visual question answering. The explanations produced are then linked to a rather global question about the content. Explainability on more local decision making such as the semantic segmentation task presents a more limited state of the art. It is however necessary in some use cases to explain why an area in the image has been associated to a certain class and identifying the information that contributed to the decision.

In this context, we propose an explanation method based on the Kernel SHAP approach. This approach, close to LIME, is model agnostic. We then describe an adaptation to the semantic segmentation problem. It is analyzed in the application framework of ocean pollution monitoring. In particular, we are interested in the explanation of deep neural network models for the segmentation of oil slicks on the ocean surface from SAR type images. This approach allows us to compare the sensitivity of different segmentation models to the spatial neighborhood or even to the introduction of external data in a difficult context.

Identification