Vers un apprentissage pragmatique dans un contexte de données visuelles labellisées limitées

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

34 personnes membres du GdR ISIS, et 13 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 50 personnes.

Annonce

Résumé

Les réseaux de neurones convolutifs profonds ont offert de nombreux résultats impressionnants par exemple sur des tâches de classification d'images telles que la reconnaissance d'objets ou la classification de scènes. Cependant, pour qu'un réseau de neurones convolutifs profonds puisse avec succès apprendre à reconnaître des catégories visuelles, il est nécessaire de collecter et d'étiqueter manuellement des milliers d'exemples d'entraînement par catégorie cible et d'appliquer sur ceux-ci des algorithmes d'optimisation (en général itératifs) extrêmement coûteux en termes de ressources de calcul (centaines voire des milliers d'heures de GPU). De plus, dans le cas où une extension du modèle à d'autres catégories serait envisagée, il est nécessaire de collecter suffisamment de nouvelles données d'entraînement pour les nouvelles catégories et de recommencer la procédure d'apprentissage. Ces lourdes exigences ont poussé certains chercheurs à investiguer des pistes de recherche moins gourmandes en termes de données d'entrainement. L'idée étant de s'inspirer de la puissance du système visuel humain qui apprend de nouveaux concepts sans effort à partir de seulement quelques exemples et de les reconnaitre de manière fiable. Reproduire le même mode de fonctionnement dans les systèmes de vision artificielle par apprentissage est un des objectifs des recherches actuelles pour de nombreuses applications notamment celle de la vision du monde réel.

Cette journée a pour objectif de donner un état des lieux des avancées en vision artificielle par apprentissage automatique avec très peu de données labellisées sur les thèmes suivants : Détection, Reconnaissance, Classification, Segmentation.

Appel à contribution

Un appel à contribution est lancé sur les thèmes listés ci-dessus. Les chercheurs souhaitant présenter leurs contributions sont invités à soumettre aux organisateurs un résumé (1 page) avant le 30 octobre.

Orateurs invités

Dawood AL CHANTI, GIPSA-lab / Grenoble
Hervé LE BORGNE, CEA LIST, Saclay
Désiré SIDIBE, IBISC Université Val d'Essonne
Tuan-Hung VU, VALEO
Zihao WANG, INRIA, Nice

Organisateurs

Anissa MOKRAOUI (L2TI, Université Sorbonne Paris Nord), anissa.mokraoui@univ-paris13.fr,
Mustapha LEBBAH (LIPN, Université Sorbonne Paris Nord), mustapha.lebbah@univ-paris13.fr,
Hanane AZZAG (LIPN, Université Sorbonne Paris Nord), azzag@univ-paris13.fr

Programme

9h00 : Accueil et présentation de la journée
Par Anissa MOKRAOUI, Mustapha LEBBAH, Hanene AZZAG
9h20-10h00 : IFSS-Net: Interactive Few-Shot Siamese Network for Faster Muscle Segmentation and Propagation in Volumetric Ultrasound
Dawood AL CHANTI, GIPSA-lab / Grenoble
10h00-10h40 : Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation
Désiré SIDIBE, IBISC Université Evry Val d?Essonne
10h40-11h00 : Pause café -- Session poster
11h00-11h20 : Micro-Unet: a Light Segmentation Architecture for Small Microscopy Datasets of Cerebral Organoid Bright Field Images
Clara BREMOND MARTIN, PhD Student, Université de Cergy-Pontoise - ETIS Research Center
11h20-12h00 : Handling new target classes in semantic segmentation
Tuan-Hung VU, VALEO
12h00-12h20 : Improving Few-Shot Learning through Multi-task Representation Learning Theory
Quentin BOUNIOT, PhD Student, CEA LIST, Saclay
12h20-14h00 : Pause déjeuner -- Session poster
14h-14h40 : One shot Learning Landmarks Detection
Zihao WANG, INRIA, Nice
14h40-15h00 : Spatial Contrastive Learning for Few-Shot Classification
Yassine OUALI, PhD, Université Paris-Saclay, CentraleSupélec, MICS
15h00-15h20 : Pause café -- Session poster
15h20-16h00 : Zero-shot Learning with Deep Neural Networks for Object Recognition
Hervé LE BORGNE, CEA LIST, Saclay
16h00-16h20 : Prototypical Faster R-CNN for few-shot object detection on aerial images
Pierre LE JEUNE, PhD Student, COSE, Université Sorbonne Paris Nord, L2TI/LIPN
16h20-16h30 : Clôture de la journée

Résumés des contributions

IFSS-Net: Interactive Few-Shot Siamese Network for Faster Muscle Segmentation and Propagation in Volumetric Ultrasound
Dawood AL CHANTI, GIPSA-lab / Grenoble
Quantification of muscle volume is a useful biomarker for degenerative neuromuscular disease progression or sports performance. Measuring muscle volume often requires the segmentation of 3D images. While Magnetic Resonance (MR) is the modality of preference for imaging muscles, 3D Ultrasound (US) offers a real-time, inexpensive, and portable alternative. The motivation of our work is to assist the segmentation and volume computation of the low limb muscles from 3D freehand ultrasound volumes. We propose a novel deep learning segmentation and propagation method for 3D US data, which requires few-shot expert annotated slices per 3D volume, on average 48 annotations out of 1400 slices, and leverages unannotated sub-volumes using sequential pseudo-labelling. To produce a fast and accurate muscle segmentation, suitable for reliable volume computation, we design a minimal interactive setting. In practice, we design a Siamese network to capture a common feature representation between ultrasound and mask sub-volumes. The reference can either come from an annotated part of the volume or from prior predictions. To guarantee the model convergence with limited annotated data, we propose a decremental learning strategy. We validate our approach for the segmentation, label propagation, and volume computation of the low-limb muscles, namely: the Gastrocnemius Medialis (GM), the Gastrocnemius Lateralis (GL), and the Soleus (SOL). We consider a dataset of 44 subjects. We demonstrate our method?s capability to learn from a few annotations under a simulated weaklysupervised regime, keeping only 3,5% of the annotations.

Multiscale Attention-Based Prototypical Network for Few-Shot Semantic Segmentation
Désiré SIDIBE, IBISC Université Evry Val d'Essonne
Few-shot semantic segmentation refers to the pixel-level prediction of new categories on a test set, given only a few label examples. In this talk, we will highlight the challenges and specificities of this task as compared to other few-shot learning tasks such as image classification, and present the main approaches from the literature. We will also present a method that takes advantage of complementary depth information to extend RGB-centric approaches.

Micro-Unet: a Light Segmentation Architecture for Small Microscopy Datasets of Cerebral Organoid Bright Field Images
Clara Brémond Martin, PhD, Student, Université de Cergy-Pontoise -- ETIS Research Center
Going further towards time reduction and results improvement, we propose Micro-Unet, a light U-Net architecture based on Mini-Unet. We compare U-Net with these two light architectures for several data augmentation strategies of cerebral organoid bright-field images. In each case, we perform a leave-one-out-cross-validation on 80 images (40 original images and 40 generated).
While achieving similar segmentation accuracy to U-Net trained on data from an optimized augmentation strategy, the proposed Micro-UNet is more robust, regardless of the training dataset (GANs or a classic augmentation). Mini-UNet gives the least accurate segmentation results.
In this study, by comparing augmentation strategies, we show a very light version of U-Net is quite successful in segmenting and robust to various small training datasets. In the near future, we plan to test the model generated from other datasets of cerebral organoid bright-field images to further validate our approach.

Handling new target classes in semantic segmentation
Tuan-Hung VU, VALEO
Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this work, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. We present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings. ZS3Net addresses pixel classification tasks where both seen and unseen categories are faced at test time -- so called generalized zero-shot classification. We further define and address a novel domain adaptation (DA) problem in semantic scene segmentation, where the target domain not only exhibits a data distribution shift w.r.t. the source domain, but also includes novel classes that do not exist in the latter. Aiming at explicit test-time prediction for these new classes, we propose a framework, BudaNet, that leverages domain adaptation and zero-shot learning techniques to enable ?boundless? adaptation in the target domain.This is a joint work with Maxime Bucher, Matthieu Cord and Patrick Pérez.

Improving Few-Shot Learning through Multi-task Representation Learning Theory
Quentin BOUNIOT, PhD Student, CEA LIST, Saclay
We consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We start by reviewing recent advances in MTR theory and show that they can provide novel insights for popular meta-learning algorithms when analyzed within this framework. In particular, we highlight a fundamental difference between gradient-based and metric-based algorithms and put forward a theoretical analysis to explain it. Finally, we use the derived insights to improve the generalization capacity of meta-learning methods via a new spectral-based regularization term and confirm its efficiency through experimental studies on classic few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification

One shot Learning Landmarks Detection
Zihao WANG, INRIA, Nice
Landmarks detection in a medical image is a mainstay for many clinical algorithms applications. Learning-based landmarks detection is now a major successful methodology for many types of objects detection. However, learning-based approaches usually need a number of the annotated dataset for training the learning models. In this presentation, I will introduce an automatic one-shot learning-based landmarks detection approach is proposed for identifying the landmarks in 3D volume images. A convolutional neural network-based iterative objects localization method in combine with a registration framework is applied for automatically target organ localization and landmarks matching. The evaluation result shows that the proposed method is robust in convergence, effective in accuracy, and reliable for clinical usage.

Spatial Contrastive Learning for Few-Shot Classification
Yassine OUALI, PhD, Université Paris-Saclay, CentraleSupélec, MICS
In this work, we explore contrastive learning for few-shot classification, in which we propose to use it as an additional auxiliary training objective acting as a data-dependent regularizer to promote more general and transferable features. In particular, we present a novel attention-based spatial contrastive objective to learn locally discriminative and class-agnostic features. As a result, our approach overcomes some of the limitations of the cross-entropy loss, such as its excessive discrimination towards seen classes, which reduces the transferability of features to unseen classes. With extensive experiments, we show that the proposed method outperforms state-of-the-art approaches, confirming the importance of learning good and transferable embeddings for few-shot learning.

Zero-shot Learning with Deep Neural Networks for Object Recognition
Hervé LE BORGNE, CEA LIST, Saclay
Zero-shot learning (ZSL) deals with the ability to recognize objects without any visual training sample. To counterbalance this lack of visual data, each class to recognize is associated with a semantic prototype that reflects the essential features of the object. The general approach is to learn a mapping from visual data to semantic prototypes, then use it at inference to classify visual samples from the class prototypes only. Different settings of this general configuration can be considered depending on the use case of interest, in particular whether one only wants to classify objects that have not been employed to learn the mapping or whether one can use unlabelled visual examples to learn the mapping.
We present an overview of the ZSL domain, its principle, how it has evolved over the past 10 years towards a less biased and a more realistic evaluation, as well as the main approaches proposed in the field. Then, we present quite recent work addressing various aspect of the ZSL, namely the realism of the "generalized zero shot learning", different methods to build the class prototypes at large scale, and a method to benefit from unlabelled samples from the unseen classes.

Prototypical Faster R-CNN for few-shot object detection on aerial images
Pierre LE JEUNE, PhD, COSE, Université Sorbonne Paris Nord, L2TI
Few-shot object detection is a challenging task in computer vision, it requires to detect novel classes based only on few annotated examples. In this talk, we introduce a new method based on Faster R-CNN and representation learning: Prototypical Faster R-CNN. The main idea is to combine the adaptability of prototypical networks with the high performance of Faster R-CNN on detection tasks. Each region of interest is mapped into an embedding space and classification of these regions is done comparing the distance between their representation and class prototypes. This approach is limited by the quality of the prototypes and intra-class variance is hard to overcome this way. An attention mechanism between the query image and classes prototypes is a possible solution. It can filter out irrelevant information (e.g. background related information) contained in RoI or in class prototypes in order to improve the matching.

Session Poster

Few-shot image-to-image translation with manifold deformation
Fabio PIZZATI, PhD Student, INRIA, Vislab
Image-to-image translation (i2i) methods often require great amount of data in order to learn a mapping between source and target domain. In this work instead, we propose a novel framework for few-shot i2i able to learn a context aware representation of target domain using only few ? or even one - target images. Our extensive experiments show we outperform the state-of-the-art on all metrics and multiple tasks.

Entre l'adaptation de domaine et l'apprentissage à partir de peu de données : le nouveau défi du Support-Query Shift
Etienne BENNEQUIN, PhD Student, Université Paris-Saclay, Sicara
Dans les dernières années, une grande attention a été portée au problème de l'apprentissage à partir de peu de données labellisées (Few-Shot Learning), et certaines méthodes affichent maintenant de bons résultats dans la reconnaissance visuelle de nouveaux objets à partir d'une petite poignée d'exemples. Cependant, ces méthodes, de même que les benchmarks utilisés pour les comparer, supposent que les images dont on souhaite prédire la classe (requêtes) sont issues de la même distribution que les quelques images d'exemple (support).
Dans un grand nombre de cas d'application, cette hypothèse ne se vérifie pas. Nous formalisons ce problème de Few-Shot Learning under Support-Query Shift, proposons un banc de test pour évaluer les méthodes sur ce nouveau problème, et introduisons les Prototypes Transportés, la première méthode dédiée au problème de Support-Query Shift. Nous exploitons le Transport Optimal et les Prototypical Networks, deux approches respective d'adaptation de domaine et de few-shot learning. Nous montrons que les approches à l'état de l'art en few-shot learning classique subissent une énorme chute de performance dans un contexte de Support-Query Shift, et comblons une partie de cet écart grâce aux Prototypes Transportés.

Identification