Vision 3D et Apprentissage

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

78 personnes membres du GdR ISIS, et 55 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 150 personnes.

Annonce

La capacité d'accueil de cette réunion a été étendue à 150 personnes. Le lien de connexion sera transmis aux personnes inscrites la veille ou le matin de la réunion.

Dans le contexte actuel de pandémie Covid, la réunion aura lieu en visioconférence. Cependant, pour des raisons techniques liées au nombre de connexions simultanées, l'inscription à la réunion est gratuite mais obligatoire. Les identifiants de connexion seront communiqués par e-mail aux inscrits la veille ou le matin de la réunion.

Cette demi-journée « Vision 3D et Apprentissage » portera sur ce que peuvent apporter les méthodes d?apprentissage, et notamment de l?apprentissage profond, à la vision 3D, et réciproquement, sur comment introduire des aspects de géométrie 3D dans des techniques d'apprentissage.

Par exemple, nous chercherons durant cette journée à répondre aux questions suivantes : Pour quels aspects de la vision 3D les méthodes d?apprentissage sont-elles adaptées, et comment les appliquer ? Existe-t-il encore des applications pour lesquelles les méthodes purement géométriques restent plus adaptées et pourquoi ? Comment tenir compte de la dynamique de la scène ou de la déformation des objets dans les méthodes d'apprentissage profond ?

Cette journée est organisée conjointement entre les thèmes B et T du GdR ISIS.

Elle aura lieu le 26 mai de 14h à 17h et inclura deux conférenciers invités :

Mathieu Aubry, ENPC, http://imagine.enpc.fr/~aubrym/
Titre : " Deep deformations estimation for 3D scenes representation"
Torsten Sattler, Chalmers University of Technology, https://www.chalmers.se/en/Staff/Pages/torsat.aspx
Titre : " Careful Engineering makes a Difference for 3D Vision (and Why You Might Be Affected, Too)?

Nous lançons également un appel à contribution. Les personnes souhaitant présenter leurs travaux (15 minutes + 5 minutes de questions) sont invitées à envoyer, par e-mail, leur proposition (titre et résumé de ½ page) aux organisateurs de la journée avant le 20 mai 2020 :

Adrien Bartoli, Institut Pascal UMR CNRS 6602, adrien.bartoli@gmail.com
Cédric Demonceaux, VIBOT ERL CNRS 6000, ImViA, cedric.demonceaux@u-bourgogne.fr
Vincent Lepetit, Imagine-ENPC, vincent.lepetit@enpc.fr

Programme

14h - 14h45 : Mathieu Aubry (LIGM-IMAGINE, ENPC) : " Deep deformations estimation for 3D scenes representation"

14h45 - 15 h15 : Maximilian Jaritz (RITS-INRIA) : "xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation"

15h15 - 15h45 : Clara Fernandez (VIBOT-ImViA, Université Bourgogne Franche-Comté ; Robotics Lab, University of Zaragoza) : "Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets"

15h45 - 16h30 : Torsten Sattler (Chalmers University of Technology) : " Careful Engineering makes a Difference for 3D Vision (and Why You Might Be Affected, Too)"

16h30 - 17h00 : Makki Karim (LIS, Aix Marseille University) : "Towards a characterization of highly deformable 3D shape dynamics through a combination of Riemannian and Eulerian geometries"

Résumés des contributions

14h - 14h45 : Mathieu Aubry (LIGM-IMAGINE, ENPC) : " Deep deformations estimation for 3D scenes representation"

Abstract:

I will present an overview of two lines of work we pursued over the last couple of years on trying to represent the 3D structure of the world with neural networks. In a first part, I will discuss how neural networks can be used to match and reconstruct 3D surfaces by learning meaningful families of deformations. I will then show how we can relate this shape representation framework to object pose estimation. In a second part, I will present a few insights insights on deep object pose estimation, and argue that deep approaches, because of their simplicity and robustness, are well suited to applications in real scenarios and in particular robotics. I believe bringing together these two lines of work is a promising direction toward more effective scene representations.

References: AtlasNet for shape reconstruction, 3D-CODED for shape matching, AtlasNet V2 for part discovery and positioning. Pose estimation for arbitrary objects, synthetic training and advanced application for robotics.

14h45 - 15 h15 : Maximilian Jaritz (RITS-INRIA) : "xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation"

Abstract:

Unsupervised Domain Adaptation (UDA) is crucial to tackle the lack of annotations in a new domain. There are many multi-modal datasets, but most UDA approaches are uni-modal. In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation. This is challenging as the two input spaces are heterogeneous and can be impacted differently by domain shift. In xMUDA, modalities learn from each other through mutual mimicking, disentangled from the segmentation objective, to prevent the stronger modality from adopting false predictions from the weaker one. We evaluate on new UDA scenarios including day-to-night, country-to-country and dataset-to-dataset, leveraging recent autonomous driving datasets. xMUDA brings large improvements over uni-modal UDA on all tested scenarios, and is complementary to state-of-the-art UDA techniques. arxiv : https://arxiv.org/abs/1911.12676, (CVPR 2020)

15h15 - 15h45 : Clara Fernandez (VIBOT-ImViA, Université Bourgogne Franche-Comté ; Robotics Lab, University of Zaragoza) : "Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets »

Abstract:

Understanding the scene geometry, e.g., depth, shape, pose, motion... is essential for many vision tasks. Building learning architectures that leverage the geometric properties of the scene, can improve and simplify the learning process compared to the case of just relying on semantic representations. Geometry becomes especially useful for unsupervised learning, where no ground-truth labels are available. Unsupervised learning is particularly exciting because getting large amounts of labeled data is not always possible and it offers a far more scalable framework. In this regard, I would like to talk about our latest work in collaboration with the Computer Vision Laboratory at ETH Zurich (https://arxiv.org/pdf/2003.07619.pdf). In this work, we aim at learning 3D keypoints from a collection of objects of some category, so that they meaningfully represent objects shape and their correspondences can be simply established order-wise across all objects. We propose to solve this problem, in an unsupervised manner, by modeling keypoints with non-rigidity, based on symmetric linear basis shapes. We do not assume the plane of symmetry to be known and consider two different priors: instance-wise symmetry (rigid objects) and symmetric deformation space (non-rigid objects). Using multiple categories from four benchmark datasets, we demonstrate that the keypoints discovered by our method have one-to-one ordered correspondences and are geometrically and semantically consistent.

15h45 - 16h30 : Torsten Sattler (Chalmers University of Technology) : " Careful Engineering makes a Difference for 3D Vision (and Why You Might Be Affected, Too)"

Abstract:

3D reconstruction algorithms such as Structure-from-Motion and SLAM have the potential to provide powerful training signals for many computer vision problems, including semantic segmentation, stereo and single view depth estimation, and visual localization, by estimating the positions of cameras in a scene together the underlying 3D structure. Naturally, obtaining accurate ground truth through 3D reconstruction requires careful engineering. Since the ground truth is generated by an imperfect algorithm, using 3D reconstruction systems to generate ground truth also requires an understanding of potential biases "hidden" in these systems. This talk focuses on both problems, i.e., the importance of careful engineering and potential biases "hidden" in 3D reconstruction systems.

The talk consists of three parts. In the first part, based on a paper to be presented as an oral at CVPR 2020, we consider the problem of accurate camera calibration, which is an important pre-requisite for many 3D computer vision and robotics applications. We show that the parametric camera models predominantly used in practice are not able to fully accurately model many real-world cameras, resulting in biases. We show that less-known generic camera models can be used to avoid this bias.

In the second part, based on a paper presented as an oral at CVPR 2019, we consider the problem of highly accurate RGB-D SLAM. We talk about the engineering tricks, both in software and hardware, necessary for high accuracy.

In the last part, based on ongoing work, we consider the problem of camera re-localization. We show that with careful engineering, classical feature-based methods are able to achieve a similar performance as state-of-the-art learning-based methods. In addition, we show that there can be hidden biases in the way ground truth for visual localization is obtained, which impacts which methods perform best on a given dataset.

16h30 - 17h00 : Karim Makki (LIS, Aix Marseille University) : "Towards a characterization of highly deformable 3D shape dynamics through a combination of Riemannian and Eulerian geometries"

Abstract:

In this work, we propose a method for characterizing 3D shapes from point clouds and we show a direct application on a study of organ temporal deformations. As an example, we characterize the behavior of a bladder during a forced respiratory motion with a reduced number of 3D surface points: first, a set of equidistant points representing the vertices of quadrilateral mesh for the surface in the first time frame are tracked throughout a long dynamic MRI sequence using a Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework. Second, a novel geometric feature which is invariant to translation, scaling, and rotation is proposed for characterizing the temporal organ deformations by employing an Eulerian Partial Differential Equations (PDEs) methodology. We demonstrate the robustness of our feature on both synthetic 3D shapes and realistic dynamic MRI data portraying the bladder deformation during forced respiratory motions. Promising results are obtained, showing that the proposed feature may be useful for several computer vision applications such as medical imaging, aerodynamics and robotics.

Identification

Vision 3D et Apprentissage

Inscriptions

Annonce

Programme

Résumés des contributions

14h - 14h45 : Mathieu Aubry (LIGM-IMAGINE, ENPC) : " Deep deformations estimation for 3D scenes representation"

14h45 - 15 h15 : Maximilian Jaritz (RITS-INRIA) : "xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation"

15h15 - 15h45 : Clara Fernandez (VIBOT-ImViA, Université Bourgogne Franche-Comté ; Robotics Lab, University of Zaragoza) : "Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets »

15h45 - 16h30 : Torsten Sattler (Chalmers University of Technology) : " Careful Engineering makes a Difference for 3D Vision (and Why You Might Be Affected, Too)"

16h30 - 17h00 : Karim Makki (LIS, Aix Marseille University) : "Towards a characterization of highly deformable 3D shape dynamics through a combination of Riemannian and Eulerian geometries"