Annonce

25 avril 2024

Leveraging multimodal unlabeled data for visual scene understanding

Catégorie : Doctorant

PhD thesis on multimodal visual data analysis for scene understanding at IBISC lab, Univ Evry - Paris Saclay

Description

Multimodal learning, that is learning from different modalities, has proven to be an effective way of leveraging multiple information for better decision making [1]. In particular, in the context of computer vision, multimodal visual data provide complementary information about the same scene, thus enhancing the accuracy and robustness of complex scenes analysis. Several approaches have been proposed in literature which mainly focus on the best fusion strategy of the different modalities [2].

However, most, if not all, machine learning methods used in practice are supervised approaches that rely on annotated training data. While these supervised techniques have shown excellent results, it has also been shown in literature that they often perform poorly under domains and distributions shifts [3, 4, 5]. From a theoretical point of view, the key assumption of classic supervised methods is the i.i.d. assumption, postulating that the training and test data are from the same distribution. When this assumption is violated, we face an out-of-distribution (O.O.D) generalization problem, which has to be addressed for better performances of learning methods. In real-world scenarios, the environment on which the vision system is deployed may drastically diverge from the training environment, leading to a domain and distribution shift. Moreover, the test distribution is typically unknown during training, and the system may encounter new unseen categories at test time, for example when operating in unconstrained or unseen environments.

Moreover, collecting and labeling large amount of data is a difficult and costly task. Thus fully supervised methods cannot be used in many applications. Recently, self- supervised methods, and in particular contrastive learning approaches have been successfully developed in different application areas, including visual scene analysis and understanding [6,7].

On the other hand, in training a deep learning model, all training examples are randomly presented to the model, ignoring the various complexities of data samples and the current learning status of the model. Whereas, it has been show that carefully selecting the order in which to present training data for learning improves the generalization capacity and convergence rate of the model [8]. This learning strategy is known as curriculum learning and consist in training a model from easier to harder examples. The basic idea is to train a machine learning model with easier data subsets (or easier subtasks) and gradually increase the difficulty level of the data (or subtasks) until the whole training dataset is used [9].

Thus, the goal of this project is threefold : i) avoid the cost and burden of dense annotations by employing self-supervised learning approaches, ii) benefit from multimodal information to tacle difficult recognition cases, and iii) explore the use of curriculum learning for better training.

In particular, we aim to leverage multimodal information to answer the following questions:

- how to use data from different modalities for self-supervised learning ?

- how to measure difficulty? That is how to decide the relative “easiness” of a training example?

- how to select examples? That is how to decide the order in which data are used in the training process, because a data sample considered “easy” in one modality can be “difficult” in another and vice-versa ?

Therefore, this research topic opens news questions never addressed before in the literature and can provides new ideas and methods for different tasks such as segmentation, detection or recognition.

Applications

Applicants with astrong background in machine learning, computer vision, and related topics are invited to applied.

A master degree in Computer Sciences, Mathematics or related fields is required.

Please send your resume, a motivation letter and transcripts of your Bachelor and/or Master programs to :

-Prof. Désiré Sidibé, drodesire.sidibe@univ-evry.fr

-Associate Prof. Dominique Fourer, dominique.fourer@univ-evry.fr

Application deadline : May 10th, 2024.

Retour

Identification

Annonce

Leveraging multimodal unlabeled data for visual scene understanding

Dans cette rubrique