Generative models: Control and (mis)Usage

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

68 personnes membres du GdR ISIS, et 24 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 100 personnes.

Annonce

Since the advent of generative adversarial networks (Goodfellow et al., 2014) and variational autoencoders (Kingma and Welling, 2014), many neural architectures have been proposed to improve data-driven image synthesis. Once the most recent models are learned from a dataset, sampling randomly in their latent space usually produces images with a striking realism. Beyond the increasing quality of the resulting images, the models also proposed more disentangled representation, that allows a deeper interpretation of their internal structure. In particular, some works showed that controlling latent codes along a certain path in their latent space can result in variations of the semantic attributes in the corresponding generated images.

Such an ability to semantically edit images is useful for various real-world tasks, including artistic visualization, design, photo enhancement, inpainting and retouching, film post-production and targeted data augmentation. These works also have a theoretical interest to better understand the internal structure of the models as well as the learning procedure.

This meeting aims to bring together researchers and students to discuss recent theoretical and practical advances in generative models. Topics include, but are not limited to:

neural architectures for image generation and edition
GAN inversion
disentangled representation
conditional image synthesis
style transfer
cross-modal content generation
image edition or generation with non RGB images (infrared, X-ray, ultrasound...)
image edition with few data, unsupervised and self-supervised approaches
evaluation of image edition and generation
application of image editing and generation with generative models

Two speakers accepted to give a talk:

Symeon (Akis) Papadopoulos will present the activity of MeVer (https://mever.iti.gr/), a research team that develop technologies and services for the detection of media-based disinformation, in particular fake videos
Guillaume Le Moing, the first author of Semantic Palette (https://arxiv.org/abs/2106.01629) and Context-aware Controllable Video Synthesis (https://16lemoing.github.io/ccvs/)

If you are interested in presenting a work to this meeting, send an email to the organizer with (1) the title of the presentation (2) the list of authors (3) an abstract of the work presented before May 12th.

Organizers:

Access

The meeting will be both online and face to face at 7, rue Guy Môquet 94800 Villejuif (CNRS), depending on the health status (Salle de conférence, Bâtiment L, Au -1)

Métro : ligne 7, station Villejuif Paul Vaillant-Couturier

Bus : 131, 162, 172, 180

The person registred at the meeting (only) are allowed to access the Villejuif campus. The visio is available to a larger audience.

For the visio: you need to register at https://app.livestorm.co/list-diasi/generative-models-control-usage
You receive the link (fast) by email. Warning: there are two different sessions (morning and afternoon).

Programme

9h45: intro to the meeting

10h - 10h45: [invited conference] Guillaume Le Moing: Controllable image and video synthesis, two use cases

11h - 12h30: presentations

Thibaut Issenhuth: Learning disconnected manifolds with generative adversarial networks.
Perla Doubinsky: Multi-Attribute Balanced Sampling for Disentangled GAN Controls
Pedro Coutinho: Disentangling Multiple Specified Latents using Variational Autoencoder and Total Correlation Loss

12h30 - 13h45: free time for lunch

13h45 - 16h: presentations

Javiera Castillo-Navarro: Energy-based models in Earth observation: from generation to semi-supervised learning
Fabio Pizzati: Few-shot image-to-image translation with manifold deformation
Jelica Vasiljevic: CycleGAN for virtual stain transfer: is seeing really believing?
Jean Prost: Diverse super-resolution with deep hierarchical variational autoencoders
Farideh Bazangani: FDG-PET to T1 weighted MRI translation with3D Elicit Generative Adversarial Network(E-GAN)

16h - 16h15: pause

16h15 - 17h: [invited conference] Symeon (Akis) Papadopoulos: DeepFakes: Technology, Methods, Trends and Challenges

Résumés des contributions

[invited] Guillaume Le Moing

Title: Controllable image and video synthesis, two use cases.

Abstract: The recent progress of generative models at synthesizing photo-realistic images and videos have opened new perspectives for the computer vision community and beyond. However without control over the synthesis process, their practical use remains limited. The talk will detail two approaches for controllable synthesis. In the first one, we will see how one can generate images while guiding the scene composition by accommodating semantic class proportions, with various applications, ranging from real image editing to data augmentation. In the second one, we will focus on an autoregressive method affording simple mechanisms for handling multimodal ancillary information for controlling the synthesis of a video (e.g., a few sample frames, an audio track, a trajectory in image space) and taking into account the intrinsically uncertain nature of the future by allowing multiple predictions.

Thibaut Issenhuth - Doctorant CIFRE à Criteo AI Lab / Ecole des Ponts ParisTech, Imagine Lab.

Title: Learning disconnected manifolds with generative adversarial networks.

Abstract: Generative Adversarial Networks (GANs) recently achieved impressive results in unconditional image synthesis (e.g. face synthesis), but are still struggling on large-scale multi-class datasets. In this presentation, we will formalize a fundamental limitation of GANs when the target distribution is lying on disconnected manifolds. We establish a "no free lunch" theorem for the disconnected manifold learning, stating an upper bound on the precision of the generated distribution. Then, we will present two methods to improve the precision of a pre-trained generator: a heuristic method rejecting generated samples with high Jacobian Frobenius norms, and a learning-based method trained to minimize the Wasserstein distance between generated and target distributions. This work is based on the two following papers:
[1] Tanielian, U., Issenhuth, T., Dohmatob, E., & Mary, J. Learning disconnected manifolds: a no gan?s land. In International Conference on Machine Learning (ICML) 2020.
[2] Issenhuth, T., Tanielian, U., Picard, D., & Mary, J. (2022). Latent reweighting, an almost free improvement for GANs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1668-1677).

Perla Doubinsky, Nicolas Audebert, Michel Crucianu, Hervé Le Borgne

Title: Multi-Attribute Balanced Sampling for Disentangled GAN Controls

Abstract: Various controls over the generated data can be extracted from the latent space of a pre-trained GAN, as it implicitly encodes the semantics of the training data. The discovered controls allow to vary semantic attributes in the generated images but usually lead to entangled edits that affect multiple attributes at the same time. Supervised approaches typically sample and annotate a collection of latent codes, then train classifiers in the latent space to identify the controls. Since the data generated by GANs reflects the biases of the original dataset, so do the resulting semantic controls. We propose to address disentanglement by balancing the semantics of the dataset before training the classifiers. We demonstrate the effectiveness of this approach by extracting disentangled linear directions for face manipulation on two popular GAN architectures, PGGAN and StyleGAN, and two datasets, CelebAHQ and FFHQ. We show that this simple and general approach outperforms state-of-the-art classifier-based methods while avoiding the need for disentanglement-enforcing post-processing.

Pedro Caio Castro Côrtes C Coutinho, Yannick Berthoumieu, Marc Donias, Sébastien Guillon

Title: Disentangling Multiple Specified Latents using Variational Autoencoder and Total Correlation Loss

Abstract: Recently, Variational Autoencoders (VAEs) [5] have gained considerable attention due to their capacity of encoding high dimensional data into a lower dimensional latent space. In this context, several works have proposed methods to produce disentangled representations, where each factor of variation in the data space is controlled by a single component of the latent space. While this canbe done in an unsupervised manner ([2], [4], [6]), some weakly supervised methods propose the explicit disentanglement of a specified factor of variation ([9], [11], [3], [1]) by dividing the latent space into two subspaces, where one of them is explicitly designed for controlling such factor. In this work, we propose a model that expands the latter notion to multiple specified factors, by dividing the latent space into multiple subspaces, with the objective of allowing the explicit disentanglement of as many specified factors of variation as desired. This is done in a weakly supervised manner, where, for each specified factor, only a pair of data generated with the same factor is needed during training, with no necessity to have actual labels. Furthermore, in order to reinforce the independence among the different subspaces, total correlation loss is implemented by using an adversarial trained discriminator ([4], [8]). Experiments are carried out on different datasets, such as MNIST [7] and Sprites [10], and show that our model is able to disentangle one or more specified factors of variation, being able to generate new data while constraining some desired properties.
[1] Joey Hejna, Ashwin Vangipuram, and Kara Liu. Improving latent representations via explicit disentanglement.
[2] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. 2016.
[3] Ananya Harsh Jha, Saket Anand, Maneesh Singh, and VSR Veeravasarapu. Disentangling factors of variation with cycle-consistent variational auto-encoders. In Proceedings of the European Conference on Computer Vision (ECCV), pages 805?820, 2018.
[4] Hyunjik Kim and Andriy Mnih. Disentangling by factorising. In International Conference on Machine Learning, pages 2649?2658. PMLR, 2018.
[5] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[6] Abhishek Kumar, Prasanna Sattigeri, and Avinash Balakrishnan. Variational inference of disentangled latent concepts from unlabeled observations. arXiv:1711.00848, 2017.
[7] Yann LeCun and Corinna Cortes. The mnist database of handwritten digits. 2005.
[8] Ziwen Liu, Mingqiang Li, and Congying Han. Blocked and hierarchical disentangled representation from information theory perspective. arXiv preprint arXiv:2101.08408, 2021.
[9] Michael F Mathieu, Junbo Jake Zhao, Junbo Zhao, Aditya Ramesh, Pablo Sprechmann, and Yann LeCun. Disentangling factors of variation in deep representation using adversarial training. Advances in neural information processing systems, 29, 2016.
[10] Scott E Reed, Yi Zhang, Yuting Zhang, and Honglak Lee. Deep visual analogy-making. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in NeuralInformation Processing Systems, volume 28. Curran Associates, Inc., 2015.
[11] Attila Szab ?o, Qiyang Hu, Tiziano Portenier, Matthias Zwicker, and Paolo Favaro. Challenges in disentangling independent factors of variation. arXiv preprint arXiv:1711.02245, 2017.

Javiera Castillo-Navarro, Bertrand Le Saux, Alexandre Boulch, Sébastien Lefèvre

Title: Energy-based models in Earth observation: from generation to semi-supervised learning

Abstract: Deep learning, together with the availability of large amounts of data, has transformed the way we process Earth observation (EO) tasks, such as land cover mapping or image registration. Yet, today, new models are needed to push further the revolution and enable new possibilities. This work focuses on a recent framework for generative modeling and explores its applicability to the EO images. The framework learns an energy-based model (EBM) to estimate the underlying joint distribution of the data and the categories, obtaining a neural network that is able to classify and synthesize images. On these two tasks, we show that EBMs reach comparable or better performances than convolutional networks on various public EO datasets and that they are naturally adapted to semi-supervised settings, with very few labeled data. Moreover, models of this kind allow us to address high-potential applications, such as out-of-distribution analysis and land cover mapping with confidence estimation.

Fabio Pizzati, Jean-François Lalonde, Raoul de Charette

Title: Few-shot image-to-image translation with manifold deformation

Abstract: Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and additional anchor domains (assumed to be composed of large numbers of images). The learned manifold is interpolated and deformed towards the few-shot target domain via patch-based adversarial and feature statistics alignment losses. All of these components are trained simultaneously during a single end-to-end loop. In addition to the general few-shot translation task, our approach can alternatively be conditioned on a single exemplar image to reproduce its specific style. Extensive experiments demonstrate the efficacy of ManiFest on multiple tasks, outperforming the state-of-the-art on all metrics and in both the general- and exemplar-based scenarios. Our code is open source.

Jelica Vasiljevic, Zeeshan Nisar, Friedrich Feuerhake, Cédric Wemmert, Thomas Lampert

Title: CycleGAN for virtual stain transfer: is seeing really believing?

Abstract: Digital Pathology is an area prone to high variation due to multiple factors which can strongly affect diagnostic quality and visual appearance of the Whole-Slide-Images (WSIs). The state-of-the art methods to deal with such variation tend to address this through GAN-based style-transfer approaches. Usually, these solutions directly apply successful approaches from the literature, potentially with some task-related modifications. The majority of the obtained results are visually convincing; however, this is not a guarantee that such images can be directly used for either medical diagnosis or reducing domain shift. This talk will show that slight modification in a stain transfer architecture, such as a choice of normalisation layer, while resulting in a variety of visually appealing results, surprisingly greatly effects the ability of a stain transfer model to reduce domain shift. By extensive qualitative and quantitative evaluations, it is confirmed that translations resulting from different stain transfer architectures are distinct from each other and from the real samples. Therefore, conclusions made by visual inspection or pretrained model evaluation might be misleading.

Jean Prost, Antoine Houdard, Nicolas Papadakis et Andrés Almansa

Title : Diverse super-resolution with deep hierarchical variational autoencoders

Abstract: Image super-resolution is a one-to-many problem, but most deep-learning based methods only provide one single solution to this problem. In this work, we tackle the problem of diverse super-resolution by reusing VD-VAE, a state-of-the art variational autoencoder (VAE). We find that the hierarchical latent representation learned by VD-VAE naturally separates the image low-frequency information, encoded in the latent groups at the top of the latent hierarchy, from the image high-frequency details, determined by the latent groups at the bottom of the latent hierarchy. Starting from this observation, we design a super-resolutionmodel exploiting the specific structure of VD-VAE latent space. Specifically, we train an encoder to encode low-resolution images in the first groups of VD-VAE latent space, and we combine this encoder with VD-VAE generative model to sample diverse super-resolved version of a low-resolution input. We demonstrate the ability of our method to generate diverse solutions to the super-resolution problem on face super-resolution with upsampling factors ×4, ×8 and ×16.

Farideh Bazangani, Badih Ghattas

Title: FDG-PET to T1 weighted MRI translation with3D Elicit Generative Adversarial Network(E-GAN)

Abstract: With the strengths of deep learning, computer-aided diagnosis (CAD) is a hot topic for researchers in medical image analysis. One of the main requirements for training a deep learning model is providing enough data for the network. However, in medical images, due to the difficulties of data collection and data privacy finding an appropriate dataset (balanced, enough samples, etc.) is quite a challenge. Although image synthesis could be beneficial to overcome this issue, synthesizing3D images is a hard task. The main objective of this paper is to generate a 3D T1 weighted MRIcorresponding to FDG-PET. In this study, we propose a separable convolution-based elicit generative adversarial network (E-GAN). The proposed architecture can reconstruct 3D T1 weighted MRI from 2D high-level features andgeometrical information retrieved from a Sobel filter. Experimental results on the ADNI datasets forhealthy subjects show that the proposed model improves the quality of images compared with the state of the art. In addition, the evaluation of E-GAN and the state of art methods, gives a better resulton the structural and textural information with the proposed model.

[invited] Symeon (Akis) Papadopoulos

Title: DeepFakes: Technology, Methods, Trends and Challenges

Abstract: DeepFakes pose an increasing risk to democratic societies as they threaten to undermine the credibility of audiovisual material as evidence of real-world events. The technological field of DeepFakes is highly evolving with new generation methods constantly improving the quality, convincingness and ease of generation of synthetic content, and new detection methods aiming at detecting as many as possible cases of synthetic content generation and manipulation. The talk will provide a short overview of the technology behind DeepFake content generation and detection, highlighting the main methods and tools available, and discussing some ongoing trends. It will also briefly discuss the experience of the Media Verification (MeVer) team with developing, evaluating, and deploying a DeepFake detection service in the context of the AI4Media project.

Identification