Annonce

3 décembre 2024

Leveraging multimodal unlabeled data for visual scene understanding

Catégorie : Stagiaire

Recovering the properties of the scene such the shape, material and lighting properties is a crucial task in many computer vision and computer graphics applications. This task is referred to as inverse rendering, and it enables object insertion [1], scene relighting [2] and scene editing [3].

Recent advances in learning a 3D representation of the scene have shown impressive novel synthesis results such as NeRF [4] and 3D Gaussian Splatting [5]. However, these representations are not relightable /editable as the scene properties are baked into the radiance field. Many state-of-the-art solutions [6,7,8] have proposed inverse rendering pipelines making those 3D representations editable. Despite this progress, current methods often struggle with cast shadows, specular highlights, and other complex lighting interactions.

Diffusion-based generative models [9] have emerged as a promising visual generation approach. Diffusion models can change many images aspects such as the image style [10], or blending foreground object into the background [11], relighting scenes [12], editing the colors of specific objects [13] etc. This adaptability highlights diffusion models’ potential to learn and manipulate various intrinsic scene attributes effectively, including materials and lighting conditions, while maintaining photorealism. Their use in editing tasks results in representations that are flexible, allowing for manipulation of the scene properties [14, 15].

This internship will focus on developing methods that leverage diffusion models to disentangle and manipulate intrinsic scene properties, including materials, and lighting. The intern will explore novel approaches to produce representations that are fully editable and relightable. The specific objectives include:

· Intrinsic scene decomposition: Investigate current state-of-the-art diffusion models that decompose images into core components, enabling photorealistic editing and relighting while preserving scene coherence.

· Handling complex lighting interactions: Address current SOTA challenges in rendering complex lighting effects, such as realistic shadow and specular highlights.

· Relightable and editable representation: develop novel framework that leverage diffusion models to produce high-quality 3D scene representations. These representations should allow for intrinsic scene properties, such as material and lighting, to be manipulated. The goal is to create models that are not only relightable and editable but also maintain high fidelity and realism in their outputs. This work will build on recent advancements in inverse rendering and diffusion model.

· Evaluation: Evaluating the proposed techniques against existing state-of-the-art solutions, aiming to outperform them in terms of realism, fidelity, and editing across different lighting conditions.

References:

[1] Wang, Z., et al., "Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes," Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[2] Zhu, X., et al., "Relighting Scenes with Object Insertions in Neural Radiance Fields," arXiv preprint arXiv: 2406.14806, 2024.

[3] Zhu, J., et al., " I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs," CVPR, 2023

[4] Mildenhall, B., et al., "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis," ECCV, 2020.

[5] Kerbl, B., et al., "3D Gaussian Splatting for Real-Time Radiance Field Rendering," SIGGRAPH 2023.

[6] Zhanh, K., et al., "PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting," CVPR 2021.

[7] Zhang, X., et al., "NeRFactor: Neural Factorization of Shape and Reflectance under an Unknown Illumination," SIGGRAPH Asia 2021

[8] Liang, R., et al., " ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting," ICCV, 2023.

[9] Ho, J., et al., "Denoising Diffusion Probabilistic Models," NeurIPS, 2020.

[10] Haque, A., et al., " Instruct-NeRF2NeRF," ICCV 2023.

[11] Nichol, A., et al., "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models," ICML, 2022.

[12] Kocsis, P ., et al., " LightIt: Illumination Modeling and Control for Diffusion Models,"CVPR, 2024.

[13] Liang, Z., et al., " Control Color: Multimodal Diffusion-Based Interactive Image Colorization," arXiv preprint arXiv: 2402.10855, 2024.

[14] Chen, X., et al., "IntrinsicAnything: Learning to Decompose and Edit Scenes from Images," arxiv: 2404.11593, 2024.

[15] Jin, H., et al., "Neural Gaffer: Lighting and Material Editing via Neural Diffusion," NeurIPS, 2024

Description of research activities:

· Study state-of-the-art inverse rendering and relighting methods.

· Identify bottlenecks in the current approaches of the state-of-the-art.

· Propose new solutions to improve scene understanding and editing.

· Research and develop algorithms based on the proposed solutions.

· Apply proposed algorithms to existing datasets.

· Publish research results at top conferences and participate in scientific seminars.

Supervision:

This internship will be supervised by as a PhD candidate working on NeRF-based inverse rendering and a Professor at University of Evry.

Prerequisites:

The candidate should be motivated to carry out world-class research and should have a Master in Computer Vision and/or Robotics. They should have solid skills in the following domains:

· Implement Code in Python, and CUDA (optional) using deep learning libraries (PyTorch, PyTorch-lightning)

· Good knowledge of Git and Linux systems

· Project reporting and planning

· Write scientific publications and participate in conferences

· Fluency in spoken and written English and/or French is a plus

· Intercultural and coordination skills, hands-on and can-do attitude

· Interpersonal skills, team spirit, and independent working style

Contacts and Application:

Hala DJEGHIM – hala.djeghim@gmail.com

Désiré SIDIBE – drodesire.sidibe@univ-evry.f

Internship information: start in February/March 2024 for 6 months, located in University of Evry.

Application deadline: January 10^th, 2025.

Application Files: CV + motivation letter + transcript of records for academic years 2022-2023 and 2023-2024 + any other relevant document. To submit your application, send all documents in one PDF file at the contact addresses.

Retour

Identification

Annonce

Leveraging multimodal unlabeled data for visual scene understanding

Dans cette rubrique