Compression et qualité des contenus 360, Light Field et Point Cloud (3D)

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

10 personnes membres du GdR ISIS, et 37 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 80 personnes.

Annonce

Les technologies immersives sont en plein essors notamment par le biais de la réalité virtuelle (RV), augmentée (RA) et mixte (RM). Afin d'alimenter ces technologies, les contenus peuvent être capturés par différents moyens en exploitant un ensemble de caméras mais également des caméras sphériques ou plénoptiques. Cela permet donc de produire des images/vidéos en 360°, de nuages de points (Point Cloud, PC) et de champs de lumière (Light Field, LF). L'avènement de ces contenus a permis la création de nouveaux services, notamment les communications immersives 3D en temps réel, la visualisation de contenus VR avec parallaxe interactive, la télévision en point de vue libre 3D, la navigation autonome et bien d'autres applications.

Par la même occasion, ces nouveaux formats apportent de nouveaux défis à différents niveaux de la chaîne allant de l'acquisition à la restitution des contenus tout en passant par les étapes de codage, de transmission. Chaque étape apporte son lot de questions en plus de celle en lien avec la qualité de l'expérience de l'utilisateur et des risques sanitaires pouvant en découler.

Cette journée a pour objectif de réunir les académiques et les industriels autour des sujets évoqués plus haut. Le but est de créer un lieu d'échange et de présentation des travaux récents sur ces sujets d'actualité. Cela permet également de faire rencontrer académiques et industriels, chercheurs seniors et juniors, des chercheurs de différentes disciplines, etc. Des présentations seniors, couvrant la chaîne entière de traitements, seront programmées afin de permettre une meilleure compréhension de la problématique. Le programme de la journée sera complété par des présentations junior donnant un aperçu des travaux en cours dans le domaine.

Appel à contribution :

Nous attendons des contributions tant pratiques que théoriques en lien avec les thématiques de la journée. Merci de soumettre titre, un résumé (10-15 lignes), nom du présentateur, affiliation, par mail aux organisateurs avant le 28 février 2019.

Accès au campus INSA:

La journée se déroulera dans l'amphi GC, situé dans le bâtiment 7 du campus INSA :

https://www.insa-rennes.fr/informations-complementaires/acces.html

Accès depuis la Gare de Rennes :

Prendre le métro direction J. F Kennedy jusqu'à la station République (2 stations)

Depuis république prendre le bus C4 direction ZA Saint-Sulpice jusqu'à la station INSA Rennes (15 - 20 minutes).

Organisateurs :

Wassim Hamidouche VAADER/IETR - INSA Rennes (whamidou@insa-rennes.fr)
Christine Guillemot INRIA (Christine.Guillemot@inria.fr)
Vincent Ricordel LS2N - University of Nantes (vincent.ricordel@univ-nantes.fr)
Chaker Larabi, XLIM, University of Poitiers (chaker.larabi@univ-poitiers.fr)
Marc Antonini I3S, Sophia Antipolis (am@i3s.unice.fr).

Programme

09H50 - 10H00 Ouverture de la journée

Présentation 1 : 10H00 - 10H50

Titre : Densely-sampled Light Field: Reconstruction, Compression and Applications

Auteur: Atanas Gotchev - Tampere University (TAU)

10H50 - 11H05 Pause café

Présentation 2 : 11H05 - 11H50

Titre: Quality assessment of immersive media

Auteur: Jesus Gutierrez - LS2N, Université de Nantes

Présentation 3 : 11H50 - 12H115

Titre : Data acquisition and compression for user immersion in a 3D scene

Auteur : Thomas Maugey - INRIA Rennes

Présentation 4 & Démonstration : 14H00 - 14H25

Titre : Backward Compatible Layered Video Coding for 360° Video Broadcast
Auteur : Thibaud Biatek - TDF Rennes

Présentation 5 : 14H25-14H50

Titre : Numérisation d'installations industrielles complexes : pratiques actuelles, nouveaux enjeux à EDF
Auteur : Guillaume Terrasse - EDF R&D Saclay

mmmm

Présentation 6 : 14H50 - 15H15

Titre: On Head Motion Prediction in 360° Virtual Reality Videos
Auteur: Miguel F. Romero Rondon , Université Côte d'Azur, CNRS, I3S

Pause café 15H15 - 15H30

Présentation 7 : 15H30 - 15H55

Titre: Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression
Auteur : Maurice Quach - L2S, CNRS, CentraleSupelec

Présentation 8 : 15H55 - 16H20

Titre: Achieving 6DoF using divergent multi-view compression and synthesis - a paradigm shift.
Auteurs : Bappaditya Ray (1/2), Joël Jung (1) et Chaker Larabi (2)
(1) Orange Labs, (2) XLIM UMR CNRS 7252 - Université de Poitiers

Présentation 9 : 16H20 - 16H45

Titre: Rate-Distortion Optimized Tree-Structured Point-Lattice Vector Quantization for Compression of 3-D Point Cloud Geometry

Auteurs : Amira Filali (1), Vincent Ricordel (1), Nicolas Normand (1) et Wassim Hamidouche (2)

(1) LS2N Nantes, (2) IETR Rennes

Présentation 10 : 16H45 - 17H05

Titre: Towards practical hologram streaming using view-dependent scalable coding

Auteurs : Anas El Rhammad - b<>com Rennes

Résumés des contributions

Présentation 1 : 10H00 - 10H50

Titre : Densely-sampled Light Field: Reconstruction, Compression and Applications

Auteur: Atanas Gotchev - Tampere University (TAU)

Résumé : Densely-sampled Light Field (DSLF) is a discrete representation of the 4D continuous light field, which assumes neighbouring camera views with max disparity of 1 pixel at most. According to the plenoptic sampling theory, such sampling is sufficient for reconstructing any arbitrary ray by simple linear interpolation. This makes DSLF an attractive representation for arbitrary ray interpolation and view synthesis. DSLF has a prohibitively high number of views and cannot be captured directly. Instead, it can be reconstructed from sparse multi-perspective views using computational methods. In the talk, we discuss our approach to reconstruct DSLF using sparsification in shearlet transform domain. Furthermore, we discus DSLF compression and its applications in microscopy and full-parallax imaging.

mmm

Présentation 2 : 11H05 - 11H50

Titre: Quality assessment of immersive media

Auteur: Jesus Gutierrez - LS2N, Université de Nantes

Résumé : The emergence of immersive media technologies (e.g., free-viewpoint video, virtual reality, augmented reality, etc.) is providing to the users new interactive experiences, which allow a more natural and complete exploration of the represented content in comparison with previous technologies. These innovative solutions entail novel perceptual and technical factors that should be extensively studied to support an adequate technology development to meet the new user expectations. In this sense, this talk provides an overview on quality of experience (QoE) evaluation for immersive media through some studies with multiview video, 360-degree content, light field images, and 3D objects for augmented/mixed reality.

mmm

Présentation 3 : 11H50 - 12H115

Titre : Data acquisition and compression for user immersion in a 3D scene

Auteur : Thomas Maugey - INRIA Rennes

Résumé : In this talk, we present a new dataset in order to serve as a support for researches in Free Viewpoint Television (FTV) and 6 degrees-of-freedom (6DoF) immersive communication. This dataset relies on a novel acquisition procedure consisting in a synchronized capture of a scene by 40 omnidirectional cameras. We have also developed a calibration solution that estimates the position and orientation of each camera with respect to a same reference. This solution relies on a regular calibration of each individual camera, and a graph-based synchronization of all these parameters. These videos and the calibration solution are made publicly available.

Then, we present innovating solution to compress such data for interactive communication, i.e., when the user is enabled to choose in real time its angle of view. In such a scenario, there is no need to transmit the whole data, which implies to reinvent the entire compression strategy. We experimentally show that our coder outperforms classical solutions such as tiling approaches.

mmm

Présentation 4 & Démonstration : 14H00 - 14H25

Titre : Backward Compatible Layered Video Coding for 360° Video Broadcast
Auteur : Thibaud Biatek - TDF Rennes

Résumé : Recently, the coding and transmission of immersive 360° video has been intensely studied. The technologies provided by standards developing organizations mainly address requirements coming from over-the-top services. The terrestrial broadcast remains in many countries the principal medium for accessing high quality contents and ensuring a wide audience reach to service providers. To introduce immersive video services over terrestrial broadcast, the deployed technologies shall fulfill requirements such as backward compatibility to legacy receivers and high bandwidth efficiency. While bandwidth efficiency is addressed by existing techniques, none of them enables backward compatibility. In this paper, a novel scalable coding scheme addressing broadcast is proposed to enable immersive services introduction over terrestrial broadcast networks. The experiments show that the proposed approach provides substantial coding gains of 14.99% compared to simulcast coding and introduces a limited coding overhead of 5.15% compared to 360° single-layer coding. A real-time decoding implementation is proposed, highlighting the relevance of the proposed design. Eventually, an end-to-end demonstrator illustrates how the proposed solution could be integrated in a real terrestrial broadcast environment.

Présentation 5 : 14H25-14H50

Titre : Numérisation d'installations industrielles complexes : pratiques actuelles, nouveaux enjeux à EDF
Auteur : Guillaume Terrasse - EDF R&D Saclay

Résumé : Dans le cadre de la maintenance et la prolongation de la durée de vie des centrales nucléaires, EDF a entrepris la numérisation de ces installations complexes et de grande taille, notamment les bâtiments réacteur, formés de 15 étages et demi-étages, organisés en près de 200 locaux et contenant plus de 10.000 équipements. La numérisation du bâtiment dans sa totalité intègre des photographies panoramiques 360° haute résolution, un nuage de points 3D, une reconstruction de ce nuage. Ces données sont mises en contextes et sont accompagnées de la mise à jour des plans masse. Pour un bâtiment réacteur, cela représente environ 50 milliards de points 3D et 450 milliards de pixels. Pour ce faire EDF a développé une chaine logicielle permettant de traiter ces données puis d'y naviguer (Hullo et al., 2015). Actuellement, cette chaine semi-automatique permet d'atteindre une reconstruction exacte à 5 cm près (à 2.57 ). Face à la masse de données croissante liée à l'intensification des numérisations, les enjeux d'EDF sont :

Augmenter l'automatisation des traitements fastidieux pour permettre aux équipes en charge de la modélisation de se consacrer pleinement au contrôle qualité. Relier les informations sémantiques du Système d'Information Industriel avec les données topographiques et photographiques.

EDF R&D teste dès à présent l'apprentissage profond pour la segmentation automatique de nuage de points 3D : nous avons récemment évalué l'algorithme PointNet et des modèles équivalents. Nous avons également mis en oeuvre des algorithme des algorithmes d?apprentissage profond pour la détection des équipements et leur matricule dans les photographies.
Hullo, J.-F., Thibault, G., Boucheny, C., Dory, F., Mas, A. (2015) Multi-Sensor As-Built Models of Complex Industrial Architectures. Remote Sensing 7(12):16339-16362. DOI: 10.3390/rs71215827

mmmm

Présentation 6 : 14H50 - 15H15

Titre: On Head Motion Prediction in 360° Virtual Reality Videos
Auteur: Miguel F. Romero Rondon , Université Côte d'Azur, CNRS, I3S

Résumé : The streaming transmissions of 360° videos is a major challenge for the development of Virtual Reality (VR). The design of efficient streaming algorithms consuming a limited network rate however requires a reliable head motion predictor to identify which region of the sphere to send in high quality. In this work, we unveil that the most recent literature has produced (deep network-based) predictors performing worse than simple baselines.

We therefore revisit the problem of head motion prediction in VR, and show that an LSTM-based sequence-to-sequence architecture can be designed to outperform the baseline. We then tackle the crucial question of how to design an architecture able to make the prediction benefit both from the time series of past positions and from the visual content, as existing architectures claim to do but fail at. With a principled testing approach and considering a wider set of architectures designed in other application contexts, we are able to pinpoint that fusing the visual information after the last recurrent unit should be avoided. The two promising approaches we identify to hold most potential to best combine positional and visual information are: (i) the architectures with early fusion of these heterogeneous inputs, that is fusion before the last recurrent unit, and (ii) the architectures which consider the visual and positional inputs in the same feature space from the start, such as a ConvLSTM.

Présentation 7 : 15H30 - 15H55

Titre: Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression
Auteur : Maurice Quach - L2S, CNRS, CentraleSupelec

Résumé : Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform

quantization.
We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5\% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates.

mmm

Présentation 8 : 15H55 - 16H20

Résumé : In this talk, a novel framework for achieving omnidirectional 6DoF is proposed by coding divergent views and subsequently performing view synthesis using decoded divergent views. The proposed framework gets rid of panoramic (ERP) representation and exploit the parallax between divergent views. The talk is composed of two major parts. In the first part, the proposed framework is compared with state-of-the-art one (based on ERP representation) using HEVC coding. Simulation results show that the proposed framework performs better in low bitrate with a significantly reduced encoding and decoding runtime. In the second part, the performance of multi-view extensions of HEVC is investigated for the proposed framework, and subsequently, a novel coding method is proposed, which significantly outperforms HEVC and its multi-view extensions.

Présentation 9 : 16H20 - 16H45

Titre: Rate-Distortion Optimized Tree-Structured Point-Lattice Vector Quantization for Compression of 3-D Point Cloud Geometry

Auteurs : Amira Filali (1), Vincent Ricordel (1), Nicolas Normand (1) et Wassim Hamidouche (2)

(1) LS2N Nantes, (2) IETR Rennes

Résumé : Talk deals with the current trends of new compression methods for 3-D point cloud contents required to ensure efficient transmission and storage. The representation of 3D point clouds geometry remains a challenging problem, since this signal is unstructured. For this purpose, we introduce a new hierarchical geometry representation based on adaptive Tree-Structured Point-Lattice Vector Quantization (TSPLVQ). This representation enables hierarchically structured 3D content that improves the compression performance for static point cloud.

The novelty of the proposed scheme lies in adaptive selection of the optimal quantization scheme of the geometric information, that better leverage the intrinsic correlations in point cloud. Based on its adaptive and multiscale structure, two quantization schemes are dedicated to project recursively the 3D point clouds into a series of embedded truncated cubic lattices. At each step of the process, the optimal quantization scheme is selected according to a rate-distortion cost in order to achieve the best trade-off between coding rate and geometry distortion, such that the compression flexibility and performance can be greatly improved. Experimental results show the interest of the proposed multi-scale method for lossy compression of geometry.

Présentation 10 : 16H45 - 17H05

Titre: Towards practical hologram streaming using view-dependent scalable coding

Auteurs : Anas El Rhammad - b<>com Rennes

Résumé : Holography is considered as the most promising immersive technology for natural, comfortable and authentic three-dimensional (3D) visualization. However, high quality holograms with large field of view (FoV) require very large amounts of data (from gigapixel up to a few terapixel), and the bandwidth requirements allowing the access to such holograms in reasonable time cannot be met by current communications networks. For example, generating a monochromatic hologram of size 10cmx10cm with a FoV of 80° requires more than 60 gigapixels. Thus, transmitting a hologram of such size using a network bandwidth of 100Mb/s would require around 1 hour and 20 minutes. To reduce the time needed to display the hologram, we propose a view-dependent scalable compression scheme based on a Gabor decomposition. At the encoder side, the hologram is first decomposed to a sparse set of diffracted light rays using Matching Pursuit over a Gabor atoms dictionary. Then, the atoms corresponding to a given user's viewpoint are selected to form a sub-hologram. Finally, the pruned atoms are sorted and encoded according to their importance for the reconstructed view. The experimental results reveal that our approach outperforms the conventional scalable codecs and enables a progressive decoding and display from the first received bits. Finally, a streaming simulation of digital holograms is presented for different bandwidths compatible with today's transmission channels

Identification

Compression et qualité des contenus 360, Light Field et Point Cloud (3D)

Inscriptions

Annonce

Appel à contribution :

Accès au campus INSA:

Organisateurs :

Programme

Résumés des contributions