Etat des lieux de la compression des données multimédia

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

19 personnes membres du GdR ISIS, et 45 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 100 personnes.

Annonce

Au cours des deux dernières décennies, les contenus multimédias sont devenus la forme de communication dominante de la société numérique. Cela a conduit à une croissance fulgurante qui représente plus de 80 % du trafic mondial de données. La compression reste un moyen incontournable pour transmettre le moins de bits possible tout en représentant les données d'entrée à un certain niveau de fidélité/qualité. Par ailleurs, les données multimédias sont consommées à l'aide d'appareils hétérogènes allant des téléviseurs aux smartphones ou même dans des communications machine à machine. Plusieurs domaines ont connu une progression significative comme la visioconférence en temps réel dans les environnements de travail et d'éducation, les applications immersives pour le divertissement et dans l'industrie du futur, et bien d'autres encore. Ces applications imposent des contraintes supplémentaires à la conception du codeur telles qu'un débit contrôlable dynamiquement, une faible empreinte de calcul et de mémoire, une faible latence et une satisfaction de l'utilisateur final, s'il existe.

Pour répondre aux besoins des applications émergentes dans le respect des contraintes applicatives, les académiques et les industriels déploient des efforts considérables. Les comités de normalisation tels que MPEG ou JPEG ont vu leur activité se diversifier notamment avec l'adoption de l'intelligence artificielle.

Cette journée du GdR-ISIS, première d'une série dédiée à la compression des données multimédias, a pour but de faire l'état des lieux de la compression au sens large. L'objectif est de présenter les derniers développements en matière de compression et d'identifier les défis auxquels fait face cette thématique. Elle est destinée aux académiques comme aux industriels désireux de comprendre à la fois les enjeux de cette problématique et les ouvertures possibles à l'ère de l'intelligence artificielle. L'idée est également de contribuer à créer une synergie entre académiques et industriels afin de répondre ensemble aux défis émergents en lien avec la compression des données multimédias.

Organisateurs :

Chaker Larabi - XLIM, Université de Poitiers
Didier Nicholson - EKTACOM
Félix Henry - ORANGE LABS
Marius Preda - ARTEMIS, TELECOM SUDPARIS

Logistique Locale : Lu Zhang - INSA Rennes/IETR

Participation gratuite mais inscription obligatoire !!!

Programme

10h00 - 10h15 : Lancement de la journée.
10h15 - 11h00 : The next generation of JPEG standards
- Touradj Ebrahimi - EPFL - Suisse
11h00 - 11h45 : New ways to develop standards with new technologies
- Leonardo Chariglione - MPAI
11h45 - 12h15 : Recent trends at JVET: What if AI is actually the answer?
- Mohsen Abdolli - B<>COM
12h15 - 13h45 : Pause déjeuner
13h45 - 14h15 : Overview of standardization activities on media coding (MPEG and JPEG)
- Aylin Kip - AFNOR et Didier Nicholson - EKTACOM
14h15 - 14h45 : An overview of the MPEG Immersive Video standard
- Bertrand Chupeau - Interdigital
14h45 - 15h15 : Recent advancements in Computer Graphics compression standardisation
- Marius Preda - ARTEMIS, TELECOM SUDPARIS
15h15 - 16h00 : Alliance for Open Video and its video coding efforts
- Andrey Norkin - Netflix
16h00 - 16h15 : Pause Café
16h15 - 17h30 : Codage de données multimédia - Horizon 2030
- Metamedia? - Ph. Guillotel - Interdigital
- MPEG Visual Volumetric Video-based Coding (and beyond) versus Omni- and Metaverse - Gauthier Lafruit - Université Libre de Bruxelles, Belgique
- Perspectives on Next-generation Video Coding Development - Matthias Wien - RWTH Aachen University - Allemagne
Clôture de la journée

Résumés des contributions

The next generation of JPEG standards - Touradj Ebrahimi
- Résumé : Artificial Intelligence (AI) has emerged as a key technology in a wide range of applications, including those in visual information processing. Computer vision based on AI has shown outstanding results in pattern recognition and content understanding and AI-based image processing has demonstrated unparalleled performance in cognitive enhancement of visual content as witnessed by advances in super-resolution and denoising operations taking advantage of deep learning techniques. More recent efforts in AI-based compression have shown superior performance when compared to the most advanced to state of the art in image coding. More recently, Generative AI has been used to create realistic visual information in form of image, video and 3D models. JPEG standardization committee has initiated several new projects to cope with the needs of emerging imaging applications relying on AI. A first such initiative started 4 years ago referred to as JPEG AI and focusing on compression of images, shortly followed by a second known as JPEG Pleno to deal with immersive content representation. More recently, JPEG Trust has been initiated as a new standardization effort to cope with the issue of trust in imaging, especially thanks to trends in Generative AI. Finally, JPEG has initiated an exploration to find how it can help in the development of an exchange format for event based vision, increasingly used in computer vision application, known under the acronym JPEG XE. In this talk we will provide an overview of JPEG AI, JPEG Pleno, JPEG Trust and JPEG XE standards, their current status and their roadmaps.
- Biographie : Touradj Ebrahimi is professor of image processing at Ecole Polytechnique Fédérale de Lausanne (EPFL). He is active in teaching and research in multimedia signal processing and heads the Multimedia Signal Processing Group at EPFL. Since 2014, he has been the Convenor (Chairman) of the JPEG standardization Committee which has produced a family of standards that have revolutionised the world of imaging. He represents Switzerland as the head of its delegation to JTC1 (in charge of standardization of information technology in ISO and IEC), SC29 (the body overseeing MPEG and JPEG standardization) and is a member of ITU representing EPFL. He was previously the chairman of the SC29 Advisory Group on Management until 2014. Prof. Ebrahimi is involved in Ecma International as a member of its ExeCom since 2020 and serves as consultant, evaluator and expert for European Commission and other governmental funding agencies in Europe and advises a number of Venture Capital companies in Switzerland in their scientific and technical audits. He has founded several startup and spinoff companies in the past two decades, including the most recent RayShaper SA, a research company based in Crans-Montana involved in AI powered multimedia. Prof. Ebrahimi is author or co-author of over 400 scientific publications and hold a dozen invention patents. His areas of interest include image and video compression, media security, quality of experience in multimedia and AI based image and video processing and analysis. Prof. Ebrahimi is a Fellow of the IEEE, SPIE and EURASIP and has been recipient of several awards and distinctions, including a Prime Time Emmy Award for JPEG, the IEEE Star Innovator Award in Multimedia and the SMPTE Progress Medal.

Recent trends at JVET: What if AI is actually the answer?
- Résumé : Following the standardization of Versatile Video Coding (VVC) in 2020, the experts of the Joint Video Exploration Team (JVET) have remained tirelessly devoted to their mission and have already begun exploring new technologies for developing a next-generation video compression system. In addition to sticking to their routines of exploring new conventional compression tools, JVET is also taking a rather unconventional approach: utilizing AI for video compression. This factual presentation will provide a summary of the key activities undertaken by JVET in their quest for a next-generation codec.
- Biographie : Mohsen Abdoli received his Master of Engineering (in Artificial Intelligence) degree in 2013 from Sharif University of Technology in Tehran, Iran, and his Doctor of Philosophy in 2018, jointly from Université Paris-Saclay and CentraleSupélec in Paris, France. In 2015, he joined Orange Labs in Rennes, France, where he developed various video compression techniques targeting intra prediction and residual coding of the VVC standard established by JVET. In 2018, he joined Ateme in Rennes, France, where he contributed to the first-ever implementation of a real-time VVC transcoder. Currently, he is a standardization engineer at IRT b<>com in Rennes, France. His research interests include image/video compression, encoder optimization, machine learning, and quality assessment.

An overview of the MPEG Immersive Video standard
- Résumé : The MPEG immersive video (MIV) standard is targeting the compression of immersive video content, in which a real or virtual 3D scene is captured by multiple real or virtual cameras. It is designed to support virtual and augmented reality applications that require rendering of virtual viewpoints with 6 degrees of freedom of view position and orientation. At encoder stage, redundant parts of the captured 3D scene are identified and pruned out, the non-redundant patches are 3D-to-2D projected and efficiently packed into atlases of texture, depth, and possibly other attributes such as transparency. The patch atlas sequences can then be encoded leveraging conventional 2D video encoders. An associated metadata stream transports the patch parameters, their position in atlas frame and their camera parameters mainly, which allows the end-user device to reconstruct the 3D scene from the decoded atlases for rendering virtual viewports. The first edition of MIV is in the final publishing stage at ISO. We will conclude with an overlook of the new features in the second edition which is under way.
- Biographie : Bertrand Chupeau is Senior Scientist and project leader at InterDigital research labs in Rennes. He has been deeply involved in the development of the MIV standard and was co-editor of the specification of the first edition.

Point clouds compression in MPEG
- Résumé : Point clouds contain vast amounts of data, which limits their use in mass market applications. However, their ease of capturing and rendering spatial information makes them increasingly popular for immersive volumetric data. This talk presents two approaches developed during the MPEG standardization process for compressing point clouds: V-PCC projects 3D space into 2D patches and uses traditional video technologies, while G-PCC traverses 3D space to create predictors. V-PCC provides 125:1 compression ratio for 1 million points at 8 Mbit/s, while G-PCC provides up to 10:1 lossless and up to 35:1 lossy compression ratio. These standards are expected to enable various applications, including immersive media, VR/AR, real-time communication, autonomous driving, and cultural heritage. The talk also introduces two ongoing standardization projects for dynamic mesh compression and AI-based graphics compression in MPEG.
- Biographie : Marius Preda is known for his contributions to the field of multimedia and 3D data compression and has been involved with the MPEG standardization process. He has served as the convenor of MPEG 3D, a group responsible for defining standards for 3D data representation, compression, and transmission. He has also been involved in various research projects related to multimedia systems and signal processing and has published numerous articles in leading academic journals.

Overview of standardization activities on media coding (MPEG and JPEG)
- Résumé : The presentation aims to outline national and international standardization initiatives in the field of media coding and to explain how a standardization committee functions. Information on how a stakeholder can take part in standardization work.
- Biographie : IT law graduate, Aylin Kip is a standardization project manager on digital issues at AFNOR. Her missions lead her to work with actors at national and international level. In particular, she manages the activities of the standardisation committee on media coding.

New ways to develop standards with new technologies
- Résumé : Standardisation used to be structured to serve specific interests in silos. MPEG was the first, at list on the scale it applied to overthrow the paradigm. Before its disbanding, MPEG?s propulsive force had lost strength. There was a need to apply MPEG?s paradigm to the new challenges of Artificial Intelligence while alleviating some known distortions of traditional standardisation. MPAI is the international unaffiliated non-profit organisations developing standards for AI-based data coding executing its mission in various areas such as audio enhancement, human-machine conversation, finance, neural network watermarking, video coding and more.
- Biographie : MSc at the Polytechnic of Torino and PhD at the University of Tokyo, is the initiator and leader of international groups such as MPEG which have shaped media technology and business. He currently leads MPAI, the standards body for artificial intelligence-based data coding.

Alliance for Open Video and its video coding efforts
- Résumé : Alliance for Open Media (AOMedia) is an industry consortium formed in 2015, which develops media related specifications. Its first project was AV1 video codec, finalized in 2018-2019. Since then, AOMedia has worked on a number of projects and produced other media related specifications. In the video compression field, the Alliance has been investigating compression beyond the AV1 codec capabilities. This presentation will give a brief overview of AOMedia and will focus on its current efforts and the status of its video compression investigation.
- Biographie : Andrey Norkin is a Research Scientist at Netflix, USA, working on video compression and encoding techniques for OTT video streaming. He is a co-chair of the Codec Working Group of AOMedia and has been contributing to the compression research beyond AV1. Previously, Andrey Norkin was with Ericsson Research, Sweden. He holds a Doctor of Science degree in signal processing from Tampere University of Technology, Tampere, Finland.

Metamedia?
- Résumé : Il est difficile d?ignorer la tendance du tout « meta ». Le multimedia n?y échappe pas, même sans sens véritable. Il est donc intéressant de regarder si la compression de données multimédia est aussi concernée par cette meta-tendance, et aussi de regarder si les autres technologies associées entraîneront des conséquences sur la façon de représenter et de coder les données multimedia. Au travers de deux enjeux majeurs, la présentation essaiera de donner quelques éléments de réponse, notamment, i) comment appréhender l?impact de l?IA et du deep learning (au travers de ses spécificités), et ii) est-ce que le metavers, comme nouvel environnent de communication, nécessite la définition de nouveaux standards. Avec un focus sur la standardisation MPEG et les divers organisations/forum associés.
- Biographie : Philippe Guillotel is Distinguished Scientist at InterDigital Research & Innovation. His research interests include UX/user experiences (immersive experiences and user sensing), human perception technologies (vision and haptics), video compression (2D and 3D) and artificial intelligence (machine/deep learning), with a special focus on their use for entertainment applications. He currently leads a team of experts in the fields of i) digital humans and how to make those avatars, socially interacting with the users, ii) energy aware multimedia communication, iii) deep-based video compression. He holds a Ph.D. in signal processing and telecommunication from the University of Rennes.

MPEG Visual Volumetric Video-based Coding (and beyond) versus Omni- and Metaverse
- Résumé : Taking for granted that Metaverse would be the 3D internet of tomorrow (various other definitions exists), we might claim that there are already solutions out there that may serve this objective, e.g. Nvidia?s Omniverse toolset, as well as Unity, Unreal, Blender that will most likely be constituents to the global solution towards the Metaverse. Even though they cover file formats well supported by the Khronos group, there is a need to compress this gigantic amount of information streamed over the internet. The MPEG consortium, and its Visual Volumetric Video-based Coding (V3C) has clearly a role to play in this ecosystem. 3D object and scene coding for 3D free navigation is covered by two modes in V3C, both using a depth representation: the well-known 3D graphics format (point clouds and meshes), as well as image-based formats like MPEG Immersive Video (MIV). A new challenge, however, is the support of this free navigation functionality with high levels of realism, without any depth information to be captured or estimated. Neural Radiance Fields (NeRF, from Nvidia) and Implicit Neural Video Representations (INVR, from MPEG-Video) are two recent approaches trying to get rid of the cumbersome depth capture/estimation. But what about the compression performance of these tools? Will they ever become competitive compared to more conventional coding approaches that have been finetuned for decades? And what about old approaches that have been developed in the course of the MPEG activities, e.g. Facial And Body animation (FAB)? Could they compete against 3D graphics raytracing or Deepfakes? Which of all these tools ? past, present or future ? will best serve the multi-objective targets (low bitrate, high visual quality, high immersive feeling) imposed by the needs for a photo-realistic Metaverse? The presentation will of course not provide a clear-cut answer to these questions, but will nevertheless pinpoint some compression technology strengths and weaknesses to better apprehend future research needs in the domain, at the horizon 2030.
- Biographie : Gauthier Lafruit is an associate professor of immersive light field technologies with Université Libre de Bruxelles, Brussels, B-1050, Belgium. He works in visual data compression and rendering, participating in compression standardization committees like CCSDS (space applications), JPEG (still picture coding), and MPEG (moving picture coding). His research interests include depth image-based rendering, immersive video, and digital holography. Lafruit received his Ph.D. degree from Vrije Universiteit Brussel, Brussels.