Efficient Hardware for Deep Neural Network Processing

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

27 personnes membres du GdR ISIS, et 21 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 50 personnes.

Annonce

Efficient Hardware for Deep Neural Network Processing

Suite au grand nombre de demandes, nous mettons en place une visio :

Topic: Journée GDR ISIS -- Efficient Hardware for Deep Neural Network Processing
Time: Sep 22, 2022 09:00 AM Paris

Join Zoom Meeting
https://imt-atlantique.zoom.us/j/94982757611

Deep Neural Networks (DNNs) are used in a broad range of applications in AI such as computer vision, natural language processing, robotics. These DNNs reach and outperform state-of-the-art methods at the cost of a dramatic increase of the computational complexity and memory footprint. Solutions are therefore explored in order to improve implementation metrics such as throughput, latency, energy efficiency. Along with work on efficient DNN architectures, which would achieve close-to-optimal performance with a reduced number of parameters and computation cost, the design of efficient hardware architectures is crucial to deploy DNNs, whether on powerful servers or on edge devices.

This GDR Isis day targets research that explore hardware architectures and hardware co-design methods for Efficient DNN processing. Efficiency is here understood as designs that targets either low-complexity, high throughput, low latency or low energy consumption.

Topics of interest include, but are not limited to:

- Hardware architectures of DNN processors,
- Hardware/software co-design,
- DNN processing on heterogeneous systems,
- Approximate computing for DNN processing,
- New technologies for DNN processing (in- or near- memory computing)

The meeting will be held on September 22, 2022, beginning at 9:00 am.

It will include two guest lectures by:

Olivier Bichler, Main contributor of the N2D2 project, CEA, LIST
Thomas Preußer, Lead of the FINN project, AMD

People wishing to present their work at this meeting are invited to send, by e-mail, their proposal (title and abstract of 1 page maximum, with list of authors and affiliations) to the organizers before September 1st, 2022 to (mathieu.leonardon@imt-atlantique.fr, matthieu.arzel@imt-atlantique.fr, erwan.libessart@centralesupelec.fr, anthony.kolar@centralesupelec.fr).

Organisateurs

Matthieu Arzel, IMT Atlantique, Lab-STICC, matthieu.arzel@imt-atlantique.fr
Anthony Kolar, CentraleSupélec, anthony.kolar@centralesupelec.fr
Mathieu Léonardon, IMT Atlantique, Lab-STICC, mathieu.leonardon@imt-atlantique.fr
Erwan Libessart, CentraleSupélec, erwan.libessart@centralesupelec.fr

Lieu

Sorbonne Université, Jussieu.

Programme

8h45 - 9h15 : Accueil café

9h15 - 10h30 : Building Inference Engines for Quantized Neural Networks Using Brevitas & FINN - Thomas Preusser

10h25 - 10h45 : Pause

10h45 - 12h00 : DeepGreen : plateforme indépendante pour le deep learning embarqué - Olivier Bichler

12h00 - 14h00 : Déjeuner au restaurant CROUS Jussieu

14h00 - 14h30 : Mobileflow : modèle et mise en oeuvre pour une inférence de flot optique efficace - Agathe Archet

14h30 - 15h00 : La compression via une pseudo-randomisation partielle des réseaux de neurones convolutifs - Florent Crozet

15h00 - 15h30 : THINK: Test of Hardware Inferring Neural network, an IN2P3 R&T program - Frédéric Druillole

15h30 - 16h00 : Pause

16h00 - 16h30 : A new figure of merit for neural network efficiency - Hugo Waltsburger

16h30 - 17h00 : Leveraging Structured Pruning of Convolutional Neural Networks - Mathieu Léonardon

Résumés des contributions

Présentations invitées

Building Inference Engines for Quantized Neural Networks Using Brevitas & FINN

Thomas Preusser
AMD

Deep Learning is penetrating an ever-increasing number of applications and establishing itself in the edge and on embedded platforms. With Dennard Scaling and Moore?s Law fading, the advances in silicon technology can no longer cater for the growth of its associated computational complexity and memory demands. The resulting gap needs to be closed by hardware specialization and custom hardware/software co-design. This talk tells the story of Brevitas & FINN. It shows (a) how these tools help to exploit and implement numeric quantization in convolutional neural networks to hedge the resource demands of custom inference engines, and (b) how an underlying streaming dataflow architecture delivers reliable computational throughput without compromising the inference latency by means like batching. The case for Brevitas & FINN will be illustrated by selected example projects that have used these tools over the recent years. The central building blocks of the generated inference engines will be detailed. Entry points for engaging hands-on with Brevitas and FINN as well as an outlook on the ongoing developments will be offered.

DeepGreen : plateforme indépendante pour le deep learning embarqué

Olivier Bichler
CEA, LIST, Gif-sur-Yvette

Le projet DeepGreen vise le regroupement des grands industriels et PME françaises pour le déploiement de l'Intelligence Artificielle sur cibles matérielles contraintes au travers d'une plateforme logicielle qui répondent aux exigences et attentes de chacun. La première innovation du projet est d'adresser au sein d'une même et unique plateforme l'ensemble de la chaîne de conception et de déploiement de l'IA pour l'embarqué. Au-delà de l'aspect intégré, il est prévu de développer un ensemble de fonctionnalités à forte valeur ajoutée et allant au-delà l'état de l'art, pour générer des applications innovantes et à moindre coût :

- Optimisation robuste de graphes de réseaux de neurones profonds, en combinant la quantification, la parcimonie et la robustesse dès l'apprentissage des modèles ;
- Conception et analyse comparative de haut niveau du matériel, facilitant le choix et cibles matérielles et le développement de systèmes embarqués ;
- Confiance embarquée, avec notamment l'intégration de connaissance métier et de contraintes formelles dans les modèles embarqués, en tenant compte de la spécificité des capteurs et des processus physiques ;
- Algorithmes d'apprentissage pour l'adaptation des modèles à l'embarqué et leur évolution avec peu de données et/ou peu d'étiquettes ;

A cela s'ajoute une interopérabilité poussée avec les standards et plateformes majeures du marché, permettant d'utiliser la plateforme avec les développements existants et indispensables à l'adoption et à la pérennité de la plateforme.

Communications

Mobileflow : modèle et mise en oeuvre pour une inférence de flot optique efficace

Mickaël Seznec^1,2, Agathe Archet^1,2, Nicolas Gac², François Orieux², Alvin Sashala Naik¹
¹Thales Reasearch & Technology, Palaiseau

²Laboratoire L2S, Laboratoire des Sinaux et Systèmes, Université Paris-Saclay, CentraleSupélec

Estimating the optical flow between two images is a particularly costly application with regard to computational and time resources. This is particularly problematic for real-time deployments on embedded platforms, where algorithmic solutions must be light, fast and remain efficient. Thus, we propose MobileFlow, a lightweight convolutional neural network, based on PWC-Net and MobileNetV2. This network is more efficient (- 12% on EPE, or EndPoint Error), more compact (-91% on parameters? weight) and faster (+ 14% FPS with fp32 precision) than PWC-Net on Flying Chairs dataset.

A new figure of merit for neural network efficiency

Hugo Waltsburger^1,2, Erwan Libessart², Anthony Kolar², Chengfang Ren¹, Régis Guinvarc'h¹
¹Laboratoire SONDRA
²CentraleSupélec, Laboratoire GeePS

Resources limitation is the main constraint on DNN embedded implementation. This explains the search of an optimal trade-off between accuracy, throughput and power consumption. This compromise can be modeled by a metric that takes these different values into account. Thanks to a software tool developed to monitoring DNN inference in real time, it is then possible to analyze the impact of network systemic optimizations (quantization, pruning?) on the efficiency metric.

La compression via une pseudo-randomisation partielle des réseaux de neurones convolutifs

Florent Crozet^1,2, Stéphane Mancini², Marina Nicolas¹
¹STMicroelectronics
²Laboratoire TIMA, Grenoble

Le nombre de paramètres des réseaux de neurones convolutifs explose créant une contrainte forte pour l?implémentation des algorithmes sur des systèmes embarqués. Dans certains dispositifs électroniques, la capacité mémoire est strictement limitée et il devient nécessaire de remplacer le stockage des données par leur compression préalable, puis la décompression à la volée. Afin d?obtenir un compromis entre la précision et le nombre de paramètres à stocker, nous proposons une méthode de compression basée sur une décomposition linéaire des filtres en vecteurs appris et vecteurs générés. Les vecteurs appris sont stockés alors que les autres sont générés de manière pseudo-aléatoire lors de l?inférence. Avec cette méthode, nous divisons par 8 le nombre de paramètres à stocker pour un réseau de neurones VGG16 en perdant seulement 6% de précision.

THINK: Test of Hardware Inferring Neural network, an IN2P3 R&T program

Jean-Pierre Cachemiche, Monnier Emmanuel, George Aad, Thomas Calvet, Arthur Ducheix, Etienne Fortin¹, Frédéric Magniette², Joana Fronteras-Pons³, Joana Fronteras-Pons³, Frédéric Druillole⁴, David Etasse⁵, Vladimir Gligorov, Le Dortz Olivie⁶, Fatih Bellachia, Lafrasse Sylvain⁷, Claude Girerd⁸
¹CPPM
²LLR
³IRFU/AIM
⁴LP2IB(CENBG)
⁵LPC
⁶LPNHE
⁷LAPP
⁸LP2IL

After two years of development, the THINK project has trained engineers in AI techniques using neural networks, tested GPU, MPPC, FPGA and neuromorphic circuit hardware with the aim of inferring all types of neural neyworks in the context of instrumentation for fundamental physics research. We propose to take stock of the inference materials and tools available to the community. We will outline the prospects for the continuation of the project for the period 2023-2026.

Leveraging Structured Pruning of Convolutional Neural Networks

Hugo Tessier^1,2, Vincent Gripon¹, Mathieu Léonardon¹, Matthieu Arzel¹, David Bertrand², Thomas Hannagan²
¹IMT Atlantique, Brest
²Stellantis, Velizy-VillaCoublay

Structured pruning is a popular method to reduce the cost of convolutional neural networks, that are the state of the art in many computer vision tasks. However, depending on the architecture, pruning introduces dimensional discrepancies which prevent the actual reduction of pruned networks. To tackle this problem, we propose a method that is able to take any structured pruning mask and generate a network that does not encounter any of these problems and can be leveraged efficiently. We provide an accurate description of our solution and show results of gains, in energy consumption and inference time on embedded hardware, of pruned convolutional neural networks.

Identification

Efficient Hardware for Deep Neural Network Processing

Inscriptions

Annonce

Organisateurs

Lieu

Programme

Résumés des contributions

Building Inference Engines for Quantized Neural Networks Using Brevitas & FINN

DeepGreen : plateforme indépendante pour le deep learning embarqué

Mobileflow : modèle et mise en oeuvre pour une inférence de flot optique efficace

A new figure of merit for neural network efficiency

La compression via une pseudo-randomisation partielle des réseaux de neurones convolutifs

THINK: Test of Hardware Inferring Neural network, an IN2P3 R&T program

Leveraging Structured Pruning of Convolutional Neural Networks