Intelligence artificielle et apprentissage sur systèmes embarqués

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

28 personnes membres du GdR ISIS, et 41 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 80 personnes.

Annonce

Le domaine de l'intelligence artificielle est un domaine de recherche au milieu de toutes les attentions ces dernières années. Les réseaux de neurones convolutifs (CNN) ont montré leurs performances en reconnaissance de forme en particulier et en Traitement du Signal et des Images (TdSI) plus généralement. L'utilisation de ces techniques dans des applications réelles est au coeur de nombreuses expérimentations académiques et industrielles. Les recherches dans le domaine de l'apprentissage profond sont nombreuses et orientées le plus souvent vers la phase d'apprentissage du réseau : dimensionnement du réseau, choix de la base d'apprentissage, annotations, etc. Les calculs nécessaires en intelligence artificielle pour obtenir de bons résultats sont proportionnels à la taille des données traitées. La solution actuelle est d'utiliser des clusters de calcul distant (Cloud Computing) sur de très larges bases de données. L'utilisation de systèmes embarqués dans ce domaine a plusieurs intérêts. Le premier est de diminuer la consommation énergétique dans les clusters de calcul distants. La seconde est de pouvoir réaliser de nouvelles applications en exécutant une partie des calculs au plus près des capteurs (Edge Computing / Fog Computing).

La journée « Intelligence artificielle et apprentissage sur systèmes embarqués » a pour objectif de faire le point sur les travaux en cours et d'échanger sur les perspectives dans ce domaine. 10 présentations sont programmées pour cette journée.

Organisateurs :

Fan YANG (fanyang@u-bourgogne.fr)
Jean-François NEZAN (jnezan@insa-rennes.fr)
Christian WOLF (christian.wolf@insa-lyon.fr)

Programme

9h40-10h20 : Low-Complexity Approximate Convolutional Neural Networks, Stefan Duffner (LIRIS, INSA-Lyon)

10h20-11h00 : Statistical Learning via Information Bottleneck, Abdellatif Zaidi (Paris Research Center, Huawei France et Université Paris Est)

11h00-11h40 : Deep Learning on FPGA, a journey in mixed RTL/OpenCL design, Alban Bourge (Bull Atos Technologies)

11h40-12h20 : Accelerating the Inference of CNNs on FPGA-based Smart Cams, Kamel Abdelouahab (Institut Pascal)

12h20-13h40 : Déjeuner au resto de l'ESIEE.

13h40-14h20 : Designing neural networks for embedded systems, Nicolas Ventroux (CEA - Global Sensing)

14h20-14h40 : Etude des architectures des CNNs pour la détection d'objet dans un contexte de vidéosurveillance, Heng Zhang, Hatem Belhassen, Virginie Fresse (Laboratoire Hubert Curien)

14h40-15h00 : Adéquation Algorithme-Architecture des réseaux de neurones convolutifs sur FPGA, Lucien Del Bosque, Diego Martins, Virginie Fresse (Laboratoire Hubert Curien)

15h00-15h40 : Algorithm Level Timing Speculation for Convolutional Neural Network Forward Pass Accelerators, Thibaut Marty, Tomofumi Yuki, Steven Derrien (IRISA, Rennes 1)

15h40-16h00 : Machine learning-based embedded systems for autonomous cars, Smaïl Niar, Ihsen Alouani, Ayoub Neggaz, Yazid Lachachi et Abdelmalik Taleb-Ahmed (LAMIH/CNRS, Université Polytechnique Hauts-de-France)

16h00-16h40 : Neural networks assisted OCR for automated exploitation of eavesdropping interception, Erwan Nogues, Florent Montreuil (MI, DGA)

Résumés des contributions

Low-Complexity Approximate Convolutional Neural Networks

Stefan Duffner (LIRIS, INSA-Lyon)

We present an approach for minimizing the computational complexity of trained Convolutional Neural Networks by replacing the trained weight and bias parameters with efficient approximations. Low-complexity convolution filters are obtained through a binary (zero-one) linear programming scheme based on the Frobenius norm over sets of dyadic rationals. The resulting matrices allow for multiplication-free computations requiring only addition and bit-shifting operations. Such low-complexity structures pave the way for low-power, efficient hardware designs. We applied our approach on three use cases of different size and obtained very low-complexity approximations maintaining an almost equal classification performance.

Statistical Learning via Information Bottleneck

Abdellatif Zaidi (Paris Research Center, Huawei France et l'Université Paris Est)

We connect the information flow in a neural network to sufficient statistics; and show how techniques that are rooted in information theory, such as the source-coding based information bottleneck method can lead to improved architectures, as well as a better understanding of the theoretical foundation of neural networks, viewed as a cascade compression networks. We illustrate our results and view through some numerical examples.

Short Biography: A. Zaidi received the B.S. degree in Electrical Engineering from ENSTA ParisTech, Paris, in 2002 and the M. Sc. and Ph.D. degrees in Electrical Engineering from TELECOM ParisTech, Paris in 2002 and 2006, respectively. From December 2002 to March 2006, he was with the Communications an Electronics Dept., TELECOM ParisTech, Paris and the Signals and Systems Lab., CNRS/Supélec, France pursuing his PhD degree. From May 2006 to September 2010, he was at École Polytechnique de Louvain, Université Catholique de Louvain, Belgium, working as a senior researcher. Dr. Zaidi was "Research Visitor" at the University of Notre Dame, Indiana, USA, during 2007 and 2008, the Technical University of Munich during Summer 2014, and the Ecole Polytechnique Federale de Lausanne, EFPL, Switzerland. He is Associate Professor at Université Paris-Est, France; and on leave since Jan. 2015 at the Mathematics and Algorithmic Sciences Lab., France Research Center, Huawei France. His research interests lie broadly in network information theory and its interactions with other fields, including communication and coding, statistics, security and privacy, with application to diverse problems of data transmission and compression in networks. Dr. Zaidi is IEEE senior member. From 2013 to 2016 he served as Associate Editor for the Eurasip Journal on Wireless Communications and Networking (EURASIP JWCN); and, since 2016, as Associate Editor for the IEEE Transactions on Wireless Communications. He is the co-recipient (jointly with Shlomo Shamai (Shitz)) of the N# Best Paper Award, as well as the French Excellence in Research Award since 2011.

Deep Learning on FPGA, a journey in mixed RTL/OpenCL design

Alban Bourge (Bull Atos Technologies)

Les FPGA sont en plein essor. Le domaine de l'intelligence artificielle est une application de choix pour utiliser cette technologie qui possède une maturité remarquable et qui continue de progresser dans de nombreux secteurs. D'un autre côté, les réseaux de neurones convolutifs (CNN) sont extrêmement exigeants et nécessitent une quantité de calculs considérables. Très efficaces pour cette application mais gourmands en énergie, les GPU ne sont pas nécessairement la solution idéale pour accélérer l'exécution des CNN dans les systèmes embarqués. Après une courte présentation des spécificités du FPGA, on verra quelles accointances il est possible de trouver entre ces dispositifs et les CNN et dans quelle mesure il est plus intéressant de préférer un FPGA à un GPU lorsqu'on s'intéresse à la consommation énergétique. Ensuite, on verra quelle solution est envisagée chez Atos pour l'accélération de réseaux de neurones dans un contexte embarqué.

Accelerating the Inference of CNNs on FPGA-based Smart Cams

Kamel Abdelouahab (Institut Pascal)

The exponential growth of raw data sources such as video streams, images and speech sequences motivates for innovative methods to automatically extract information at the nearest of sensors. This is especially true for smart cameras integrated into wireless networks, for which the low bandwidth of the network forces the cameras to process video streams on-the-fly in order to extract high-semantic information from a given scene. Among the methods proposed to extract this knowledge, Deep Convolutional Neural Networks (CNNs) have rapidly become the de-facto standard used in many machine vision-related tasks, ranging from image classification to scene segmentation and object detection. However, CNNs are particularly computationally intensive, especially for the limited processing capabilities of a smart camera node. As a result, implementing CNNs on smart cameras, with real-time constrains, is a challenging task. To address this challenge, the use of dedicated and tailored hardware, such FPGAs, is advocated. FPGAs are reconfigurable and energy-efficient hardware platforms that are well suited to support the streaming nature of CNNs, and the fine-grain parallelism they exhibit. In this presentation, we detail state-of-the-art FPGA-Based accelerators for CNNs, and describe how recent trends in the CNN development make FPGAs even more suitable in accelerating deep learning workloads on embedded smart cameras.

Designing neural networks for embedded systems

Nicolas Ventroux (CEA - Global Sensing)

Artificial intelligence and especially Machine Learning recently gained a lot of interest from the industry. Indeed, new generation of neural networks built with a large number of successive computing layers enables a large amount of new applications and services implemented from smart sensors to data centers. These Deep Neural Networks (DNN) can interpret signals to recognize objects or situations to drive decision processes. However, their integration into embedded systems remains challenging due to their high computing needs. This presentation will focus on the latest works of CEA LIST on DNNs and specifically on solutions to ease the integration of DNN in embedded systems. Thus, we will present our N2D2 DNN design framework and its RTL library DNeuro optimized for FPGAs; as well as a scalable and energy-efficient SIMD processor PNeuro.

Etude des architectures des CNNs pour la détection d'objet dans un contexte de vidéosurveillance

Heng Zhang, Hatem Belhassen, Virginie Fresse (Laboratoire Hubert Curien)

Avec le développement de l'apprentissage profond (deep learning), les techniques de détection d'objets ont évolué à grande vitesse. Aujourd'hui il existe déjà certaines approches fiables, robustes à l'occlusion, au changement d'éclairage et d'échelle, mais ces approches varient d'un point de vue du temps de traitement et de la précision de détection. L'objectif de la présentation est d'introduire les concepts et architectures de réseaux de neurone convolutifs (CNN), ainsi que les bases d'image de référence dans le domaine. Enfin, une étude porte sur les compromis vitesse (en FPS) / précision (en mAP) de CNN afin d'identifier l'architecture CNN la plus appropriée pour l'analyse intelligente de vidéo dans un contexte industriel définie.

Adéquation Algorithme-Architecture des réseaux de neurones convolutifs sur FPGA

Lucien Del Bosque, Diego Martins, Virginie Fresse (Laboratoire Hubert Curien)

Afin de porter au mieux les algorithmes de réseaux de neurones convolutifs (CNNs) sur FPGA, pour obtenir des performances maximales, ainsi que d'utiliser au mieux les ressources disponibles, il est nécessaire de réaliser une étude de différents modèles de CNNs, portant sur la quantité de mémoire utilisée sur le FPGA, et sur les performances en termes de précision et de temps de calcul. Les travaux impliquent la réalisation d'une plateforme d'évaluation et de simulation de modèles de CNNs, permettant de tester les performances de Précision (Accuracy) et de Perte (Loss), et de mesurer l'espace mémoire utilisé par un modèle de CNN, en fonction de :

Quantification des couches du modèle (poids, biais) selon la dynamique la mieux adaptée (Fixed Point 16bits - 6bits), avec pour référence un modèle utilisant Floating Point 32bits.
Quantification de l'image d'entrée et des données internes (résultats intermédiaires et activation maps).
Hyperparamètres du modèle, comme la taille des fenêtres de convolutions.

À l'issue des tests, il est possible d'extraire des informations sur les couches du réseau qui contribuent le plus à sa précision en Inférence. Il est aussi possible de choisir la quantification du modèle la plus adaptée (le maximum d'efficacité), en considérant le rapport de la précision obtenue et de la quantité de mémoire utilisée par le modèle quantifié dans le FPGA. Une étude des performances et des ressources utilisées est aussi présentée sur une IP en VHDL.

Algorithme Level Timing Speculation for Convolutional Neural Network Forward Pass Accelerators

Thibaut Marty, Tomofumi Yuki, Steven Derrien (IRISA, Rennes 1)

Nous proposons une nouvelle technique pour améliorer l'efficacité des accélérateurs matériels de couches de convolution des réseaux de neurones convolutifs (CNN) basée sur la spéculation temporelle (overclocking). Nous avons développé un mécanisme de détection d'erreur léger, inspiré par la tolérance aux fautes au niveau algorithmique (ABFT), pour protéger contre les erreurs temporelles et permettre un overclocking agressif. Fonctionnant au niveau algorithmique, ce mécanisme peut être facilement implémenté en utilisant des outils de synthèse de haut niveau. Nous avons utilisé un ensemble de cartes Zybo pour montrer expérimentalement que nous pouvons accroître la fréquence de 17 à 36% avec une faible probabilité d'erreur et que ces erreurs rares sont détectées avec un surcout négligeable (1000 LUT).

Machine Learning-based embedded systems for autonomous cars

Smail Niar, Ihsen Alouani, Ayoub Neggaz, Yazid Lachachi et Abdelmalik Taleb-Ahmed (LAMIH/CNRS, Université Polytechnique Haut-de-France)

In the domain of embedded systems for automotive applications, Machine Learning (ML) techniques such as Deep Neural Networks have increasingly catching industrial and academic attention. Autonomous Driving (AD) is the most popular application. In these systems, ML algorithms have shown high performance in scene understanding and object/obstacle classification. Currently, General Purpose Graphics Processing Units (GPGPUs) are the most commonly used devices to implement ML and several studies demonstrated the capabilities of GPUs to offer high performances. However, GPUs suffer from high power consumption and relative inefficiency in fine grain customization. For this reason, the debate around the optimal platform is still open. The potential alternative to GPUs are FPGAs whose characteristics allow achieving trade-offs between flexibility, performance and power consumption. We will present our on-going work in the implementation on an FPGA platform of two important functionalities in AD which are road extraction and object recognition. The objective is to use information given by the road extraction to increase accuracy of the object recognition. Exploiting these 2 units makes the ML implementation in the AD less complex and more efficient.

Neural networks assisted OCR for automated exploitation of eavesdropping interception

Erwan Nogues, Florent Montreuil (MI, DGA)

Eavesdropping attack has seen a renewed interest with the introduction full digital video display interface (eg HDMI) of modern IT systems. In this talk, we present how Artificial Intelligence network connected to a Software Defined Radio (SDR) system can restore the intercepted image of a leaking air-gapped IT system. Finally connected to an Optical Character Recognition (OCR), it leads to an improved automated text recognition and detection of vulnerable IT systems.

Identification