Menaces de sécurité sur l'apprentissage profond

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, l'inscription aux réunions est gratuite mais obligatoire.

Inscriptions closes à cette réunion.

Inscriptions

48 personnes membres du GdR ISIS, et 64 personnes non membres du GdR, sont inscrits à cette réunion.
Capacité de la salle : 120 personnes.

Annonce

Nous avons choisi la plateforme LiveStorm qui gère jusqu'à 1000 participants connectés via un navigateur internet (cela fonctionne avec la plupart d'entre eux, normalement...).

Les personnes déjà inscrites sur le site du GdR recevront bientôt par mail un lien LiveStorm.

Les personnes qui n'ont pas pu s'inscrire (car la jauge était atteinte) peuvent maintenant le faire directement sur LiveStorm : https://app.livestorm.co/inria-14/journees-gdr-isis

*** Attention. La journée est divisée en deux sessions LiveStorm (une le matin, une l'après-midi). Il faut s'incrire pour les deux sessions. C'est une limitation inhérente à cet outil... Désolé !***

Les GdR ISIS et Sécurité Informatique organisent une journée commune sur le thème "menaces de sécurité sur l'apprentissage profond".

2012 marque l'avènement de l'intelligence artificielle dans la communauté 'vision par ordinateur' avec le réseau de neurones profond AlexNet, gagnant de la compétition ImageNet. A peine deux ans plus tard, Szegedi et al. découvrent des vulnérabilités : l'ajout d'une perturbation bien choisie sur l'image d'entrée leurre le réseau de neurones. C'est la naissance des exemples adverses (adversarial examples). La perturbation étant souvent de petite amplitude, un humain ne la perçoit pas. Les exemples adverses remettent ainsi en cause la dénomination 'intelligence artificielle' attribuée trop tôt à ce genre d'algorithmes. Comment les qualifier d'intelligents alors qu'ils se trompent aussi lourdement ?

Depuis, le nombre d'articles portant sur ce sujet ne cesse de croître, formant une littérature très riche proposant des attaques (en boite blanche, noire ou grise) sur des données diverses (image, son, parole, texte, série temporelle) et des défenses plus ou moins validées. Le jeu du gendarme et du voleur semble sans fin.

Les menaces se sont aussi diversifiées et frappent maintenant aussi la phase d'apprentissage. Les anglais parlent de poisoning, de back door, et de trojaning. En perturbant les données d'entraînement, l'attaquant modifie le comportement du réseau à l'insu du défenseur. Les menaces concernent aussi l'effet mémoire implicite du réseau. Ce dernier garde une trace des données d'entraînement, si bien qu'un attaquant peut profiter de ces fuites d'information. Cela met en danger des jeux de données sensibles et pose question sur leur confidentialité.

La journée mêlera exposés invités et exposés courts proposant des mises en pratique d' attaques / défenses sur tout type de données ou des études plus théoriques.

Nous aurons le plaisir d'accueillir deux orateurs invités :

Elvis Dohmatob (Criteo) sur les aspects théoriques des exemples adverses.
Alexandre Sablayrolles (Facebook) sur les menaces liées à la mémoire implicite des réseaux.

Contact : Teddy Furon (teddy.furon@inria.fr) et Caroline Fontaine (caroline.fontaine@lsv.fr).

Programme

9h - 10h Exposé Invité

Elvis Dohmatob (Criteo), Adversarial Examples: The Good, The Bad, and The Ugly!

10h - 11h Session #1

Ihsen ALOUANI (Insa Hauts de France), Defensive Approximation: Securing CNNs through Approximate Computing
Rémi Bernhard (CEA), Luring of transferable adversarial perturbations in the black-box paradigm
Alexandre Araujo (Paris Dauphine), On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory

11h - 11h20 Pause

11h20 - 12h20 Session #2

Thibault Maho (Inria Rennes), Fast, Real, Black and White Attacks
Wassim Hamidouche (Insa Rennes), Detect and Defense Against Adversarial Examples in Deep Learning using Natural Scene Statistics and Adaptive Denoising
Pierre-Yves Lagrave (Thales), Robustness and Vulnerability of Lie Group-Equivariant Neural Networks

12h20 - 14h Déjeuner

14h-14h40 Session #3

Ahmed Aldahdooh (INSA Rennes), SFAD: Selective and Feature based Adversarial Detection
Rafael Pinot (Paris Dauphine), Randomization matters. How to defend against strong adversarial attacks?

14h40 - 15h40 Exposé Invité

Alexandre Sablayrolles (Facebook), Privacy and data tracing in machine learning models

15h40 - 16h Pause

16h - 17h20 Session #4

Adrien Chan Hon Tong (Onera), Adversarial poisoning against deep models
Samuel Tap (ZAMA.ai), Inférence homomorphe de réseaux profonds
Arnaud Grivet-Sébert (LIST - CEA), SPEED: Secure, PrivatE, and Efficient Deep learning
Katarzyna KAPUSTA (Thales), Watermarking de modèle comme moyen de vérification de vol de propriété intellectuelle

Résumés des contributions

Elvis Dohmatob (Criteo), Adversarial Examples: The Good, The Bad, and The Ugly!

Abstract: Have you heard of the flying pig? You can add a small perturbation to the image of a pig and fool a deep neural network into classifying it as an aircraft, with high confidence! The existence of these so-called adversarial examples for machine-learning algorithms is at least one reason why AI systems will never be deployable in a closed-loop fashion (aka no self-driving cars !). In this talk, I will explain why in general, these adversarial examples are unavoidable, a consequence of the peculiar geometry of high-dimensional probability.

Ihsen ALOUANI (Insa Hauts de France), Defensive Approximation: Securing CNNs through Approximate Computing

Abstract: In the past few years, an increasing number of machine-learning and deep learning structures, such as Convolutional Neural Networks (CNNs), have been applied to solving a wide range of real-life problems. However, these architectures are vulnerable to adversarial attacks. In this paper, we propose for the first time to use hardware-supported approximate computing to improve the robustness of machine learning classifiers. We show that our approximate computing implementation achieves robustness across a wide range of attack scenarios. Specifically, for black-box and grey-box attack scenarios, we show that successful adversarial attacks against the exact classifier have poor transferability to the approximate implementation. Surprisingly, the robustness advantages also apply to white-box attacks where the attacker has access to the internal implementation of the approximate classifier. We explain some of the possible reasons for this robustness through analysis of the internal operation of the approximate implementation. Furthermore, our approximate computing model maintains the same level in terms of classification accuracy, does not require retraining, and reduces resource utilization and energy consumption of the CNN. We conducted extensive experiments on a set of strong adversarial attacks; We empirically show that the proposed implementation increases the robustness of a LeNet-5 and an Alexnet CNNs by up to 99% and 87%, respectively for strong grey-box adversarial attacks along with up to 67% saving in energy consumption due to the simpler nature of the approximate logic. We also show that a white-box attack requires a remarkably higher noise budget to fool the approximate classifier, causing an average of 4db degradation of the PSNR of the input image relative to the images that succeed in fooling the exact classifier

Rémi Bernhard (LETI-CEA), Luring of transferable adversarial perturbations in the black-box paradigm

Co-auteurs : Rémi Bernhard, Pierre-Alain Moëllic, Jean-Max Dutertre
Abstract: The growing interest for adversarial examples has resulted in many defenses intended to detect them, render them inoffensive or make the model more robust against them. We pave the way towards a new approach to improve the robustness of a model against black-box transfer attacks. Our method aims at tricking the adversary into choosing false directions to fool the target model thanks to a removable additional neural network that is included in the target model and designed to induce what we called ?the luring effect?. Training the additional model is achieved thanks to a loss function acting on the logits sequence order. Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set. We explain the luring effect thanks to the notion of robust and non-robust useful features and perform experiments on MNIST, SVHN and CIFAR10 to characterize and evaluate this phenomenon. We verify experimentally that our approach can be used as a defense to efficiently thwart an adversary using state-of-the-art attacks and allowed to perform large perturbations.

Alexandre Araujo (Paris Dauphine), On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory

Co-auteurs : Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif
Abstract: This paper tackles the problem of Lipschitz regularization of Convolutional Neural Networks. Lipschitz regularity is now established as a key property of modern deep learning with implications in training stability, generalization, robustness against adversarial examples, etc. However, computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard. Recent attempts from the literature introduce upper bounds to approximate this constant that are either efficient but loose or accurate but computationally expensive. In this work, by leveraging the theory of Toeplitz matrices, we introduce a new upper bound for convolutional layers that is both tight and easy to compute. Based on this result we devise an algorithm to train Lipschitz regularized Convolutional Neural Networks.

Thibault Maho (Inria Rennes), Fast, Real, Black and White Attacks

Co-auteurs : Thibault Maho, Benoit Bonnet, Hanwei Zhang, Erwan Le Merrer, Patrick Bas, Yannis Avrithis, Laurent Amsaleg, et Teddy Furon.
Résumé : Nous présentons deux atttaques, l'une en boite noire l'autre en boite blanche, qui sont rapides (nombre limité d'accès à la boite noire, nombre limité de calcul de gradient), réelles (les images adverses sont quantifiées sur des entiers codés en 8 bits) et qui offrent des distortions en norme L2 similaires ou plus petites que l'état de l'art sur ImageNet.

Wassim Hamidouche (Insa Rennes), Detect and Defense Against Adversarial Examples in Deep Learning using Natural Scene Statistics and Adaptive Denoising

Co-auteurs : Anouar Kherchouche, Sid Ahmed Fezza, et Wassim Hamidouche
Abstract: Despite the enormous performance of Deep Neural Networks (DNNs), recent studies have shown their vulnerability to Adversarial Examples (AEs), i.e., carefully perturbed inputs designed to fool the targeted DNN. Currently, the literature is rich with many effective attacks to craft such AEs. Meanwhile, many defenses strategies have been developed to mitigate this vulnerability. However, these latter showed their effectiveness against specific attacks and does not generalize well to different attacks. In this project, we propose a framework for defending DNN classifier against adversarial samples. The proposed method includes a separate detector and a denoising block. The detector aims to detect AEs by characterizing them through the use of natural scene statistics (NSS), where we demonstrate that these statistical features are altered by the presence of adversarial perturbations. The denoiser is based on Block Matching 3D (BM3D) filter fed by a threshold estimated by a Convolutional Neural Network (CNN) to project back the samples detected as AEs into their data manifold. We conduct a complete evaluation on three standards datasets namely MNIST, CIFAR-10 and Tiny-ImageNet comparing with state-of-the-art defenses. Our experimental results have shown that the proposed detector achieve a high detection accuracy while providing a low false positive accuracy. Additionally, we outperform the state-of-the-art-defense techniques by improving the robustness of DNN's.

Pierre-Yves Lagrave (Thales) : Robustness and Vulnerability of Lie Group-Equivariant Neural Networks

Abstract: For image processing tasks, it has been shown that adversarial samples could be generated by modifying the input data with geometrical transforms sometimes as simple as small isometries, therefore motivating the need for increasing the neural networks robustness to the actions of Lie groups such as SE(2). To do so, a natural approach consists in using data augmentation but this has been shown inefficient for worst-case attacks. Building on the success of CNN for which local robustness to the translation group is achieved through their convolution layers, Group-Convolutional Neural Networks (G-CNN) have been introduced as an alternative by leveraging on group-based convolution kernels and are achieving state-of-the-art accuracies. Despite their increased robustness, G-CNN are in practice still not immune to adversarial attacks and it is therefore natural to analyze the impact of the embedded equivariance property on the type and severity of existing attacks. After introducing some general background about G-CNN, this talk will provide initial results and discussions with respect to the vulnerabilities of G-CNN to usual attacks and to the applicability of corresponding defense strategies. We will in particular discuss the robustness of the adversarial samples, for which the lack can be used as a natural defense for usual CNN.

Ahmed Aldahdooh (INSA Rennes), SFAD: Selective and Feature based Adversarial Detection

Co-auteurs : Ahmed Aldahdooh, Wassim Hamidouche
Abstract: Security-sensitive applications that relay on Deep Neu-ral Networks (DNNs) are vulnerable to small perturba-tions crafted to generate Adversarial Examples (AEs) that are imperceptible to human and cause DNN to misclassify them. Many defense and detection techniques have been proposed. The state-of-the-art detection techniques have been designed for specific attacks or broken by others, need knowledge about the attacks, are not consistent, increase model parameters overhead, are time-consuming, or have latency in inference time. To trade off these factors, we propose a novel unsupervised detection mechanism that uses the selective prediction, processing model layers outputs,and knowledge transfer concepts in a multi-task learning setting. It is called Selective and Feature based Adversarial Detection (SFAD). Experimental results show that the proposed approach achieves comparable results to the state-of-the-art methods against tested attacks in white, black, and gray boxes scenarios. Moreover, results show that SFAD is fully robust against High Confidence Attacks (HCAs) for MNIST and partially robust for CIFAR-10 datasets.

Rafael Pinot (Paris Dauphine), Randomization matters. How to defend against strong adversarial attacks?

Co-auteurs : Rafael Pinot, Raphael Ettedgui, Geovani Rizk, Yann Chevaleyre, Jamal Atif
Abstract: Is there a classifier that ensures optimal robustness against all adversarial attacks? This paper answers this question by adopting a game-theoretic point of view. We show that adversarial attacks and defenses form an infinite zero-sum game where classical results (e.g. Sion theorem) do not apply. We demonstrate the non-existence of a Nash equilibrium in our game when the classifier and the Adversary are both deterministic, hence giving a negative answer to the above question in the deterministic regime. Nonetheless, the question remains open in the randomized regime. We tackle this problem by showing that, undermild conditions on the dataset distribution, any deterministic classifier can be outperformed by a randomized one. This gives arguments for using randomization, and leads us to a new algorithm for building randomized classifiers that are robust to strong adversarial attacks. Empirical results validate our theoretical analysis, and show that our defense method considerably outperforms Adversarial Training against state-of-the-art attacks.

Alexandre Sablayrolles (Facebook), Privacy and data tracing in machine learning models

Abstract: What quantity of information do machine learning models reveal about their training data? Models trained with differential privacy (DP) provably limit such leakage, but the question remains open for non-DP models. In this talk, we present multiple techniques for membership inference, which estimates if a given data sample is in the training set of a model. In particular, we derive the Bayes-optimal strategy for passive membership inference, and show that approximations of this strategy leads to state-of-the-art methods. We also introduce a new approach for active tracking of privacy leakage: radioactive data. This approach modifies data marginally so that any model trained on it bears an identifiable mark.

Adrien Chan Hon Tong (Onera), Adversarial poisoning against deep models

Abstract: In data poisoning, a hacker changes some training data to modify the testing behaviour of a model expected to be learnt on those data. Previous works have offered poisoning attacks against deep networks. But these attacks have focused on large modifications of few samples. Now, combining adversarial example sensibility and data poisoning can give rise to attacks which can not be detected even with a careful review of the training data.

Samuel Tap (ZAMA.ai), Inférence homomorphe de réseaux profonds

Résumé : Le machine learning sur données réelles se heurte à des questions de protection de la vie privée des utilisateurs lorsque ces données sont à caractère personnel, sensibles ou encore stratégiques. Afin de protéger ces dernières, la cryptographie propose de multiples mécanismes assurant diverses propriétés telles que la confidentialité, mais qui limitent générlalement leur exploitation. Récemment, le chiffrement dit totalement homomorphe permet de résoudre cette apparente contradiction en offrant la possibilité d?exécuter des algorithmes arbitraires sur données chiffrés. Bridé pendant longtemps par la profondeur multiplicative des algorithmes appliqués, l?inférence homomorphe de réseaux de neurones profonds est désormais envisageable.

Arnaud Grivet-Sébert (LIST - CEA), SPEED: Secure, PrivatE, and Efficient Deep learning

Co-auteurs : Arnaud Grivet Sébert, Rafael Pinot, Martin Zuber, Cédric Gouy-Pailler, Renaud Sirdey
Résumé : Nous introduisons un protocole d'apprentissage profond fonctionnant sous fortes contraintes de confidentialité. Fondée sur l'apprentissage collaboratif, la confidentialité différentielle et le chiffrement homomorphe, l'approche proposée étend l'état de l'art de l'apprentissage profond confidentiel à un spectre de menaces plus large, en particulier à l'hypothèse du serveur honnête mais curieux. Nous répondons aux menaces provenant à la fois du serveur, du modèle global et de propriétaires de données (professeurs) qui pourraient colluder. Utilisant la confidentialité différentielle distribuée et l'opérateur d'argmax homomorphe, notre méthode est spécialement conçue pour limiter les coûts de communication et optimiser l'efficacité. Des résultats théoriques appuient la méthode proposée. Nous fournissons des garanties de confidentialité différentielle du point de vue d'une entité quelconque ayant accès au modèle global, y compris des professeurs qui colludent, en fonction de la proportion de professeurs honnêtes. Notre méthode est par conséquent applicable à des scénarios réels où les professeurs ne font confiance ni à une tierce partie qui aurait accès à leurs bases de données, ni aux autres professeurs. Un point décisif est que la charge computationnelle de l'approche reste raisonnable et celle-ci est donc adaptée à l'apprentissage profond. Afin d'évaluer la pertinence pratique de notre protocole, des expériences ont été menées sur des banques d'images, dans un contexte de classification. Nous présentons des résultats numériques qui montrent que la procédure d'apprentissage a un taux élevé de bonne classification tout en préservant la confidentialité des données.

Katarzyna KAPUSTA (Thales), Watermarking de modèle comme moyen de vérification de vol de propriété intellectuelle

Co-auteurs : Katarzyna KAPUSTA et Vincent THOUVENOT (Thalès)
Abstract: Model watermarking is an emerging research track aiming at protecting the intellectual property of ML actors. It is motivated by the growing demand for pre-trained models and the fast development of extraction attacks that are able to copy a model hidden behind an API of a MLaaS. By inserting an unusual change in the look or behaviour of a model, watermarking enables model traceability. Various watermarking solutions were proposed as countermeasures to different types of attacks, such as model extraction, watermark removal or falsification, and ownership check evasion. We give an overview of these techniques along with recommendations on their usage.

Identification