TL;DR: A notebook explaining the principle of adversarial attacks and their defences
Abstract:
Deep neural networks models have been wildly successful in many applications such as natural language processing, computer vision, speech, reinforcement learning and so on. These algorithms sometimes outperform human intelligence on several benchmarks. However, they are also sometimes seen as black boxes and their use in critical systems such as medicine, autonomous cars remains inhibited. In addition, an intriguing and worrying property for the use of neural networks is their brittleness to adversarial attacks method or out-of-distribution inputs. These adversarial attacks can cause a model with high accuracy to misclassify an image by applying a very small perturbation invisible to us humans. Thus, the application of an imperceptible disturbance to the input will cause a misclassification of the image.