A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.

Overview

2021: A Year Full of Amazing AI papers- A Review
📌 [work in progress...]

A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.

While the world is still recovering, research hasn't slowed its frenetic pace, especially in the field of artificial intelligence. More, many important aspects were highlighted this year, like the ethical aspects, important biases, governance, transparency and much more. Artificial intelligence and our understanding of the human brain and its link to AI are constantly evolving, showing promising applications improving our life's quality in the near future. Still, we ought to be careful with which technology we choose to apply.

"Science cannot tell us what we ought to do, only what we can do."
- Jean-Paul Sartre, Being and Nothingness

Here are the most interesting research papers of the year, in case you missed any of them. In short, it is curated list of the latest breakthroughs in AI and Data Science by release date with a clear video explanation, link to a more in-depth article, and code (if applicable). Enjoy the read!

The complete reference to each paper is listed at the end of this repository.
This is a work in progress... Star this repository to stay up to date! ⭐️

Maintainer: louisfb01

Subscribe to my newsletter - The latest updates in AI explained every week.

Feel free to message me any interesting paper I may have missed to add to this repository.

Tag me on Twitter @Whats_AI or LinkedIn @Louis (What's AI) Bouchard if you share the list!

Missed last year? Check this out: 2020: A Year Full of Amazing AI papers- A Review

The Full List


DALL·E: Zero-Shot Text-to-Image Generation from OpenAI [1]

OpenAI successfully trained a network able to generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.

VOGUE: Try-On by StyleGAN Interpolation Optimization [2]

Google used a modified StyleGAN2 architecture to create an online fitting room where you can automatically try-on any pants or shirts you want using only an image of yourself.

Taming Transformers for High-Resolution Image Synthesis [3]

Tl;DR: They combined the efficiency of GANs and convolutional approaches with the expressivity of transformers to produce a powerful and time-efficient method for semantically-guided high-quality image synthesis.

Thinking Fast And Slow in AI [4]

Drawing inspiration from Human Capabilities Towards a more general and trustworthy AI & 10 Questions for the AI Research Community.

Automatic detection and quantification of floating marine macro-litter in aerial images [5]

Odei Garcia-Garin et al. from the University of Barcelona have developed a deep learning-based algorithm able to detect and quantify floating garbage from aerial images. They also made a web-oriented application allowing users to identify these garbages, called floating marine macro-litter, or FMML, within images of the sea surface.

ShaRF: Shape-conditioned Radiance Fields from a Single View [6]

Just imagine how cool it would be to just take a picture of an object and have it in 3D to insert in the movie or video game you are creating or in a 3D scene for an illustration.

Generative Adversarial Transformers [7]

They basically leverage transformers’ attention mechanism in the powerful StyleGAN2 architecture to make it even more powerful!

Subscribe to my weekly newsletter and stay up-to-date with new publications in AI for 2022!

We Asked Artificial Intelligence to Create Dating Profiles. Would You Swipe Right? [8]

Would you swipe right on an AI profile? Can you distinguish an actual human from a machine? This is what this study reveals using AI-made-up people on dating apps.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [9]

Will Transformers Replace CNNs in Computer Vision? In less than 5 minutes, you will know how the transformer architecture can be applied to computer vision with a new paper called the Swin Transformer.

IMAGE GANS MEET DIFFERENTIABLE RENDERING FOR INVERSE GRAPHICS AND INTERPRETABLE 3D NEURAL RENDERING [10]

This promising model called GANverse3D only needs an image to create a 3D figure that can be customized and animated!

Deep nets: What have they ever done for vision? [11]

"I will openly share everything about deep nets for vision applications, their successes, and the limitations we have to address."

Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [12]

The next step for view synthesis: Perpetual View Generation, where the goal is to take an image to fly into it and explore the landscape!

Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control [13]

With this AI-powered nerve interface, the amputee can control a neuroprosthetic hand with life-like dexterity and intuitiveness.

Total Relighting: Learning to Relight Portraits for Background Replacement [14]

Properly relight any portrait based on the lighting of the new background you add. Have you ever wanted to change the background of a picture but have it look realistic? If you’ve already tried that, you already know that it isn’t simple. You can’t just take a picture of yourself in your home and change the background for a beach. It just looks bad and not realistic. Anyone will just say “that’s photoshopped” in a second. For movies and professional videos, you need the perfect lighting and artists to reproduce a high-quality image, and that’s super expensive. There’s no way you can do that with your own pictures. Or can you?

LASR: Learning Articulated Shape Reconstruction from a Monocular Video [15]

Generate 3D models of humans or animals moving from only a short video as input. This is a new method for generating 3D models of humans or animals moving from only a short video as input. Indeed, it actually understands that this is an odd shape, that it can move, but still needs to stay attached as this is still one "object" and not just many objects together...

Enhancing Photorealism Enhancement [16]

This AI can be applied live to the video game and transform every frame to look much more natural. The researchers from Intel Labs just published this paper called Enhancing Photorealism Enhancement. And if you think that this may be "just another GAN," taking a picture of the video game as an input and changing it following the style of the natural world, let me change your mind. They worked on this model for two years to make it extremely robust. It can be applied live to the video game and transform every frame to look much more natural. Just imagine the possibilities where you can put a lot less effort into the game graphic, make it super stable and complete, then improve the style using this model...

DefakeHop: A Light-Weight High-Performance Deepfake Detector [17]

How to Spot a Deep Fake in 2021. Breakthrough US Army technology using artificial intelligence to find deepfakes.

While they seem like they’ve always been there, the very first realistic deepfake didn’t appear until 2017. It went from the first-ever resembling fake images automatically generated to today’s identical copy of someone on videos, with sound.

The reality is that we cannot see the difference between a real video or picture and a deepfake anymore. How can we tell what’s real from what isn’t? How can audio files or video files be used in court as proof if an AI can entirely generate them? Well, this new paper may provide answers to these questions. And the answer here may again be the use of artificial intelligence. The saying “I’ll believe it when I’ll see it” may soon change for “I’ll believe it when the AI tells me to believe it…”

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network [18]

Apply any style to your 4K image in real-time using this new machine learning-based approach!

Barbershop: GAN-based Image Compositing using Segmentation Masks [19]

This article is not about a new technology in itself. Instead, it is about a new and exciting application of GANs. Indeed, you saw the title, and it wasn’t clickbait. This AI can transfer your hair to see how it would look like before committing to the change…

TextStyleBrush: Transfer of text aesthetics from a single example [20]

This new Facebook AI model can translate or edit text directly in the image in your own language, following the same style!

Imagine you are on vacation in another country where you do not speak the language. You want to try out a local restaurant, but their menu is in the language you don’t speak. I think this won’t be too hard to imagine as most of us already faced this situation whether you see menu items or directions and you can’t understand what’s written. Well, in 2020, you would take out your phone and google translate what you see. In 2021 you don’t even need to open google translate anymore and try to write what you see one by one to translate it. Instead, you can simply use this new model by Facebook AI to translate every text in the image in your own language…

If you’d like to read more research papers as well, I recommend you read my article where I share my best tips for finding and reading more research papers.

Animating Pictures with Eulerian Motion Fields [21]

This model takes a picture, understands which particles are supposed to be moving, and realistically animates them in an infinite loop while conserving the rest of the picture entirely still creating amazing-looking videos like this one...

CVPR 2021 Best Paper Award: GIRAFFE - Controllable Image Generation [22]

Using a modified GAN architecture, they can move objects in the image without affecting the background or the other objects!

GitHub Copilot & Codex: Evaluating Large Language Models Trained on Code [23]

Find out how this new model from OpenAI Generates Code From Words!

Apple: Recognizing People in Photos Through Private On-Device Machine Learning [24]

Using multiple machine learning-based algorithms running privately on your device, Apple allows you to accurately curate and organize your images and videos on iOS 15.

Image Synthesis and Editing with Stochastic Differential Equations [25]

Say goodbye to complex GAN and transformer architectures for image generation! This new method by Chenling Meng et al. from Stanford University and Carnegie Mellon University can generate new images from any user-based inputs. Even people like me with zero artistic skills can now generate beautiful images or modifications out of quick sketches...

Sketch Your Own GAN [26]

Make GANs training easier for everyone by generating Images following a sketch! Indeed, whit this new method, you can control your GAN’s outputs based on the simplest type of knowledge you could provide it: hand-drawn sketches.

Tesla's Autopilot Explained [27]

If you wonder how a Tesla car can not only see but navigate the roads with other vehicles, this is the video you were waiting for. A couple of days ago was the first Tesla AI day where Andrej Karpathy, the Director of AI at Tesla, and others presented how Tesla’s autopilot works from the image acquisition through their eight cameras to the navigation process on the roads.

Styleclip: Text-driven manipulation of StyleGAN imagery [28]

AI could generate images, then, using a lot of brainpower and trial and error, researchers could control the results following specific styles. Now, with this new model, you can do that using only text!

TimeLens: Event-based Video Frame Interpolation [29]

TimeLens can understand the movement of the particles in-between the frames of a video to reconstruct what really happened at a speed even our eyes cannot see. In fact, it achieves results that our intelligent phones and no other models could reach before!

Diverse Generation from a Single Video Made Possible [30]

Have you ever wanted to edit a video?

Remove or add someone, change the background, make it last a bit longer, or change the resolution to fit a specific aspect ratio without compressing or stretching it. For those of you who already ran advertisement campaigns, you certainly wanted to have variations of your videos for AB testing and see what works best. Well, this new research by Niv Haim et al. can help you do all of these out of a single video and in HD!

Indeed, using a simple video, you can perform any tasks I just mentioned in seconds or a few minutes for high-quality videos. You can basically use it for any video manipulation or video generation application you have in mind. It even outperforms GANs in all ways and doesn’t use any deep learning fancy research nor requires a huge and impractical dataset! And the best thing is that this technique is scalable to high-resolution videos.

Skillful Precipitation Nowcasting using Deep Generative Models of Radar [31]

DeepMind just released a Generative model able to outperform widely-used nowcasting methods in 89% of situations for its accuracy and usefulness assessed by more than 50 expert meteorologists! Their model focuses on predicting precipitations in the next 2 hours and achieves that surprisingly well. It is a generative model, which means that it will generate the forecasts instead of simply predicting them. It basically takes radar data from the past to create future radar data. So using both time and spatial components from the past, they can generate what it will look like in the near future.

You can see this as the same as Snapchat filters, taking your face and generating a new face with modifications on it. To train such a generative model, you need a bunch of data from both the human faces and the kind of face you want to generate. Then, using a very similar model trained for many hours, you will have a powerful generative model. This kind of model often uses GANs architectures for training purposes and then uses the generator model independently.

The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks [32]

Have you ever tuned in to a video or a TV show and the actors were completely inaudible, or the music was way too loud? Well, this problem, also called the cocktail party problem, may never happen again. Mitsubishi and Indiana University just published a new model as well as a new dataset tackling this task of identifying the right soundtrack. For example, if we take the same audio clip we just ran with the music way too loud, you can simply turn up or down the audio track you want to give more importance to the speech than the music.

The problem here is isolating any independent sound source from a complex acoustic scene like a movie scene or a youtube video where some sounds are not well balanced. Sometimes you simply cannot hear some actors because of the music playing or explosions or other ambient sounds in the background. Well, if you successfully isolate the different categories in a soundtrack, it means that you can also turn up or down only one of them, like turning down the music a bit to hear all the other actors correctly. This is exactly what the researchers achieved.

ADOP: Approximate Differentiable One-Pixel Point Rendering [33]

Imagine you want to generate a 3D model or simply a fluid video out of a bunch of pictures you took. Well, it is now possible! I don't want to give out too much, but the results are simply amazing and you need to check it out by yourself!


If you would like to read more papers and have a broader view, here is another great repository for you covering 2020: 2020: A Year Full of Amazing AI papers- A Review

Tag me on Twitter @Whats_AI or LinkedIn @Louis (What's AI) Bouchard if you share the list!


Paper references

[1] A. Ramesh et al., Zero-shot text-to-image generation, 2021. arXiv:2102.12092

[2] Lewis, Kathleen M et al., (2021), VOGUE: Try-On by StyleGAN Interpolation Optimization.

[3] Taming Transformers for High-Resolution Image Synthesis, Esser et al., 2020.

[4] Thinking Fast And Slow in AI, Booch et al., (2020), https://arxiv.org/abs/2010.06002.

[5] Odei Garcia-Garin et al., Automatic detection and quantification of floating marine macro-litter in aerial images: Introducing a novel deep learning approach connected to a web application in R, Environmental Pollution, https://doi.org/10.1016/j.envpol.2021.116490.

[6] Rematas, K., Martin-Brualla, R., and Ferrari, V., “ShaRF: Shape-conditioned Radiance Fields from a Single View”, (2021), https://arxiv.org/abs/2102.08860

[7] Drew A. Hudson and C. Lawrence Zitnick, Generative Adversarial Transformers, (2021)

[8] Sandra Bryant et al., “We Asked Artificial Intelligence to Create Dating Profiles. Would You Swipe Right?”, (2021), UNSW Sydney blog.

[9] Liu, Z. et al., 2021, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”, arXiv preprint https://arxiv.org/abs/2103.14030v1

[10] Zhang, Y., Chen, W., Ling, H., Gao, J., Zhang, Y., Torralba, A. and Fidler, S., 2020. Image gans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering. arXiv preprint arXiv:2010.09125.

[11] Yuille, A.L., and Liu, C., 2021. Deep nets: What have they ever done for vision?. International Journal of Computer Vision, 129(3), pp.781–802, https://arxiv.org/abs/1805.04025.

[12] Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N. and Kanazawa, A., 2020. Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image, https://arxiv.org/pdf/2012.09855.pdf

[13] Nguyen & Drealan et al. (2021) A Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control: https://arxiv.org/abs/2103.13452

[14] Pandey et al., 2021, Total Relighting: Learning to Relight Portraits for Background Replacement, doi: 10.1145/3450626.3459872, https://augmentedperception.github.io/total_relighting/total_relighting_paper.pdf.

[15] Gengshan Yang et al., (2021), LASR: Learning Articulated Shape Reconstruction from a Monocular Video, CVPR, https://lasr-google.github.io/.

[16] Richter, Abu AlHaija, Koltun, (2021), "Enhancing Photorealism Enhancement", https://intel-isl.github.io/PhotorealismEnhancement/.

[17] DeepFakeHop: Chen, Hong-Shuo, et al., (2021), “DefakeHop: A Light-Weight High-Performance Deepfake Detector.” ArXiv abs/2103.06929.

[18] Liang, Jie and Zeng, Hui and Zhang, Lei, (2021), "High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network", https://export.arxiv.org/pdf/2105.09188.pdf.

[19] Peihao Zhu et al., (2021), Barbershop, https://arxiv.org/pdf/2106.01505.pdf.

[20] Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev, and Tal Hassner, Facebook AI, (2021), ”TextStyleBrush: Transfer of text aesthetics from a single example”.

[21] Holynski, Aleksander, et al. “Animating Pictures with Eulerian Motion Fields.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[22] Michael Niemeyer and Andreas Geiger, (2021), "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields", Published in CVPR 2021.

[23] Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.D.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G. and Ray, A., 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

[24] Apple, “Recognizing People in Photos Through Private On-Device Machine Learning”, (2021), https://machinelearning.apple.com/research/recognizing-people-photos

[25] Meng, C., Song, Y., Song, J., Wu, J., Zhu, J.Y. and Ermon, S., 2021. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073.

[26] Wang, S.Y., Bau, D. and Zhu, J.Y., 2021. Sketch Your Own GAN. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 14050-14060).

[27] “Tesla AI Day”, Tesla, August 19th 2021, https://youtu.be/j0z4FweCy4M

[28] Patashnik, Or, et al., (2021), “Styleclip: Text-driven manipulation of StyleGAN imagery.”, https://arxiv.org/abs/2103.17249

[29] Stepan Tulyakov*, Daniel Gehrig*, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021, http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf

[30] Haim, N., Feinstein, B., Granot, N., Shocher, A., Bagon, S., Dekel, T., & Irani, M. (2021). Diverse Generation from a Single Video Made Possible, https://arxiv.org/abs/2109.08591.

[31] Ravuri, S., Lenc, K., Willson, M., Kangin, D., Lam, R., Mirowski, P., Fitzsimons, M., Athanassiadou, M., Kashem, S., Madge, S. and Prudden, R., 2021. Skillful Precipitation Nowcasting using Deep Generative Models of Radar, https://www.nature.com/articles/s41586-021-03854-z

[32] Petermann, D., Wichern, G., Wang, Z., & Roux, J.L. (2021). The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks. https://arxiv.org/pdf/2110.09958.pdf.

[33] Rückert, D., Franke, L. and Stamminger, M., 2021. ADOP: Approximate Differentiable One-Pixel Point Rendering, https://arxiv.org/pdf/2110.06635.pdf.

Owner
Louis-François Bouchard
Master student, AI Research Scientist, and speaker on YouTube.
Louis-François Bouchard
Awesome Video Datasets

Awesome Video Datasets

Yunhua Zhang 402 Aug 12, 2022
Video Visual Relation Detection (VidVRD) tracklets generation. also for ACM MM Visual Relation Understanding Grand Challenge

VidVRD-tracklets This repository contains codes for Video Visual Relation Detection (VidVRD) tracklets generation based on MEGA and deepSORT. These tr

null 23 Jun 7, 2022
Python code for "Machine learning: a probabilistic perspective" (2nd edition)

Python code for "Machine learning: a probabilistic perspective" (2nd edition)

Probabilistic machine learning 5k Aug 13, 2022
Django application and library for importing and exporting data with admin integration.

django-import-export django-import-export is a Django application and library for importing and exporting data with included admin integration. Featur

null 2.5k Aug 11, 2022
BitcartCC is a platform for merchants, users and developers which offers easy setup and use.

BitcartCC is a platform for merchants, users and developers which offers easy setup and use.

BitcartCC 159 Aug 8, 2022
Ajenti Core and stock plugins

Ajenti is a Linux & BSD modular server admin panel. Ajenti 2 provides a new interface and a better architecture, developed with Python3 and AngularJS.

Ajenti Project 6.8k Aug 7, 2022
Simple and extensible administrative interface framework for Flask

Flask-Admin The project was recently moved into its own organization. Please update your references to [email protected]:flask-admin/flask-admin.git. Int

Flask-Admin 5.1k Aug 15, 2022
Real-time monitor and web admin for Celery distributed task queue

Flower Flower is a web based tool for monitoring and administrating Celery clusters. Features Real-time monitoring using Celery Events Task progress a

Mher Movsisyan 5.3k Aug 13, 2022
Freqtrade is a free and open source crypto trading bot written in Python

Freqtrade is a free and open source crypto trading bot written in Python. It is designed to support all major exchanges and be controlled via Telegram. It contains backtesting, plotting and money management tools as well as strategy optimization by machine learning.

null 18.9k Aug 13, 2022
Simple and extensible administrative interface framework for Flask

Flask-Admin The project was recently moved into its own organization. Please update your references to [email protected]:flask-admin/flask-admin.git. Int

Flask-Admin 4.6k Feb 7, 2021
FastAPI Admin Dashboard based on FastAPI and Tortoise ORM.

FastAPI ADMIN 中文文档 Introduction FastAPI-Admin is a admin dashboard based on fastapi and tortoise-orm. FastAPI-Admin provide crud feature out-of-the-bo

long2ice 1.4k Aug 12, 2022
Nginx UI allows you to access and modify the nginx configurations files without cli.

nginx ui Table of Contents nginx ui Introduction Setup Example Docker UI Authentication Configure the auth file Configure nginx Introduction We use ng

David Schenk 4.2k Aug 5, 2022
xarray: N-D labeled arrays and datasets

xarray is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

Python for Data 2.6k Aug 10, 2022
A high-level app and dashboarding solution for Python

Panel provides tools for easily composing widgets, plots, tables, and other viewable objects and controls into custom analysis tools, apps, and dashboards.

HoloViz 2k Aug 12, 2022
With Django Hijack, admins can log in and work on behalf of other users without having to know their credentials.

Django Hijack With Django Hijack, admins can log in and work on behalf of other users without having to know their credentials. Docs See http://django

null 1.1k Aug 13, 2022
PyMMO is a Python-based MMO game framework using sockets and PyGame.

PyMMO is a Python framework/template of a MMO game built using PyGame on top of Python's built-in socket module.

Luis Souto Maior 52 Aug 6, 2022
Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use

Lazymux is a tool installer that is specially made for termux user which provides a lot of tool mainly used tools in termux and its easy to use, Lazymux install any of the given tools provided by it from itself with just one click, and its often get updated.

DedSecTL 1.6k Aug 15, 2022
WebVirtCloud is virtualization web interface for admins and users

WebVirtCloud is a virtualization web interface for admins and users. It can delegate Virtual Machine's to users. A noVNC viewer presents a full graphical console to the guest domain. KVM is currently the only hypervisor supported.

Anatoliy Guskov 1.2k Aug 10, 2022
Extends the Django Admin to include a extensible dashboard and navigation menu

django-admin-tools django-admin-tools is a collection of extensions/tools for the default django administration interface, it includes: a full feature

Django Admin Tools 697 Jul 25, 2022