Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Comments
  • Crashes on launch on Mac

    Crashes on launch on Mac

    Hi,

    DeepLearningFlappyBird crashes on launch on Mac OS X EI Captian, here is the error log:

    tensorflow.python.framework.errors.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bird-dqn-30000

    opened by imikay 10
  • How do 1 and -1 reward be used?

    How do 1 and -1 reward be used?

    I find from here that all the rewards are add into the deque. We need to sample the 1 and -1 reward from the deque to use them. So do you think it may be slow.

    In Chinese:是不是reward为1和-1的情况也都放在deque里,那么reward为1和-1的被sample出来的几率岂不是很低,反馈就会很慢?

    Thank you @yenchenlin

    opened by guotong1988 5
  • why deep learning is used in this game

    why deep learning is used in this game

    As i found on the internet that this game can be built without the use of deep learning [https://github.com/chncyhn/flappybird-qlearning-bot] So can u help me understand what is more beneficial in using deep learning in this game rather than simply using q-learning.

    opened by gaurav2695 4
  • import Tensorflow as tf, some error occurred

    import Tensorflow as tf, some error occurred

    Hey there, if anyone else is getting the same error then please uninstall the tensorflow version on windows using pip uninstall tensorflow and then re-install tensorflow. You might use other versions of tensorflow too, only if it's not working in 1.5.0. You can also downgrade to pip install tensorflow==1.1 BTW amazing stuff man, Kudos!

    opened by kunaldhariwal 3
  • Readme‘s Chinese Veision

    Readme‘s Chinese Veision

    I am a student from China and I am familier with relative knowledge background above corresponding terminologies. So I am able to translate your great readme content into Chinese precisely to show this project more directly to more Chinese programmers or students who are interested in this field.

    opened by littleheap 2
  • are those calculations right??

    are those calculations right??

    In the figure, I wonder that your math things are valid

    I mean,

    1. input 80 x 80 x 4 -- conv. w/ 8 x 8 x 4 x 32, stride 4 --> output 19 x 19 x 32 (because (80 - 8) / 4 + 1) => your result was 20 x 20 x 32

    2. input 10 x 10 x 32 -- conv. w/ 4 x 4 x32 x 64, stride 2 ---> output 4 x 4 x 64 (because (10 - 4) / 2 + 1) => your result was 5 x 5 x 64

    3. and...
      input 3 x 3 x 64 -- conv. w/ 3 x 3 x64 x 64 --> your result was 3 x 3 x 64 (is this possible?)

    am I wrong? Since I am a newbie on this area, if I misunderstood, please teach me.

    opened by jbsgbmr 2
  • Update deep_q_network.py for Tensorflow 1.0

    Update deep_q_network.py for Tensorflow 1.0

    "tf.mul, tf.sub and tf.neg are deprecated in favor of tf.multiply, tf.subtract and tf.negative." - see https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md for changes related to "Release 1.0.0".

    opened by jgyllinsky 2
  • Why use pooling?

    Why use pooling?

    Probably you might want to enlighten me on the reason why you use pooling in this architecture, cause as far as i know pooling might result into a network that is insensitive to the location of an object in the image. Thanks in advance

    opened by aryopg 2
  • libpng warning: iCCP: known incorrect sRGB profile

    libpng warning: iCCP: known incorrect sRGB profile

    i want know whats wrong about this , THX very much.

    libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile libpng warning: iCCP: known incorrect sRGB profile Traceback (most recent call last): File "deep_q_network.py", line 215, in main() File "deep_q_network.py", line 212, in main playGame() File "deep_q_network.py", line 209, in playGame trainNetwork(s, readout, h_fc1, sess) File "deep_q_network.py", line 82, in trainNetwork readout_action = tf.reduce_sum(tf.multiply(readout, a), reduction_indices=1) AttributeError: 'module' object has no attribute 'multiply'

    opened by ExcaliburAir 1
  • parse_card: can't find card 0

    parse_card: can't find card 0

    
    Traceback (most recent call last):
      File "deep_q_network.py", line 8, in <module>
        import wrapped_flappy_bird as game
      File "game/wrapped_flappy_bird.py", line 19, in <module>
        IMAGES, SOUNDS, HITMASKS = flappy_bird_utils.load()
      File "game/flappy_bird_utils.py", line 42, in load
        SOUNDS['die']    = pygame.mixer.Sound('assets/audio/die' + soundExt)
    MemoryError
    
    
    opened by lazywhite 1
  • Why use tf.multiply?

    Why use tf.multiply?

    https://github.com/yenchenlin/DeepLearningFlappyBird/blob/master/deep_q_network.py#L82-L83

    I can not find the math that support the multiply operation.

    opened by guotong1988 1
  • Update to Tensorflow 2.0

    Update to Tensorflow 2.0

    I updated this to Tensorflow 2.5, with the Automatic Update tool from Google. I also removed the models that come with it, and increased its FPS and removed the rendering.

    opened by Anonymous-Ol 2
  • AttributeError

    AttributeError

    After py deep_q_network.py:

    AttributeError: module 'tensorflow' has no attribute 'InteractiveSession'

    Then pip install --upgrade tensorflow==0.7

    It returns:

    ERROR: Could not find a version that satisfies the requirement tensorflow==0.7 (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
    
    ERROR: No matching distribution found for tensorflow==0.7
    

    How can I run this?

    opened by Setembru 2
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
Flappy Bird clone in Python using Pyglet

python_Flappy-Bird This is the Game Flappy Bird which was originally developt by Dong Nguyen under .Gears recreated in Python. Requirements (used to d

Konstantin Opora 1 Dec 10, 2021
Flappy bird using Pygames

flappy-bird Esse é um jogo que eu fiz utilizando a biblioteca de jogos do Python

Leandro Henrique 2 Jan 5, 2022
Flappy Bird Game using Pygame in Python

Flappy Bird Game using Pygame in Python Demo Pages Hello dear, hope you are very well! I created Flappy Bird Game using Pygame ( Pygame is a cross-pla

Datt Panchal 3 Feb 5, 2022
Just a copied of flappy bird game made by Thuongton999

flappy-bird Just a copied of flappy bird game made by Thuongton999 Installation and Usage Using terminal (on Window) Make sure you installed Python an

ThuongTon 9 Aug 8, 2021
Flappy Bird clone utilizing facial recognition to move the

Flappy Face Flappy Bird clone utilizing facial recognition to move the "bird" How it works Flappy Face uses Facial Recognition to detect your face's p

Brady McDermott 1 Jan 11, 2022
Jogo Flappy Bird com phyton e phygame

Flappy-Bird Tecnologias usadas Requisitos para inicializar o jogo: Python faça o download em: https://www.python.org/ Pygame faça o download em: https

João Guilherme 1 Dec 6, 2021
Projeto Flappy Bird temática doom, projeto python e pygame

Doom-Bird Tecnologias usadas Requisitos para inicializar o jogo: Python faça o download em: https://www.python.org/downloads/ Após instalar o Python d

João Guilherme 1 Dec 8, 2021
🐥Flappy Birds🐤 Video game. With your help I can go through🚀 the pipes. All UI is made with 🐍Pygame🐍

?? Flappy Fish ?? I am Flappy Fish ?? . With your help I can jump through the pipes and experience an interesting and exciting flight deep into the fi

MohammadReza 2 Jan 14, 2022
Finding a method to objectively quantify skill expression in games, using reinforcement learning

Analyzing Skill Expression in Games This is a repo where I describe a method to measure the amount of skill expression games have. Table of Contents M

Marcus Chiam 4 Nov 19, 2022
Jiminy, fast and portable Python/C++ simulator of poly-articulated systems with OpenAI Gym interface for reinforcement learning.

Jiminy is a fast and portable cross-platform open-source simulator for poly-articulated systems. It was built with two ideas in mind: provide a fast y

Alexis DUBURCQ 122 Dec 29, 2022
The repository that hosts the code that teaches a reinforcement learning - based bot to play 2048

The repository that hosts the code that teaches a reinforcement learning - based bot (based on policy gradients method) to play 2048

Maxim Rud 1 Dec 16, 2021
A Gomoku game GUI using pygame where the user can choose to play against another player or an AI using minimax with alpha-beta pruning

Gomoku A GUI based Gomoku game using pygame where the user can choose to play against another player or an AI using minimax with alpha-beta pruning. R

Mingyu Liu 1 Oct 30, 2021
A zombie game using Kinetic. You can control players using fingers

This is Eden Park's portpolio: Works, projects and practices This repository can be used to show the potential employers to check my works, code and p

Eden Park 4 May 16, 2022
A "finish the lyrics" game using Spotify, YouTube Transcript, and YouTube Search APIs, coupled with visual machine learning

Singify Introducing Singify, the party game! Challenge your friend to who knows songs better. Play random songs from your very own Spotify playlist an

Josh Wong 4 Nov 19, 2021
Deep Running

Deep Running 1. Install $ pip install --user deep_running 2. Lap N Lap. Name Remarks Citation Meta 1 Mario Deeeeeep Running I was born to run. dannyso

karaage 69 Jan 31, 2022
Minecraft clone using Python Ursina game engine!

Minecraft clone using Python Ursina game engine!

Taehee Lee 35 Jan 3, 2023
To solve games using AI, we will introduce the concept of a game tree followed by minimax algorithm.

To solve games using AI, we will introduce the concept of a game tree followed by minimax algorithm.

Vaibhaw 7 Oct 13, 2022
Creates a landscape with more accurate river generation in Minecraft version 1.12 using python.

MinecraftLandRiverGen View the following youtube video to set up a world that can interact with the python programs

null 23 Dec 25, 2022
This is a simple game made using pygame.

Ball breaker This is a simple game made using pygame game view The game view have been updated wait for the new view to be uploaded Game_show.mp4 Lear

Rishikesh Kumar 3 Nov 5, 2021