An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Overview

AlphaZero-Gomoku

This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that we can focus on the training scheme of AlphaZero and obtain a pretty good AI model on a single PC in a few hours.

References:

  1. AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
  2. AlphaGo Zero: Mastering the game of Go without human knowledge

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

  • Each move with 400 MCTS playouts:
    playout400

Requirements

To play with the trained AI models, only need:

  • Python >= 2.7
  • Numpy >= 1.11

To train the AI model from scratch, further need, either:

  • Theano >= 0.7 and Lasagne >= 0.1
    or
  • PyTorch >= 0.2.0
    or
  • TensorFlow

PS: if your Theano's version > 0.7, please follow this issue to install Lasagne,
otherwise, force pip to downgrade Theano to 0.7 pip install --upgrade theano==0.7.0

If you would like to train the model using other DL frameworks, you only need to rewrite policy_value_net.py.

Getting Started

To play with provided models, run the following script from the directory:

python human_play.py  

You may modify human_play.py to try different provided models or the pure MCTS.

To train the AI model from scratch, with Theano and Lasagne, directly run:

python train.py

With PyTorch or TensorFlow, first modify the file train.py, i.e., comment the line

from policy_value_net import PolicyValueNet  # Theano and Lasagne

and uncomment the line

# from policy_value_net_pytorch import PolicyValueNet  # Pytorch
or
# from policy_value_net_tensorflow import PolicyValueNet # Tensorflow

and then execute: python train.py (To use GPU in PyTorch, set use_gpu=True and use return loss.item(), entropy.item() in function train_step in policy_value_net_pytorch.py if your pytorch version is greater than 0.5)

The models (best_policy.model and current_policy.model) will be saved every a few updates (default 50).

Note: the 4 provided models were trained using Theano/Lasagne, to use them with PyTorch, please refer to issue 5.

Tips for training:

  1. It is good to start with a 6 * 6 board and 4 in a row. For this case, we may obtain a reasonably good model within 500~1000 self-play games in about 2 hours.
  2. For the case of 8 * 8 board and 5 in a row, it may need 2000~3000 self-play games to get a good model, and it may take about 2 days on a single PC.

Further reading

My article describing some details about the implementation in Chinese: https://zhuanlan.zhihu.com/p/32089487

Comments
  •  No module named 'numpy.core.multiarray\r'

    No module named 'numpy.core.multiarray\r'

    Traceback (most recent call last): File "human_play.py", line 75, in run() File "human_play.py", line 59, in run policy_param = pickle.load(open('best_policy_8_8_5.model', 'rb')) ImportError: No module named 'numpy.core.multiarray\r'

    opened by initial-h 6
  • TF NCHW->NHWC

    TF NCHW->NHWC

    conv2d in Tensorflow now only support data_format='channels_last', so just change "channels_first" to "channels_last" and add tf.transpose(self.input_states, [0, 2, 3, 1]) for input placeholder.

    opened by Observerspy 5
  • state_dict in pytorch isn't compatible with params with theano

    state_dict in pytorch isn't compatible with params with theano

    state_dict in pytorch is a dict while params trained with theano dumped as list. when you want to retrain the model trained with theano, it seems that the model can't be loaded properly. is there any way to solve this?

    opened by GeneZC 5
  • 如何把现在的Net网络改成Resnet网络

    如何把现在的Net网络改成Resnet网络

    我在15x15的棋盘上已经跑了4500盘,这个Ai下棋还是非常非常差劲。(四个角轮流下,也不会堵我的子) 噪声,温度,学习率都尝试改过了,没什么作用。通过翻转扩充self-play数据,我觉得这是不可行的。 想问如何现在net改成resnet。。查了下相关资料,一头雾水。。 还有如果把7x7或8x8之类比较小的板训练的Ai应用到更大的15x15之类的,会不会训练快很多呢?(但是目前好像大小不一致是不能这样用的。。)

    opened by liunian321 4
  • Keras support

    Keras support

    Dear Junxiao: Thanks very much for sharing your codes on Github. I have learnt a lot from your work.

    I am a fan of deep reinforcement learning and I'm still learning it. And your work helps me understand the mechanism of AlphaZero. Meanwhile, I am studying Keras, so I rewrite the "policy_value_net.py" with Keras. I have tested my codes and they passed the test under Keras 2.0.5 with tensorflow-gpu 1.2.1 as backend.

    I really hope that I can contribute myself to this project. Honestly wish you can accept this Pull Request. I'm looking forword to your reply.

    Yours, Mingxu Zhang

    opened by MingxuZhang 4
  • 小白的猜想,请多多指教。

    小白的猜想,请多多指教。

    AlphaGo zero和 alphazero 的区别是少了一个eval。self和opt还是一样。AlphaGo zero中是一个步骤一个步骤来。如,在我们的单机电脑中,是先self生成数据,然后再opt训练数据。再eval评估。我们不能同时进行这三个步骤。因为不能刚好连接得上。。。即使alphazero少了eval。但还是要一步一步来啊,要怎么做才能self数据的时候自动的变成opt的模型呢?我想的是self数据完成后直接变成model。然后model又直接self这样不是更快了吗。这样我们就可以美美的睡觉第二天起来,直接能用上更好的model了。如:https://github.com/chncyhn/flappybird-qlearning-bot

    opened by 1715509415 3
  • 关于self.data_buffer.extend(play_data)

    关于self.data_buffer.extend(play_data)

    play_data=[1,2],data_buffer为空队列的话,执行self.data_buffer.extend(play_data)想要的结果是data_buffer=[1,2],(1,2分别为训练数据[s,矩阵,z]),但实际结果会不会是data_buffer = [[1,2]]? 是不是应该把play data里的每一项提出来append到data buffer后面?

    opened by huyp182 2
  • Why -leaf_value at mcts_alphaZero.py line 66?

    Why -leaf_value at mcts_alphaZero.py line 66?

        def update_recursive(self, leaf_value):
            """Like a call to update(), but applied recursively for all ancestors.
            """
            # If it is not root, this node's parent should be updated first.
            if self._parent:
                self._parent.update_recursive(-leaf_value)
            self.update(leaf_value)
    
    opened by wmpeng 2
  • Support tensorflow

    Support tensorflow

    Support Tensorflow, already comment out the changes specific for Tensorflow.

    Though I am still a N00b, I really like this repro, which is a super clear and easy learning materials :)

    opened by Kelvin-Zhong 2
  • Support gomocup protocol

    Support gomocup protocol

    The source code for the protocol can be found here: https://github.com/stranskyjan/pbrain-pyrandom/blob/master/pisqpipe.py

    Then it can be used with the Piskvork gomoku manager to compare with other engines like http://www.aiexp.info/pages/yixin.html (which is presently the top gomoku engine)

    opened by tianshuo 2
  • 看完代码觉得这个Implemention有问题,欢迎指正

    看完代码觉得这个Implemention有问题,欢迎指正

    把作者的代码读了一遍,觉得有个地方有问题。 按照我的理解作者这里把每局的replay简单的所有局面赋予了相同的z值,我按一种分支走法走到底,如果这局白棋赢了,对于这一局的所有states都赋予白棋赢的标签。

    然而任何一篇alphago论文都不是这么干的,包括alpha lee的文章, 一开始就是有把单次搜索(可能是几千几万盘end_game)做一个统计,才能得出一个当前局面的value或者action的监督信号。 我觉得这可能是这个项目训出来的ai不怎么强的原因。

    具体训练数据的生成,标签统计应该如何做,可以参考这一篇文章: https://medium.com/applied-data-science/how-to-build-your-own-deepmind-muzero-in-python-part-2-3-f99dad7a7ad

    opened by ylf11235 1
  • 关于pytorch训练无法收敛的问题

    关于pytorch训练无法收敛的问题

    作者您好,非常感谢您提供的这个非常棒的alpha go zero算法工程!感谢您能够百忙之中抽看去看我的问题:我使用pytroch去训练,发现loss值始终下降不下去,一直在6-7左右徘徊,不知到您是否有遇到过类似的问题?有什么解决的办法吗?(您在先前问题提到的explain_val=0的问题我也遇到过,但大部分情况都是有值的)

    根据您往期的回答,您貌似是只在theano上完整训练过网络,我在使用该网络后得到了和您类似的loss曲线,所以我猜测是pytorch和theano的差异导致pytorch训练无法收敛。我将Theano训练出来的权重生成pytorch的权重文件拿来使用后,得到的结果仍然不理想,所以,是不是因为两者在前向传播的过程当中就存在不同? 我目前使用的版本是pytorch==1.12.0,cpu和gpu都试过,我还在pytroch==0.4.1上尝试过,但是还是没能解决问题。调试超参数貌似也不能很好的解决这个问题

    opened by hxb123622 6
  • 为什么要在'play_data'中反转输入特征?

    为什么要在'play_data'中反转输入特征?

    为什么要在'play_data'中反转 input features?

    play_data 1:

    [[0,0,0], [0,1,0], [0,0,0]] # white
    [[0,0,0], [0,0,0], [0,0,0]] # black
    [[0,0,0], [0,1,0], [0,0,0]] # action
    [[0,0,0], [0,0,0], [0,0,0]] # player
    

    play_data 2:

    [[1,0,0], [0,0,0], [0,0,0]] # black
    [[0,0,0], [0,1,0], [0,0,0]] # white
    [[1,0,0], [0,0,0], [0,0,0]] # action
    [[1,1,1], [1,1,1], [1,1,1]] # player
    
    opened by Michi-123 0
Owner
Junxiao Song
PhD, ECE, HKUST
Junxiao Song
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 3, 2023
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

Stereo-Waterdrop-Removal-with-Row-wise-Dilated-Attention This repository includes official codes for "Stereo Waterdrop Removal with Row-wise Dilated A

null 29 Oct 1, 2022
This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

EEND-vector clustering The EEND-vector clustering (End-to-End-Neural-Diarization-vector clustering) is a speaker diarization framework that integrates

null 45 Dec 26, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

null 7 Aug 16, 2022
wlad 2 Dec 19, 2022
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

Fangjian Li 3 Dec 28, 2021
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
A list of papers about point cloud based place recognition, also known as loop closure detection in SLAM (processing)

A list of papers about point cloud based place recognition, also known as loop closure detection in SLAM (processing)

Xin Kong 17 May 16, 2021
A simple pygame dino game which can also be trained and played by a NEAT KI

Dino Game AI Game The game itself was developed with the Pygame module pip install pygame You can also play it yourself by making the dino jump with t

Kilian Kier 7 Dec 5, 2022
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

null 117 Dec 28, 2022
Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Linux_kernel_exploits Some Linux kernel exploits for various real world kernel vulnerabilities here. More exploits are yet to come. This repo contains

Wei Wu 472 Dec 21, 2022
Updated for TTS(CE) = Also Known as TTN V3. The code requires the first server to be 'ttn' protocol.

Updated Updated for TTS(CE) = Also Known as TTN V3. The code requires the first server to be 'ttn' protocol. Introduction This balenaCloud (previously

Remko 1 Oct 17, 2021
This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

PyTorch Infer Utils This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model infer

Alex Gorodnitskiy 11 Mar 20, 2022
Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Predicitng_viability Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for

Gopalika Sharma 1 Nov 8, 2021
I created My own Virtual Artificial Intelligence named genesis, He can assist with my Tasks and also perform some analysis,,

Virtual-Artificial-Intelligence-genesis- I created My own Virtual Artificial Intelligence named genesis, He can assist with my Tasks and also perform

AKASH M 1 Nov 5, 2021