An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Junxiao Song

Last update: Dec 26, 2022

Related tags

Deep Learning board-game lasagne theano reinforcement-learning tensorflow pytorch mcts gomoku rl monte-carlo-tree-search self-learning gobang alphago alphago-zero alphazero

Overview

AlphaZero-Gomoku

This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that we can focus on the training scheme of AlphaZero and obtain a pretty good AI model on a single PC in a few hours.

References:

AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
AlphaGo Zero: Mastering the game of Go without human knowledge

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

Each move with 400 MCTS playouts:

Requirements

To play with the trained AI models, only need:

Python >= 2.7
Numpy >= 1.11

To train the AI model from scratch, further need, either:

Theano >= 0.7 and Lasagne >= 0.1
or
PyTorch >= 0.2.0
or
TensorFlow

PS: if your Theano's version > 0.7, please follow this issue to install Lasagne,
otherwise, force pip to downgrade Theano to 0.7 pip install --upgrade theano==0.7.0

If you would like to train the model using other DL frameworks, you only need to rewrite policy_value_net.py.

Getting Started

To play with provided models, run the following script from the directory:

python human_play.py

You may modify human_play.py to try different provided models or the pure MCTS.

To train the AI model from scratch, with Theano and Lasagne, directly run:

python train.py

With PyTorch or TensorFlow, first modify the file train.py, i.e., comment the line

from policy_value_net import PolicyValueNet  # Theano and Lasagne

and uncomment the line

# from policy_value_net_pytorch import PolicyValueNet  # Pytorch
or
# from policy_value_net_tensorflow import PolicyValueNet # Tensorflow

and then execute: python train.py (To use GPU in PyTorch, set use_gpu=True and use return loss.item(), entropy.item() in function train_step in policy_value_net_pytorch.py if your pytorch version is greater than 0.5)

The models (best_policy.model and current_policy.model) will be saved every a few updates (default 50).

Note: the 4 provided models were trained using Theano/Lasagne, to use them with PyTorch, please refer to issue 5.

Tips for training:

It is good to start with a 6 * 6 board and 4 in a row. For this case, we may obtain a reasonably good model within 500~1000 self-play games in about 2 hours.
For the case of 8 * 8 board and 5 in a row, it may need 2000~3000 self-play games to get a good model, and it may take about 2 days on a single PC.

$No module named 'numpy.core.multiarray\r'$

No module named 'numpy.core.multiarray\r'

Traceback (most recent call last): File "human_play.py", line 75, in run() File "human_play.py", line 59, in run policy_param = pickle.load(open('best_policy_8_8_5.model', 'rb')) ImportError: No module named 'numpy.core.multiarray\r'

opened by initial-h 6
TF NCHW->NHWC

conv2d in Tensorflow now only support data_format='channels_last', so just change "channels_first" to "channels_last" and add tf.transpose(self.input_states, [0, 2, 3, 1]) for input placeholder.

opened by Observerspy 5
state_dict in pytorch isn't compatible with params with theano

state_dict in pytorch is a dict while params trained with theano dumped as list. when you want to retrain the model trained with theano, it seems that the model can't be loaded properly. is there any way to solve this?

opened by GeneZC 5
如何把现在的Net网络改成Resnet网络

我在15x15的棋盘上已经跑了4500盘，这个Ai下棋还是非常非常差劲。（四个角轮流下，也不会堵我的子）噪声，温度，学习率都尝试改过了，没什么作用。通过翻转扩充self-play数据，我觉得这是不可行的。想问如何现在net改成resnet。。查了下相关资料，一头雾水。。还有如果把7x7或8x8之类比较小的板训练的Ai应用到更大的15x15之类的，会不会训练快很多呢？（但是目前好像大小不一致是不能这样用的。。）

opened by liunian321 4
Keras support

Dear Junxiao: Thanks very much for sharing your codes on Github. I have learnt a lot from your work.

I am a fan of deep reinforcement learning and I'm still learning it. And your work helps me understand the mechanism of AlphaZero. Meanwhile, I am studying Keras, so I rewrite the "policy_value_net.py" with Keras. I have tested my codes and they passed the test under Keras 2.0.5 with tensorflow-gpu 1.2.1 as backend.

I really hope that I can contribute myself to this project. Honestly wish you can accept this Pull Request. I'm looking forword to your reply.

Yours, Mingxu Zhang

opened by MingxuZhang 4
小白的猜想，请多多指教。

AlphaGo zero和 alphazero 的区别是少了一个eval。self和opt还是一样。AlphaGo zero中是一个步骤一个步骤来。如，在我们的单机电脑中，是先self生成数据，然后再opt训练数据。再eval评估。我们不能同时进行这三个步骤。因为不能刚好连接得上。。。即使alphazero少了eval。但还是要一步一步来啊，要怎么做才能self数据的时候自动的变成opt的模型呢？我想的是self数据完成后直接变成model。然后model又直接self这样不是更快了吗。这样我们就可以美美的睡觉第二天起来，直接能用上更好的model了。如：https://github.com/chncyhn/flappybird-qlearning-bot

opened by 1715509415 3
关于self.data_buffer.extend(play_data)

play_data=[1,2]，data_buffer为空队列的话，执行self.data_buffer.extend(play_data)想要的结果是data_buffer=[1,2]，（1,2分别为训练数据[s,矩阵,z]）,但实际结果会不会是data_buffer = [[1,2]]? 是不是应该把play data里的每一项提出来append到data buffer后面？

opened by huyp182 2

Why -leaf_value at mcts_alphaZero.py line 66?

    def update_recursive(self, leaf_value):
        """Like a call to update(), but applied recursively for all ancestors.
        """
        # If it is not root, this node's parent should be updated first.
        if self._parent:
            self._parent.update_recursive(-leaf_value)
        self.update(leaf_value)

opened by wmpeng 2

Support tensorflow

Support Tensorflow, already comment out the changes specific for Tensorflow.

Though I am still a N00b, I really like this repro, which is a super clear and easy learning materials :)

opened by Kelvin-Zhong 2
Support gomocup protocol

The source code for the protocol can be found here: https://github.com/stranskyjan/pbrain-pyrandom/blob/master/pisqpipe.py

Then it can be used with the Piskvork gomoku manager to compare with other engines like http://www.aiexp.info/pages/yixin.html (which is presently the top gomoku engine)

opened by tianshuo 2
看完代码觉得这个Implemention有问题，欢迎指正

把作者的代码读了一遍，觉得有个地方有问题。按照我的理解作者这里把每局的replay简单的所有局面赋予了相同的z值，我按一种分支走法走到底，如果这局白棋赢了，对于这一局的所有states都赋予白棋赢的标签。

然而任何一篇alphago论文都不是这么干的，包括alpha lee的文章，一开始就是有把单次搜索（可能是几千几万盘end_game）做一个统计，才能得出一个当前局面的value或者action的监督信号。我觉得这可能是这个项目训出来的ai不怎么强的原因。

具体训练数据的生成，标签统计应该如何做，可以参考这一篇文章： https://medium.com/applied-data-science/how-to-build-your-own-deepmind-muzero-in-python-part-2-3-f99dad7a7ad

opened by ylf11235 1
关于pytorch训练无法收敛的问题

作者您好，非常感谢您提供的这个非常棒的alpha go zero算法工程！感谢您能够百忙之中抽看去看我的问题：我使用pytroch去训练，发现loss值始终下降不下去，一直在6-7左右徘徊，不知到您是否有遇到过类似的问题？有什么解决的办法吗？（您在先前问题提到的explain_val=0的问题我也遇到过，但大部分情况都是有值的）

根据您往期的回答，您貌似是只在theano上完整训练过网络，我在使用该网络后得到了和您类似的loss曲线，所以我猜测是pytorch和theano的差异导致pytorch训练无法收敛。我将Theano训练出来的权重生成pytorch的权重文件拿来使用后，得到的结果仍然不理想，所以，是不是因为两者在前向传播的过程当中就存在不同？我目前使用的版本是pytorch==1.12.0，cpu和gpu都试过，我还在pytroch==0.4.1上尝试过，但是还是没能解决问题。调试超参数貌似也不能很好的解决这个问题

opened by hxb123622 6

为什么要在'play_data'中反转输入特征？

为什么要在'play_data'中反转 input features？

play_data 1:

[[0,0,0], [0,1,0], [0,0,0]] # white
[[0,0,0], [0,0,0], [0,0,0]] # black
[[0,0,0], [0,1,0], [0,0,0]] # action
[[0,0,0], [0,0,0], [0,0,0]] # player

play_data 2:

[[1,0,0], [0,0,0], [0,0,0]] # black
[[0,0,0], [0,1,0], [0,0,0]] # white
[[1,0,0], [0,0,0], [0,0,0]] # action
[[1,1,1], [1,1,1], [1,1,1]] # player

opened by Michi-123 0

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Related tags

Overview

AlphaZero-Gomoku

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

Requirements

Getting Started

Further reading

Comments

Owner

Junxiao Song

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

Official code for "Stereo Waterdrop Removal with Row-wise Dilated Attention (IROS2021)"

This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

A list of papers about point cloud based place recognition, also known as loop closure detection in SLAM (processing)

A simple pygame dino game which can also be trained and played by a NEAT KI

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Updated for TTS(CE) = Also Known as TTN V3. The code requires the first server to be 'ttn' protocol.

This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

I created My own Virtual Artificial Intelligence named genesis, He can assist with my Tasks and also perform some analysis,,