This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Facebook Research

Last update: Jan 5, 2023

Related tags

Deep Learning off-belief-learning

Overview

Off-Belief Learning

Introduction

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Environment Setup

We have been using pytorch-1.5.1, cuda-10.1, and cudnn-v7.6.5 in our development environment. Other settings may also work but we have not tested it extensively under different configurations. We also use conda/miniconda to manage environments.

There are known issues when using this repo with newer versions of pytorch, such as this illegal move issue.

conda create -n hanabi python=3.7
conda activate hanabi

# install pytorch 1.5.1
# note that newer versions may cause compilation issues
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install other dependencies
pip install psutil

# install a newer cmake if the current version is < 3.15
conda install -c conda-forge cmake

To help cmake find the proper libraries (e.g. libtorch), please either add the following lines to your .bashrc, or add it to a separate file and source it before you start working on the project.

# activate the conda environment
conda activate hanabi

# set path
CONDA_PREFIX=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CPATH=${CONDA_PREFIX}/include:${CPATH}
export LIBRARY_PATH=${CONDA_PREFIX}/lib:${LIBRARY_PATH}
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}

# avoid tensor operation using all cpu cores
export OMP_NUM_THREADS=1

Finally, to compile this repo:

# under project root
mkdir build
cd build
cmake ..
make -j10

Code Structure

For an overview of how the training infrastructure, please refer to Figure 5 of the Off-Belief Learning paper.

hanabi-learning-environment is a modified version of the original HLE from Deepmind.

Notable modifications includes:

Card knowledge part of the observation encoding is changed to v0-belief, i.e. card knowledge normalized by the remaining public card count.
Functions to reset the game state with sampled hands.

rela (REinforcement Learning Assemly) is a set of tools for efficient batched neural network inference written in C++ with multi-threading.

rlcc implements the core of various algorithms. For example, the logic of fictitious transitions are implemented in r2d2_actor.cc. It also contains implementations of baselines such as other-play, VDN and IQL.

pyhanabi is the main entry point of the repo. It contains implementations for Q-network, recurrent DQN training, belief network and training, as well as some tools to analyze trained models.

Run the Code

Please refer to the README in pyhanabi for detailed instruction on how to train a model.

Download Models

To download the trained models used in the paper, go to models folder and run

sh download.sh

Due to agreement with BoardGameArena and Facebook policies, we are unable to release the "Clone Bot" models trained on the game data nor the datasets themselves.

Copyright

This source code is licensed under the license found in the LICENSE file in the root directory of this source tree.

Comments

How do I run SAD inside this repo?

I have trained my own agent using this repo, and would like to evaluate it when it plays a game with the SAD agent. The download.sh script downloads a pre-trained SAD model, and I would like to run this model using evaluate(), however, I'm getting a number of errors. How do I run the SAD model in this repo?

Thanks in advance!

opened by ravihammond 10

Cannot build project

I am currently trying to build the project on my MacBook but cannot seem to get it to work. Here are some details of my setup.

macOS Big Sur 11.6.5
Python 3.7.13
Pytorch 1.10.1

Unfortunately, I get the following error.

(obl-paper) matthias@MacBookPro ~/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/build$ cmake ..
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Deprecation Warning at hanabi-learning-environment/CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- pybind11 v2.6.2
CMake Warning (dev) at /Users/matthias/Developer/miniconda3/envs/obl-paper/share/cmake-3.22/Modules/CMakeDependentOption.cmake:84 (message):
  Policy CMP0127 is not set: cmake_dependent_option() supports full Condition
  Syntax.  Run "cmake --help-policy CMP0127" for policy details.  Use the
  cmake_policy command to set the policy and suppress this warning.
Call Stack (most recent call first):
  third_party/pybind11/CMakeLists.txt:98 (cmake_dependent_option)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found PythonInterp: /Users/matthias/Developer/miniconda3/envs/obl-paper/bin/python (found version "3.7.13")
-- Found PythonLibs: /Users/matthias/Developer/miniconda3/envs/obl-paper/lib/libpython3.7m.dylib
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Performing Test HAS_FLTO_THIN
-- Performing Test HAS_FLTO_THIN - Success
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Warning at /Users/matthias/Developer/miniconda3/envs/obl-paper/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /Users/matthias/Developer/miniconda3/envs/obl-paper/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:23 (find_package)


-- Found Torch: /Users/matthias/Developer/miniconda3/envs/obl-paper/lib/python3.7/site-packages/torch/lib/libtorch.dylib
---------------------
/Users/matthias/Developer/miniconda3/envs/obl-paper/lib/python3.7/site-packages/torch
-DPYBIND11_COMPILER_TYPE=\"_clang\" -DPYBIND11_STDLIB=\"_libcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1002\"
 -O3 -Wall -Wextra -Wno-register -Wno-attributes -fPIC -march=native -Wfatal-errors -DPYBIND11_COMPILER_TYPE=\"_clang\" -DPYBIND11_STDLIB=\"_libcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1002\"
---------------------
-- Configuring done
CMake Warning (dev):
  Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
  --help-policy CMP0042" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  MACOSX_RPATH is not specified for the following targets:

   pyhanabi

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/build
(obl-paper) matthias@MacBookPro ~/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/build$ make -j10
[ 12%] Building CXX object CMakeFiles/rela_lib.dir/rela/batcher.cc.o
[ 15%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_card.cc.o
[ 15%] Building CXX object CMakeFiles/rela_lib.dir/rela/batch_runner.cc.o
[ 15%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_game.cc.o
[ 18%] Building CXX object CMakeFiles/rela_lib.dir/rela/transition.cc.o
[ 12%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_move.cc.o
[ 21%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_history_item.cc.o
[ 25%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_hand.cc.o
[ 28%] Building CXX object CMakeFiles/rela_lib.dir/rela/context.cc.o
[ 31%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_observation.cc.o
[ 34%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/hanabi_state.cc.o
[ 37%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/util.cc.o
[ 40%] Building CXX object hanabi-learning-environment/hanabi_lib/CMakeFiles/hanabi.dir/canonical_encoders.cc.o
[ 43%] Linking CXX static library libhanabi.a
[ 43%] Built target hanabi
[ 46%] Building CXX object hanabi-learning-environment/CMakeFiles/pyhanabi.dir/pyhanabi.cc.o
[ 50%] Building CXX object hanabi-learning-environment/CMakeFiles/game_example.dir/game_example.cc.o
[ 53%] Linking CXX executable game_example
[ 53%] Built target game_example
[ 56%] Linking CXX shared library libpyhanabi.dylib
[ 56%] Built target pyhanabi
[ 59%] Linking CXX static library librela_lib.a
[ 59%] Built target rela_lib
make[2]: *** No rule to make target `/Users/matthias/Developer/miniconda3/envs/obl-paper/lib/python3.7/site-packages/torch/lib/libtorch_python.so', needed by `rela.cpython-37m-darwin.so'.  Stop.
make[2]: *** Waiting for unfinished jobs....
[ 62%] Building CXX object CMakeFiles/rela.dir/rela/pybind.cc.o
[ 68%] Building CXX object CMakeFiles/hanalearn.dir/rlcc/clone_data_generator.cc.o
[ 68%] Building CXX object CMakeFiles/hanalearn.dir/searchcc/hybrid_model.cc.o
[ 71%] Building CXX object CMakeFiles/hanalearn.dir/rlcc/r2d2_actor.cc.o
[ 75%] Building CXX object CMakeFiles/hanalearn.dir/rlcc/utils.cc.o
[ 78%] Building CXX object CMakeFiles/hanalearn.dir/searchcc/sparta.cc.o
[ 81%] Building CXX object CMakeFiles/hanalearn.dir/searchcc/hand_dist.cc.o
[ 84%] Building CXX object CMakeFiles/hanalearn.dir/searchcc/rl_search.cc.o
[ 87%] Building CXX object CMakeFiles/hanalearn.dir/searchcc/thread_loop.cc.o
[ 90%] Building CXX object CMakeFiles/hanalearn.dir/searchcc/sim_actor.cc.o
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/hybrid_model.cc:7:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/hybrid_model.h:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/game_sim.h:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:24:
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_hand.h:98:3: warning: definition of implicit copy assignment operator for 'HanabiHand' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy]
  HanabiHand(const HanabiHand& hand)
  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithmIn file included from :/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/hand_dist.cc1711::919:
: note: in implicit copy assignment operator for 'hanabi_learning_env::HanabiHand' first required here
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/game_sim.h:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:24:
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_hand.h:98:        *__result = *__first;3
                  ^
: warning: /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:1745:23: note: in instantiation of function template specialization 'std::__copy_constexpr<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
        return _VSTD::__copy_constexpr(
                      ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1451:30: note: in instantiation of function template specialization 'std::copy<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested heredefinition of implicit copy assignment operator for 'HanabiHand' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy]
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/sparta.cc:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/sparta.h:9:
In file included from         pointer __m = _VSTD::copy(__first, __mid, this->__begin_);

                             ^
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/game_sim.h:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector24::
1405:/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_hand.h9::98 :3:note : in instantiation of function template specialization 'std::vector<hanabi_learning_env::HanabiHand>::assign<hanabi_learning_env::HanabiHand *>' requested herewarning
: definition of implicit copy assignment operator for 'HanabiHand' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy]
        assign(__x.__begin_, __x.__end_);
        ^
  HanabiHand(const HanabiHand& hand)
  ^
  HanabiHand(const HanabiHand& hand)
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:36:  ^7
: note: in instantiation of member function 'std::vector<hanabi_learning_env::HanabiHand>::operator=' requested here
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:class HanabiState {1711
:19      ^:
 note: in implicit copy assignment operator for 'hanabi_learning_env::HanabiHand' first required here
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:1711:19:         *__result = *__first;note
:                   ^in implicit copy assignment operator for 'hanabi_learning_env::HanabiHand' first required here

 /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:1745:23:        *__result = *__first;note:
in instantiation of function template specialization 'std::__copy_constexpr<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
                  ^
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/thread_loop.cc:7:
        return _VSTD::__copy_constexpr(
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/thread_loop.h                      ^:/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm
9::
1745In file included from :/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/rlcc/utils.h23::9 :
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/canonical_encoders.hnote:: 24:
in instantiation of function template specialization 'std::__copy_constexpr<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested hereIn file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_observation.h::145123::
30: /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_hand.h:note98: :3in instantiation of function template specialization 'std::copy<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here:
warning: definition of implicit copy assignment operator for 'HanabiHand' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy]

        pointer __m = _VSTD::copy(__first, __mid, this->__begin_);        return _VSTD::__copy_constexpr(In file included from
  HanabiHand(const HanabiHand& hand)
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/rl_search.cc
:7                             ^:

  ^In file included from
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/rl_search.h:12:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/game_sim.h:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:24:
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_hand.h:                      ^98/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector
::1405:39::  warningnote: : in instantiation of function template specialization 'std::vector<hanabi_learning_env::HanabiHand>::assign<hanabi_learning_env::HanabiHand *>' requested heredefinition of implicit copy assignment operator for 'HanabiHand' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy]

/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1451:30:         assign(__x.__begin_, __x.__end_);
note        ^:
in instantiation of function template specialization 'std::copy<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
  HanabiHand(const HanabiHand& hand)
  ^
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:36:7:         pointer __m = _VSTD::copy(__first, __mid, this->__begin_);
                             ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1405:9: note: in instantiation of function template specialization 'std::vector<hanabi_learning_env::HanabiHand>::assign<hanabi_learning_env::HanabiHand *>' requested here
        assign(__x.__begin_, __x.__end_);
        ^
note: in instantiation of member function 'std::vector<hanabi_learning_env::HanabiHand>::operator=' requested here
class HanabiState {/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h
:36/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm::71711:: 19: note: notein instantiation of member function 'std::vector<hanabi_learning_env::HanabiHand>::operator=' requested here:
in implicit copy assignment operator for 'hanabi_learning_env::HanabiHand' first required here      ^class HanabiState {In file included from


/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/sim_actor.cc:      ^/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm7
::
1711In file included from :/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/sim_actor.h19::11 :
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/hybrid_model.hnote:: 10:
in implicit copy assignment operator for 'hanabi_learning_env::HanabiHand' first required hereIn file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/game_sim.h
:10:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h        *__result = *__first;:
24:
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_hand.h                  ^:
98:3: warning: definition of implicit copy assignment operator for 'HanabiHand' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy]
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:1745:23: note: in instantiation of function template specialization 'std::__copy_constexpr<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
        return _VSTD::__copy_constexpr(
                      ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1451:30: note: in instantiation of function template specialization 'std::copy<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here        *__result = *__first;  HanabiHand(const HanabiHand& hand)


                  ^  ^

        pointer __m = _VSTD::copy(__first, __mid, this->__begin_);
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:                             ^1745:23
: /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:1711note:: 19:in instantiation of function template specialization 'std::__copy_constexpr<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
note: in implicit copy assignment operator for 'hanabi_learning_env::HanabiHand' first required here/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector
:1405:9:         return _VSTD::__copy_constexpr(note
:         *__result = *__first;                      ^in instantiation of function template specialization 'std::vector<hanabi_learning_env::HanabiHand>::assign<hanabi_learning_env::HanabiHand *>' requested here

/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:        assign(__x.__begin_, __x.__end_);
        ^
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:36:7: note: in instantiation of member function 'std::vector<hanabi_learning_env::HanabiHand>::operator=' requested here
1451:class HanabiState {30
:       ^
note: in instantiation of function template specialization 'std::copy<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
        pointer __m = _VSTD::copy(__first, __mid, this->__begin_);
                             ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1405:9: note: in instantiation of function template specialization 'std::vector<hanabi_learning_env::HanabiHand>::assign<hanabi_learning_env::HanabiHand *>' requested here
        assign(__x.__begin_, __x.__end_);
        ^
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h:36:7: note: in instantiation of member function 'std::vector<hanabi_learning_env::HanabiHand>::operator=' requested here
class HanabiState {
      ^

                  ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:1745:23: note: in instantiation of function template specialization 'std::__copy_constexpr<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
        return _VSTD::__copy_constexpr(
                      ^
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1451:30: note: in instantiation of function template specialization 'std::copy<hanabi_learning_env::HanabiHand *, hanabi_learning_env::HanabiHand *>' requested here
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/sparta.cc        pointer __m = _VSTD::copy(__first, __mid, this->__begin_);
                             ^
:/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:81405:
:In file included from 9/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/future:: 366:
In file included from note/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/system_error: :149:
in instantiation of function template specialization 'std::vector<hanabi_learning_env::HanabiHand>::assign<hanabi_learning_env::HanabiHand *>' requested hereIn file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/string
:511:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/string_view:179:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/__string:57:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithm:653:
        assign(__x.__begin_, __x.__end_);/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/memory
:1072:        ^5
: fatal error: In file included from static_assert failed due to requirement '__is_cpp17_move_insertable<std::allocator<search::GameSimulator>, void>::value' "The specified type does not meet the requirements of Cpp17MoveInsertable"
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/thread_loop.cc:7:
In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/thread_loop.h:/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/hanabi_state.h9::
36In file included from :/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/co7: note    static_assert(__is_cpp17_move_insertable<_Alloc>::value,: de/obl-paper/off-belief-learning/rlcc/utils.hin instantiation of member function 'std::vector<hanabi_learning_env::HanabiHand>::operator=' requested here

In file included from :    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/rl_search.cc:7class HanabiState {:

In file included from       ^/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/rl_search.h
9:9:
:
In file included from In file included from /Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/rela/context.h/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/hanabi-learning-environment/hanabi_lib/canonical_encoders.h::1121:
/Library/Developer/CommandLineTools/SDKs/M:
acOSX12.1.sdk/usr/include/c++/v1/memoryIn file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:954::27612:
:In file included from  /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/__bit_reference:15note:
: In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/algorithmin instantiation of function template specialization 'std::__construct_backward_with_exception_guarantees<std::allocator<search::GameSimulator>, search::GameSimulator *>' requested here:653
:
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/memory::10721072::55::  fatal errorfatal error: : static_assert failed due to requirement '__is_cpp17_move_insertable<std::allocator<search::GameSimulator>, void>::value' "The specified type does not meet the req    _VSTD::__construct_backward_with_exception_guarantees(this->__alloc(), this->__begin_, this->__end_, __v.__begin_);
           ^
uirements of Cpp17MoveInsertable"static_assert failed due to requirement '__is_cpp17_move_insertable<std::allocator<search::GameSimulator>, void>::value' "The specified type does not meet the requirements of Cpp17MoveInsertable"
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1588:9: note: in instantiation of member function 'std::vector<search::GameSimulator>::__swap_out_circular_buffer' requested here

    static_assert(__is_cpp17_move_insertable<_Alloc>::value,
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~        __swap_out_circular_buffer(__v);

        ^
/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/sparta.cc:23    static_assert(__is_cpp17_move_insertable<_Alloc>::value,:
9:     ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: in instantiation of member function 'std::vector<search::GameSimulator>::reserve' requested here
/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:954  games.reserve(hands.size());:
12:        ^ /Library/Developer/CommandLineTools/SDKs/
MacOSX12.1.sdk/usr/include/c++/v1/vector:954:12: notenote: in instantiation of function template specialization 'std::__construct_backward_with_exception_guarantees<std::allocator<search::GameSimulator>, search::GameSimulator *>' requested here
: in instantiation of function template specialization 'std::__construct_backward_with_exception_guarantees<std::allocator<search::GameSimulator>, search::GameSimulator *>' requested here    _VSTD::__construct_backward_with_exception_guarantees(this->__alloc(), this->__begin_, this->__end_, __v.__begin_);
           ^

    _VSTD::__construct_backward_with_exception_guarantees(this->__alloc(), this->__begin_, this->__end_, __v.__begin_);/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector
:1588:9: note: in instantiation of member function 'std::vector<search::GameSimulator>::__swap_out_circular_buffer' requested here
        __swap_out_circular_buffer(__v);
           ^        ^

/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/thread_loop.cc:104:9: note: in instantiation of member function 'std::vector<search::GameSimulator>::reserve' requested here/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/include/c++/v1/vector:1588:9: note: in instantiation of member function 'std::vector<search::GameSimulator>::__swap_out_circular_buffer' requested here
        __swap_out_circular_buffer(__v);
        ^

/Users/matthias/Documents/Studium/Oxford/trinity/dissertation/code/obl-paper/off-belief-learning/searchcc/rl_search.cc:174:9  envs_.reserve(numEnv_);:
        ^
 note: in instantiation of member function 'std::vector<search::GameSimulator>::reserve' requested here
  games.reserve(hands.size());
        ^
1 warning and 1 error generated.
1 warning and 1 error generated.
1 warning and 1 error generated.
make[2]: *** [CMakeFiles/hanalearn.dir/searchcc/sparta.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/hanalearn.dir/searchcc/thread_loop.cc.o] Error 1
make[2]: *** [CMakeFiles/hanalearn.dir/searchcc/rl_search.cc.o] Error 1
1 warning generated.
1 warning generated.
1 warning generated.
make[1]: *** [CMakeFiles/hanalearn.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/rela.dir/all] Error 2
make: *** [all] Error 2

Moreover, listing the files in the folder mentioned by the error message gives the following.

(base) matthias@MacBookPro ~/Developer/miniconda3/envs/obl-paper/lib/python3.7/site-packages/torch/lib$ ls -Algp
total 906216
-rwxr-xr-x  1 matthias  staff     377704 Jun 19 18:10 libc10.dylib
-rwxr-xr-x  1 matthias  staff   56551680 Jun 19 18:10 libcaffe2_detectron_ops.dylib
-rwxr-xr-x  1 matthias  staff      66664 Jun 19 18:10 libcaffe2_module_test_dynamic.dylib
-rwxr-xr-x  1 matthias  staff     143864 Jun 19 18:10 libcaffe2_observers.dylib
-rwxr-xr-x  1 matthias  staff    3052912 Jun 19 18:10 libiomp5.dylib
-rwxr-xr-x  1 matthias  staff      59968 Jun 19 18:10 libshm.dylib
-rwxr-xr-x  1 matthias  staff      16432 Jun 19 18:10 libtorch.dylib
-rwxr-xr-x  1 matthias  staff  369404224 Jun 19 18:10 libtorch_cpu.dylib
-rwxr-xr-x  1 matthias  staff      16432 Jun 19 18:10 libtorch_global_deps.dylib
-rwxr-xr-x  1 matthias  staff   29149776 Jun 19 18:10 libtorch_python.dylib

Renaming libtorch_python.dylib to libtorch_python.so doesn't fix the main error in the output above. What else could I try at this point?

opened by hericks 8

Torchscript casting error
Following the discussion from the the issue I posted, where I was experiencing silent deadlocks when running theobl1.sh script, it was suggested by @hengyuan-hu for me to try a new CUDA version. I decided to throw a hail mary, and try the latest version of Pytorch.

Here are the details of my new software setup inside a docker container:

Ubuntu 20.04

CUDA 11.3

Python 3.7.4

Pytorch 1.10.2

Pybind "stable" @9b4f71d12de4f9

It compiled successfully, but when I run the script, I'm experiencing a new torchscript error:

Traceback (most recent call last): File "selfplay.py", line 237, in <module> belief_model, File "/app/pyhanabi/act_group.py", line 45, in __init__ runner = rela.BatchRunner(agent.clone(dev), dev) RuntimeError: Unable to cast Python instance of type <class 'torch._C.ScriptModule'> to C++ type 'torch::jit::Module'

I suspect that I'm experiencing this error because torchscript has changed in the latest PyTorch. I will investigate further and report back here once I've figured out some more information.

If you have any idea what might be causing this issue, I'd be very happy to hear your thoughts!
opened by ravihammond 7
obl1.sh script Freezing
I've successfully trained with the provided iql.sh, belief.sh, and belief_obl0.sh scripts, but when running obl1.sh, the training freezes after a few hours. I've seen it freeze at different times (~300, ~400, ~500 epochs). I've put a bunch of prints in the selfplay.py training loop, but it appears to be freezing at different unrelated places in the code. To me, this smells like a multi-threading issue. When the terminal freezes there is no error message, and I can't ctrl-c to exist the program - it's completely unresponsive. Also, when it freezes, the program still holds on to all of the RAM and VRAM - which gets cleared if I kill the terminal. Neither the RAM or VRAM has run out of memory, RAM is sitting at ~80% and VRAM < 50%.

Software inside a docker container:

Ubuntu 18.04

CUDA 11.0

Python 3.7.4

Pytorch 1.7.1

Hardware:

GPU's: 3x Geforce RTX 3090

CPU: 1x AMD Ryzen Threadripper 1920X

My hardware forces me to use newer CUDA and Pytorch versions.

To account for fewer GPU's, I've moved the belief and acting to the same GPUs (maybe this could be contributing to the freezing). Here is the obl1.sh script I'm using to train:

python selfplay.py \ --save_dir exps/obl1 \ --num_thread 24 \ --num_game_per_thread 24 \ --sad 0 \ --act_base_eps 0.1 \ --act_eps_alpha 7 \ --lr 6.25e-05 \ --eps 1.5e-05 \ --grad_clip 5 \ --gamma 0.999 \ --seed 2254257 \ --batchsize 128 \ --burn_in_frames 10000 \ --replay_buffer_size 100000 \ --epoch_len 1000 \ --num_epoch 1500 \ --num_player 2 \ --rnn_hid_dim 512 \ --multi_step 1 \ --train_device cuda:0 \ --act_device cuda:1,cuda:2 \ --num_lstm_layer 2 \ --boltzmann_act 0 \ --min_t 0.01 \ --max_t 0.1 \ --off_belief 1 \ --num_fict_sample 10 \ --belief_device cuda:1,cuda:2 \ --belief_model exps/belief_obl0/model0.pthw \ --load_model None \ --net publ-lstm \

This is what I plan to do moving forward to attempt to fix the issue:

See if I can get the obl1.sh script to finish when running less epochs (1-2).

Try running it with 1 thread (if possible).

Start to strip the code into smaller pieces to investigate each moving part.

Hopefully, it won't take too much work to identify the freeze, but if I could get some of your insights into what might possibly be causing the freeze, I'll be very grateful!

Thanks so much :)
opened by ravihammond 5
How do you calculate the per-card cross entropy for Learned Belief?

Hi Hengyuan,

I hope you're well! I have a question regarding your Learned Belief Search paper, Figure 4: "Per-card cross entropy with the true hand for different belief in games played by BP". I have trained my own belief model, and would like to investigate its performance using the same loss calculation that you've used in this figure.

How exactly are you comparing the belief with the true hand? After investigating the sample() function from this OBL repo, I can see that for each card in the players hand, you're storing the list of probabilities in this prob array, which is used for sampling. To calculate the cross-entropy loss, are you using this array?

To help me understand this better, would it be possible for me to see the code you used to generate the per-card cross entropy loss for this figure?

opened by ravihammond 4

Also cannot build project

Hello, I'm trying to build this project as per readme but cannot get past the make step.

Linux: Ubuntu 18.04.6 LTS Python: 3.8.12 Torch: 1.11.0+cu102

Error:


> (tensorflow2_p38) ubuntu:~/off-belief-learning$ make
[  3%] Building CXX object CMakeFiles/rela_lib.dir/rela/transition.cc.o
In file included from /home/ubuntu/anaconda3/envs/tensorflow2_p38/x86_64-conda-linux-gnu/include/c++/9.3.0/chrono:41,
                 from /home/ubuntu/anaconda3/envs/tensorflow2_p38/x86_64-conda-linux-gnu/include/c++/9.3.0/mutex:39,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/c10/util/typeid.h:8,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/c10/core/ScalarTypeToTypeMeta.h:4,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/c10/core/TensorOptions.h:10,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/ATen/Operators.h:14,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /home/ubuntu/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/ubuntu/off-belief-learning/rela/transition.h:9,
                 from /home/ubuntu/off-belief-learning/rela/transition.cc:7:
/home/ubuntu/anaconda3/envs/tensorflow2_p38/x86_64-conda-linux-gnu/include/c++/9.3.0/ctime:80:11: error: '::timespec_get' has not been declared
   80 |   using ::timespec_get;
      |           ^~~~~~~~~~~~
compilation terminated due to -Wfatal-errors.
CMakeFiles/rela_lib.dir/build.make:75: recipe for target 'CMakeFiles/rela_lib.dir/rela/transition.cc.o' failed
make[2]: *** [CMakeFiles/rela_lib.dir/rela/transition.cc.o] Error 1
CMakeFiles/Makefile2:140: recipe for target 'CMakeFiles/rela_lib.dir/all' failed
make[1]: *** [CMakeFiles/rela_lib.dir/all] Error 2
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

Any advice?

opened by philip-amortila 3

Any suggestions for the hyperparameter aux_weight

Hi Hengyuan,

There is a hyperparameter aux_weight in selfplay.py, and I guess that it indicates the weight of the auxiliary task loss term. However, I cannot find the recommended value for it. Any Suggestions?

Sincerely, PaladinEE15

opened by PaladinEE15 2
Sending v0-belief to python frontend
Hi, Hengyuan. Thanks for the great repo! I wonder if it possible to send the v0-belief (float vector) to python frontend interface here:

#pyhanabi.py c_encoding_str = lib.EncodeObservation(self._encoder, observation.observation())

which is generated from here

#pyhanabi.CC char* EncodeObservation(pyhanabi_observation_encoder_t* encoder, pyhanabi_observation_t* observation) { REQUIRE(encoder != nullptr); REQUIRE(encoder->encoder != nullptr); REQUIRE(observation != nullptr); REQUIRE(observation->observation != nullptr); return nullptr; }

The original binary representation (0 or 1) can be easily encapsulated as a string stream (as done in the vanilla HLE). I wonder what might be a good choice for floats here (maybe truncating them might be inaccurate)? An temporary solution I come out is to send required statistics to the python frontend and done v0-belief calculation there.
opened by 0xJchen 1

Illegal Move Error [Solved]

While attempting to run obl1.sh and the other training scripts, the program would always crash, giving me an illegal move error at the beginning of training. After implementing the solution from another issue described here, the script finally started working, but after a few hours, another illegal move would occur.

My software and hardware setup is described here.

After some investigation, I identified the cause of the problem and solved the issue. My experiments showed that there was an approximately 1/1000000 chance when calling torch.multinomial(), that it will sample an element with zero weight - which should never happen. Here is a post where somebody else has experienced the same issue. The problem was eventually fixed in newer versions of Pytorch, but the version I'm using still has the same issue.

To fix the problem, I edited the act() function in the r2d2.py file to check for an illegal action, and re-sample a new one. I put this code inside a loop to cover the exceptionally rare case of more than one illegal action being sampled in a row.

Here is the fixed act() function I'm using in my code. The program now runs without any illegal action errors.

@torch.jit.script_method
def act(self, obs: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
    """
    Acts on the given obs, with eps-greedy policy.
    output: {'a' : actions}, a long Tensor of shape
        [batchsize] or [batchsize, num_player]
    """
    priv_s = obs["priv_s"]
    publ_s = obs["publ_s"]
    legal_move = obs["legal_move"]

    if "eps" in obs:
        eps = obs["eps"].flatten(0, 1)
    else:
        eps = torch.zeros((priv_s.size(0),), device=priv_s.device)

    if self.vdn:
        bsize, num_player = obs["priv_s"].size()[:2]
        priv_s = obs["priv_s"].flatten(0, 1)
        publ_s = obs["publ_s"].flatten(0, 1)
        legal_move = obs["legal_move"].flatten(0, 1)
    else:
        bsize, num_player = obs["priv_s"].size()[0], 1

    hid = {"h0": obs["h0"], "c0": obs["c0"]}

    if self.boltzmann:
        temp = obs["temperature"].flatten(0, 1)
        greedy_action, new_hid, prob = self.boltzmann_act(
            priv_s, publ_s, legal_move, temp, hid
        )
        reply = {"prob": prob}
    else:
        greedy_action, new_hid = self.greedy_act(priv_s, publ_s, legal_move, hid)
        reply = {}

    random_action = legal_move.multinomial(1).squeeze(1)

    # Re-sample random action if action is illegal
    while True:
        legal_size = torch.prod(torch.tensor(legal_move.shape))
        ascending = torch.arange(0, legal_size, legal_move.shape[1], device=random_action.device)
        take_indices = random_action + ascending
        take_result = torch.take(legal_move, take_indices)
        illegal_action_selected = torch.any(take_result == 0)
        
        if not illegal_action_selected:
            break

        random_action = legal_move.multinomial(1).squeeze(1)
    
    rand = torch.rand(greedy_action.size(), device=greedy_action.device, dtype=torch.double)

    if self.greedy:
        action = greedy_action
    else:
        assert rand.size() == eps.size()
        # Use fix from https://github.com/facebookresearch/hanabi_SAD/issues/20
        action = torch.where(rand < eps, random_action, greedy_action).detach()

    if self.vdn:
        action = action.view(bsize, num_player)
        greedy_action = greedy_action.view(bsize, num_player)
        rand = rand.view(bsize, num_player)

    reply["a"] = action.detach().cpu()
    reply["h0"] = new_hid["h0"].detach().cpu()
    reply["c0"] = new_hid["c0"].detach().cpu()

    size = torch.prod(torch.tensor(legal_move.shape))
    ascending = torch.arange(0, size, legal_move.shape[1])
    take_indices = reply["a"] + ascending
    selected_action = torch.take(legal_move.detach().cpu(), take_indices)

    # If an illegal action get's selected, kill the progrom.
    if torch.any(selected_action == 0) == 1:
        raise Exception("Found illegal action")

    return reply

opened by ravihammond 1

Adding Code of Conduct file

This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0
Adding Contributing file

This is pull request was created automatically because we noticed your project was missing a Contributing file.

CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0
How to get Belief Fine-Tuning working with RL Search?

Hi Hengyuan,

I'm wanting to get my agent to apply rl_search and belief fine-tuning at test time. I've successfully got rl_search working by running this python file, however, when I try to get rl_search to use a belief model using this argument, it's giving me an error on this line that the constructor for ARBeliefModel does not accept the "mode" keyword argument. It seems to me that the OBL github repository is not up-to-date to allow belief fine-tuning with RL search.

How can I get belief fine-tuning to work with rl search? Any help is greatly appreciated!

Thanks in advance :)

opened by ravihammond 1
GPU VRAM Memory Leak

Hi Hengyuan,

I've discovered a GPU memory leak. When creating an actor in act_group.py for training, at this line you're giving it a reference to all other actors in its game. I've discovered that this line is creating memory leak, and the training GPU is crashing with an out-of-memory exception under special circumstances.

The bug is reproducible by doing the following two steps:

1 - Giving all actors in the eval.py script access to each other in same same way. 2 - Running the cross_play.py script to get 15-20 actors to evaluated against each other.

If you run watch nvidia-smi whilst cross_play.py is running, you'll notice that memory in the training gpu will fill up a little bit every time a evaluate_saved_model() is called.

Do you have any idea why this might be happening? I think it might be something to do with the python -> c++ pybind FFI boundary. It seems like the shared_ptr reference counts are not going to 0 when the actors go out of scope.

opened by ravihammond 4
Can't print within torchscript during in eval mode

I'm trying to print to the terminal inside the r2d2 act() function. Whilst training using selfplay.py, the act group that generates data are successfully printing, however, when I run the eval_model.py script, the program is freezing when calling runner_->get("act").

Would you have any idea why this is happening, and how I can fix it from freezing? I would like to run some print statements within torchscript during eval mode.

opened by ravihammond 1

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Related tags

Overview

Off-Belief Learning

Introduction

Environment Setup

Code Structure

Run the Code

Download Models

Copyright

Comments

Owner

Facebook Research

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)