Deep Reinforcement Learning based Trading Agent for Bitcoin

Overview

Deep Trading Agent

license dep1 dep2 dep3 dep4 dep4
Deep Reinforcement Learning based Trading Agent for Bitcoin using DeepSense Network for Q function approximation.

model
For complete details of the dataset, preprocessing, network architecture and implementation, refer to the Wiki of this repository.

Requirements

  • Python 2.7
  • Tensorflow
  • Pandas (for pre-processing Bitcoin Price Series)
  • tqdm (for displaying progress of training)

To setup a ubuntu virtual machine with all the dependencies to run the code, refer to assets/vm.

Run with Docker

Pull the prebuilt docker image directly from docker hub and run it as

docker pull samre12/deep-trading-agent:latest
docker run -p 6006:6006 -it samre12/deep-trading-agent:latest

OR

Build the docker image locally by executing the command and the run the image as

docker build -t deep-trading-agent .
docker run -p 6006:6006 -it deep-trading-agent

This will setup the repository for training the agent and

  • mount the current directory into /deep-trading-agent in the container

  • during image build, the latest transactions history from the exchange is pulled and sampled to create per-minute scale dataset of Bitcoin prices. This dataset is placed at /deep-trading-agent/data/btc.csv

  • to initiate training of the agent, specify suitable parameters in a config file (an example config file is provided at /deep-trading-agent/code/config/config.cfg) and run the code using /deep-trading-agent/code/main.py

  • training supports logging and monitoring through Tensorboard

  • vim and screen are installed in the container to edit the configuration files and run tensorboard

  • bind port 6006 of container to 6006 of host machine to monitor training using Tensorboard

Support

Please give a ⭐ to this repository to support the project πŸ˜„ .

ToDo

Docker Support

  • Add Docker support for a fast and easy start with the project

Improve Model performance

  • Extract highest and lowest prices and the volume of Bitcoin traded within a given time interval in the Preprocessor
  • Use closing, highest, lowest prices and the volume traded as input channels to the model (remove features calculated just using closing prices)
  • Normalize the price tensors using the price of the previous time step
  • For the complete state representation, input the remaining number of trades to the model
  • Use separate diff price blocks to calculate the unrealized PnL
  • Use exponentially decayed weighted unrealized PnL as a reward function to incorporate current state of investment and stabilize the learning of the agent

Trading Model

is inspired by Deep Q-Trading where they solve a simplified trading problem for a single asset.
For each trading unit, only one of the three actions: neutral(1), long(2) and short(3) are allowed and a reward is obtained depending upon the current position of agent. Deep Q-Learning agent is trained to maximize the total accumulated rewards.
Current Deep Q-Trading model is modified by using the Deep Sense architecture for Q function approximation.

Dataset

Per minute Bitcoin series is obtained by modifying the procedure mentioned in this repository. Transactions in the Coinbase exchange are sampled to generate the Bitcoin price series.
Refer to assets/dataset to download the dataset.

Preprocessing

Basic Preprocessing
Completely ignore missing values and remove them from the dataset and accumulate blocks of continuous values using the timestamps of the prices.
All the accumulated blocks with number of timestamps lesser than the combined history length of the state and horizon of the agent are then filtered out since they cannot be used for training of the agent.
In the current implementation, past 3 hours (180 minutes) of per minute Bitcoin prices are used to generate the representation of the current state of the agent.
With the existing dataset (at the time of writing), following are the logs generated while preprocessing the dataset:

INFO:root:Number of blocks of continuous prices found are 58863
INFO:root:Number of usable blocks obtained from the dataset are 887
INFO:root:Number of distinct episodes for the current configuration are 558471

Advanced Preprocessing
Process missing values and concatenate smaller blocks to increase the sizes of continuous price blocks.
Standard technique in literature to fill the missing values in a way that does not much affect the performance of the model is using exponential filling with no decay.
(To be implemented)

Implementation

Tensorflow "1.1.0" version is used for the implementation of the Deep Sense network.

Deep Sense

Implementation is adapted from this Github repository with a few simplifications in the network architecture to incorporate learning over a single time series of the Bitcoin data.

Deep Q Trading

Implementation and preprocessing is inspired from this Medium post. The actual implementation of the Deep Q Network is adapted from DQN-tensorflow.

Comments
  • ERROR: No section: 'global'

    ERROR: No section: 'global'

    Hello, I tried running the code with the config file shown in the image and I still getting that error. But the 'global' section is at the beginning of the file.

    image

    >sudo cat config/config.cfg
    [global]
    PARENT_DIR = /home/mmnkl9/deep-trading-agent/code/
    
    [logging]
    LOG_FILE = logs/run.log
    SAVE_DIR = logs/saved_models/
    TENSORBOARD_LOG_DIR = logs/tensorboard/
    
    [preprocessing]
    DATASET_PATH = data/btc.csv
    
    [dataset]
    BATCH_SIZE = 32
    HISTORY_LENGTH = 180
    
    opened by cTatu 5
  • Price gap between historical dataset and live API requests

    Price gap between historical dataset and live API requests

    For kaggle's btceUSD dataset we got these last 5 lines in which BTC(USD) weighted price =2130~2139:

    1496188560,2130,2130,2129.999,2129.999,2.35330472,5012.53827,2129.999667
    1496188620,2130,2134.8,2130,2134.8,12.03238662,25648.897767,2131.6550554
    1496188680,2133.2,2134.8,2133.2,2134.8,1.40799439,3005.4756141,2134.5792536
    1496188740,2134.8,2143.79,2134.8,2134.801,3.19062001,6824.2637131,2138.8519133
    1496188800,2139.999,2140,2134.35,2139.999,6.73707917,14413.324651,2139.4025938
    

    When I make a request to cryptocompare website for BTC(USD) I get this where the price suddenly jumps to 7656.98 USD at the same timestamp:

    {"RAW":{"BTC":{"USD":{"TYPE":"5","MARKET":"CCCAGG","FROMSYMBOL":"BTC","TOSYMBOL":"USD","FLAGS":"1","PRICE":7656.98,"LASTUPDATE":1511013600,"LASTVOLUME":0.26175057,"LASTVOLUMETO":2002.3918605000001,"LASTTRADEID":"26754557","VOLUMEDAY":48775.74066467763,"VOLUMEDAYTO":372532531.85840243,"VOLUME24HOUR":89903.71924385922,"VOLUME24HOURTO":693640577.3317167,"OPENDAY":7699.95,"HIGHDAY":7798.12,"LOWDAY":7458.9,"OPEN24HOUR":7829.71,"HIGH24HOUR":7982.5,"LOW24HOUR":7446.1,"LASTMARKET":"Bitstamp","CHANGE24HOUR":-172.73000000000047,"CHANGEPCT24HOUR":-2.2060842611029075,"CHANGEDAY":-42.970000000000255,"CHANGEPCTDAY":-0.5580555717894304,"SUPPLY":16686375,"MKTCAP":127767239647.5,"TOTALVOLUME24H":480649.677315126,"TOTALVOLUME24HTO":3685574563.3642445}}}

    What am I missing here sir?

    question 
    opened by mvrozanti 4
  • TypeError: unhashable type: 'numpy.ndarray'

    TypeError: unhashable type: 'numpy.ndarray'

    Hello, I am running main.py and got the following error:

    Traceback (most recent call last):
      File "main.py", line 36, in <module>
        main(vars(args)['file_path'])
      File "main.py", line 27, in main
        agent.train()
      File "deep-trading-agent/code/model/agent.py", line 70, in train
        screen, reward, terminal = self.env.act(action)
      File "deep-trading-agent/code/model/environment.py", line 62, in act
        if self.action_dict[action] is LONG:
    TypeError: unhashable type: 'numpy.ndarray'
    

    Could you tell me what goes wrong? Thanks

    bug 
    opened by trungtv 3
  • Running the docker container throws `killed` in preprocess.py

    Running the docker container throws `killed` in preprocess.py

    Removing intermediate container 3d5347bf9610
     ---> a84360b64f80
    Step 11/16 : RUN gunzip /deep-trading-agent/data/coinbaseUSD.csv.gz
     ---> Running in 8b963696a5b5
    Removing intermediate container 8b963696a5b5
     ---> 78e1cb2c844f
    Step 12/16 : RUN python /deep-trading-agent/code/preprocess.py --transactions /deep-trading-agent/data/coinbaseUSD.csv --dataset /deep-trading-agent/data/btc.csv
     ---> Running in a2fc04a918b0
    Killed
    The command '/bin/sh -c python /deep-trading-agent/code/preprocess.py --transactions /deep-trading-agent/data/coinbaseUSD.csv --dataset /deep-trading-agent/data/btc.csv' returned a non-zero code: 137
    
    bug 
    opened by archienorman11 2
  • enhancement - risk reward / position sizing

    enhancement - risk reward / position sizing

    it seems without a sense of risk / reward or position sizing - the trained model is a glorified heads + tails flip calculator based off previous data.

    it maybe beyond scope of project - but to build an autonomous agent that can a variety of actions - need to somehow incorporate this to win the trading game. Perhaps there's two trained models.

    https://github.com/m-esgana/poor-trader-py/blob/097fbb10c217abe0db5a2cd55d73e0ad2990acd9/poor_trader/trading.py

    opened by johndpope 2
  • ModuleNotFoundError: No module named 'ConfigParser'

    ModuleNotFoundError: No module named 'ConfigParser'

    Hi! Tried to run your agent but found an error - "ModuleNotFoundError: No module named 'ConfigParser'" - seems that this file was not committed ah, okay - it's python 2.7

    opened by philipshurpik 2
  • Add trading activity as an input

    Add trading activity as an input

    I am a very beginner in ML and so have little understanding on what this model capable of. Only want to suggest and, hopefully, get a feedback.

    If we analyze only raw exchange data, as trades in order books and closed ones, we may have:

    • current buy and sell price
    • buy and sell volumes in some period of time
    • number of buy and sell orders closed (they are not always correlated with volume)
    • buy and sell order book (for example, volume needed to move price for particular percent, or measure activity in order book)

    As post-processing of these values, we can divide them by SMA, measure speed they are changing.

    opened by nanvel 1
  • AttributeError: 'Agent' object has no attribute 'q_gru_keep_prob'

    AttributeError: 'Agent' object has no attribute 'q_gru_keep_prob'

    Problems with Python 3.6 - Unable to get past this point

    2018-05-21 08:58:16.038248: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2018-05-21 08:58:16.278040: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.4425 pciBusID: 0000:02:00.0 totalMemory: 4.00GiB freeMemory: 3.29GiB 2018-05-21 08:58:16.278375: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0 2018-05-21 08:58:16.779930: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-21 08:58:16.780127: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0 2018-05-21 08:58:16.780253: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N 2018-05-21 08:58:16.780467: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3025 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) INFO: Making new env: UnRealizedPnLEnv-v0 Traceback (most recent call last): File "D:/deep-trading-agent-dev/code/main.py", line 44, in <module> main(vars(args)['file_path']) File "D:/deep-trading-agent-dev/code/main.py", line 34, in main agent = Agent(sess, logger, config, env) File "D:\deep-trading-agent-dev\code\model\agent.py", line 252, in __init__ self.build_dqn(params) File "D:\deep-trading-agent-dev\code\model\agent.py", line 448, in build_dqn print (self.q_gru_keep_prob) AttributeError: 'Agent' object has no attribute 'q_gru_keep_prob'

    opened by SdxHex 1
  • InvalidArgumentError (see above for traceback): Input to reshape is a tensor wit h 80640 values, but the requested shape has 115200

    InvalidArgumentError (see above for traceback): Input to reshape is a tensor wit h 80640 values, but the requested shape has 115200

    Hi, cool project.

    Getting a weird error during training when using the same config as example.cfg:

    KeyError: KeyError(<weakref at 0x7f3801e2a628; to 'tqdm' at 0x7f38019d88d0>,) in
     <bound method tqdm.__del__ of   0%|                    | 24866/25000000 [00:05<
    1:24:55, 4901.45it/s]> ignored
    Traceback (most recent call last):
      File "main.py", line 38, in <module>
        main(vars(args)['file_path'])
      File "main.py", line 29, in main
        agent.train()
      File "/home/ted/projects/deep-trading-agent/code/model/agent.py", line 71, in 
    train
        self.observe(screen, reward, action, terminal, trade_rem)
      File "/home/ted/projects/deep-trading-agent/code/model/agent.py", line 170, in
     observe
        self.q_learning_mini_batch()
      File "/home/ted/projects/deep-trading-agent/code/model/agent.py", line 186, in
     q_learning_mini_batch
        self.t_trade_rem_t: trade_rem_t_plus_1
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.
    py", line 905, in run
        run_metadata_ptr)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.
    py", line 1137, in _run
        feed_dict_tensor, options, run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.
    py", line 1355, in _do_run
        options, run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.
    py", line 1374, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape i
    s a tensor with 80640 values, but the requested shape has 115200
             [[Node: t_q_network/Reshape_1 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _d
    evice="/job:localhost/replica:0/task:0/device:GPU:0"](t_q_network/conv_layers/co
    nv_layer_2/conv_2/Relu, t_q_network/Reshape_1/shape)]]
             [[Node: t_q_network/q_values/BiasAdd/_139 = _Recv[client_terminated=fal
    se, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/jo
    b:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_na
    me="edge_274_t_q_network/q_values/BiasAdd", tensor_type=DT_FLOAT, _device="/job:
    localhost/replica:0/task:0/device:CPU:0"]()]]
    
    Caused by op u't_q_network/Reshape_1', defined at:
      File "main.py", line 38, in <module>
        main(vars(args)['file_path'])
      File "main.py", line 28, in main
        agent = Agent(sess, logger, config, env)
      File "/home/ted/projects/deep-trading-agent/code/model/agent.py", line 44, in 
    __init__
        self.build_dqn(params)
      File "/home/ted/projects/deep-trading-agent/code/model/agent.py", line 254, in
     build_dqn
        self.t_q.build_model((self.t_s_t, self.t_trade_rem_t))
      File "/home/ted/projects/deep-trading-agent/code/model/deepsense.py", line 150
    , in build_model
        self.params.window_size * self.params.filter_sizes[-1]
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_o
    ps.py", line 3903, in reshape
        "Reshape", tensor=tensor, shape=shape, name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_de
    f_library.py", line 787, in _apply_op_helper
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.p
    y", line 3271, in create_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.p
    y", line 1650, in __init__
        self._traceback = self._graph._extract_stack()  # pylint: disable=protected-
    access
    
    InvalidArgumentError (see above for traceback): Input to reshape is a tensor wit
    h 80640 values, but the requested shape has 115200
             [[Node: t_q_network/Reshape_1 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _d
    evice="/job:localhost/replica:0/task:0/device:GPU:0"](t_q_network/conv_layers/co
    nv_layer_2/conv_2/Relu, t_q_network/Reshape_1/shape)]]
             [[Node: t_q_network/q_values/BiasAdd/_139 = _Recv[client_terminated=fal
    se, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/jo
    b:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_na
    me="edge_274_t_q_network/q_values/BiasAdd", tensor_type=DT_FLOAT, _device="/job:
    localhost/replica:0/task:0/device:CPU:0"]()]]
    

    Any idea for what this is from?

    bug 
    opened by txizzle 1
  • dev

    dev

    This branch contains recent code updates for improved model performance and docker support for easy jump start with the running the training of the agent.

    opened by samre12 0
  • Unable to understand the Reward Function

    Unable to understand the Reward Function

    Hi I am having difficulty in understanding the reward function. More specifically I am not able to understand what does "amount of long currency" and "amount of short currency" mean?

    I would appreciate if you could elaborate a on the reward function and the terms used in it.

    Thanks.

    opened by aayushdua007 0
  • Some Issues - Pycharm W10

    Some Issues - Pycharm W10

    Hi, I have some problems... the first is in utils/util

    def get_config_parser(filename): config = ConfigParser(allow_no_value=True) config.read(filename) return config

    filename is a NoneType object, I don't udenderstand wich file i must put.

    I tried with "config" instead "filename" but i receive this error:

    `Traceback (most recent call last): File "C:\Users\vadil\Anaconda3\envs\Test5\lib\configparser.py", line 1138, in _unify_values sectiondict = self._sections[section] KeyError: 'global'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "C:/Users/vadil/PycharmProjects/Test5/main.py", line 38, in main(vars(args)['file_path']) File "C:/Users/vadil/PycharmProjects/Test5/main.py", line 21, in main config = get_config(config_parser) File "C:\Users\vadil\PycharmProjects\Test5\utils\config.py", line 17, in get_config config[PARENT_DIR] = config_parser.get(GLOBAL, PARENT_DIR) File "C:\Users\vadil\Anaconda3\envs\Test5\lib\configparser.py", line 781, in get d = self._unify_values(section, vars) File "C:\Users\vadil\Anaconda3\envs\Test5\lib\configparser.py", line 1141, in _unify_values raise NoSectionError(section) configparser.NoSectionError: No section: 'global'

    Process finished with exit code 1`

    I love this project, in my university no one teach RL than I started this study alone.

    opened by ghost 0
  • ERROR: INVALID_TIMESTEP

    ERROR: INVALID_TIMESTEP

    Traceback (most recent call last): File "/home/arunavo/PycharmProjects/samre12Trading/trader/main.py", line 47, in main("/home/arunavo/PycharmProjects/samre12Trading/trader/agent/config/config.cfg") File "/home/arunavo/PycharmProjects/samre12Trading/trader/main.py", line 37, in main agent.train() File "/home/arunavo/PycharmProjects/samre12Trading/trader/agent/model/train.py", line 69, in train self.replay_memory.set_history(state, supplementary) File "/home/arunavo/PycharmProjects/samre12Trading/trader/agent/model/replay_memory.py", line 107, in set_history self.add(state[length - 1], 0.0, 0, False, supp) File "/home/arunavo/PycharmProjects/samre12Trading/trader/agent/model/replay_memory.py", line 53, in add raise ValueError(INVALID_TIMESTEP) ValueError: INVALID_TIMESTEP ERROR:deep_trading_agent:Invalid Timestep with shapes (4,), (1,)

    Process finished with exit code 1

    /// In file : train.py def train(self): ............

        state, supplementary = self.env.reset()
        self.history.set_history(state)
        self.replay_memory.set_history(state, supplementary)
    

    #env.reset() //returns -->> return self.historical_prices[self.current - self.history_length:self.current], np.array([1.0])

    from file: cryptoenv.py //its an np.array.shape --> 1 whereas supplementary needs 27 according to the config file

    P.S: for branch adaptive_normalization

    bug 
    opened by arunavo4 1
  • Nan in summary histogram for: q_network/avg_q_summary/q/2

    Nan in summary histogram for: q_network/avg_q_summary/q/2

    Hi, I tried using another datasets, one with a forex pair and another one with the BTC/USD pair but it gives me that error for every dataset. With the original dataset (btc.csv) this error doesn't happen, and the other ones has the same exact column names and data types. The only thing I did was changing the config file to take this new dataset.

    Here's a look of my custom BTC dataset.

    | DateTime_UTC | Timestamp | price_high | price_low | price_close | price_open | volume | | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | | 2017-08-05 00:00:00 | 1501891200 | 1601.0 | 1554.0 | 1592.0 | 1555.0 | 304.0 | | 2017-08-05 00:01:00 | 1501891260 | 1592.0 | 1591.0 | 1591.0 | 1592.0 | 248.0 |

    Also, the people on the internet say that I should decrease the learning rate, so I did, but the problem persist. I also tried messing around with the parameters from model/baseagent.py but same. I ask here because I tried everything, including changing the order of the colums in the dataset.

    Thank you!

    0%|                   | 49900/100000000 [00:15<8:40:07, 3202.78it/s]Traceback (most recent call last):
      File "main.py", line 38, in <module>
        main(vars(args)['file_path'])
      File "main.py", line 29, in main
        agent.train()
      File "/deep-trading-agent/code/model/agent.py", line 71, in train
        self.observe(screen, reward, action, terminal, trade_rem)
      File "/deep-trading-agent/code/model/agent.py", line 170, in observe
        self.q_learning_mini_batch()
      File "/deep-trading-agent/code/model/agent.py", line 204, in q_learning_mini_batch
        self.learning_rate_step: self.step
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run
        run_metadata_ptr)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run
        feed_dict_string, options, run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
        target_list, options, run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: q_network/avg_q_summary/q/2
             [[Node: q_network/avg_q_summary/q/2 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](q_network/avg_q_summary/q/2/tag, q_network/avg_q_summary/strided_slice_2)]]
    
    Caused by op u'q_network/avg_q_summary/q/2', defined at:
      File "main.py", line 38, in <module>
        main(vars(args)['file_path'])
      File "main.py", line 28, in main
        agent = Agent(sess, logger, config, env)
      File "/deep-trading-agent/code/model/agent.py", line 44, in __init__
        self.build_dqn(params)
      File "/deep-trading-agent/code/model/agent.py", line 237, in build_dqn
        self.q.build_model((self.s_t, self.trade_rem_t))
      File "/deep-trading-agent/code/model/deepsense.py", line 205, in build_model
        self._avg_q_summary.append(tf.summary.histogram('q/{}'.format(idx), avg_q[idx]))
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 209, in histogram
        tag=scope.rstrip('/'), values=values, name=scope)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
        name=name)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
        op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
        self._traceback = _extract_stack()
    
    InvalidArgumentError (see above for traceback): Nan in summary histogram for: q_network/avg_q_summary/q/2
             [[Node: q_network/avg_q_summary/q/2 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](q_network/avg_q_summary/q/2/tag, q_network/avg_q_summary/strided_slice_2)]]
    
    bug 
    opened by cTatu 10
  • training issue

    training issue

    Hi! Have a question about training.

    After 16 hours of training, I still get average reward 0. Will be happy if you can explain what can be wrong? Maybe it's a problem with default setup parameters?

    25%|β–ˆβ–ˆβ–ˆβ–Š | 6374997/25000000 [16:18:15<48:41:33, 106.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001807, avg_ep_r: 0.0000, max_ep_r: 0.0911, min_ep_r: -0.0984, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–Š | 6399993/25000000 [16:22:11<48:18:45, 106.94it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001469, avg_ep_r: 0.0000, max_ep_r: 0.0985, min_ep_r: -0.0641, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–Š | 6424989/25000000 [16:26:07<48:06:53, 107.24it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000138, avg_q: -0.001775, avg_ep_r: 0.0001, max_ep_r: 0.1445, min_ep_r: -0.0460, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–Š | 6449993/25000000 [16:30:03<48:41:25, 105.83it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000134, avg_q: -0.001525, avg_ep_r: -0.0000, max_ep_r: 0.0223, min_ep_r: -0.0371, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–‰ | 6477033/25000000 [16:34:16<47:10:48, 109.06it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000138, avg_q: -0.002763, avg_ep_r: -0.0000, max_ep_r: 0.0302, min_ep_r: -0.0762, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–‰ | 6499197/25000000 [16:37:41<47:10:50, 108.92it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000142, avg_q: -0.003163, avg_ep_r: 0.0000, max_ep_r: 0.0352, min_ep_r: -0.0225, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–‰ | 6526765/25000000 [16:41:56<47:30:54, 108.00it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000135, avg_q: -0.003114, avg_ep_r: -0.0000, max_ep_r: 0.0253, min_ep_r: -0.1445, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–‰ | 6551381/25000000 [16:45:43<47:47:03, 107.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000131, avg_q: -0.002506, avg_ep_r: 0.0000, max_ep_r: 0.0643, min_ep_r: -0.0199, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–‰ | 6577145/25000000 [16:49:41<47:26:52, 107.85it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.001795, avg_ep_r: -0.0000, max_ep_r: 0.0300, min_ep_r: -0.1185, # game: 5000 26%|β–ˆβ–ˆβ–ˆβ–‰ | 6599989/25000000 [16:53:14<46:38:00, 109.60it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.002334, avg_ep_r: -0.0000, max_ep_r: 0.0495, min_ep_r: -0.1122, # game: 5000

    bug help wanted 
    opened by philipshurpik 5
Owner
Kartikay Garg
Major in Mathematics and Computing
Kartikay Garg
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 2, 2023
Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Trading Gym Trading Gym is an open-source project for the development of reinforcement learning algorithms in the context of trading. It is currently

Dimitry Foures 535 Nov 15, 2022
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

null 195 Dec 7, 2022
Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Deep Reinforcement Learning for Smart Cities Documentation RLlib: https://docs.ray.io/en/master/rllib.html Mesa: https://mesa.readthedocs.io/en/stable

null 1 May 15, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 6, 2023
A Deep Reinforcement Learning Framework for Stock Market Trading

DQN-Trading This is a framework based on deep reinforcement learning for stock market trading. This project is the implementation code for the two pap

null 61 Jan 1, 2023
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 8, 2023
This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

You can use this simple crypto backtesting script to ensure your trading strategy is successful Minimal setup required and works well with static TP a

Andrei 154 Sep 12, 2022
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN)

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks (MAPDN) This is the implementation of the paper Multi-Age

Future Power Networks 83 Jan 6, 2023
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
Minecraft agent to farm resources using reinforcement learning

BarnyardBot CS 175 group project using Malmo download BarnyardBot.py into the python examples directory and run 'python BarnyardBot.py' in the console

null 0 Jul 26, 2022