Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Last update: Dec 13, 2022

Related tags

Deep Learning hybrid-cp-rl-solver

Overview

Hybrid solving process for combinatorial optimization problems

Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems.

In the last years, Deep Reinforcement Learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality.

In another context, Constraint Programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a Dynamic Programming (DP) formulation, that acts as a bridge between both techniques.

In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems formulated as a DP. In the related paper, we show experimentally show that our solver is efficient to solve two challenging problems: the Travelling Salesman Problem with Time Windows and the 4-moments Portfolio Optimization Problem, that includes the means, deviations, skewnessess, and kurtosis of the assets. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

Please be aware that this project is still at research level.

Content of the repository

For each problem that we have considered, you can find:

A DP model serving as a basis for the RL environment and the CP model.
The RL enviroment and the CP model.
A RL training algorithm based on Deep Q-Learning (DQN).
A RL training algorithm based on Proximal Policy Optimization (PPO).
The models, and the hyperparameters used, that we trained.
Three CP solving algorithms leveraging the learned models: Depth-First Branch-and_bound (BaB), Iterative Limited Discrepancy Search (ILDS), and Restart Based Search (RBS)
A random instance generators for training the model and evaluating the solver.

.
├── conda_env.yml  # configuration file for the conda environment
├── run_training_x_y.sh  # script for running the training. It is where you have to enter the parameters 
├── trained_models/  # directory where the models that you train will be saved
├── selected_models/  # models that we used for our experiments
└── src/ 
	├── architecture/ # implementation of the NN used
        ├── util/  #  utilitary code (as the memory replay)
	├── problem/  # problems that we have implemented
		└── tsptw/ 
		      ├── main_training_x_y.py  # main file for training a model for the problem y using algorithm x
		      ├── baseline/ # methods that are used for comparison
		      ├── environment/ # the generator, and the DP model, acting also as the RL environment
		      ├── training/  # PPO and DQN training algorithms
		      ├── solving/  # CP model and solving algorithm
		├── portfolio/

Installation instructions

1. Importing the repository

git clone https://github.com/qcappart/hybrid-cp-rl-solver.git

2. Setting up the conda virtual environment

conda env create -f conda_env.yml

Note: install a DGL version compatible with your CUDA installation.

3. Building Gecode

Please refer to the setup instructions available on the official website.

4. Compiling the solver

A makefile is available in the root repository. First, modify it by adding your python path. Then, you can compile the project as follows:

make [problem] # e.g. make tsptw

It will create the executable solver_tsptw.

Basic use

1. Training a model

(Does not require Gecode)

./run_training_ppo_tsptw.sh # for PPO
./run_training_dqn_tsptw.sh # for DQN

2. Solving the problem

(Require Gecode)

# For TSPTW
./solver_tsptw --model=rl-ilds-dqn --time=60000 --size=20 --grid_size=100 --max_tw_size=100 --max_tw_gap=10 --d_l=5000 --cache=1 --seed=1  # Solve with ILDS-DQN
./solver_tsptw --model=rl-bab-dqn --time=60000 --size=20 --grid_size=100 --max_tw_size=100 --max_tw_gap=10 --cache=1 --seed=1 # Solve with BaB-DQN
./solver_tsptw --model=rl-rbs-ppo --time=60000 --size=20 --grid_size=100 --max_tw_size=100 --max_tw_gap=10 --cache=1 --luby=1 --temperature=1 --seed=1 # Solve with RBS-PPO
./solver_tsptw --model=nearest --time=60000 --size=20 --grid_size=100 --max_tw_size=100 --max_tw_gap=10 --d_l=5000 --seed=1 # Solve with a nearest neigbour heuristic (no learning)

# For Portfolio
./solver_portfolio --model=rl-ilds-dqn --time=60000 --size=50 --capacity_ratio=0.5 --lambda_1=1 --lambda_2=5 --lambda_3=5 --lambda_4=5  --discrete_coeffs=0 --cache=1 --seed=1

For learning based methods, the model selected by default is the one located in the corresponding selected_model/ repository. For instance:

selected-models/ppo/tsptw/n-city-20/grid-100-tw-10-100/

Example of results

The table recaps the solution obtained for an instance generated with a seed of 0, and a timeout of 60 seconds. Bold results indicate that the solver has been able to proof the optimality of the solution and a dash that no solution has been found within the time limit.

Tour cost for the TSPTW

Model name	20 cities	50 cities	100 cities
DQN	959	-	-
PPO (beam-width=16)	959	-	-
CP-nearest	959	-	-
BaB-DQN	959	2432	4735
ILDS-DQN	959	2432	-
RBS-PPO	959	2432	4797

./benchmarking/tsptw_bmk.sh 0 20 60000 # Arguments: [seed] [n_city] [timeout - ms]
./benchmarking/tsptw_bmk.sh 0 50 60000
./benchmarking/tsptw_bmk.sh 0 100 60000

Profit for Portfolio Optimization

Model name	20 items	50 items	100 items
DQN	247.40	1176.94	2223.09
PPO (beam-width=16)	264.49	1257.42	2242.67
BaB-DQN	273.04	1228.03	2224.44
ILDS-DQN	273.04	1201.53	2235.89
RBS-PPO	267.05	1265.50	2258.65

./benchmarking/portfolio_bmk.sh 0 20 60000 # Arguments: [seed] [n_item] [timeout - ms]
./benchmarking/portfolio_bmk.sh 0 50 60000
./benchmarking/portfolio_bmk.sh 0 100 60000

Technologies and tools used

The code, at the exception of the CP model, is implemented in Python 3.7.
The CP model is implemented in C++ and is solved using Gecode. The reason of this design choice is that there is no CP solver in Python with the requirements we needed.
The graph neural network architecture has been implemented in Pytorch together with DGL.
The set embedding is based on SetTransformer.
The interface between the C++ and Python code is done with Pybind11.

Current implemented problems

At the moment, only the travelling salesman problem with time windows and the 4-moments portfolio optimization are present in this repository. However, we also have the TSP, and the 0-1 Knapsack problem available. If there is demand for these problems, I will add them in this repository. Feel free to open an issue for that or if you want to add another problem.

Cite

Please use this reference:

@misc{cappart2020combining,
    title={Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization},
    author={Quentin Cappart and Thierry Moisan and Louis-Martin Rousseau and Isabeau Prémont-Schwarz and Andre Cire},
    year={2020},
    eprint={2006.01610},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}

Licence

This work is under MIT licence (https://choosealicense.com/licenses/mit/). It is a short and simple very permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

Comments

Error when I run 'run_training_dqn_tsptw.sh'

$ ./run_training_dqn_tsptw.sh

[INFO] TRAINING ON RANDOM INSTANCES: TSPTW [INFO] n_city: 20 [INFO] grid_size: 100 [INFO] max_tw_gap: 10 [INFO] max_tw_size: 100 [INFO] seed: 1

[INFO] TRAINING PARAMETERS [INFO] algorithm: DQN [INFO] batch_size: 32 [INFO] learning_rate: 0.000100 [INFO] hidden_layer: 2 [INFO] latent_dim: 32 [INFO] softmax_beta: 10 [INFO] n_step: -1

Using backend: pytorch /data/yjiang/anaconda3/lib/python3.7/site-packages/dgl/base.py:45: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data). return warnings.warn(message, category=category, stacklevel=1)

[INFO] NUMBER OF FEATURES [INFO] n_node_feat: 6 [INFO] n_edge_feat: 5

Traceback (most recent call last): File "src/problem/tsptw/main_training_dqn_tsptw.py", line 67, in trainer.run_training() File "/data/yjiang/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_dqn.py", line 85, in run_training self.initialize_memory() File "/data/yjiang/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_dqn.py", line 137, in initialize_memory self.run_episode(0, memory_initialization=True) File "/data/yjiang/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_dqn.py", line 175, in run_episode graph = env.make_nn_input(cur_state, self.args.mode) File "/data/yjiang/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/environment/environment.py", line 72, in make_nn_input g.edata['e_feat'] = self.edge_feat_tensor File "/data/yjiang/anaconda3/lib/python3.7/site-packages/dgl/view.py", line 197, in setitem self._graph._set_e_repr(self._etid, self._edges, {key : val}) File "/data/yjiang/anaconda3/lib/python3.7/site-packages/dgl/heterograph.py", line 3899, in _set_e_repr ' Got %d and %d instead.' % (nfeats, num_edges)) dgl._ffi.base.DGLError: Expect number of features to match number of edges. Got 380 and 0 instead.

I installed the version of DGL with CUDA 10.2.

opened by jiang-yuan 7
AssertionError: Torch not compiled with CUDA enabled

Last time I create env by the .yml you provide. It installs PyTorch 1.4.0. I run the code successfully in CPU mode. But when I change to mode to 'gpu', I try CUDA 10.1 and 10.2 it shows same error: trainer = TrainerDQN(args) [80/879] File "/data/yjiang/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning /trainer_dqn.py", line 57, in init self.brain = BrainDQN(self.args, self.num_node_feats, self.num_edge_feats) File "/data/yjiang/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning /brain_dqn.py", line 39, in init self.model.cuda() File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules /module.py", line 304, in cuda return self._apply(lambda t: t.cuda(device)) File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules /module.py", line 201, in _apply module._apply(fn) File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules /module.py", line 201, in _apply module._apply(fn) File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules /module.py", line 201, in _apply module._apply(fn) File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules /module.py", line 223, in _apply param_applied = fn(param) File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules /module.py", line 304, in return self._apply(lambda t: t.cuda(device)) File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/cuda/ini t.py", line 196, in _lazy_init _check_driver() File "/data/yjiang/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/cuda/ini t.py", line 94, in _check_driver raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

opened by jiang-yuan 6

Error on GPU mode

When I try to run it on GPU mode, I got this error. Can you help to identify the problem?

 File "src/problem/tsptw/main_training_ppo_tsptw.py", line 71, in <module>
    trainer.run_training()
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_ppo.py", line 77, in run_training
    self.run_episode()
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_ppo.py", line 147, in run_episode
    self.brain.update(self.memory)
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/brain_ppo.py", line 81, in update
    old_availables)
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/actor_critic.py", line 83, in evaluate
    out = self.action_layer(state_for_action, graph_pooling=False)
  File "/home/dora/Programs/miniconda3/envs/dp-solver-gpu-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 160, in forward
    
  File "/home/dora/Programs/miniconda3/envs/dp-solver-gpu-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 107, in forward
    return g
  File "/home/dora/Programs/miniconda3/envs/dp-solver-gpu-env/lib/python3.7/site-packages/dgl/graph.py", line 3238, in update_all
    Runtime.run(prog)
  File "/home/dora/Programs/miniconda3/envs/dp-solver-gpu-env/lib/python3.7/site-packages/dgl/runtime/runtime.py", line 11, in run
    exe.run()
  File "/home/dora/Programs/miniconda3/envs/dp-solver-gpu-env/lib/python3.7/site-packages/dgl/runtime/ir/executor.py", line 204, in run
    udf_ret = fn_data(src_data, edge_data, dst_data)
  File "/home/dora/Programs/miniconda3/envs/dp-solver-gpu-env/lib/python3.7/site-packages/dgl/runtime/scheduler.py", line 972, in _mfunc_wrapper
    return mfunc(ebatch)
  File "/home/dora/vince/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 81, in message_function
    return {'attention_logits': attention_logits,
RuntimeError: mat1 dim 1 must match mat2 dim 0

opened by Vincent-bin 4

Redundant code in graph_attention_network.py

for l, layer in enumerate(self.embedding_layer[:-1]):
            g = layer(g)
            g.ndata["n_feat"] = torch.relu(g.ndata["n_feat"])
            g.edata["e_feat"] = torch.relu(g.edata["e_feat"])

last_layer = self.embedding_layer[-1]
g = last_layer(g)
g.ndata["n_feat"] = torch.relu(g.ndata["n_feat"])
g.edata["e_feat"] = torch.relu(g.edata["e_feat"])

I found this in the file src/architecture/graph_attention_network.py and I am confused. I think the last_layer operation can be merged into the for-loop. I guess you used it for debugging and forgot to revert it back?

opened by Vincent-bin 1

I get an error when I try to run make tsptw

Hi, I'm a student in Japan and I'm planning to use this solver in my research. So, when I try to run make tsptw, I get an error saying that there are no files or directories like cxxopts.hpp and that the recipe for target 'tsptw' failed.

About the environment wsl ubuntu16.04 Python 3.7.7 Python 3.7.7 ・anaconda3

(error) (dp-solver-env) riko@LAPTOP-Q28HUGQ8:/mnt/c/Users/Riko/hybrid-cp-rl-solver$ make tsptw rm -rf src/problem/tsptw/solving/build cmake -Hsrc/problem/tsptw/solving -Bsrc/problem/tsptw/solving/build -DPYTHON_EXECUTABLE:FILEPATH=/home/riko/anaconda3/envs/dp-solver-env/bin/python3 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -DCMAKE_BUILD_TYPE:STRING="Release" -G "Unix Makefiles" -DPYTHON_EXECUTABLE:FILEPATH=/home/riko/anaconda3/envs/dp-solver-env/bin/python3 -- The C compiler identification is GNU 5.4.0 -- The CXX compiler identification is GNU 5.4.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PythonInterp: /home/riko/anaconda3/envs/dp-solver-env/bin/python3 (found version "3.7.7") -- Found PythonLibs: /home/riko/anaconda3/envs/dp-solver-env/lib/libpython3.7m.so -- Performing Test HAS_CPP14_FLAG -- Performing Test HAS_CPP14_FLAG - Success -- Found Gecode: /usr/local/include (found suitable version "6.2.0", minimum required is "6.2") found components: Driver Float Int Kernel Minimodel Search Set Support -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Configuring done -- Generating done -- Build files have been written to: /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build cmake --build src/problem/tsptw/solving/build --config Release --target all -- -j VERBOSE=1 make[1]: Entering directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' /usr/bin/cmake -H/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving -B/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build --check-build-system CMakeFiles/Makefile.cmake 0 /usr/bin/cmake -E cmake_progress_start /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles/progress.marks /usr/bin/make -f CMakeFiles/Makefile2 all make[2]: Entering directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' /usr/bin/make -f CMakeFiles/solver_tsptw.dir/build.make CMakeFiles/solver_tsptw.dir/depend make[3]: Entering directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' cd /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles/solver_tsptw.dir/DependInfo.cmake --color= Dependee "/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles/solver_tsptw.dir/DependInfo.cmake" is newer than depender "/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles/solver_tsptw.dir/depend.internal". Dependee "/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build/CMakeFiles/solver_tsptw.dir/depend.internal". Scanning dependencies of target solver_tsptw make[3]: Leaving directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' /usr/bin/make -f CMakeFiles/solver_tsptw.dir/build.make CMakeFiles/solver_tsptw.dir/build make[3]: Entering directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' [ 50%] Building CXX object CMakeFiles/solver_tsptw.dir/solver.cpp.o /usr/bin/c++ -isystem /home/riko/anaconda3/envs/dp-solver-env/include -isystem /home/riko/anaconda3/envs/dp-solver-env/include/python3.7m -isystem /usr/local/include -O3 -DNDEBUG -std=c++14 -o CMakeFiles/solver_tsptw.dir/solver.cpp.o -c /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/solver.cpp /mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/solver.cpp:18:23: fatal error: cxxopts.hpp: No such file or directory compilation terminated. CMakeFiles/solver_tsptw.dir/build.make:62: recipe for target 'CMakeFiles/solver_tsptw.dir/solver.cpp.o' failed make[3]: *** [CMakeFiles/solver_tsptw.dir/solver.cpp.o] Error 1 make[3]: Leaving directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/solver_tsptw.dir/all' failed make[2]: *** [CMakeFiles/solver_tsptw.dir/all] Error 2 make[2]: Leaving directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' Makefile:83: recipe for target 'all' failed make[1]: *** [all] Error 2 make[1]: Leaving directory '/mnt/c/Users/Riko/hybrid-cp-rl-solver/src/problem/tsptw/solving/build' Makefile:5: recipe for target 'tsptw' failed make: *** [tsptw] Error 2

opened by Ririkosann 0
Error when I run './run_training_ppo_tsptw.sh' and './run_training_dqn_tsptw.sh'

$ ./run_training_ppo_tsptw.sh # for PPO

[INFO] TRAINING ON RANDOM INSTANCES: TSPTW [INFO] n_city: 20 [INFO] grid_size: 100 [INFO] max_tw_gap: 10 [INFO] max_tw_size: 100 [INFO] seed: 1

[INFO] TRAINING PARAMETERS [INFO] Algorithm: PPO [INFO] learning rate: 0.000100 [INFO] eps_clip: 0.100000 [INFO] entropy_value: 0.001000 [INFO] hidden_layer: 4 [INFO] k_epochs: 3 [INFO] batch_size: 64 [INFO] update_timestep: 2048 [INFO] latent_dim: 128

[INFO] NUMBER OF FEATURES [INFO] n_node_feat: 6 [INFO] n_edge_feat: 5

[INFO] iter time avg_reward_learning [DATA] 0 3.6 10.56674003178788 [DATA] 10 7.66 10.551267996718227 [DATA] 20 11.14 9.419938157245602 [DATA] 30 14.93 9.583740556219418 [DATA] 40 18.88 10.58884094988236 [DATA] 50 22.87 10.428705262645517 [DATA] 60 26.76 10.010971459541096 [DATA] 70 30.59 10.054244118399101 [DATA] 80 34.71 10.791883246641037 [DATA] 90 38.68 10.583356318858371 [DATA] 100 42.84 10.893410571071767 [DATA] 200 50.51 10.33913003624206 [DATA] 300 57.77 9.89661327599596 [DATA] 400 64.85 9.780417441888613 [DATA] 500 71.91 10.225113229968612 Using backend: pytorch Traceback (most recent call last): File "src/problem/tsptw/main_training_ppo_tsptw.py", line 71, in trainer.run_training() File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_ppo.py", line 77, in run_training self.run_episode() File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_ppo.py", line 147, in run_episode self.brain.update(self.memory) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/brain_ppo.py", line 81, in update old_availables) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/actor_critic.py", line 83, in evaluate out = self.action_layer(state_for_action, graph_pooling=False) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 158, in forward g = layer(g) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 105, in forward g.update_all(message_func=self.message_function, reduce_func=self.new_node_features) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/graph.py", line 3238, in update_all Runtime.run(prog) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/runtime/runtime.py", line 11, in run exe.run() File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/runtime/ir/executor.py", line 204, in run udf_ret = fn_data(src_data, edge_data, dst_data) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/runtime/scheduler.py", line 972, in _mfunc_wrapper return mfunc(ebatch) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 79, in message_function attention_logits = self.prelu(torch.matmul(x, self.W_a)) RuntimeError: size mismatch, m1: [23940 x 130], m2: [17 x 128] at /tmp/pip-req-build-m1jla2kc/aten/src/TH/generic/THTensorMath.cpp:136

$ ./run_training_ppo_tsptw.sh # for PPO

[INFO] TRAINING ON RANDOM INSTANCES: TSPTW [INFO] n_city: 20 [INFO] grid_size: 100 [INFO] max_tw_gap: 10 [INFO] max_tw_size: 100 [INFO] seed: 1

[INFO] TRAINING PARAMETERS [INFO] Algorithm: PPO [INFO] learning rate: 0.000100 [INFO] eps_clip: 0.100000 [INFO] entropy_value: 0.001000 [INFO] hidden_layer: 4 [INFO] k_epochs: 3 [INFO] batch_size: 64 [INFO] update_timestep: 2048 [INFO] latent_dim: 128

[INFO] NUMBER OF FEATURES [INFO] n_node_feat: 6 [INFO] n_edge_feat: 5

[INFO] iter time avg_reward_learning [DATA] 0 3.66 10.528531539911066 [DATA] 10 7.49 10.254481712808198 [DATA] 20 11.37 10.417858130552686 [DATA] 30 15.35 10.099412208456542 [DATA] 40 19.35 10.83385508918823 [DATA] 50 23.26 9.75215836513569 [DATA] 60 27.24 10.312330784342953 [DATA] 70 31.23 10.44216600914973 [DATA] 80 35.49 10.532509094649615 [DATA] 90 39.37 10.255611674771005 [DATA] 100 43.34 10.197485549657278 [DATA] 200 50.57 10.8156113531605 [DATA] 300 57.86 10.023542230992511 [DATA] 400 64.91 9.808304311776478 [DATA] 500 72.37 10.955673825704611 Using backend: pytorch Traceback (most recent call last): File "src/problem/tsptw/main_training_ppo_tsptw.py", line 71, in trainer.run_training() File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_ppo.py", line 77, in run_training self.run_episode() File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/trainer_ppo.py", line 147, in run_episode self.brain.update(self.memory) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/brain_ppo.py", line 81, in update old_availables) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/problem/tsptw/learning/actor_critic.py", line 83, in evaluate out = self.action_layer(state_for_action, graph_pooling=False) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 158, in forward g = layer(g) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 105, in forward g.update_all(message_func=self.message_function, reduce_func=self.new_node_features) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/graph.py", line 3238, in update_all Runtime.run(prog) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/runtime/runtime.py", line 11, in run exe.run() File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/runtime/ir/executor.py", line 204, in run udf_ret = fn_data(src_data, edge_data, dst_data) File "/home/mdclab/anaconda3/envs/dp-solver-env/lib/python3.7/site-packages/dgl/runtime/scheduler.py", line 972, in _mfunc_wrapper return mfunc(ebatch) File "/home/mdclab/Documents/yilin/hybrid-cp-rl-solver/src/problem/tsptw/../../../src/architecture/graph_attention_network.py", line 79, in message_function attention_logits = self.prelu(torch.matmul(x, self.W_a)) RuntimeError: size mismatch, m1: [23940 x 130], m2: [17 x 128] at /tmp/pip-req-build-m1jla2kc/aten/src/TH/generic/THTensorMath.cpp:136

How to solve the error? Thanks.

opened by yilin0830 0

Owner

GitHub https://arxiv.org/abs/2006.01610

Filtering variational quantum algorithms for combinatorial optimization

Current gate-based quantum computers have the potential to provide a computational advantage if algorithms use quantum hardware efficiently.

1 Feb 9, 2022

Robot Reinforcement Learning on the Constraint Manifold

Implementation of "Robot Reinforcement Learning on the Constraint Manifold"

31 Dec 5, 2022

Rainbow: Combining Improvements in Deep Reinforcement Learning

Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning [1]. Results and pretrained models can be found in the releases. DQN [2] Double

1.4k Dec 29, 2022

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Introduction This repository contains the PyTorch implem

25 Nov 9, 2022

Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Ankou Ankou is a source-based grey-box fuzzer. It intends to use a more rich fitness function by going beyond simple branch coverage and considering t

54 Dec 24, 2022

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

55 Nov 14, 2022

Reusable constraint types to use with typing.Annotated

annotated-types PEP-593 added typing.Annotated as a way of adding context-specific metadata to existing types, and specifies that Annotated[T, x] shou

125 Dec 26, 2022

Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

1.7k Dec 31, 2022

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

pytorch-a2c-ppo-acktr Update (April 12th, 2021) PPO is great, but Soft Actor Critic can be better for many continuous control tasks. Please check out

3k Jan 9, 2023

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

3k Dec 31, 2022

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

40 Dec 14, 2022

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

8 Apr 25, 2022

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

8 Nov 7, 2022

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

DSE 314/614: Reinforcement Learning This repository containing reinforcement lea

4 Apr 15, 2022

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

69 Dec 26, 2022

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

15 Aug 30, 2022

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Related tags

Overview

Hybrid solving process for combinatorial optimization problems

Content of the repository

Installation instructions

1. Importing the repository

2. Setting up the conda virtual environment

3. Building Gecode

4. Compiling the solver

Basic use

1. Training a model

2. Solving the problem

Example of results

Tour cost for the TSPTW

Profit for Portfolio Optimization

Technologies and tools used

Current implemented problems

Cite

Licence

Comments

Error when I run 'run_training_dqn_tsptw.sh'

AssertionError: Torch not compiled with CUDA enabled

Error on GPU mode

Redundant code in graph_attention_network.py

I get an error when I try to run make tsptw

Error when I run './run_training_ppo_tsptw.sh' and './run_training_dqn_tsptw.sh'

Owner

Filtering variational quantum algorithms for combinatorial optimization

Robot Reinforcement Learning on the Constraint Manifold

Rainbow: Combining Improvements in Deep Reinforcement Learning

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

Reusable constraint types to use with typing.Annotated

Constraint-based geometry sketcher for blender

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification