Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Overview

Softlearning

Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is fairly thin and primarily optimized for our own development purposes. It utilizes the tf.keras modules for most of the model classes (e.g. policies and value functions). We use Ray for the experiment orchestration. Ray Tune and Autoscaler implement several neat features that enable us to seamlessly run the same experiment scripts that we use for local prototyping to launch large-scale experiments on any chosen cloud service (e.g. GCP or AWS), and intelligently parallelize and distribute training for effective resource allocation.

This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit.

Getting Started

Prerequisites

The environment can be run either locally using conda or inside a docker container. For conda installation, you need to have Conda installed. For docker installation you will need to have Docker and Docker Compose installed. Also, most of our environments currently require a MuJoCo license.

Conda Installation

  1. Download and install MuJoCo 1.50 and 2.00 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150 and ~/.mujoco/mujoco200_{platform}). Unfortunately, gym and dm_control expect different paths for MuJoCo 2.00 installation, which is why you will need to have it installed both in ~/.mujoco/mujoco200_{platform} and ~/.mujoco/mujoco200. The easiest way is to create a symlink from ~/.mujoco/mujoco200_{plaftorm} -> ~/.mujoco/mujoco200 with: ln -s ~/.mujoco/mujoco200_{platform} ~/.mujoco/mujoco200.

  2. Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:

  3. Clone softlearning

git clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH}
  1. Create and activate conda environment, install softlearning to enable command line interface.
cd ${SOFTLEARNING_PATH}
conda env create -f environment.yml
conda activate softlearning
pip install -e ${SOFTLEARNING_PATH}

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

conda deactivate
conda remove --name softlearning --all

Docker Installation

docker-compose

To build the image and run the container:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.cpu.yml \
        up \
        -d \
        --force-recreate

You can access the container with the typical Docker exec-command, i.e.

docker exec -it softlearning bash

See examples section for examples of how to train and simulate the agents.

Finally, to clean up the docker setup:

docker-compose \
    -f ./docker/docker-compose.dev.cpu.yml \
    down \
    --rmi all \
    --volumes

Examples

Training and simulating an agent

  1. To train the agent
softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000  # Save the checkpoint to resume training later
  1. To simulate the resulting policy: First, find the absolute path that the checkpoint is saved to. By default (i.e. without specifying the log-dir argument to the previous script), the data is saved under ~/ray_results/<universe>/<domain>/<task>/<datatimestamp>-<exp-name>/<trial-id>/<checkpoint-id>. For example: ~/ray_results/gym/HalfCheetah/v3/2018-12-12T16-48-37-my-sac-experiment-1-0/mujoco-runner_0_seed=7585_2018-12-12_16-48-37xuadh9vd/checkpoint_1000/. The next command assumes that this path is found from ${SAC_CHECKPOINT_DIR} environment variable.
python -m examples.development.simulate_policy \
    ${SAC_CHECKPOINT_DIR} \
    --max-path-length 1000 \
    --num-rollouts 1 \
    --render-kwargs '{"mode": "human"}'

examples.development.main contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag: python ./examples/development/main.py --help

optional arguments:
  -h, --help            show this help message and exit
  --universe {robosuite,dm_control,gym}
  --domain DOMAIN
  --task TASK
  --checkpoint-replay-pool CHECKPOINT_REPLAY_POOL
                        Whether a checkpoint should also saved the replay
                        pool. If set, takes precedence over
                        variant['run_params']['checkpoint_replay_pool']. Note
                        that the replay pool is saved (and constructed) piece
                        by piece so that each experience is saved only once.
  --algorithm ALGORITHM
  --policy {gaussian}
  --exp-name EXP_NAME
  --mode MODE
  --run-eagerly RUN_EAGERLY
                        Whether to run tensorflow in eager mode.
  --local-dir LOCAL_DIR
                        Destination local folder to save training results.
  --confirm-remote [CONFIRM_REMOTE]
                        Whether or not to query yes/no on remote run.
  --video-save-frequency VIDEO_SAVE_FREQUENCY
                        Save frequency for videos.
  --cpus CPUS           Cpus to allocate to ray process. Passed to `ray.init`.
  --gpus GPUS           Gpus to allocate to ray process. Passed to `ray.init`.
  --resources RESOURCES
                        Resources to allocate to ray process. Passed to
                        `ray.init`.
  --include-webui INCLUDE_WEBUI
                        Boolean flag indicating whether to start theweb UI,
                        which is a Jupyter notebook. Passed to `ray.init`.
  --temp-dir TEMP_DIR   If provided, it will specify the root temporary
                        directory for the Ray process. Passed to `ray.init`.
  --resources-per-trial RESOURCES_PER_TRIAL
                        Resources to allocate for each trial. Passed to
                        `tune.run`.
  --trial-cpus TRIAL_CPUS
                        CPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for CPUs. Passed to
                        `tune.run`.
  --trial-gpus TRIAL_GPUS
                        GPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for GPUs. Passed to
                        `tune.run`.
  --trial-extra-cpus TRIAL_EXTRA_CPUS
                        Extra CPUs to reserve in case the trials need to
                        launch additional Ray actors that use CPUs.
  --trial-extra-gpus TRIAL_EXTRA_GPUS
                        Extra GPUs to reserve in case the trials need to
                        launch additional Ray actors that use GPUs.
  --num-samples NUM_SAMPLES
                        Number of times to repeat each trial. Passed to
                        `tune.run`.
  --upload-dir UPLOAD_DIR
                        Optional URI to sync training results to (e.g.
                        s3://<bucket> or gs://<bucket>). Passed to `tune.run`.
  --trial-name-template TRIAL_NAME_TEMPLATE
                        Optional string template for trial name. For example:
                        '{trial.trial_id}-seed={trial.config[run_params][seed]
                        }' Passed to `tune.run`.
  --checkpoint-frequency CHECKPOINT_FREQUENCY
                        How many training iterations between checkpoints. A
                        value of 0 (default) disables checkpointing. If set,
                        takes precedence over
                        variant['run_params']['checkpoint_frequency']. Passed
                        to `tune.run`.
  --checkpoint-at-end CHECKPOINT_AT_END
                        Whether to checkpoint at the end of the experiment. If
                        set, takes precedence over
                        variant['run_params']['checkpoint_at_end']. Passed to
                        `tune.run`.
  --max-failures MAX_FAILURES
                        Try to recover a trial from its last checkpoint at
                        least this many times. Only applies if checkpointing
                        is enabled. Passed to `tune.run`.
  --restore RESTORE     Path to checkpoint. Only makes sense to set if running
                        1 trial. Defaults to None. Passed to `tune.run`.
  --server-port SERVER_PORT
                        Port number for launching TuneServer. Passed to
                        `tune.run`.

Resume training from a saved checkpoint

This feature is currently broken!

In order to resume training from previous checkpoint, run the original example main-script, with an additional --restore flag. For example, the previous example can be resumed as follows:

softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000 \
    --restore ${SAC_CHECKPOINT_PATH}

References

The algorithms are based on the following papers:

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja*, Aurick Zhou*, Kristian Hartikainen*, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. arXiv preprint, 2018.
paper | videos

Latent Space Policies for Hierarchical Reinforcement Learning.
Tuomas Haarnoja*, Kristian Hartikainen*, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Composable Deep Reinforcement Learning for Robotic Manipulation.
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine. International Conference on Robotics and Automation (ICRA), 2018.
paper | videos

Reinforcement Learning with Deep Energy-Based Policies.
Tuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine. International Conference on Machine Learning (ICML), 2017.
paper | videos

If Softlearning helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

@techreport{haarnoja2018sacapps,
  title={Soft Actor-Critic Algorithms and Applications},
  author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine},
  journal={arXiv preprint arXiv:1812.05905},
  year={2018}
}
Comments
  • Add support for SQL

    Add support for SQL

    @hartikainen This is my progress so far. However, I am encountering an issue with policies:

    Traceback (most recent call last):
      File "/Users/bohan/miniconda2/envs/softlearning/lib/python3.6/site-packages/ray/tune/function_runner.py", line 80, in run
        self._entrypoint(*self._entrypoint_args)
      File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/examples/multi_goal/main.py", line 53, in run_experiment
        plotter=plotter,
      File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/utils.py", line 36, in get_algorithm_from_variant
        variant, *args, **algorithm_kwargs, **kwargs)
      File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/utils.py", line 15, in create_SQL_algorithm
        algorithm = SQL(*args, **kwargs)
      File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/sql.py", line 117, in __init__
        self._create_svgd_update()
      File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/sql.py", line 219, in _create_svgd_update
        actions = self.policy.actions_for(
    AttributeError: 'FeedforwardGaussianPolicy' object has no attribute 'actions_for'
    

    I've been trying to use actions_np, but it seems to have other issues.

    opened by bohan-zhang 25
  • Unable to reproduce result on HalfCheetah-v2

    Unable to reproduce result on HalfCheetah-v2

    I am unable to obtain the result as reported in the paper on the openai environment HalfCheetah-v2. The commit used to obtain this result is 1f6147c83b82b376ceed5e95df5a422113741468, which isn't too long ago. The result is averaged over 5 random initial seeds.

    halfcheetah

    Do you know what might be causing this issue? Thank you!

    I am able to obtain the result as reported (or close to it) in the paper on the remaining environments, posted here for reference.

    ant walker humanoid hopper

    opened by quanvuong 18
  • Module 'gym' has no attribute 'register' on MacOS Mojave 10.14.4

    Module 'gym' has no attribute 'register' on MacOS Mojave 10.14.4

    Hi All, when I tried to run a reward learning task (https://github.com/avisingh599/reward-learning-rl) with softlearning environment, the following error occurred: "AttributeError: module 'gym' has no attribute 'register'"

    However when I ran import gym and gym.register() on a separate python script on Pycharm it works fine, e.g. able to find the register module in gym. I had a look at the previous issues posted for Softlearning and think this is a gym adapter issue? But I am not sure how to manually add this environment/task onto gym_adapter in the Softlearning package? Many thanks for your help!!

    image
    opened by weijiafeng 8
  • Question on initialization of alpha and entropy

    Question on initialization of alpha and entropy

    Question1: From hereheuristic_target_entropy, I see the initialization of alpha related to action_dim, I don't figure out why make it related to action_dim. Theoretically, is it also workable by just set target_entropy a hard code number, like 0.1 (practically, it seems to be working, but I am not sure)?

    Question2: According to your paper of SAC version2, entropy of a state action should NEVER lower than target_entropy, but during training, after each learning round, I found that the entropy of a state action pair would sometimes lower than target_entropy! pseudo code is like this (I use pytorch):

      alpha_loss = torch.tensor(0.).to(self.device)
      alpha_tlogs = torch.tensor(self.alpha)  # For TensorboardX logs
    
      for each_ele in -log_pi:
            if each_ele < self.target_entropy:
                 print("error,-logpi<target_entropy!!!!")
    

    Does it mean I coded it wrong?

    Question3: Alpha , sometimes would go higher than 1 during learning, is it correct?

    opened by dbsxdbsx 7
  • Implementation of automatic entropy temperature tuning(alpha loss)

    Implementation of automatic entropy temperature tuning(alpha loss)

    https://github.com/rail-berkeley/softlearning/blob/46f14436f62465a02b99f431bbcf57a7fa0fd09d/softlearning/algorithms/sac.py#L254-L255 The implementation of the alpha loss seems to vary from the formula definition, formula 18 in the paper Haarnoja et al., 2019. Is this a bug?

    opened by Maggern3 6
  • Multiple conflicts in requirements.txt

    Multiple conflicts in requirements.txt

    When trying to create the conda environment, as described in the README.md I pip finds multiple conflicts with the selected packages (these problems also occur with the docker image) I haven't been able to successfully test my installation, so I am not sure if this list is complete, but at least the following packages (of which some are explicitly "required" at that version) are conflicting:

    ERROR: requests 2.20.1 has requirement urllib3<1.25,>=1.21.1, but you'll have urllib3 1.25.1 which is incompatible.
    ERROR: botocore 1.12.130 has requirement urllib3<1.25,>=1.20; python_version >= "3.4", but you'll have urllib3 1.25.1 which is incompatible.
    ERROR: awscli 1.16.140 has requirement PyYAML<=3.13,>=3.10, but you'll have pyyaml 5.1 which is incompatible.
    ERROR: gym 0.15.4 has requirement cloudpickle~=1.2.0, but you'll have cloudpickle 1.1.1 which is incompatible.
    ERROR: robosuite 0.1.0 has requirement mujoco-py<1.50.2,>=1.50.1, but you'll have mujoco-py 2.0.2.8 which is incompatible.
    ERROR: dm-tree 0.1.2 has requirement six>=1.12.0, but you'll have six 1.11.0 which is incompatible.
    ERROR: Cannot uninstall 'ruamel-yaml'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
    
    opened by johannespitz 6
  • Bound std of Gaussian policy via beta-sigmoid ?

    Bound std of Gaussian policy via beta-sigmoid ?

    If I got it correctly, the logstd of the Gaussian policy is clipped via min/max range and std is retrieved by exponentiation.

    I am curious if using beta-sigmiodal function to model the logstd would be a tiny bit more stable, because it allows smooth lower/upper bound and less sharp gradient for larger magnitude.

    e.g.

    logvar = network output
    var = 1/(1 + self.beta*torch.exp(-logvar))
    var = min_var + (max_var - min_var)*var
    
    opened by zuoxingdong 6
  • Local dir cli

    Local dir cli

    This pull request adds a --local-dir argument to the command line interface and resolves this issue: https://github.com/rail-berkeley/softlearning/issues/128. I also fixed an issue with pip install -e . that re-installs an older incompatible version of ray.

    opened by brandontrabucco 5
  • Policy weights and output becomes NaN after some iterations

    Policy weights and output becomes NaN after some iterations

    Issue Overview

    After some training iterations, the policy starts outputting NaN, all the policy weights become NaN.

    Package versions

    • Lastest commit of softlearning
    • tensorflow version 2.2.0rc2
    • tfp-nightly version 0.11.0.dev20200424

    Preliminary debugging

    I think the issue might be cause by either of these three things: the Tanh bijector, large learning rate (unlikely, using the default 3e-4), or the alpha training (most likely alpha).

    Tanh

    The policy sometimes output actions that are 1 or -1 (I checked and it never outputted values greater than 1 in magnitude), which may cause a problem with the inverse becoming +inf or -inf, which may or may not be a problem because I don't know if inverse is ever used (edit: inverse is called when calculating log_prob https://github.com/tensorflow/probability/blob/dd3a555ef37fc31c6ad04f3236942e3dbc0f4228/tensorflow_probability/python/distributions/transformed_distribution.py#L509). Could be the problem due to https://github.com/tensorflow/probability/issues/840 but this is apparently fixed by combining action and log prob together.

    Alpha

    This is most likely the issue. From my logging diagnostics inside of _do_training_repeats https://github.com/rail-berkeley/softlearning/blob/84d7589fd5852aff9aa46debda6de39acaec2e0b/softlearning/algorithms/rl_algorithm.py#L336 a few training steps before the policy failed look like this:

    diagnostics: OrderedDict([('Q_value-mean', 3.2876506), ('Q_loss-mean', 0.04909911), ('policy_loss-mean', -3.1032994), ('alpha', nan), ('alpha_loss-mean', -inf)])
    diagnostics: OrderedDict([('Q_value-mean', 3.2876506), ('Q_loss-mean', 0.04909911), ('policy_loss-mean', -3.1032994), ('alpha', nan), ('alpha_loss-mean', -inf)])
    diagnostics: OrderedDict([('Q_value-mean', 3.3472314), ('Q_loss-mean', nan), ('policy_loss-mean', nan), ('alpha', nan), ('alpha_loss-mean', nan)])
    diagnostics: OrderedDict([('Q_value-mean', 3.3472314), ('Q_loss-mean', nan), ('policy_loss-mean', nan), ('alpha', nan), ('alpha_loss-mean', nan)])
    diagnostics: OrderedDict([('Q_value-mean', 3.3472314), ('Q_loss-mean', nan), ('policy_loss-mean', nan), ('alpha', nan), ('alpha_loss-mean', nan)])
    

    We can see that alpha was the first to fail, which then propagated to the Q functions and policy. I also noticed that during training, sometimes alpha would become negative, and from my understanding of automatic entropy adjustment, alpha should always be non-negative.

    After digging through the SAC training step, I noticed this line https://github.com/rail-berkeley/softlearning/blob/84d7589fd5852aff9aa46debda6de39acaec2e0b/softlearning/algorithms/sac.py#L247-L252 which is different from the old tf1 implementation that uses log_alpha instead https://github.com/rail-berkeley/softlearning/blob/bd30e33f22a7418b3e6d659908938b8bb500e6f1/softlearning/algorithms/sac.py#L210-L211

    The SAC paper uses alpha as the multiplier instead of log_alpha in the loss function, so the old implementation might be an oversight? However, the old code did store log alpha as the training variable.

    Or the issue might be something else, for example what caused the alpha loss to be -inf in the first place? Perhaps log_pis became -inf, which means actions_and_log_probs was the problem? I don't know enough about the implementation to decide for sure.

    Let me know if you want more logs or the programs output or other questions, etc. This was ran on my own environment in a fork of this repo:

    https://github.com/externalhardrive/mobilemanipulation-tf2/blob/efe8161c2692d4747ad0623cfd8218cdbf4211d2/softlearning/environments/gym/locobot/nav_grasp_envs.py#L113

    opened by charlesjsun 5
  • tf2 support

    tf2 support

    This is still work in progress. Things to do include at least:

    • Fix ExperimentRunner checkpointing. It would be nice to completely refactor this in order to make it easier to extend and understand.
    • Verify that SAC and SQL performance hasn't degraded. At least in the default gym benchmarks.
    • Update other pip requirements as needed.

    Refactors SAC code to run on tensorflow>=2.0. The change is fairly large, as all the tf1 placeholders, sessions, graphs, etc. are now gone, and everything uses tf.keras.Models and tf.functions instead.

    The new tf.functions allow functions to be executed in the same manner as in the old tf1 session.run, i.e. larger parts of code can be optimized to run as a graph and thus there's no slow down from using the eager mode. In fact, based on the preliminary runs, the new tf2 implementation seems to be ~30% faster than our old tf1 implementation.

    The tf.functions can be disabled using the debug mode (softlearning run_example_debug ...) which sets tf.config.experimental_run_functions_eagerly(True) https://github.com/hartikainen/softlearning/blob/89f0f90d127feec430800bb6dd7c792527116dde/examples/instrument.py#L261. For more info about tf.config.experimental_run_functions_eagerly, see https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/config/experimental_run_functions_eagerly.

    This also cleans some of the model code that used to be a bit clumsy due to some edge cases not working in tf1. For example, all the models should now accept any nested tuples/arrays/dictionaries as inputs, making usage of e.g. gym dictionary observation spaces and goal-conditioned policies much cleaner. Here's an example of how our policy and Q-function calls have changed:

    Before with tf1:

    observations = {
        'joint_position': np.random.uniform((batch_size, 7)),
        'joint_velocity': np.random.uniform((batch_size, 7)),
    }
    flat_observations = flatten_input_structure(observations)
    actions = policy.actions(flat_observations)
    log_pis = policy.log_pis(flat_observations, actions)
    
    Q_inputs = flatten_input_structure({**observations, 'actions': actions})
    Q_values = Q(Q_inputs)
    

    Now with tf2:

    actions = policy.actions(observations)
    log_pis = policy.log_pis(observations, actions)
    Q_values = Q.values(observations, actions)
    

    As said, the inputs to these models can be arbitrarily nested, as long as the structure of the input remains the same across calls.

    opened by hartikainen 5
  • trials did not complete error

    trials did not complete error

    The Complete Error Message is:

    ==========================================================

    (softlearning) surabhi@surabhi-Vostro-3559:~/Downloads/github/softlearning$ softlearning run_example_local examples.development --universe=gym --domain=HalfCheetah --task=v3 --exp-name=my-sac-experiment-1 --checkpoint-frequency=1000

    /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning)

    WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

    • https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

    • https://github.com/tensorflow/addons

    If you depend on functionality not listed there, please file an issue.

    WARNING: Logging before flag parsing goes to stderr. I0615 04:36:45.514044 140073955813120 init.py:34] MuJoCo library version is: 200 2019-06-15 04:36:45,621 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-15_04-36-45_621187_9030/logs. 2019-06-15 04:36:45,731 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:51785 to respond... 2019-06-15 04:36:45,856 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:43790 to respond... 2019-06-15 04:36:45,860 INFO services.py:806 -- Starting Redis shard with 0.81 GB max memory. 2019-06-15 04:36:45,898 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-15_04-36-45_621187_9030/logs. 2019-06-15 04:36:45,899 INFO services.py:1442 -- Starting the Plasma object store with 1.21 GB memory using /dev/shm. 2019-06-15 04:36:46,022 INFO tune.py:65 -- Did not find checkpoint file in /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1. 2019-06-15 04:36:46,022 INFO tune.py:232 -- Starting a new experiment. 2019-06-15 04:36:46,027 INFO web_server.py:241 -- Starting Tune Server...

    == Status ==

    Using FIFO scheduling algorithm. Resources requested: 0/4 CPUs, 0/0 GPUs Memory usage on this node: 2.6/4.0 GB

    2019-06-15 04:36:46,779 WARNING util.py:64 -- The start_trial operation took 0.7297773361206055 seconds to complete, which may be a performance bottleneck.

    == Status ==

    Using FIFO scheduling algorithm. Resources requested: 4/4 CPUs, 0/0 GPUs Memory usage on this node: 2.7/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'RUNNING': 1}) RUNNING trials:

    • id=14eb5e74-seed=221: RUNNING

    (pid=9082) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9082) RequestsDependencyWarning) (pid=9082) (pid=9082) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9082) For more information, please see: (pid=9082) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

    (pid=9082) * https://github.com/tensorflow/addons

    (pid=9082) If you depend on functionality not listed there, please file an issue. (pid=9082) (pid=9082) Using seed 221 (pid=9082) 2019-06-15 04:36:50.109066: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9082) 2019-06-15 04:36:50.114858: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9082) 2019-06-15 04:36:50.115111: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562f555f1110 executing computations on platform Host. Devices: (pid=9082) 2019-06-15 04:36:50.115134: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9082) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9082) Instructions for updating: (pid=9082) Colocations handled automatically by placer. (pid=9082) WARNING: Logging before flag parsing goes to stderr. (pid=9082) W0615 04:36:50.157908 140696037508864 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9082) Instructions for updating: (pid=9082) Colocations handled automatically by placer. 2019-06-15 04:36:50,263 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9082, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'

    2019-06-15 04:36:50,264 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. 2019-06-15 04:36:50,266 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint. (pid=9083) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9083) RequestsDependencyWarning) (pid=9083) (pid=9083) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9083) For more information, please see:

    (pid=9083) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

    (pid=9083) * https://github.com/tensorflow/addons

    (pid=9083) If you depend on functionality not listed there, please file an issue. (pid=9083) (pid=9083) 2019-06-15 04:36:53.790710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9083) 2019-06-15 04:36:53.794906: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9083) 2019-06-15 04:36:53.795049: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557721f5e310 executing computations on platform Host. Devices: (pid=9083) 2019-06-15 04:36:53.795068: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9083) Using seed 221 (pid=9083) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9083) Instructions for updating: (pid=9083) Colocations handled automatically by placer. (pid=9083) WARNING: Logging before flag parsing goes to stderr. (pid=9083) W0615 04:36:53.833971 139846760371968 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9083) Instructions for updating: (pid=9083) Colocations handled automatically by placer. 2019-06-15 04:36:53,940 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9083, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize' 2019-06-15 04:36:53,941 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. 2019-06-15 04:36:53,943 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint.

    == Status ==

    Using FIFO scheduling algorithm. Resources requested: 4/4 CPUs, 0/0 GPUs Memory usage on this node: 2.9/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'RUNNING': 1}) RUNNING trials:

    • id=14eb5e74-seed=221: RUNNING, 2 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-36-53.txt

    (pid=9081) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9081) RequestsDependencyWarning) (pid=9081) (pid=9081) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9081) For more information, please see:

    (pid=9081) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

    (pid=9081) * https://github.com/tensorflow/addons

    (pid=9081) If you depend on functionality not listed there, please file an issue. (pid=9081) (pid=9081) Using seed 221 (pid=9081) 2019-06-15 04:36:57.425650: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9081) 2019-06-15 04:36:57.429647: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9081) 2019-06-15 04:36:57.429862: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b3ac7c78a0 executing computations on platform Host. Devices: (pid=9081) 2019-06-15 04:36:57.429886: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9081) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9081) Instructions for updating: (pid=9081) Colocations handled automatically by placer. (pid=9081) WARNING: Logging before flag parsing goes to stderr. (pid=9081) W0615 04:36:57.472656 140634258609920 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9081) Instructions for updating: (pid=9081) Colocations handled automatically by placer. 2019-06-15 04:36:57,574 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9081, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'

    2019-06-15 04:36:57,575 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. 2019-06-15 04:36:57,576 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint. (pid=9084) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9084) RequestsDependencyWarning) (pid=9084) (pid=9084) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9084) For more information, please see:

    (pid=9084) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

    (pid=9084) * https://github.com/tensorflow/addons

    (pid=9084) If you depend on functionality not listed there, please file an issue. (pid=9084) (pid=9084) Using seed 221 (pid=9084) 2019-06-15 04:37:00.981560: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9084) 2019-06-15 04:37:00.987048: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9084) 2019-06-15 04:37:00.987274: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x561e70f53660 executing computations on platform Host. Devices: (pid=9084) 2019-06-15 04:37:00.987293: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9084) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9084) Instructions for updating: (pid=9084) Colocations handled automatically by placer. (pid=9084) WARNING: Logging before flag parsing goes to stderr. (pid=9084) W0615 04:37:01.021606 140204585965312 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9084) Instructions for updating: (pid=9084) Colocations handled automatically by placer. 2019-06-15 04:37:01,131 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9084, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'

    2019-06-15 04:37:01,132 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.

    == Status ==

    Using FIFO scheduling algorithm. Resources requested: 0/4 CPUs, 0/0 GPUs Memory usage on this node: 2.9/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'ERROR': 1}) ERROR trials:

    • id=14eb5e74-seed=221: ERROR, 4 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-37-01.txt

    == Status ==

    Using FIFO scheduling algorithm. Resources requested: 0/4 CPUs, 0/0 GPUs Memory usage on this node: 2.9/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'ERROR': 1}) ERROR trials:

    • id=14eb5e74-seed=221: ERROR, 4 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-37-01.txt

    Traceback (most recent call last): File "/home/surabhi/anaconda3/envs/softlearning/bin/softlearning", line 11, in load_entry_point('softlearning', 'console_scripts', 'softlearning')() File "/home/surabhi/Downloads/github/softlearning/softlearning/scripts/console_scripts.py", line 202, in main return cli() File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(*args, **kwargs) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd return run_example_local(example_module_name, example_argv) File "/home/surabhi/Downloads/github/softlearning/examples/instrument.py", line 224, in run_example_local reuse_actors=True)

    File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/tune.py", line 272, in run raise

    TuneError("Trials did not complete", errored_trials)

    ray.tune.error.TuneError: ('Trials did not complete', [id=14eb5e74-seed=221]) trial not cmplt

    opened by surbhi1944 5
  • Bump wheel from 0.36.2 to 0.38.1

    Bump wheel from 0.36.2 to 0.38.1

    Bumps wheel from 0.36.2 to 0.38.1.

    Changelog

    Sourced from wheel's changelog.

    Release Notes

    UNRELEASED

    • Updated vendored packaging to 22.0

    0.38.4 (2022-11-09)

    • Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

    0.38.3 (2022-11-08)

    • Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

    0.38.2 (2022-11-05)

    • Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

    0.38.1 (2022-11-04)

    • Removed install dependency on setuptools
    • The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

    0.38.0 (2022-10-21)

    • Dropped support for Python < 3.7
    • Updated vendored packaging to 21.3
    • Replaced all uses of distutils with setuptools
    • The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.
    • Fixed potential DoS attack via the WHEEL_INFO_RE regular expression
    • Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

    0.37.1 (2021-12-22)

    • Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)
    • Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

    0.37.0 (2021-08-09)

    • Added official Python 3.10 support
    • Updated vendored packaging library to v20.9

    ... (truncated)

    Commits
    • 6f1608d Created a new release
    • cf8f5ef Moved news item from PR #484 to its proper place
    • 9ec2016 Removed install dependency on setuptools (#483)
    • 747e1f6 Fixed PyPy SOABI parsing (#484)
    • 7627548 [pre-commit.ci] pre-commit autoupdate (#480)
    • 7b9e8e1 Test on Python 3.11 final
    • a04dfef Updated the pypi-publish action
    • 94bb62c Fixed docs not building due to code style changes
    • d635664 Updated the codecov action to the latest version
    • fcb94cd Updated version to match the release
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump certifi from 2020.12.5 to 2022.12.7

    Bump certifi from 2020.12.5 to 2022.12.7

    Bumps certifi from 2020.12.5 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • why target entropy is -dim(A)?

    why target entropy is -dim(A)?

    I read Soft Actor-Critic Algorithms and Applications. And I have a question. In paper's appendix, target entropy (is this a desired minimum expected entropy?)is -dim(A) , why target entropy is -dim(A)?

    opened by mittu1008 0
  • Bump pillow from 7.2.0 to 9.3.0

    Bump pillow from 7.2.0 to 9.3.0

    Bumps pillow from 7.2.0 to 9.3.0.

    Release notes

    Sourced from pillow's releases.

    9.3.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.3.0 (2022-10-29)

    • Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

    • Initialize libtiff buffer when saving #6699 [radarhere]

    • Inline fname2char to fix memory leak #6329 [nulano]

    • Fix memory leaks related to text features #6330 [nulano]

    • Use double quotes for version check on old CPython on Windows #6695 [hugovk]

    • Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

    • Fixed set_variation_by_name offset #6445 [radarhere]

    • Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

    • Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

    • Added ExifTags enums #6630 [radarhere]

    • Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

    • Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

    • Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

    • Added GPS TIFF tag info #6661 [radarhere]

    • Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

    • Do not attempt normalization if mode is already normal #6644 [radarhere]

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump tensorflow from 2.4.1 to 2.9.3

    Bump tensorflow from 2.4.1 to 2.9.3

    Bumps tensorflow from 2.4.1 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump joblib from 1.0.0 to 1.2.0

    Bump joblib from 1.0.0 to 1.2.0

    Bumps joblib from 1.0.0 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Robotic AI & Learning Lab Berkeley
Robotic AI & Learning Lab Berkeley
Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Preference-Planning-Deep-IRL Introduction Check my portfolio post Dependencies Gym stable-baselines3 PyTorch Usage Take Demonstration python3 record.

Tianyu Li 9 Oct 26, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Ilya Kostrikov 3k Dec 31, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

Andrey 3 Jan 23, 2022
Using deep actor-critic model to learn best strategies in pair trading

Deep-Reinforcement-Learning-in-Stock-Trading Using deep actor-critic model to learn best strategies in pair trading Abstract Partially observed Markov

null 281 Dec 9, 2022
Asynchronous Advantage Actor-Critic in PyTorch

Asynchronous Advantage Actor-Critic in PyTorch This is PyTorch implementation of A3C as described in Asynchronous Methods for Deep Reinforcement Learn

Reiji Hatsugai 38 Dec 12, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [Project website] [Paper] This project is a PyTorch i

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 6 Feb 28, 2022
PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Maximum Entropy Generators for Energy-Based Models All experiments have tensorboard visualizations for samples / density / train curves etc. To run th

Rithesh Kumar 135 Oct 27, 2022
PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

Mitch Hill 32 Nov 25, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 3, 2023
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 6, 2022
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

null 1 Nov 1, 2021
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 7, 2022
Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Batch Soft-DTW(Dynamic Time Warping) in TensorFlow2 including forward and backward computation Custom TensorFlow2 implementations of forward and backw

null 19 Aug 30, 2022
This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

International Business Machines 72 Jan 6, 2023
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

null 20 Oct 24, 2022