Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Robotic AI & Learning Lab Berkeley

Last update: Dec 30, 2022

Related tags

Deep Learning machine-learning deep-neural-networks reinforcement-learning deep-learning deep-reinforcement-learning soft-actor-critic

Overview

Softlearning

Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is fairly thin and primarily optimized for our own development purposes. It utilizes the tf.keras modules for most of the model classes (e.g. policies and value functions). We use Ray for the experiment orchestration. Ray Tune and Autoscaler implement several neat features that enable us to seamlessly run the same experiment scripts that we use for local prototyping to launch large-scale experiments on any chosen cloud service (e.g. GCP or AWS), and intelligently parallelize and distribute training for effective resource allocation.

This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit.

Getting Started

Prerequisites

The environment can be run either locally using conda or inside a docker container. For conda installation, you need to have Conda installed. For docker installation you will need to have Docker and Docker Compose installed. Also, most of our environments currently require a MuJoCo license.

Conda Installation

Download and install MuJoCo 1.50 and 2.00 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150 and ~/.mujoco/mujoco200_{platform}). Unfortunately, gym and dm_control expect different paths for MuJoCo 2.00 installation, which is why you will need to have it installed both in ~/.mujoco/mujoco200_{platform} and ~/.mujoco/mujoco200. The easiest way is to create a symlink from ~/.mujoco/mujoco200_{plaftorm} -> ~/.mujoco/mujoco200 with: ln -s ~/.mujoco/mujoco200_{platform} ~/.mujoco/mujoco200.
Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:
Clone softlearning

git clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH}

Create and activate conda environment, install softlearning to enable command line interface.

cd ${SOFTLEARNING_PATH}
conda env create -f environment.yml
conda activate softlearning
pip install -e ${SOFTLEARNING_PATH}

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

conda deactivate
conda remove --name softlearning --all

Docker Installation

docker-compose

To build the image and run the container:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.cpu.yml \
        up \
        -d \
        --force-recreate

You can access the container with the typical Docker exec-command, i.e.

docker exec -it softlearning bash

See examples section for examples of how to train and simulate the agents.

Finally, to clean up the docker setup:

docker-compose \
    -f ./docker/docker-compose.dev.cpu.yml \
    down \
    --rmi all \
    --volumes

Examples

Training and simulating an agent

To train the agent

softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000  # Save the checkpoint to resume training later

To simulate the resulting policy: First, find the absolute path that the checkpoint is saved to. By default (i.e. without specifying the log-dir argument to the previous script), the data is saved under ~/ray_results/<universe>/<domain>/<task>/<datatimestamp>-<exp-name>/<trial-id>/<checkpoint-id>. For example: ~/ray_results/gym/HalfCheetah/v3/2018-12-12T16-48-37-my-sac-experiment-1-0/mujoco-runner_0_seed=7585_2018-12-12_16-48-37xuadh9vd/checkpoint_1000/. The next command assumes that this path is found from ${SAC_CHECKPOINT_DIR} environment variable.

python -m examples.development.simulate_policy \
    ${SAC_CHECKPOINT_DIR} \
    --max-path-length 1000 \
    --num-rollouts 1 \
    --render-kwargs '{"mode": "human"}'

examples.development.main contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag: python ./examples/development/main.py --help

optional arguments:
  -h, --help            show this help message and exit
  --universe {robosuite,dm_control,gym}
  --domain DOMAIN
  --task TASK
  --checkpoint-replay-pool CHECKPOINT_REPLAY_POOL
                        Whether a checkpoint should also saved the replay
                        pool. If set, takes precedence over
                        variant['run_params']['checkpoint_replay_pool']. Note
                        that the replay pool is saved (and constructed) piece
                        by piece so that each experience is saved only once.
  --algorithm ALGORITHM
  --policy {gaussian}
  --exp-name EXP_NAME
  --mode MODE
  --run-eagerly RUN_EAGERLY
                        Whether to run tensorflow in eager mode.
  --local-dir LOCAL_DIR
                        Destination local folder to save training results.
  --confirm-remote [CONFIRM_REMOTE]
                        Whether or not to query yes/no on remote run.
  --video-save-frequency VIDEO_SAVE_FREQUENCY
                        Save frequency for videos.
  --cpus CPUS           Cpus to allocate to ray process. Passed to `ray.init`.
  --gpus GPUS           Gpus to allocate to ray process. Passed to `ray.init`.
  --resources RESOURCES
                        Resources to allocate to ray process. Passed to
                        `ray.init`.
  --include-webui INCLUDE_WEBUI
                        Boolean flag indicating whether to start theweb UI,
                        which is a Jupyter notebook. Passed to `ray.init`.
  --temp-dir TEMP_DIR   If provided, it will specify the root temporary
                        directory for the Ray process. Passed to `ray.init`.
  --resources-per-trial RESOURCES_PER_TRIAL
                        Resources to allocate for each trial. Passed to
                        `tune.run`.
  --trial-cpus TRIAL_CPUS
                        CPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for CPUs. Passed to
                        `tune.run`.
  --trial-gpus TRIAL_GPUS
                        GPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for GPUs. Passed to
                        `tune.run`.
  --trial-extra-cpus TRIAL_EXTRA_CPUS
                        Extra CPUs to reserve in case the trials need to
                        launch additional Ray actors that use CPUs.
  --trial-extra-gpus TRIAL_EXTRA_GPUS
                        Extra GPUs to reserve in case the trials need to
                        launch additional Ray actors that use GPUs.
  --num-samples NUM_SAMPLES
                        Number of times to repeat each trial. Passed to
                        `tune.run`.
  --upload-dir UPLOAD_DIR
                        Optional URI to sync training results to (e.g.
                        s3://<bucket> or gs://<bucket>). Passed to `tune.run`.
  --trial-name-template TRIAL_NAME_TEMPLATE
                        Optional string template for trial name. For example:
                        '{trial.trial_id}-seed={trial.config[run_params][seed]
                        }' Passed to `tune.run`.
  --checkpoint-frequency CHECKPOINT_FREQUENCY
                        How many training iterations between checkpoints. A
                        value of 0 (default) disables checkpointing. If set,
                        takes precedence over
                        variant['run_params']['checkpoint_frequency']. Passed
                        to `tune.run`.
  --checkpoint-at-end CHECKPOINT_AT_END
                        Whether to checkpoint at the end of the experiment. If
                        set, takes precedence over
                        variant['run_params']['checkpoint_at_end']. Passed to
                        `tune.run`.
  --max-failures MAX_FAILURES
                        Try to recover a trial from its last checkpoint at
                        least this many times. Only applies if checkpointing
                        is enabled. Passed to `tune.run`.
  --restore RESTORE     Path to checkpoint. Only makes sense to set if running
                        1 trial. Defaults to None. Passed to `tune.run`.
  --server-port SERVER_PORT
                        Port number for launching TuneServer. Passed to
                        `tune.run`.

Resume training from a saved checkpoint

This feature is currently broken!

In order to resume training from previous checkpoint, run the original example main-script, with an additional --restore flag. For example, the previous example can be resumed as follows:

softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000 \
    --restore ${SAC_CHECKPOINT_PATH}

References

The algorithms are based on the following papers:

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja*, Aurick Zhou*, Kristian Hartikainen*, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. arXiv preprint, 2018.
paper | videos

Latent Space Policies for Hierarchical Reinforcement Learning.
Tuomas Haarnoja*, Kristian Hartikainen*, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Composable Deep Reinforcement Learning for Robotic Manipulation.
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine. International Conference on Robotics and Automation (ICRA), 2018.
paper | videos

Reinforcement Learning with Deep Energy-Based Policies.
Tuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine. International Conference on Machine Learning (ICML), 2017.
paper | videos

If Softlearning helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

@techreport{haarnoja2018sacapps,
  title={Soft Actor-Critic Algorithms and Applications},
  author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine},
  journal={arXiv preprint arXiv:1812.05905},
  year={2018}
}

Comments

Add support for SQL

@hartikainen This is my progress so far. However, I am encountering an issue with policies:

Traceback (most recent call last):
  File "/Users/bohan/miniconda2/envs/softlearning/lib/python3.6/site-packages/ray/tune/function_runner.py", line 80, in run
    self._entrypoint(*self._entrypoint_args)
  File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/examples/multi_goal/main.py", line 53, in run_experiment
    plotter=plotter,
  File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/utils.py", line 36, in get_algorithm_from_variant
    variant, *args, **algorithm_kwargs, **kwargs)
  File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/utils.py", line 15, in create_SQL_algorithm
    algorithm = SQL(*args, **kwargs)
  File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/sql.py", line 117, in __init__
    self._create_svgd_update()
  File "/Users/bohan/Documents/EngSci/4-1/research/softlearning/softlearning/algorithms/sql.py", line 219, in _create_svgd_update
    actions = self.policy.actions_for(
AttributeError: 'FeedforwardGaussianPolicy' object has no attribute 'actions_for'

I've been trying to use actions_np, but it seems to have other issues.

opened by bohan-zhang 25

Unable to reproduce result on HalfCheetah-v2

I am unable to obtain the result as reported in the paper on the openai environment HalfCheetah-v2. The commit used to obtain this result is 1f6147c83b82b376ceed5e95df5a422113741468, which isn't too long ago. The result is averaged over 5 random initial seeds.

Do you know what might be causing this issue? Thank you!

I am able to obtain the result as reported (or close to it) in the paper on the remaining environments, posted here for reference.

opened by quanvuong 18
Module 'gym' has no attribute 'register' on MacOS Mojave 10.14.4

Hi All, when I tried to run a reward learning task (https://github.com/avisingh599/reward-learning-rl) with softlearning environment, the following error occurred: "AttributeError: module 'gym' has no attribute 'register'"

However when I ran import gym and gym.register() on a separate python script on Pycharm it works fine, e.g. able to find the register module in gym. I had a look at the previous issues posted for Softlearning and think this is a gym adapter issue? But I am not sure how to manually add this environment/task onto gym_adapter in the Softlearning package? Many thanks for your help!!

opened by weijiafeng 8
Question on initialization of alpha and entropy
Question1: From hereheuristic_target_entropy, I see the initialization of alpha related to action_dim, I don't figure out why make it related to action_dim. Theoretically, is it also workable by just set target_entropy a hard code number, like 0.1 (practically, it seems to be working, but I am not sure)?

Question2: According to your paper of SAC version2, entropy of a state action should NEVER lower than target_entropy, but during training, after each learning round, I found that the entropy of a state action pair would sometimes lower than target_entropy! pseudo code is like this (I use pytorch):

alpha_loss = torch.tensor(0.).to(self.device) alpha_tlogs = torch.tensor(self.alpha) # For TensorboardX logs for each_ele in -log_pi: if each_ele < self.target_entropy: print("error,-logpi<target_entropy!!!!")

Does it mean I coded it wrong?

Question3: Alpha , sometimes would go higher than 1 during learning, is it correct?
opened by dbsxdbsx 7
Implementation of automatic entropy temperature tuning(alpha loss)

https://github.com/rail-berkeley/softlearning/blob/46f14436f62465a02b99f431bbcf57a7fa0fd09d/softlearning/algorithms/sac.py#L254-L255 The implementation of the alpha loss seems to vary from the formula definition, formula 18 in the paper Haarnoja et al., 2019. Is this a bug?

opened by Maggern3 6

Multiple conflicts in requirements.txt

When trying to create the conda environment, as described in the README.md I pip finds multiple conflicts with the selected packages (these problems also occur with the docker image) I haven't been able to successfully test my installation, so I am not sure if this list is complete, but at least the following packages (of which some are explicitly "required" at that version) are conflicting:

ERROR: requests 2.20.1 has requirement urllib3<1.25,>=1.21.1, but you'll have urllib3 1.25.1 which is incompatible.
ERROR: botocore 1.12.130 has requirement urllib3<1.25,>=1.20; python_version >= "3.4", but you'll have urllib3 1.25.1 which is incompatible.
ERROR: awscli 1.16.140 has requirement PyYAML<=3.13,>=3.10, but you'll have pyyaml 5.1 which is incompatible.
ERROR: gym 0.15.4 has requirement cloudpickle~=1.2.0, but you'll have cloudpickle 1.1.1 which is incompatible.
ERROR: robosuite 0.1.0 has requirement mujoco-py<1.50.2,>=1.50.1, but you'll have mujoco-py 2.0.2.8 which is incompatible.
ERROR: dm-tree 0.1.2 has requirement six>=1.12.0, but you'll have six 1.11.0 which is incompatible.
ERROR: Cannot uninstall 'ruamel-yaml'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

opened by johannespitz 6

Bound std of Gaussian policy via beta-sigmoid ?
If I got it correctly, the logstd of the Gaussian policy is clipped via min/max range and std is retrieved by exponentiation.

I am curious if using beta-sigmiodal function to model the logstd would be a tiny bit more stable, because it allows smooth lower/upper bound and less sharp gradient for larger magnitude.

e.g.

logvar = network output var = 1/(1 + self.beta*torch.exp(-logvar)) var = min_var + (max_var - min_var)*var
opened by zuoxingdong 6
Local dir cli

This pull request adds a --local-dir argument to the command line interface and resolves this issue: https://github.com/rail-berkeley/softlearning/issues/128. I also fixed an issue with pip install -e . that re-installs an older incompatible version of ray.

opened by brandontrabucco 5
Policy weights and output becomes NaN after some iterations
Issue Overview

After some training iterations, the policy starts outputting NaN, all the policy weights become NaN.

Package versions

Lastest commit of softlearning

tensorflow version 2.2.0rc2

tfp-nightly version 0.11.0.dev20200424

Preliminary debugging

I think the issue might be cause by either of these three things: the Tanh bijector, large learning rate (unlikely, using the default 3e-4), or the alpha training (most likely alpha).

Tanh

The policy sometimes output actions that are 1 or -1 (I checked and it never outputted values greater than 1 in magnitude), which may cause a problem with the inverse becoming +inf or -inf, which may or may not be a problem because I don't know if inverse is ever used (edit: inverse is called when calculating log_prob https://github.com/tensorflow/probability/blob/dd3a555ef37fc31c6ad04f3236942e3dbc0f4228/tensorflow_probability/python/distributions/transformed_distribution.py#L509). Could be the problem due to https://github.com/tensorflow/probability/issues/840 but this is apparently fixed by combining action and log prob together.

Alpha

This is most likely the issue. From my logging diagnostics inside of _do_training_repeats https://github.com/rail-berkeley/softlearning/blob/84d7589fd5852aff9aa46debda6de39acaec2e0b/softlearning/algorithms/rl_algorithm.py#L336 a few training steps before the policy failed look like this:

diagnostics: OrderedDict([('Q_value-mean', 3.2876506), ('Q_loss-mean', 0.04909911), ('policy_loss-mean', -3.1032994), ('alpha', nan), ('alpha_loss-mean', -inf)]) diagnostics: OrderedDict([('Q_value-mean', 3.2876506), ('Q_loss-mean', 0.04909911), ('policy_loss-mean', -3.1032994), ('alpha', nan), ('alpha_loss-mean', -inf)]) diagnostics: OrderedDict([('Q_value-mean', 3.3472314), ('Q_loss-mean', nan), ('policy_loss-mean', nan), ('alpha', nan), ('alpha_loss-mean', nan)]) diagnostics: OrderedDict([('Q_value-mean', 3.3472314), ('Q_loss-mean', nan), ('policy_loss-mean', nan), ('alpha', nan), ('alpha_loss-mean', nan)]) diagnostics: OrderedDict([('Q_value-mean', 3.3472314), ('Q_loss-mean', nan), ('policy_loss-mean', nan), ('alpha', nan), ('alpha_loss-mean', nan)])

We can see that alpha was the first to fail, which then propagated to the Q functions and policy. I also noticed that during training, sometimes alpha would become negative, and from my understanding of automatic entropy adjustment, alpha should always be non-negative.

After digging through the SAC training step, I noticed this line https://github.com/rail-berkeley/softlearning/blob/84d7589fd5852aff9aa46debda6de39acaec2e0b/softlearning/algorithms/sac.py#L247-L252 which is different from the old tf1 implementation that uses log_alpha instead https://github.com/rail-berkeley/softlearning/blob/bd30e33f22a7418b3e6d659908938b8bb500e6f1/softlearning/algorithms/sac.py#L210-L211

The SAC paper uses alpha as the multiplier instead of log_alpha in the loss function, so the old implementation might be an oversight? However, the old code did store log alpha as the training variable.

Or the issue might be something else, for example what caused the alpha loss to be -inf in the first place? Perhaps log_pis became -inf, which means actions_and_log_probs was the problem? I don't know enough about the implementation to decide for sure.

Let me know if you want more logs or the programs output or other questions, etc. This was ran on my own environment in a fork of this repo:

https://github.com/externalhardrive/mobilemanipulation-tf2/blob/efe8161c2692d4747ad0623cfd8218cdbf4211d2/softlearning/environments/gym/locobot/nav_grasp_envs.py#L113
opened by charlesjsun 5
tf2 support
This is still work in progress. Things to do include at least:

Fix ExperimentRunner checkpointing. It would be nice to completely refactor this in order to make it easier to extend and understand.

Verify that SAC and SQL performance hasn't degraded. At least in the default gym benchmarks.

Update other pip requirements as needed.

Refactors SAC code to run on tensorflow>=2.0. The change is fairly large, as all the tf1 placeholders, sessions, graphs, etc. are now gone, and everything uses tf.keras.Models and tf.functions instead.

The new tf.functions allow functions to be executed in the same manner as in the old tf1 session.run, i.e. larger parts of code can be optimized to run as a graph and thus there's no slow down from using the eager mode. In fact, based on the preliminary runs, the new tf2 implementation seems to be ~30% faster than our old tf1 implementation.

The tf.functions can be disabled using the debug mode (softlearning run_example_debug ...) which sets tf.config.experimental_run_functions_eagerly(True) https://github.com/hartikainen/softlearning/blob/89f0f90d127feec430800bb6dd7c792527116dde/examples/instrument.py#L261. For more info about tf.config.experimental_run_functions_eagerly, see https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/config/experimental_run_functions_eagerly.

This also cleans some of the model code that used to be a bit clumsy due to some edge cases not working in tf1. For example, all the models should now accept any nested tuples/arrays/dictionaries as inputs, making usage of e.g. gym dictionary observation spaces and goal-conditioned policies much cleaner. Here's an example of how our policy and Q-function calls have changed:

Before with tf1:

observations = { 'joint_position': np.random.uniform((batch_size, 7)), 'joint_velocity': np.random.uniform((batch_size, 7)), } flat_observations = flatten_input_structure(observations) actions = policy.actions(flat_observations) log_pis = policy.log_pis(flat_observations, actions) Q_inputs = flatten_input_structure({**observations, 'actions': actions}) Q_values = Q(Q_inputs)

Now with tf2:

actions = policy.actions(observations) log_pis = policy.log_pis(observations, actions) Q_values = Q.values(observations, actions)

As said, the inputs to these models can be arbitrarily nested, as long as the structure of the input remains the same across calls.
opened by hartikainen 5
trials did not complete error
The Complete Error Message is:

==========================================================

(softlearning) surabhi@surabhi-Vostro-3559:~/Downloads/github/softlearning$ softlearning run_example_local examples.development --universe=gym --domain=HalfCheetah --task=v3 --exp-name=my-sac-experiment-1 --checkpoint-frequency=1000

/home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning)

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

https://github.com/tensorflow/addons

If you depend on functionality not listed there, please file an issue.

WARNING: Logging before flag parsing goes to stderr. I0615 04:36:45.514044 140073955813120 init.py:34] MuJoCo library version is: 200 2019-06-15 04:36:45,621 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-15_04-36-45_621187_9030/logs. 2019-06-15 04:36:45,731 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:51785 to respond... 2019-06-15 04:36:45,856 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:43790 to respond... 2019-06-15 04:36:45,860 INFO services.py:806 -- Starting Redis shard with 0.81 GB max memory. 2019-06-15 04:36:45,898 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-15_04-36-45_621187_9030/logs. 2019-06-15 04:36:45,899 INFO services.py:1442 -- Starting the Plasma object store with 1.21 GB memory using /dev/shm. 2019-06-15 04:36:46,022 INFO tune.py:65 -- Did not find checkpoint file in /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1. 2019-06-15 04:36:46,022 INFO tune.py:232 -- Starting a new experiment. 2019-06-15 04:36:46,027 INFO web_server.py:241 -- Starting Tune Server...

== Status ==

Using FIFO scheduling algorithm. Resources requested: 0/4 CPUs, 0/0 GPUs Memory usage on this node: 2.6/4.0 GB

2019-06-15 04:36:46,779 WARNING util.py:64 -- The start_trial operation took 0.7297773361206055 seconds to complete, which may be a performance bottleneck.

== Status ==

Using FIFO scheduling algorithm. Resources requested: 4/4 CPUs, 0/0 GPUs Memory usage on this node: 2.7/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'RUNNING': 1}) RUNNING trials:

id=14eb5e74-seed=221: RUNNING

(pid=9082) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9082) RequestsDependencyWarning) (pid=9082) (pid=9082) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9082) For more information, please see: (pid=9082) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

(pid=9082) * https://github.com/tensorflow/addons

(pid=9082) If you depend on functionality not listed there, please file an issue. (pid=9082) (pid=9082) Using seed 221 (pid=9082) 2019-06-15 04:36:50.109066: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9082) 2019-06-15 04:36:50.114858: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9082) 2019-06-15 04:36:50.115111: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562f555f1110 executing computations on platform Host. Devices: (pid=9082) 2019-06-15 04:36:50.115134: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9082) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9082) Instructions for updating: (pid=9082) Colocations handled automatically by placer. (pid=9082) WARNING: Logging before flag parsing goes to stderr. (pid=9082) W0615 04:36:50.157908 140696037508864 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9082) Instructions for updating: (pid=9082) Colocations handled automatically by placer. 2019-06-15 04:36:50,263 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9082, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'

2019-06-15 04:36:50,264 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. 2019-06-15 04:36:50,266 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint. (pid=9083) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9083) RequestsDependencyWarning) (pid=9083) (pid=9083) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9083) For more information, please see:

(pid=9083) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

(pid=9083) * https://github.com/tensorflow/addons

(pid=9083) If you depend on functionality not listed there, please file an issue. (pid=9083) (pid=9083) 2019-06-15 04:36:53.790710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9083) 2019-06-15 04:36:53.794906: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9083) 2019-06-15 04:36:53.795049: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557721f5e310 executing computations on platform Host. Devices: (pid=9083) 2019-06-15 04:36:53.795068: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9083) Using seed 221 (pid=9083) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9083) Instructions for updating: (pid=9083) Colocations handled automatically by placer. (pid=9083) WARNING: Logging before flag parsing goes to stderr. (pid=9083) W0615 04:36:53.833971 139846760371968 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9083) Instructions for updating: (pid=9083) Colocations handled automatically by placer. 2019-06-15 04:36:53,940 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9083, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize' 2019-06-15 04:36:53,941 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. 2019-06-15 04:36:53,943 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint.

== Status ==

Using FIFO scheduling algorithm. Resources requested: 4/4 CPUs, 0/0 GPUs Memory usage on this node: 2.9/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'RUNNING': 1}) RUNNING trials:

id=14eb5e74-seed=221: RUNNING, 2 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-36-53.txt

(pid=9081) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9081) RequestsDependencyWarning) (pid=9081) (pid=9081) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9081) For more information, please see:

(pid=9081) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

(pid=9081) * https://github.com/tensorflow/addons

(pid=9081) If you depend on functionality not listed there, please file an issue. (pid=9081) (pid=9081) Using seed 221 (pid=9081) 2019-06-15 04:36:57.425650: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9081) 2019-06-15 04:36:57.429647: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9081) 2019-06-15 04:36:57.429862: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b3ac7c78a0 executing computations on platform Host. Devices: (pid=9081) 2019-06-15 04:36:57.429886: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9081) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9081) Instructions for updating: (pid=9081) Colocations handled automatically by placer. (pid=9081) WARNING: Logging before flag parsing goes to stderr. (pid=9081) W0615 04:36:57.472656 140634258609920 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9081) Instructions for updating: (pid=9081) Colocations handled automatically by placer. 2019-06-15 04:36:57,574 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9081, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'

2019-06-15 04:36:57,575 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads. 2019-06-15 04:36:57,576 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint. (pid=9084) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version! (pid=9084) RequestsDependencyWarning) (pid=9084) (pid=9084) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. (pid=9084) For more information, please see:

(pid=9084) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md

(pid=9084) * https://github.com/tensorflow/addons

(pid=9084) If you depend on functionality not listed there, please file an issue. (pid=9084) (pid=9084) Using seed 221 (pid=9084) 2019-06-15 04:37:00.981560: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA (pid=9084) 2019-06-15 04:37:00.987048: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz (pid=9084) 2019-06-15 04:37:00.987274: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x561e70f53660 executing computations on platform Host. Devices: (pid=9084) 2019-06-15 04:37:00.987293: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , (pid=9084) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9084) Instructions for updating: (pid=9084) Colocations handled automatically by placer. (pid=9084) WARNING: Logging before flag parsing goes to stderr. (pid=9084) W0615 04:37:01.021606 140204585965312 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. (pid=9084) Instructions for updating: (pid=9084) Colocations handled automatically by placer. 2019-06-15 04:37:01,131 ERROR trial_runner.py:487 -- Error processing event. Traceback (most recent call last): File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result result = ray.get(trial_future[0]) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get raise value ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9084, host=surabhi-Vostro-3559) File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train result = self._train() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train self._build() File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build variant, training_environment) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant return get_policy_from_params(policy_params, *args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy policy = FeedforwardGaussianPolicy(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init self._Serializable__initialize(locals()) AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'

2019-06-15 04:37:01,132 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.

== Status ==

Using FIFO scheduling algorithm. Resources requested: 0/4 CPUs, 0/0 GPUs Memory usage on this node: 2.9/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'ERROR': 1}) ERROR trials:

id=14eb5e74-seed=221: ERROR, 4 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-37-01.txt

== Status ==

Using FIFO scheduling algorithm. Resources requested: 0/4 CPUs, 0/0 GPUs Memory usage on this node: 2.9/4.0 GB Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1 Number of trials: 1 ({'ERROR': 1}) ERROR trials:

id=14eb5e74-seed=221: ERROR, 4 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-37-01.txt

Traceback (most recent call last): File "/home/surabhi/anaconda3/envs/softlearning/bin/softlearning", line 11, in load_entry_point('softlearning', 'console_scripts', 'softlearning')() File "/home/surabhi/Downloads/github/softlearning/softlearning/scripts/console_scripts.py", line 202, in main return cli() File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(*args, **kwargs) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/home/surabhi/Downloads/github/softlearning/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd return run_example_local(example_module_name, example_argv) File "/home/surabhi/Downloads/github/softlearning/examples/instrument.py", line 224, in run_example_local reuse_actors=True)

File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/tune.py", line 272, in run raise

TuneError("Trials did not complete", errored_trials)

ray.tune.error.TuneError: ('Trials did not complete', [id=14eb5e74-seed=221])
opened by surbhi1944 5
Bump wheel from 0.36.2 to 0.38.1
Bumps wheel from 0.36.2 to 0.38.1.

Changelog

Sourced from wheel's changelog.

Release Notes

UNRELEASED

Updated vendored packaging to 22.0

0.38.4 (2022-11-09)

Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

0.38.3 (2022-11-08)

Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

0.38.2 (2022-11-05)

Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

0.38.1 (2022-11-04)

Removed install dependency on setuptools

The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

0.38.0 (2022-10-21)

Dropped support for Python < 3.7

Updated vendored packaging to 21.3

Replaced all uses of distutils with setuptools

The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.

Fixed potential DoS attack via the WHEEL_INFO_RE regular expression

Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

0.37.1 (2021-12-22)

Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)

Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

0.37.0 (2021-08-09)

Added official Python 3.10 support

Updated vendored packaging library to v20.9

... (truncated)

Commits

6f1608d Created a new release

cf8f5ef Moved news item from PR #484 to its proper place

9ec2016 Removed install dependency on setuptools (#483)

747e1f6 Fixed PyPy SOABI parsing (#484)

7627548 [pre-commit.ci] pre-commit autoupdate (#480)

7b9e8e1 Test on Python 3.11 final

a04dfef Updated the pypi-publish action

94bb62c Fixed docs not building due to code style changes

d635664 Updated the codecov action to the latest version

fcb94cd Updated version to match the release

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump certifi from 2020.12.5 to 2022.12.7
Bumps certifi from 2020.12.5 to 2022.12.7.

Commits

9e9e840 2022.12.07

b81bdb2 2022.09.24

939a28f 2022.09.14

aca828a 2022.06.15.2

de0eae1 Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...

b8eb5e9 2022.06.15.1

47fb7ab Fix deprecation warning on Python 3.11 (#199)

b0b48e0 fixes #198 -- update link in license

9d514b4 2022.06.15

4151e88 Add py.typed to MANIFEST.in to package in sdist (#196)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
why target entropy is -dim(A)?

I read Soft Actor-Critic Algorithms and Applications. And I have a question. In paper's appendix, target entropy (is this a desired minimum expected entropy?)is -dim(A) , why target entropy is -dim(A)?

opened by mittu1008 0
Bump pillow from 7.2.0 to 9.3.0
Bumps pillow from 7.2.0 to 9.3.0.

Release notes

Sourced from pillow's releases.

9.3.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

Changes

Initialize libtiff buffer when saving #6699 [@radarhere]

Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [@wiredfool]

Inline fname2char to fix memory leak #6329 [@nulano]

Fix memory leaks related to text features #6330 [@nulano]

Use double quotes for version check on old CPython on Windows #6695 [@hugovk]

GHA: replace deprecated set-output command with GITHUB_OUTPUT file #6697 [@nulano]

Remove backup implementation of Round for Windows platforms #6693 [@cgohlke]

Upload fribidi.dll to GitHub Actions #6532 [@nulano]

Fixed set_variation_by_name offset #6445 [@radarhere]

Windows build improvements #6562 [@nulano]

Fix malloc in _imagingft.c:font_setvaraxes #6690 [@cgohlke]

Only use ASCII characters in C source file #6691 [@cgohlke]

Release Python GIL when converting images using matrix operations #6418 [@hmaarrfk]

Added ExifTags enums #6630 [@radarhere]

Do not modify previous frame when calculating delta in PNG #6683 [@radarhere]

Added support for reading BMP images with RLE4 compression #6674 [@npjg]

Decode JPEG compressed BLP1 data in original mode #6678 [@radarhere]

pylint warnings #6659 [@marksmayo]

Added GPS TIFF tag info #6661 [@radarhere]

Added conversion between RGB/RGBA/RGBX and LAB #6647 [@radarhere]

Do not attempt normalization if mode is already normal #6644 [@radarhere]

Fixed seeking to an L frame in a GIF #6576 [@radarhere]

Consider all frames when selecting mode for PNG save_all #6610 [@radarhere]

Don't reassign crc on ChunkStream close #6627 [@radarhere]

Raise a warning if NumPy failed to raise an error during conversion #6594 [@radarhere]

Only read a maximum of 100 bytes at a time in IMT header #6623 [@radarhere]

Show all frames in ImageShow #6611 [@radarhere]

Allow FLI palette chunk to not be first #6626 [@radarhere]

If first GIF frame has transparency for RGB_ALWAYS loading strategy, use RGBA mode #6592 [@radarhere]

Round box position to integer when pasting embedded color #6517 [@radarhere]

Removed EXIF prefix when saving WebP #6582 [@radarhere]

Pad IM palette to 768 bytes when saving #6579 [@radarhere]

Added DDS BC6H reading #6449 [@ShadelessFox]

Added support for opening WhiteIsZero 16-bit integer TIFF images #6642 [@JayWiz]

Raise an error when allocating translucent color to RGB palette #6654 [@jsbueno]

Moved mode check outside of loops #6650 [@radarhere]

Added reading of TIFF child images #6569 [@radarhere]

Improved ImageOps palette handling #6596 [@PososikTeam]

Defer parsing of palette into colors #6567 [@radarhere]

Apply transparency to P images in ImageTk.PhotoImage #6559 [@radarhere]

Use rounding in ImageOps contain() and pad() #6522 [@bibinhashley]

Fixed GIF remapping to palette with duplicate entries #6548 [@radarhere]

Allow remap_palette() to return an image with less than 256 palette entries #6543 [@radarhere]

Corrected BMP and TGA palette size when saving #6500 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.3.0 (2022-10-29)

Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

Initialize libtiff buffer when saving #6699 [radarhere]

Inline fname2char to fix memory leak #6329 [nulano]

Fix memory leaks related to text features #6330 [nulano]

Use double quotes for version check on old CPython on Windows #6695 [hugovk]

Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

Fixed set_variation_by_name offset #6445 [radarhere]

Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

Added ExifTags enums #6630 [radarhere]

Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

Added GPS TIFF tag info #6661 [radarhere]

Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

Do not attempt normalization if mode is already normal #6644 [radarhere]

... (truncated)

Commits

d594f4c Update CHANGES.rst [ci skip]

909dc64 9.3.0 version bump

1a51ce7 Merge pull request #6699 from hugovk/security-libtiff_buffer

2444cdd Merge pull request #6700 from hugovk/security-samples_per_pixel-sec

744f455 Added release notes

0846bfa Add to release notes

799a6a0 Fix linting

00b25fd Hide UserWarning in logs

05b175e Tighter test case

13f2c5a Prevent DOS with large SAMPLESPERPIXEL in Tiff IFD

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump tensorflow from 2.4.1 to 2.9.3
Bumps tensorflow from 2.4.1 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump joblib from 1.0.0 to 1.2.0
Bumps joblib from 1.0.0 to 1.2.0.

Changelog

Sourced from joblib's changelog.

Release 1.2.0

Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

Vendor loky 3.3.0 which fixes several bugs including:

robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

avoiding leaking worker processes in case of nested loky parallel calls;

reliability spawn the correct number of reusable workers.

Release 1.1.0

Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

... (truncated)

Commits

5991350 Release 1.2.0

3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)

cea26ff CI test the future loky-3.3.0 branch (#1338)

8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)

067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)

ac4ebd5 MAINT add back pytest warnings plugin (#1337)

a23427d Test child raises parent exits cleanly more reliable on macos (#1335)

ac09691 [MAINT] various test updates (#1334)

4a314b1 Vendor loky 3.2.0 (#1333)

bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Related tags

Overview

Softlearning

Getting Started

Prerequisites

Conda Installation

Docker Installation

docker-compose

Examples

Training and simulating an agent

Resume training from a saved checkpoint

This feature is currently broken!

References

Comments

Issue Overview

Package versions

Preliminary debugging

Tanh

Alpha

Release Notes

9.3.0

Changes

9.3.0 (2022-10-29)

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Release 1.2.0

Release 1.1.0

Owner

Robotic AI & Learning Lab Berkeley

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage Actor Critic (A2C): jax + flax implementation

Using deep actor-critic model to learn best strategies in pair trading

Asynchronous Advantage Actor-Critic in PyTorch

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"