NovelD: A Simple yet Effective Exploration Criterion

Last update: Dec 5, 2022

Related tags

Deep Learning NovelD

Overview

NovelD: A Simple yet Effective Exploration Criterion

Intro

This is an implementation of the method proposed in

NovelD: A Simple yet Effective Exploration Criterion and BeBold: Exploration Beyond the Boundary of Explored Regions

Citation

If you use this code in your own work, please cite our paper:

@article{zhang2021noveld,
  title={NovelD: A Simple yet Effective Exploration Criterion},
  author={Zhang, Tianjun and Xu, Huazhe and Wang, Xiaolong and Wu, Yi and Keutzer, Kurt and Gonzalez, Joseph E and Tian, Yuandong},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

@article{zhang2020bebold,
  title={BeBold: Exploration Beyond the Boundary of Explored Regions},
  author={Zhang, Tianjun and Xu, Huazhe and Wang, Xiaolong and Wu, Yi and Keutzer, Kurt and Gonzalez, Joseph E and Tian, Yuandong},
  journal={arXiv preprint arXiv:2012.08621},
  year={2020}
}

Installation

# Install Instructions
conda create -n ride python=3.7
conda activate noveld 
git clone git@github.com:tianjunz/NovelD.git
cd NovelD
pip install -r requirements.txt

Train NovelD on MiniGrid

OMP_NUM_THREADS=1 python main.py --model bebold --env MiniGrid-ObstructedMaze-2Dlhb-v0 --total_frames 500000000 --intrinsic_reward_coef 0.05 --entropy_cost 0.0005

Acknowledgements

Our vanilla RL algorithm is based on RIDE.

License

This code is under the CC-BY-NC 4.0 (Attribution-NonCommercial 4.0 International) license.

An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

373 Dec 31, 2022

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

State Entropy Maximization with Random Encoders for Efficient Exploration (RE3) (ICML 2021) Code for State Entropy Maximization with Random Encoders f

47 Nov 29, 2022

Generative Exploration and Exploitation - This is an improved version of GENE.

GENE This is an improved version of GENE. In the original version, the states are generated from the decoder of VAE. We have to check whether the gere

33 Mar 23, 2022

Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

64 Dec 16, 2022

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

62 Dec 20, 2022

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

8 Sep 14, 2022

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Comments

MiniGrid results appear to be using the fully observable space as opposed to the partially observable one

First off, thanks for releasing the code. I ran into this paper not too long ago and found it pretty interesting.

I was under the impression that the MiniGrid experiments in your NovelD paper you use the partial observation of the agent, but it appears that you are using the fully-observable one, per these lines:

https://github.com/tianjunz/NovelD/blob/master/src/utils.py#L112-L114

Can you clarify if this is true? A lot of users of MiniGrid assume that you use the partially observable view from the agent, so this seems like an important detail that needs to be mentioned...

opened by vlawhern 0
Cannot reproduce result on ObstructedMaze-2Dlhb

Hi @tianjunz

I ran your codes with the same command you suggested as OMP_NUM_THREADS=1 python main.py --model bebold --env MiniGrid-ObstructedMaze-2Dlhb-v0 --total_frames 500000000 --intrinsic_reward_coef 0.05 --entropy_cost 0.0005

However, the obtained "mean episode return" is still 0 even after 60M frames, which is different from that in Fig.4 in the paper (NovelD). my log: logs.csv Could you check it or share your result (log)?

FYI, MultiRoom and KeyCorridor tasks seem to be reproduced. I used the following versions: pytorch(1.10.0), gym(0.15.4), gym_minigrid(1.0.2).

Thanks, Sungwoong.

opened by sungwoong 2

NovelD: A Simple yet Effective Exploration Criterion

Related tags

Overview

NovelD: A Simple yet Effective Exploration Criterion

Intro

Citation

Installation

Train NovelD on MiniGrid

Acknowledgements

License

You might also like...

An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

Generative Exploration and Exploitation - This is an improved version of GENE.

Systemic Evolutionary Chemical Space Exploration for Drug Discovery

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

A clear, concise, simple yet powerful and efficient API for deep learning.

SeqTR: A Simple yet Universal Network for Visual Grounding

Comments

MiniGrid results appear to be using the fully observable space as opposed to the partially observable one

Cannot reproduce result on ObstructedMaze-2Dlhb

Owner

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

Exploration-Exploitation Dilemma Solving Methods

[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models