Exploration-Exploitation Dilemma Solving Methods

Aman Mishra

Last update: Jan 25, 2022

Related tags

Deep Learning Thompson-Greedy-Comparison-for-MultiArmed-Bandits

Overview

Exploration-Exploitation Dilemma Solving Methods

Medium article for this repo - HERE

In ths repo I implemented two techniques for tackling mentioned tradeoff. Methods Include:-

Epsilon Greedy (With different epsilons)
Thompson Sampling(also known as posterior sampling)

The reason for choosing these two only is to show the upper and lower bounds as epsilons are a starting point in dealing with these tradeoffs and Thompson Sampling is considered a recent state of the Art in this field.

ENV SPECIFICATIONS - A 10 arm testbed is simulated as same demonstrated in Sutton-Barto Book.
True Reward distribution (Here Action-2 is best)

Comparison Greedy(or Epsilon Greedies and TS

we used three different epsilons here for testing i.e:

epsilon = 0 => Greedy Agent
epsilon = 0.01 => exploration with 1% probability
epsilon = 0.1 => exploration with 10% probability

and TS

Averaged Over 2500 independent runs with 1500 timesteps

Comparison

Percentage Actions selected for epsilon = 0.01 and TS

Conclusion -> epsilon = 0.01 can be considered best for eps-greedies as it is increasing but pretty slow and the percentage Optimal Actions for it is Around 80% in later stages, on the other hand Thomsan Sampling shows a significant improvement in these results as it quickly explores and then exploit the optimal one with percentage goes upto almost 100 even very early!!.

In case you want to know more about TS visit this Reference.

An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

373 Dec 31, 2022

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

State Entropy Maximization with Random Encoders for Efficient Exploration (RE3) (ICML 2021) Code for State Entropy Maximization with Random Encoders f

47 Nov 29, 2022

Systemic Evolutionary Chemical Space Exploration for Drug Discovery

SECSE SECSE: Systemic Evolutionary Chemical Space Explorer Chemical space exploration is a major task of the hit-finding process during the pursuit of

64 Dec 16, 2022

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

62 Dec 20, 2022

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

8 Sep 14, 2022

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Exploration-Exploitation Dilemma Solving Methods

Related tags

Overview

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

You might also like...

An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

Systemic Evolutionary Chemical Space Exploration for Drug Discovery

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning.

Solving reinforcement learning tasks which require language and vision

A module for solving and visualizing Schrödinger equation.

Owner

Aman Mishra

Breaking the Dilemma of Medical Image-to-image Translation

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Detector for Log4Shell exploitation attempts

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models