An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

You might also like...
python-social-auth and oauth2 support for django-rest-framework

Django REST Framework Social OAuth2 This module provides OAuth2 social authentication support for applications in Django REST Framework. The aim of th

Imia is an authentication library for Starlette and FastAPI (python 3.8+).

Imia Imia (belarussian for "a name") is an authentication library for Starlette and FastAPI (python 3.8+). Production status The library is considered

python-social-auth and oauth2 support for django-rest-framework

Django REST Framework Social OAuth2 This module provides OAuth2 social authentication support for applications in Django REST Framework. The aim of th

A Python tool to generate and refresh Amazon access tokens.

amazon_auth A Python tool to generate and refresh Amazon access tokens. Description This tool generates and outputs Amazon access and refresh tokens f

A Python library to create and validate authentication tokens

handshake A Python library to create and validate authentication tokens. handshake is used to generate and validate arbitrary authentication tokens th

This app makes it extremely easy to build Django powered SPA's (Single Page App) or Mobile apps exposing all registration and authentication related functionality as CBV's (Class Base View) and REST (JSON)

Welcome to django-rest-auth Repository is unmaintained at the moment (on pause). More info can be found on this issue page: https://github.com/Tivix/d

A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

A host-guest based app in which host can CREATE the room. and guest can join room with room code and vote for song to skip. User is authenticated using Spotify API

Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.
Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

A full Rest-API With Oauth2 and JWT for request & response a JSON file Using FastAPI and SQLAlchemy 🔑
A full Rest-API With Oauth2 and JWT for request & response a JSON file Using FastAPI and SQLAlchemy 🔑

Pexon-Rest-API A full Rest-API for request & response a JSON file, Building a Simple WorkFlow that help you to Request a JSON File Format and Handling

Owner
Yu Shen
UCLA '24
Yu Shen
Simple two factor authemtication system, made by me.

Simple two factor authemtication system, made by me. Honestly, i don't even know How 2FAs work I just used my knowledge and did whatever i could. Send

Refined 5 Jan 4, 2022
Complete Two-Factor Authentication for Django providing the easiest integration into most Django projects.

Django Two-Factor Authentication Complete Two-Factor Authentication for Django. Built on top of the one-time password framework django-otp and Django'

Bouke Haarsma 1.3k Jan 4, 2023
Django Admin Two-Factor Authentication, allows you to login django admin with google authenticator.

Django Admin Two-Factor Authentication Django Admin Two-Factor Authentication, allows you to login django admin with google authenticator. Why Django

Iman Karimi 9 Dec 7, 2022
Toolkit for Pyramid, a Pylons Project, to add Authentication and Authorization using Velruse (OAuth) and/or a local database, CSRF, ReCaptcha, Sessions, Flash messages and I18N

Apex Authentication, Form Library, I18N/L10N, Flash Message Template (not associated with Pyramid, a Pylons project) Uses alchemy Authentication Authe

null 95 Nov 28, 2022
Login-python - Login system made in Python, using native libraries

login-python Sistema de login feito 100% em Python, utilizando bibliotecas nativ

Nicholas Gabriel De Matos Leal 2 Jan 28, 2022
The ultimate Python library in building OAuth, OpenID Connect clients and servers. JWS,JWE,JWK,JWA,JWT included.

Authlib The ultimate Python library in building OAuth and OpenID Connect servers. JWS, JWK, JWA, JWT are included. Authlib is compatible with Python2.

Hsiaoming Yang 3.4k Jan 4, 2023
Python module for generating and verifying JSON Web Tokens

python-jwt Module for generating and verifying JSON Web Tokens. Note: From version 2.0.1 the namespace has changed from jwt to python_jwt, in order to

David Halls 210 Dec 24, 2022
Django CAS 1.0/2.0/3.0 client authentication library, support Django 2.0, 2.1, 2.2, 3.0 and Python 3.5+

django-cas-ng django-cas-ng is Django CAS (Central Authentication Service) 1.0/2.0/3.0 client library to support SSO (Single Sign On) and Single Logou

django-cas-ng 347 Dec 18, 2022
A Python library for OAuth 1.0/a, 2.0, and Ofly.

Rauth A simple Python OAuth 1.0/a, OAuth 2.0, and Ofly consumer library built on top of Requests. Features Supports OAuth 1.0/a, 2.0 and Ofly Service

litl 1.6k Dec 8, 2022
The ultimate Python library in building OAuth, OpenID Connect clients and servers. JWS,JWE,JWK,JWA,JWT included.

Authlib The ultimate Python library in building OAuth and OpenID Connect servers. JWS, JWK, JWA, JWT are included. Authlib is compatible with Python2.

Hsiaoming Yang 2.3k Feb 17, 2021