Markov Decision Processes (MDPs) for Reinforcement Learning (RL)

3 min readAug 18, 2022

The Markov Decision Process (MDP) is one of the most important models in all of reinforcement learning. It allows researchers to model environments with a finite set of states, a finite set of controls, a reward function (tells an agent what are good controls from bad controls in any given state), a transition matrix that stores all of the possible transition probabilities from one state to another state, and a discount factor “gamma” that tells the agent how much to consider the influence of future rewards in the present control decision.

Finite Discounted MDP Model

The MDP model presented above provides a mathematically clear picture of all of the parts required to model an RL problem. The task of the agent is to maximize its expected long-term reward over time by learning an optimal policy corresponding to the definition of the MDP model. The expected return can consist of a finite sum if the horizon is finite, but in general the expected return is an infinite summation of rewards.

Expected Discounted Return

Markov Decision Processes (MDPs) for Reinforcement Learning (RL)

Finite Discounted MDP Model

Expected Discounted Return

Create an account to read the full story.

Written by Caleb M. Bowyer, Ph.D. Candidate

No responses yet

More from Caleb M. Bowyer, Ph.D. Candidate

Strategies for Decaying Epsilon in Epsilon-Greedy

The exploration-exploitation dilemma is fundamental to Reinforcement Learning (RL) problems. Early on in training an agent has not learned…

The ChatGPT list of lists: A collection of 3000+ prompts, GPTs, use-cases, tools, APIs, extensions…

Updated Sep-01, 2024. Added New Introductions, Prompts, Lists and Tools

The 10 Best Free Prompt Engineering Courses & Resources for ChatGPT, Midjourney & Co.

Updated Jan-21, 2024: Added a bonus resource.

4 Special Graphs in Graph Theory with Examples

There are four special graphs in graph theory that are fundamental or special. These four graphs are named complete or “fully connected”…

Recommended from Medium

Reinforcement Learning 101: Q-Learning

Decoding the Math behind Q-Learning, Action-Value Functions, Bellman Equations, and building them from scratch in Python.

Policy Gradient

Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the…

Lists

Natural Language Processing

Predictive Modeling w/ Python

AI Regulation

Generative AI Recommended Reading

Decision Transformer: Reinforcement Learning via Sequence Modeling

This article is summary and review of the paper, “Decision Transformer: Reinforcement Learning via Sequence Modeling.”

Solving the Gridworld Problem Using Reinforcement Learning in Python

Reinforcement Learning (RL) is an exciting and powerful paradigm that allows agents to learn optimal behaviors through trial and error. In…

Reinforcement Learning in Games: A Complete Guide

Not a Medium member? Read the full story by clicking here.

Reinforcement Learning: Tabular Solution Methods

Sample-Based Learning Methods