Reinforcement Learning: Fundamentals

Rahul Kumar
6 min readOct 6, 2024
An agent interacting with its environment.

Table of Contents:
1. Overview
2. Multi-armed Bandits
3. Markov Decision Process
4. Returns and episodes
5. Value Functions
6. Bellman Equation
7. Policy Iteration

Overview

Reinforcement learning is learning what to do — how to map situations to actions — so as to maximize a numerical reward signal. The main sub-elements of a reinforcement learning system: are policy, reward signal, value function, agent, and environment. A policy is a mapping from perceived states of the environment to actions to be taken when in those states. A reward signal defines the goal of a reinforcement learning problem. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. We seek actions that bring about states of highest value, not highest reward. A state might always yield a low immediate reward but still have a high value because it is regularly followed by other states that yield high rewards. The learner and decision maker is called the agent and every thing else is environment.The agent exists in an environment described by some set of possible states S (shown in the above figure). It can perform any of a set of…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Rahul Kumar

Full Stack Data Scientist