Reinforcement Learning is a type of machine learning that allows machines to learn how to take actions in an environment so as to maximize a reward.
Terminology
- Agent - ****Agent (A) takes actions that affect the environment. Citing an example, the machine learning to play chess is the agent.
- Action - ****It is the set of all possible operations/moves the agent can make. The agent makes a decision on which action to take from a set of discrete actions (a).
- Environment - ****All actions that the reinforcement learning agent makes directly affect the environment. Here, the board of chess is the environment. The environment takes the agent's present state and action as information and returns the reward to the agent with a new state.
For example, the move made by the bot will either have a negative/positive effect on the whole game and the arrangement of the board. This will decide the next action and state of the board.
- State - ****A state (S) is a particular situation in which the agent finds itself.
- Reward (R) - ****The environment gives feedback by which we determine the validity of the agent’s actions in each state. It is crucial in the scenario of Reinforcement Learning where we want the machine to learn all by itself and the only critic that would help it in learning is the feedback/reward it receives.
- Discount factor - ****Over time, the discount factor modifies the importance of incentives. Given the uncertainty of the future it’s better to add variance to the value estimates. Discount factor helps in reducing the degree to which future rewards affect our value function estimates.
- Policy (π) ****- ****It decides what action to take in a certain state to maximize the reward.
- Value (V)—It measures the optimality of a specific state. It is the expected discounted rewards that the agent collects following the specific policy.
- Q-value or action-value - Q Value is a measure of the overall expected reward if the agent (A) is in state (s) and takes action (a), and then plays until the end of the episode according to some policy (π).