SNAKE GAME & REINFORCEMENT LEARNING

What is Q-Learning?

Q-learning is a reinforcement learning method that teaches a learning agent how to perform a task by rewarding good behavior and punishing bad behavior. In Snake, for example, eat the food is good, hit his body is bad. At each point in the game, the agent will choose the action with the highest expected reward.

This process is illustrated in the image above. The agent represented by the snake, at time t, in state S_t, will have to choose an action A_t (up, down, right, left). According to this action and the state of arrival, the couple S_t and A_t will be associated to a reward R_t+1 according to the state S_t+1. The reward is calculated as follows:

- the learning rate, set between 0 and 1. Setting it to 0 means that the Q-values are never updated, hence nothing is learned. Setting a high value such as 0.9 means that learning can occur quickly.
- The discount factor essentially determines how much the reinforcement learning agents cares about rewards in the distant future relative to those in the immediate future. If γ=0, the agent will be completely myopic and only learn about actions that produce an immediate reward.

Rules of the game

The particularity of my game is that the snake can only see a few meters in front of him and on the sides (not more for computational reasons). This means that the snake only knows where the apple is when it is in its field of view. This is not the best configuration since the snake will not learn anything when it will not detect anything but the goal here is not to spoiler the snake with the position of the apple but to see how it will behave by curiosity. The snake will be rewarded when it detects the apple and moves towards it or if it eats the apple but not inadvertently. It will be punished if it touches its tail and dies. The table state below the board game represents the 108 states the snake can be in. It will be represented by a red square when the snake has not made the right decision, black when it has not yet been in that state and blue when it has made a good decision.

You can have fun watching the behavior of the IA by pressing play (better to increase the speed at the beginning). If you want to have fun changing or improving the game configuration, you can access my github on the top left of the page.

BOARD GAME

Best score: