Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"
Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"
Project Report
Submitted To:
Sir Fahad
Submitted By:
Mahak Urooj
EE-38-B
Snake Game using Deep Q-Learning
ABSTRACT
The project focuses on designing a snake game using artificial intelligence. In this
project, an agent will be trained using Deep Q-Learning Algorithm that will become
smart enough to make decisions required in the snake game on its own. The first ever
snake game was programmed in 1978 and after that different versions of the game
were developed where they were played by humans. Now we’ll use Artificial Intelligence
and Neural Networks to achieve the same goal.
OBJECTIVES
The main objective of this project is to design a snake game using Deep Q-Learning
Algorithm. In the snake game we aim to control the snake that will be able to move
forward, left and right and will eat an apple to get score. The objective of the game is to
maximize the score by eating as many apples as possible.
GAME RULES
Q-LEARNING
Agent
State,
Action
Reward
Environment
The Q-Learning algorithm creates a matrix where the states and the success probability
of the actions of respective states are listed and every time the agent moves around in
the environment, the matrix gets updated and new values are assigned to each
variable. The action with the biggest value is considered the best action to make. When
the number of states and actions increases, it becomes difficult for the algorithm to give
the best approximated results.
DEEP Q-LEARNING
In our snake game, we cannot use Q-Tables or matrix to manage the states and
rewards because
Due to the above mentioned points, the sum of the states and action pairs add up to
infinity which makes it impossible to implement the game. In order to mitigate the above
mentioned problem of Q algorithm, Deep Q-Learning is practiced. In this algorithm,
neural networks are used to approximate the values of the variables, State and Reward
to help the agent to maximize the score.
LOSS
The Deep neural network optimizes the answer (action) to a specific input (state) trying
to maximize the reward. The value to express how good the prediction is called Loss.
The job of a neural network is to minimize the loss, reducing then the difference between
the real target and the predicted one. In our case, the loss is expressed as:
When the loss is minimized , the performance of the agent, snake in our case, will
improve and it will take less time to calculate which path is the most suitable for
reaching the apple in order to increase the score.
WORKING
The neural network in Deep Q-Learning algorithm will take the current status of the
agent as input and outputs the action to be taken by the agent. As the agent interacts
with the environment, the parameters of the neural network will get updated to output
those actions which gives more positive reward to the agent. The values are updated
according to the Bellman equation given as follows:
The working of the algorithm step by step can be listed down as follows:
1. When the game starts, the values of the variables are randomly initialized.
2. After that, the current state (S) of the agent is taken by the system as an input.
3. Then the agent executes an action that depends on the neural networks. In the
initial stage of the training of the agent, the agent takes more random actions to
explore the 2 D playing field but after certain training it starts to rely more on the
information provided by the neural networks.
4. After the action is performed, the reward for the action and the new state are
retuned and the approximated values are updated in the Q-Table using the
Bellman Equation that we mentioned above.
5. The date of the current state of the agent, action performed by the agent, reward
and the new state, all are stored. The stored data is then sampled to train the
neural networks simultaneously.
6. Steps 4 and 5 are repeated again and again until a certain condition is fulfilled.
BLOCK DIAGRAM
TRAINING OF AGENT
In order to train the agent, incremental approach is used. First a small environment for
the game is used, so that the snake bumps into the food (apple) more often and
realized that eating apple is the main purpose of its life. Then the environment is made
bigger and bigger so that the snake has to take more calculated steps to reach the
apple and by doing that it learned to play the game.
The snake is able to see in a total of 8 directions. In all of these directions, the snake
makes 2 different kinds of calculations:
So 3x8 directions = 24
Inputs =24
MAIN FUNCTIONS
The value of the Q-Table is updated using the Bellman equation when the state and the
reward both are known. The next best action is then selected on the basis of the biggest
action value in a particular state, after the table is updated. If the maximum value is 0,
then the snake takes a random action which can be ‘up’, ‘down’, ‘left’, ‘right’.
REWARDS
PSEUDO CODE
The pseudo code for the Q-learning algorithm is as follows which forms the basis for the
snake game implementation:
The traditional snake game developed many years is implemented using artificial
intelligence where we used Deep Q-Learning algorithm to train the neural networks so
that the snake becomes able to take decisions on its own. The snake successfully
masters the skill of maximizing the score by eating apple and avoiding contacting itself
or the wall using this algorithm which makes it one of the most reliable algorithm.