0% found this document useful (0 votes)
169 views

Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"

The document describes a project to design a snake game using deep Q-learning algorithm. The goal is to train an agent using reinforcement learning to make decisions in the snake game to maximize its score by eating apples. The algorithm uses a neural network to approximate Q-values from states and rewards as the snake interacts with its environment. Through training, the agent learns to navigate the 2D field, avoid hitting itself or walls, and eat apples to increase its length and score over time. The deep Q-learning approach allows the agent to learn effective policies for any random starting configuration or snake length.

Uploaded by

Rabeea Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views

Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"

The document describes a project to design a snake game using deep Q-learning algorithm. The goal is to train an agent using reinforcement learning to make decisions in the snake game to maximize its score by eating apples. The algorithm uses a neural network to approximate Q-values from states and rewards as the snake interacts with its environment. Through training, the agent learns to navigate the 2D field, avoid hitting itself or walls, and eat apples to increase its length and score over time. The deep Q-learning approach allows the agent to learn effective policies for any random starting configuration or snake length.

Uploaded by

Rabeea Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

ARTIFICIAL INTELLIGENCE

Project Report

“Implementation of Snake Game using

Deep Q-Learning Algorithm”

Submitted To:
Sir Fahad
Submitted By:
Mahak Urooj
EE-38-B
Snake Game using Deep Q-Learning

ABSTRACT

The project focuses on designing a snake game using artificial intelligence. In this
project, an agent will be trained using Deep Q-Learning Algorithm that will become
smart enough to make decisions required in the snake game on its own. The first ever
snake game was programmed in 1978 and after that different versions of the game
were developed where they were played by humans. Now we’ll use Artificial Intelligence
and Neural Networks to achieve the same goal.

OBJECTIVES

The main objective of this project is to design a snake game using Deep Q-Learning
Algorithm. In the snake game we aim to control the snake that will be able to move
forward, left and right and will eat an apple to get score. The objective of the game is to
maximize the score by eating as many apples as possible.

GAME RULES

The rules of the game designed are as follows:

 The playing field for the snake is a 2-Dimensional area.


 The snake can move in 4 directions i.e. forward, backward, left and right. It
means that snake cannot move in the direction opposite to its head (towards tail).
 The snake should avoid biting itself, otherwise the game will end.
 The snake should eat as many apples as possible to maximize the score.
 When the snake eats an apple, the length of the snake will grow by one unit.
 The increases length will be indicated from the tail side.
 After the apple is eaten, another apple will be placed randomly in the
environment where each square has the same probability of having the apple.

Q-LEARNING

Q-Learning is a reinforcement algorithm that involves an agent and an environment.


The agent interacts with the environment by performing and Action and then the
environment returns the Reward for its action and the next State. The goal of the agent
is to get more positive cumulative reward which eventually leads it to play the game. It
uses Markov Decision Process to control the snake and make decisions for future.

So we can see that the algorithm consists of 4 main things.

Agent

State,
Action
Reward

Environment

The Q-Learning algorithm creates a matrix where the states and the success probability
of the actions of respective states are listed and every time the agent moves around in
the environment, the matrix gets updated and new values are assigned to each
variable. The action with the biggest value is considered the best action to make. When
the number of states and actions increases, it becomes difficult for the algorithm to give
the best approximated results.

DEEP Q-LEARNING

In our snake game, we cannot use Q-Tables or matrix to manage the states and
rewards because

 The food can be placed randomly anywhere in the environment in the


environment.
 The snake can be in any random configuration or position and it can be of any
length.

Due to the above mentioned points, the sum of the states and action pairs add up to
infinity which makes it impossible to implement the game. In order to mitigate the above
mentioned problem of Q algorithm, Deep Q-Learning is practiced. In this algorithm,
neural networks are used to approximate the values of the variables, State and Reward
to help the agent to maximize the score.

LOSS

The Deep neural network optimizes the answer (action) to a specific input (state) trying
to maximize the reward. The value to express how good the prediction is called Loss.
The job of a neural network is to minimize the loss, reducing then the difference between
the real target and the predicted one. In our case, the loss is expressed as:
When the loss is minimized , the performance of the agent, snake in our case, will
improve and it will take less time to calculate which path is the most suitable for
reaching the apple in order to increase the score.

WORKING

The neural network in Deep Q-Learning algorithm will take the current status of the
agent as input and outputs the action to be taken by the agent. As the agent interacts
with the environment, the parameters of the neural network will get updated to output
those actions which gives more positive reward to the agent. The values are updated
according to the Bellman equation given as follows:

The working of the algorithm step by step can be listed down as follows:

1. When the game starts, the values of the variables are randomly initialized.
2. After that, the current state (S) of the agent is taken by the system as an input.
3. Then the agent executes an action that depends on the neural networks. In the
initial stage of the training of the agent, the agent takes more random actions to
explore the 2 D playing field but after certain training it starts to rely more on the
information provided by the neural networks.
4. After the action is performed, the reward for the action and the new state are
retuned and the approximated values are updated in the Q-Table using the
Bellman Equation that we mentioned above.
5. The date of the current state of the agent, action performed by the agent, reward
and the new state, all are stored. The stored data is then sampled to train the
neural networks simultaneously.
6. Steps 4 and 5 are repeated again and again until a certain condition is fulfilled.

BLOCK DIAGRAM

The block diagram of the Deep Q-Learning Algorithm is as follows:

TRAINING OF AGENT

In order to train the agent, incremental approach is used. First a small environment for
the game is used, so that the snake bumps into the food (apple) more often and
realized that eating apple is the main purpose of its life. Then the environment is made
bigger and bigger so that the snake has to take more calculated steps to reach the
apple and by doing that it learned to play the game.
The snake is able to see in a total of 8 directions. In all of these directions, the snake
makes 2 different kinds of calculations:

 Distance of snake head to food


 Distance of snake head to its own body
 Distance of snake head to the wall

So 3x8 directions = 24

Inputs =24

Outputs = 4, as the snake can move in 4 directions.

MAIN FUNCTIONS

The main functions of the algorithm can be listed as follows:

Function Name Working done by the Function


s=game.state() Gate state s

act = best_action( Q(s) ) execute best action act given state s

rew = game.reward( ) receive reward rew

s’ = game.state() get next state s’

Q(s, act) = update_QTable( ) update Q-Table value q of s

The value of the Q-Table is updated using the Bellman equation when the state and the
reward both are known. The next best action is then selected on the basis of the biggest
action value in a particular state, after the table is updated. If the maximum value is 0,
then the snake takes a random action which can be ‘up’, ‘down’, ‘left’, ‘right’.

REWARDS

The rewards decided in the algorithm are as follows:


Action Reward
Snake catches apple +1
Snake this itself -1
Snake hits the wall -1
Snake explores +0.1
environment without dying
The above mentioned rewards are used for determining the action and the state of the
snake.

PSEUDO CODE

The pseudo code for the Q-learning algorithm is as follows which forms the basis for the
snake game implementation:

Initialize for all pairs (s,a)


s = initial state
k=0
while (convergence is not achieved)
{ simulate action a and research state s’
If (s’ is a terminal state)
{
target = R(s, a, s’)
}
else
{
target = R(s, a, s’) + ϫ maxa’ Qk (s’, a’)
}

Øk+1 = øk – α Δø Es’ P (s’ | s, a) [(Qø (s,a) – target (s’))2] |ø=øk


s = s’
}
CONCLUSION

The traditional snake game developed many years is implemented using artificial
intelligence where we used Deep Q-Learning algorithm to train the neural networks so
that the snake becomes able to take decisions on its own. The snake successfully
masters the skill of maximizing the score by eating apple and avoiding contacting itself
or the wall using this algorithm which makes it one of the most reliable algorithm.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy