0% found this document useful (0 votes)

169 views

Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"

The document describes a project to design a snake game using deep Q-learning algorithm. The goal is to train an agent using reinforcement learning to make decisions in the snake game to maximize its score by eating apples. The algorithm uses a neural network to approximate Q-values from states and rewards as the snake interacts with its environment. Through training, the agent learns to navigate the 2D field, avoid hitting itself or walls, and eat apples to increase its length and score over time. The deep Q-learning approach allows the agent to learn effective policies for any random starting configuration or snake length.

Uploaded by

Rabeea Ahmad

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

169 views

Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"

Uploaded by

Rabeea Ahmad

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

ARTIFICIAL INTELLIGENCE

Project Report

“Implementation of Snake Game using

Deep Q-Learning Algorithm”

Submitted To:
Sir Fahad
Submitted By:
Mahak Urooj
EE-38-B
Snake Game using Deep Q-Learning

ABSTRACT

The project focuses on designing a snake game using artificial intelligence. In this
project, an agent will be trained using Deep Q-Learning Algorithm that will become
smart enough to make decisions required in the snake game on its own. The first ever
snake game was programmed in 1978 and after that different versions of the game
were developed where they were played by humans. Now we’ll use Artificial Intelligence
and Neural Networks to achieve the same goal.

OBJECTIVES

The main objective of this project is to design a snake game using Deep Q-Learning
Algorithm. In the snake game we aim to control the snake that will be able to move
forward, left and right and will eat an apple to get score. The objective of the game is to
maximize the score by eating as many apples as possible.

GAME RULES

The rules of the game designed are as follows:

 The playing field for the snake is a 2-Dimensional area.

 The snake can move in 4 directions i.e. forward, backward, left and right. It
means that snake cannot move in the direction opposite to its head (towards tail).
 The snake should avoid biting itself, otherwise the game will end.
 The snake should eat as many apples as possible to maximize the score.
 When the snake eats an apple, the length of the snake will grow by one unit.
 The increases length will be indicated from the tail side.
 After the apple is eaten, another apple will be placed randomly in the
environment where each square has the same probability of having the apple.

Q-LEARNING

Q-Learning is a reinforcement algorithm that involves an agent and an environment.

The agent interacts with the environment by performing and Action and then the
environment returns the Reward for its action and the next State. The goal of the agent
is to get more positive cumulative reward which eventually leads it to play the game. It
uses Markov Decision Process to control the snake and make decisions for future.

So we can see that the algorithm consists of 4 main things.

Agent

State,
Action
Reward

Environment

The Q-Learning algorithm creates a matrix where the states and the success probability
of the actions of respective states are listed and every time the agent moves around in
the environment, the matrix gets updated and new values are assigned to each
variable. The action with the biggest value is considered the best action to make. When
the number of states and actions increases, it becomes difficult for the algorithm to give
the best approximated results.

DEEP Q-LEARNING

In our snake game, we cannot use Q-Tables or matrix to manage the states and
rewards because

 The food can be placed randomly anywhere in the environment in the

environment.
 The snake can be in any random configuration or position and it can be of any
length.

Due to the above mentioned points, the sum of the states and action pairs add up to
infinity which makes it impossible to implement the game. In order to mitigate the above
mentioned problem of Q algorithm, Deep Q-Learning is practiced. In this algorithm,
neural networks are used to approximate the values of the variables, State and Reward
to help the agent to maximize the score.

LOSS

The Deep neural network optimizes the answer (action) to a specific input (state) trying
to maximize the reward. The value to express how good the prediction is called Loss.
The job of a neural network is to minimize the loss, reducing then the difference between
the real target and the predicted one. In our case, the loss is expressed as:
When the loss is minimized , the performance of the agent, snake in our case, will
improve and it will take less time to calculate which path is the most suitable for
reaching the apple in order to increase the score.

WORKING

The neural network in Deep Q-Learning algorithm will take the current status of the
agent as input and outputs the action to be taken by the agent. As the agent interacts
with the environment, the parameters of the neural network will get updated to output
those actions which gives more positive reward to the agent. The values are updated
according to the Bellman equation given as follows:

The working of the algorithm step by step can be listed down as follows:

1. When the game starts, the values of the variables are randomly initialized.
2. After that, the current state (S) of the agent is taken by the system as an input.
3. Then the agent executes an action that depends on the neural networks. In the
initial stage of the training of the agent, the agent takes more random actions to
explore the 2 D playing field but after certain training it starts to rely more on the
information provided by the neural networks.
4. After the action is performed, the reward for the action and the new state are
retuned and the approximated values are updated in the Q-Table using the
Bellman Equation that we mentioned above.
5. The date of the current state of the agent, action performed by the agent, reward
and the new state, all are stored. The stored data is then sampled to train the
neural networks simultaneously.
6. Steps 4 and 5 are repeated again and again until a certain condition is fulfilled.

BLOCK DIAGRAM

The block diagram of the Deep Q-Learning Algorithm is as follows:

TRAINING OF AGENT

In order to train the agent, incremental approach is used. First a small environment for
the game is used, so that the snake bumps into the food (apple) more often and
realized that eating apple is the main purpose of its life. Then the environment is made
bigger and bigger so that the snake has to take more calculated steps to reach the
apple and by doing that it learned to play the game.
The snake is able to see in a total of 8 directions. In all of these directions, the snake
makes 2 different kinds of calculations:

 Distance of snake head to food

 Distance of snake head to its own body
 Distance of snake head to the wall

So 3x8 directions = 24

Inputs =24

Outputs = 4, as the snake can move in 4 directions.

MAIN FUNCTIONS

The main functions of the algorithm can be listed as follows:

Function Name Working done by the Function

s=game.state() Gate state s

act = best_action( Q(s) ) execute best action act given state s

rew = game.reward( ) receive reward rew

s’ = game.state() get next state s’

Q(s, act) = update_QTable( ) update Q-Table value q of s

The value of the Q-Table is updated using the Bellman equation when the state and the
reward both are known. The next best action is then selected on the basis of the biggest
action value in a particular state, after the table is updated. If the maximum value is 0,
then the snake takes a random action which can be ‘up’, ‘down’, ‘left’, ‘right’.

REWARDS

The rewards decided in the algorithm are as follows:

Action Reward
Snake catches apple +1
Snake this itself -1
Snake hits the wall -1
Snake explores +0.1
environment without dying
The above mentioned rewards are used for determining the action and the state of the
snake.

PSEUDO CODE

The pseudo code for the Q-learning algorithm is as follows which forms the basis for the
snake game implementation:

Initialize for all pairs (s,a)

s = initial state
k=0
while (convergence is not achieved)
{ simulate action a and research state s’
If (s’ is a terminal state)
{
target = R(s, a, s’)
}
else
{
target = R(s, a, s’) + ϫ maxa’ Qk (s’, a’)
}

Øk+1 = øk – α Δø Es’ P (s’ | s, a) [(Qø (s,a) – target (s’))2] |ø=øk

s = s’
}
CONCLUSION

The traditional snake game developed many years is implemented using artificial
intelligence where we used Deep Q-Learning algorithm to train the neural networks so
that the snake becomes able to take decisions on its own. The snake successfully
masters the skill of maximizing the score by eating apple and avoiding contacting itself
or the wall using this algorithm which makes it one of the most reliable algorithm.

Flesh Eater Courts
75% (4)
Flesh Eater Courts
69 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Chapter 3
No ratings yet
Chapter 3
46 pages
Freemium Mobile Games Design and Monetization Notes
No ratings yet
Freemium Mobile Games Design and Monetization Notes
2 pages
Rekindle The Spark: Ten Steps To Revitalise Your Relationship
No ratings yet
Rekindle The Spark: Ten Steps To Revitalise Your Relationship
14 pages
Jamaica's National Heroes
No ratings yet
Jamaica's National Heroes
6 pages
RL Project - Deep Q-Network Agent Report
No ratings yet
RL Project - Deep Q-Network Agent Report
11 pages
Deep Reinforcement Learning in Games
No ratings yet
Deep Reinforcement Learning in Games
9 pages
Exploring Game Playing AI Using Reinforcement Learning Techniques
No ratings yet
Exploring Game Playing AI Using Reinforcement Learning Techniques
5 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
RL Project - Deep Q-Network Agent Presentation
No ratings yet
RL Project - Deep Q-Network Agent Presentation
15 pages
CS461 Final Report Team7
No ratings yet
CS461 Final Report Team7
9 pages
Exploration of Reinforcement Learning To SNAKE: Bowei Ma, Meng Tang, Jun Zhang
No ratings yet
Exploration of Reinforcement Learning To SNAKE: Bowei Ma, Meng Tang, Jun Zhang
5 pages
AI A Z HandBook
No ratings yet
AI A Z HandBook
12 pages
Q Learning
No ratings yet
Q Learning
38 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
112 Q Learning N
100% (1)
112 Q Learning N
15 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
unit-5
No ratings yet
unit-5
65 pages
MDP
No ratings yet
MDP
10 pages
Autonomous Driving System Based On Deep Q Learnig: Takafumi Okuyama, Tad Gonsalves Jaychand Upadhay
No ratings yet
Autonomous Driving System Based On Deep Q Learnig: Takafumi Okuyama, Tad Gonsalves Jaychand Upadhay
5 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
A Painless Q-Learning Tutorial
No ratings yet
A Painless Q-Learning Tutorial
6 pages
14.4 RL
No ratings yet
14.4 RL
17 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Untitled document
No ratings yet
Untitled document
11 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
RL UNIT V QA (1)
No ratings yet
RL UNIT V QA (1)
13 pages
15
No ratings yet
15
17 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
unit5 mlt
No ratings yet
unit5 mlt
26 pages
Deep Reinforcement Learning - Guide To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning - Guide To Deep Q-Learning
1 page
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning_ Playing Tic-Tac-Toe (Pre-Print)
No ratings yet
Reinforcement Learning_ Playing Tic-Tac-Toe (Pre-Print)
11 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Sandholm 1996
No ratings yet
Sandholm 1996
20 pages
Snake Game Playing A.I. Using Q Learning: 1.0 About
No ratings yet
Snake Game Playing A.I. Using Q Learning: 1.0 About
3 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
18-deeprl
No ratings yet
18-deeprl
19 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
37 RL
No ratings yet
37 RL
18 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Snake Game - Reinforcement Q-Learning
No ratings yet
Snake Game - Reinforcement Q-Learning
9 pages
ML unit 5
No ratings yet
ML unit 5
17 pages
M4
No ratings yet
M4
11 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
AI_20report_205
No ratings yet
AI_20report_205
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Unit 4
No ratings yet
Unit 4
12 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
1611.01606v1
No ratings yet
1611.01606v1
13 pages
Lec 09
No ratings yet
Lec 09
26 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Sections
No ratings yet
Sections
76 pages
Smooth Q-Learning - Accelerate Convergence
No ratings yet
Smooth Q-Learning - Accelerate Convergence
7 pages
Genetic Algorithms
From Everand
Genetic Algorithms
Isuru Abeysinghe
4.5/5 (3)
Assignmnt No: 05 Adult Ch6 Training and Adult Ch6 Test 6.1 Abstract
No ratings yet
Assignmnt No: 05 Adult Ch6 Training and Adult Ch6 Test 6.1 Abstract
10 pages
Topic 1:: Induction Motor Tests
No ratings yet
Topic 1:: Induction Motor Tests
8 pages
Sweep Generation:: BW Δcount Fs count count Δcount Fs count
No ratings yet
Sweep Generation:: BW Δcount Fs count count Δcount Fs count
4 pages
With Nco Dither Bits
No ratings yet
With Nco Dither Bits
8 pages
The Design and Implementation of Automotive Radar System Based On MFSK Waveform
No ratings yet
The Design and Implementation of Automotive Radar System Based On MFSK Waveform
5 pages
Time Shared Architecture Time Shared Architecture: CH 9: DR Shoab CH 8,10: Stephen Brown CH 8,10: Stephen Brown
No ratings yet
Time Shared Architecture Time Shared Architecture: CH 9: DR Shoab CH 8,10: Stephen Brown CH 8,10: Stephen Brown
65 pages
Fund Raising Report
No ratings yet
Fund Raising Report
2 pages
Curriculum Vitae: Personal Information
No ratings yet
Curriculum Vitae: Personal Information
4 pages
Lab 4
No ratings yet
Lab 4
2 pages
All All: %checking
No ratings yet
All All: %checking
1 page
Linear Control Systems Lab 5: Aiman Tariq Amna Asif Mahak Urooj Minhah Saleem Rabeea Fatma Khan
No ratings yet
Linear Control Systems Lab 5: Aiman Tariq Amna Asif Mahak Urooj Minhah Saleem Rabeea Fatma Khan
3 pages
LAB 11 Controller Design in Matlab
No ratings yet
LAB 11 Controller Design in Matlab
4 pages
National University of Sciences and Technology College of E & ME
No ratings yet
National University of Sciences and Technology College of E & ME
6 pages
Rabeea Ahmad Sajida Batool 39-EE-A: Techncal Business Writing
No ratings yet
Rabeea Ahmad Sajida Batool 39-EE-A: Techncal Business Writing
17 pages
Lecture 2 Reading Materia
No ratings yet
Lecture 2 Reading Materia
12 pages
Lab Manual
No ratings yet
Lab Manual
38 pages
Lyan Hreiz - Hse4m Bystander Effect Case Studies
No ratings yet
Lyan Hreiz - Hse4m Bystander Effect Case Studies
2 pages
Pre Post Test in Ucsp
No ratings yet
Pre Post Test in Ucsp
8 pages
15 Chapter 5
No ratings yet
15 Chapter 5
21 pages
FEM Question Bank
No ratings yet
FEM Question Bank
13 pages
Sharon Hampton 100 Word Vocabulary List Geometry
50% (2)
Sharon Hampton 100 Word Vocabulary List Geometry
3 pages
3 PB PDF
No ratings yet
3 PB PDF
24 pages
How To Create A Passive Income Using Facebook and Ebay: This Method Works
No ratings yet
How To Create A Passive Income Using Facebook and Ebay: This Method Works
22 pages
Atanatiya Sutta
100% (2)
Atanatiya Sutta
9 pages
Unit - 1: EE8591 - Digital Signal Processing
No ratings yet
Unit - 1: EE8591 - Digital Signal Processing
42 pages
Vaccinations Argumentative Essay - Lyla Almonina
0% (1)
Vaccinations Argumentative Essay - Lyla Almonina
3 pages
Growth
No ratings yet
Growth
4 pages
Study Unit 10 - 2023 - Part 2
No ratings yet
Study Unit 10 - 2023 - Part 2
12 pages
Live Horoscope Discussion of - Arvind J Jain
100% (1)
Live Horoscope Discussion of - Arvind J Jain
105 pages
Unit 7 Phylogeny Textbook Slides.pptx (1)
No ratings yet
Unit 7 Phylogeny Textbook Slides.pptx (1)
36 pages
Employee Burnout Thesis
100% (3)
Employee Burnout Thesis
7 pages
Chapter Four Basic Concepts of Critical Thinking Meaning of Critical Thinking
No ratings yet
Chapter Four Basic Concepts of Critical Thinking Meaning of Critical Thinking
26 pages
Bone Fracture Grp3
No ratings yet
Bone Fracture Grp3
12 pages
An Interlanguage Is
No ratings yet
An Interlanguage Is
4 pages
Second Quarter Test in DEVELOPMENTAL READING 7 Name: - Year & Section: - Score: - Test I: Matching Type Column A Column B
No ratings yet
Second Quarter Test in DEVELOPMENTAL READING 7 Name: - Year & Section: - Score: - Test I: Matching Type Column A Column B
6 pages
Effet Du Profil Du Proprietaire Dirigeant Sur La Creation de La Valeur Dans Les Pme Beninoises
No ratings yet
Effet Du Profil Du Proprietaire Dirigeant Sur La Creation de La Valeur Dans Les Pme Beninoises
13 pages
What Do Marketing Executives Do
No ratings yet
What Do Marketing Executives Do
32 pages
Mixture and Alligation PDF
No ratings yet
Mixture and Alligation PDF
6 pages
BC Cloud Discovery Aws v1.0
No ratings yet
BC Cloud Discovery Aws v1.0
7 pages
001-Product+PPT Hemohim 20150126 V1 ENG
No ratings yet
001-Product+PPT Hemohim 20150126 V1 ENG
7 pages
Briske Et Al 2003 (JAE) - Vegetation Dynamics On Rangelands A Critique of
No ratings yet
Briske Et Al 2003 (JAE) - Vegetation Dynamics On Rangelands A Critique of
14 pages
Mudra
No ratings yet
Mudra
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"

Uploaded by

Artificial Intelligence: Project Report "Implementation of Snake Game Using Deep Q-Learning Algorithm"

Uploaded by

ARTIFICIAL INTELLIGENCE

“Implementation of Snake Game using

Deep Q-Learning Algorithm”

The rules of the game designed are as follows:

 The playing field for the snake is a 2-Dimensional area.

Q-Learning is a reinforcement algorithm that involves an agent and an environment.

So we can see that the algorithm consists of 4 main things.

 The food can be placed randomly anywhere in the environment in the

The block diagram of the Deep Q-Learning Algorithm is as follows:

 Distance of snake head to food

Outputs = 4, as the snake can move in 4 directions.

The main functions of the algorithm can be listed as follows:

Function Name Working done by the Function

act = best_action( Q(s) ) execute best action act given state s

rew = game.reward( ) receive reward rew

s’ = game.state() get next state s’

Q(s, act) = update_QTable( ) update Q-Table value q of s

The rewards decided in the algorithm are as follows:

Initialize for all pairs (s,a)

Øk+1 = øk – α Δø Es’ P (s’ | s, a) [(Qø (s,a) – target (s’))2] |ø=øk

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.