0% found this document useful (0 votes)

9 views

B38DF_LS2b_performance

The document discusses computer architecture focusing on parallelism to improve performance, detailing instruction-level and processor-level parallelism. It explains concepts like pipelining, Amdahl's Law, and performance analysis with examples illustrating the impact of system upgrades on execution speed. The document emphasizes the limitations of parallel computing and the importance of identifying which parts of a task can benefit from improvements.

Uploaded by

chaneuuk

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

B38DF_LS2b_performance

Uploaded by

chaneuuk

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

B38DF

Computer Architecture and Embedded Systems

Alexander Belyaev

Heriot-Watt University
School of Engineering & Physical Sciences
Electrical, Electronic and Computer Engineering

E-mail: a.belyaev@hw.ac.uk
Office: EM2.29

Based on the slides prepared by Dr. Mustafa Suphi Erden

B38DF: Computer Architecture and Embedded Systems 1/20

Processors – Part-1
Parallelism – Improving Performance

Two forms of parallelism:

• Instruction-level parallelism
Parallelism within individual instructions to get more
instructions/sec

• Processor-level parallelism
Multiple CPUs work together on the same problem

2/20
Instruction-level Parallelism

• A major bottle-neck in instruction execution speed:

Fetching of instructions from memory
• Prefetch buffer registers ( first step solution)
• Fetch instruction from memory in advance to the prefetch
registers
• Instruction execution in two parts: fetching + actual execution

• Pipelining ( profound solution)

• Instruction execution in many (a dozen or more) parts

3/20
Pipelining

a) A five-stage pipeline
b) The state of each stage as a function of time. Nine clock cycles are
illustrated
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.

4/20
Pipelining – Processing Rate Computation

• Clock cycle time:

2 nsec (assume)

• Instruction processing time

5 × 2nsec = 10 nsec

• Without pipelining:
Processing Rate = 1/10 nsec
= 100 MIPS (Millions of Instructions Per Second)

• With pipelining:
Processing Rate = 5 × 1/10 nsec
= 500 MIPS (Millions of Instructions Per Second)
5/20
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
Latency and Processor Bandwidth
• Latency:

How long it takes to execute an instruction

• Processor bandwidth
How many MIPS (millions of instructions per second) the CPU has
------------------------------------------------------------------------------------------------------
(Clock) Cycle time T nanoseconds (nsec)

n stages in pipeline

Latency = nT

One instruction completes every clock cycle

Number of instructions executed per second = 109 /T

Bandwidth = (109 /T) / 106 MIPS = 1000/T MIPS

6/20
Pipelining – Processing Rate Computation
• Clock cycle time:

2 nsec (assume)

• Instruction processing time

5 × 2nsec = 10 nsec

• Without pipelining: Latency

Processing Rate = 1/10 nsec
= 100 MIPS (Millions of Instructions Per Second)

• With pipelining: Throughput

Processing Rate = 5 × 1/10 nsec (Processor bandwidth)
= 500 MIPS (Millions of Instructions Per Second)

7/20
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
Performance Analysis
• How much does the performance of a computing system
improve when its components are improved?

• A task executed by a system whose resources are improved

compared to an initial similar system can be split up into two
parts:
• a part that does not benefit from the improvement;
• a part that benefits from the improvement.

Source: Wikipedia

8/20
Performance Analysis

Example:
• A computer program that processes files from disk.
• A part of that program may scan the directory of the disk and
create a list of files internally in memory.
• Another part of the program passes each file to a separate
thread for processing.
• The part that scans the directory and creates the file list
cannot be speed up on a parallel computer, but the part that
processes the files can.

Source: Wikipedia

9/20
Amdahl’s Law and its Derivation
Amdahl's law is a formula which gives the theoretical speedup in
latency of the execution of a task at fixed workload that can be expected of
a system whose resources are improved. It is named after computer
scientist Gene Amdahl who presented it 1967.

Amdahl's law is often used in parallel computing to predict the theoretical

speedup when using multiple processors..

1
S latency (s ) =
(1 − p ) + p s

Source: Wikipedia 10/20

Amdahl’s Law and its Derivation
For example, if a program needs 20 hours using a single processor core,
and a particular part of the program which takes one hour to execute
cannot be parallelized, while the remaining 19 hours (p = 0.95) of
execution time can be parallelized, then regardless of how many processors
are devoted to a parallelized execution of this program, the minimum
execution time cannot be less than that critical one hour. Hence, the
theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this
reason, parallel computing with many processors is useful only for highly
parallelizable programs.

1
S latency (s ) =
(1 − p ) + p s
Source: Wikipedia 11/20
Amdahl’s Law and its Derivation
Slatency is the theoretical speedup of the execution of the whole task;
s is the speedup of the part of the task that benefits from improved
system resources;
p is the proportion of execution time that the part benefiting from
improved resources originally occupied.

1
S latency (s ) =
(1 − p ) + p s
1
S latency (s ) < S latency (∞ ) =
(1 − p ) Source: Wikipedia

12/20
Amdahl’s Law and its Derivation
T is the execution time of the whole task before the improvement.
p is the proportion of execution time that the part benefiting from
improved resources originally occupied.

T = (1 − p )T + pT Before speedup

p
T (s ) = (1 − p )T + T After speedup
s

 p
S latency (s ) =
T 1
T (s ) =  1 − p +  T =
 s T (s ) 1 − p + p s
Source: Wikipedia 13/20
Example 1
If 30% of the execution time may be the subject of a
speedup, p will be 0.3.

If the improvement makes the affected part twice as fast: s=2.

Amdahl's law states that the overall speedup of applying the

improvement will be:

1 1
S latency (s ) = = = 1.18
1 − 0.3 + 0.3 2 0.85

Source: Wikipedia 14/20

Example 2
We are given a serial task which is split into four consecutive parts, whose
percentages of execution time are
p1 = 0.11, p2 = 0.18, p3 = 0.23, p4 = 0.48.
We are told that: T = T1 + T2 + T3 + T4 = p1T + p2T + p3T + p4T
1st part is not sped up, so s1 = 1, p p p p
T (s ) = 1 T + 2 T + 3 T + 4 T
2nd part is sped up 5 times, so s2 = 5, s1 s2 s3 s4
3rd part is sped up 20 times, so s3 = 20, T T (s ) = 1 ( p1 s1 + p2 s2 + p3 s3 + p4 s4 )
4th part is sped up 1.6 times, so s4 = 1.6.
1
S latency = = 2.2
By using Amdahl's law, p1 p2 p3 p4
the overall speedup is + + +
s1 s2 s3 s4
Notice how the 5 times and 20 times speedup on the 2nd and 3rd parts
respectively don't have much effect on the overall speedup when the Source:
Wikipedia
4th part (48% of the execution time) is accelerated by only 1.6 times.
15/20
(run Maple code)
Example 3
A serial program consists of two parts A and B for which TA = 3s, TB = 1s.

If part B is made to run 5 times faster, then S latency,B = 1

= 1.25
0.75 + 0.25 5
If part A is made to run 2 times faster, then 1
S latency,B = = 1.6
0.75 2 + 0.25
Therefore, making
part A to run 2 times faster
is better than making
part B to run 5 times faster.

16/20
Source: Wikipedia
Percentage Improvement
 1 
The percentage improvement in 
PI = 100 1 − 
speed can be calculated as
 Sl 
Improving part A by a factor 2 will increase overall program speed by factor 1.60,
which makes it
 1 
PI = 100 1 −  = 37%
 1.60 
faster than the original computation

Improving part B by a factor 5 will increase overall program speed by factor 1.25,
which makes it
 1 
PI = 100 1 −  = 20%
 1.25 
faster than the original computation
Source: Wikipedia

17/20
Example 4
Amdahl’s Law gives us a handy way to estimate the performance
improvement we can expect when we upgrade a system component.

• On a large system, suppose we can upgrade a CPU to make it 50%

faster for £10,000 or upgrade its disk drives for £7,000 to make them
150% faster.

• Processes spend 70% of their time running in the CPU and 30% of their
time waiting for disk service.

• An upgrade of which component would offer the greater benefit for the
lesser cost?

18/20
Example 4
• The processor option offers a 30% speedup:
1
S= ≈ 1.30
1 − 0.7 + 0.7 1.5

• And the disk drive option gives a 22% speedup:

1
S= ≈ 1.22
1 − 0.3 + 0.3 2.5
• Each 1% of improvement for the processor costs £333, and for the disk
a 1% improvement costs £318.

Should price/performance be your only concern?

19/20
Example 4
• The processor option offers a 30% speedup:
1
S= = 1.30
1 − 0.7 + 0.7 1.5

• And the disk drive option gives a 22% speedup:

1
S= ≈ 1.22
1 − 0.3 + 0.3 2.5
• Each 1% of improvement for the processor costs £333, and for the disk
a 1% improvement costs £318.

Should price/performance be your only concern?

If your disks are nearing the end of their expected life, or if you’re running out
of disk space, you might consider the disk upgrade even if it were to cost
more than the processor upgrade. 20/20

Problems Solution MOSFET at DC
100% (2)
Problems Solution MOSFET at DC
9 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
MAIN Electrical Parts List: (SM-A305F)
No ratings yet
MAIN Electrical Parts List: (SM-A305F)
54 pages
Lecture 6 (Amdahl's Law)
No ratings yet
Lecture 6 (Amdahl's Law)
13 pages
Coa Presentation
No ratings yet
Coa Presentation
20 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
performance metrics
No ratings yet
performance metrics
34 pages
Amdahls Law
No ratings yet
Amdahls Law
18 pages
Coa Chapter-2
No ratings yet
Coa Chapter-2
40 pages
Laraib Cs - 39 Assig 1
No ratings yet
Laraib Cs - 39 Assig 1
4 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
PDC Assignment Group#7
No ratings yet
PDC Assignment Group#7
13 pages
5 Problems PDF
No ratings yet
5 Problems PDF
32 pages
Lecture-11 Amdhals Law Gustafsons Law
No ratings yet
Lecture-11 Amdhals Law Gustafsons Law
16 pages
Lecture 02-Amdahl's Law, Modern Hardware: ECE 459: Programming For Performance
No ratings yet
Lecture 02-Amdahl's Law, Modern Hardware: ECE 459: Programming For Performance
13 pages
Amdahl Law
No ratings yet
Amdahl Law
2 pages
Amdahl's Law (Autosaved)
No ratings yet
Amdahl's Law (Autosaved)
12 pages
34-Amdahl''s Law-10-04-2023
No ratings yet
34-Amdahl''s Law-10-04-2023
9 pages
Amdahl's Law Example #2: - Protein String Matching Code
No ratings yet
Amdahl's Law Example #2: - Protein String Matching Code
23 pages
06 CA (Performance Enhancement)
No ratings yet
06 CA (Performance Enhancement)
31 pages
02 Gustafsons Law
No ratings yet
02 Gustafsons Law
2 pages
Computer Hardware Engineering: IS1200, Spring 2015
No ratings yet
Computer Hardware Engineering: IS1200, Spring 2015
17 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
Amdahl's Law
No ratings yet
Amdahl's Law
25 pages
Performance Analysis: PE PE
No ratings yet
Performance Analysis: PE PE
10 pages
Computer Performance Measurement. Amdahl's Law
No ratings yet
Computer Performance Measurement. Amdahl's Law
24 pages
Amdahl's Law, Also Known As Amdahl's Argument,: Parallel Computing Speedup Computer Architect Gene Amdahl Afips
No ratings yet
Amdahl's Law, Also Known As Amdahl's Argument,: Parallel Computing Speedup Computer Architect Gene Amdahl Afips
3 pages
CS-3006_4_PerformanceAnalysis
No ratings yet
CS-3006_4_PerformanceAnalysis
62 pages
Amdahls Law - Advanced Computer Architecture
No ratings yet
Amdahls Law - Advanced Computer Architecture
2 pages
CSC 313 Module 3 Pipelining
No ratings yet
CSC 313 Module 3 Pipelining
59 pages
Document
No ratings yet
Document
10 pages
Amdahl's Law
No ratings yet
Amdahl's Law
5 pages
Computer Architecture_Lect 6
No ratings yet
Computer Architecture_Lect 6
7 pages
Amdahl's Law 3.1.6
No ratings yet
Amdahl's Law 3.1.6
3 pages
Reevaluating Amdahls Law
No ratings yet
Reevaluating Amdahls Law
3 pages
Amdals Law Notes
No ratings yet
Amdals Law Notes
8 pages
Computer Performance
No ratings yet
Computer Performance
35 pages
COE4590_12_Amdahls_Law
No ratings yet
COE4590_12_Amdahls_Law
18 pages
2.Week
No ratings yet
2.Week
35 pages
Cao AMDAHL's Law
No ratings yet
Cao AMDAHL's Law
4 pages
Group Members:: Saeed Ud Din (15) Syed Amir Kazmi (07) Malik Baseer
No ratings yet
Group Members:: Saeed Ud Din (15) Syed Amir Kazmi (07) Malik Baseer
9 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Lect 02
No ratings yet
Lect 02
51 pages
CS4617 Computer Architecture: Lecture 2a
No ratings yet
CS4617 Computer Architecture: Lecture 2a
13 pages
PDC Week 2 (Performance Metrice, Amdahl's Law)
No ratings yet
PDC Week 2 (Performance Metrice, Amdahl's Law)
18 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Introduction To Computer Organization
No ratings yet
Introduction To Computer Organization
66 pages
Lecture 3
No ratings yet
Lecture 3
8 pages
C F C P S (CS61063) : Tutorial 1
No ratings yet
C F C P S (CS61063) : Tutorial 1
13 pages
Multi-Core Computer Architecture: Performance Evaluation Methods
No ratings yet
Multi-Core Computer Architecture: Performance Evaluation Methods
20 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Chapter (7) Performance Analysis Techniques: Asmaa Ismail Farah Basil Raua Waleed
No ratings yet
Chapter (7) Performance Analysis Techniques: Asmaa Ismail Farah Basil Raua Waleed
46 pages
Speed Up Laws
No ratings yet
Speed Up Laws
21 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Ece586 Lec4 1
No ratings yet
Ece586 Lec4 1
4 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Implementing A Large Data Bus VLIW Microprocessor
No ratings yet
Implementing A Large Data Bus VLIW Microprocessor
7 pages
8086 Addressing Mode: Dr. Mohanad A. Shehab/ Electrical Engineering Department/ Mustansiriyah University
No ratings yet
8086 Addressing Mode: Dr. Mohanad A. Shehab/ Electrical Engineering Department/ Mustansiriyah University
7 pages
Memory System
No ratings yet
Memory System
51 pages
ID-20193007011 Presentation About Latest Invention of Microprocessor
No ratings yet
ID-20193007011 Presentation About Latest Invention of Microprocessor
16 pages
Preventing Soft Errors and Hardware Trojans in RISC V Cores
No ratings yet
Preventing Soft Errors and Hardware Trojans in RISC V Cores
6 pages
Server memory population rules for HPE ProLiant Gen11 servers with AMD EPYC 9004 series processors-a50007481enw
No ratings yet
Server memory population rules for HPE ProLiant Gen11 servers with AMD EPYC 9004 series processors-a50007481enw
8 pages
CMOS Voltage Level Up Shifter A Review
No ratings yet
CMOS Voltage Level Up Shifter A Review
4 pages
Evolution of Microelectronics: (From Discrete Devices To Modern Integrated Circuits - A Brief Review)
No ratings yet
Evolution of Microelectronics: (From Discrete Devices To Modern Integrated Circuits - A Brief Review)
50 pages
Atmel At89sxx Isp
No ratings yet
Atmel At89sxx Isp
6 pages
Sram Programmable Fpgas: Configuration Memory Cells
No ratings yet
Sram Programmable Fpgas: Configuration Memory Cells
2 pages
Lecture14 Inverter Delay
No ratings yet
Lecture14 Inverter Delay
31 pages
Abhishek Prajapati 001 AEE Experiment 4 Report
No ratings yet
Abhishek Prajapati 001 AEE Experiment 4 Report
8 pages
Ch6 - Computer Architecture (1)
No ratings yet
Ch6 - Computer Architecture (1)
10 pages
CH 06
No ratings yet
CH 06
82 pages
Chapter 3 ISA
No ratings yet
Chapter 3 ISA
4 pages
Atmel 0038
No ratings yet
Atmel 0038
1 page
256K (32K X 8) Paged Parallel Eeprom AT28C256: Features
No ratings yet
256K (32K X 8) Paged Parallel Eeprom AT28C256: Features
18 pages
TMM41256P 12 Toshiba
No ratings yet
TMM41256P 12 Toshiba
10 pages
Datasheet
No ratings yet
Datasheet
255 pages
Lecture #3 Addressing Modes PDF
No ratings yet
Lecture #3 Addressing Modes PDF
77 pages
Chapter 1
No ratings yet
Chapter 1
88 pages
Types of Processors
75% (4)
Types of Processors
7 pages
Simplified CMOS Voltage Transfer Curve Step 1. Finding V: M M IN M
No ratings yet
Simplified CMOS Voltage Transfer Curve Step 1. Finding V: M M IN M
6 pages
The ARM Cortex-M3 Processor Architecture Part-1
No ratings yet
The ARM Cortex-M3 Processor Architecture Part-1
28 pages
Design and Implementation of Synthesizable 32-Bit Four Stage Pipelined RISC Processor in FPGA Using Verilog/VHDL
No ratings yet
Design and Implementation of Synthesizable 32-Bit Four Stage Pipelined RISC Processor in FPGA Using Verilog/VHDL
8 pages
Embedded Systems and IOT Design - ET3491 - Full Notes
No ratings yet
Embedded Systems and IOT Design - ET3491 - Full Notes
205 pages
Am29F800BT/Am29F800BB: 8 Megabit (1 M X 8-Bit/512 K X 16-Bit) CMOS 5.0 Volt-Only Sector Erase Flash Memory
No ratings yet
Am29F800BT/Am29F800BB: 8 Megabit (1 M X 8-Bit/512 K X 16-Bit) CMOS 5.0 Volt-Only Sector Erase Flash Memory
37 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

B38DF_LS2b_performance

Uploaded by

B38DF_LS2b_performance

Uploaded by

B38DF

Computer Architecture and Embedded Systems

Based on the slides prepared by Dr. Mustafa Suphi Erden

B38DF: Computer Architecture and Embedded Systems 1/20

Two forms of parallelism:

• A major bottle-neck in instruction execution speed:

• Pipelining ( profound solution)

• Clock cycle time:

• Instruction processing time

How long it takes to execute an instruction

One instruction completes every clock cycle

Number of instructions executed per second = 109 /T

Bandwidth = (109 /T) / 106 MIPS = 1000/T MIPS

• Instruction processing time

• Without pipelining: Latency

• With pipelining: Throughput

• A task executed by a system whose resources are improved

Amdahl's law is often used in parallel computing to predict the theoretical

Source: Wikipedia 10/20

If the improvement makes the affected part twice as fast: s=2.

Amdahl's law states that the overall speedup of applying the

Source: Wikipedia 14/20

If part B is made to run 5 times faster, then S latency,B = 1

• On a large system, suppose we can upgrade a CPU to make it 50%

• And the disk drive option gives a 22% speedup:

Should price/performance be your only concern?

• And the disk drive option gives a 22% speedup:

Should price/performance be your only concern?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.