B38DF_LS2b_performance
B38DF_LS2b_performance
Alexander Belyaev
Heriot-Watt University
School of Engineering & Physical Sciences
Electrical, Electronic and Computer Engineering
E-mail: a.belyaev@hw.ac.uk
Office: EM2.29
• Instruction-level parallelism
Parallelism within individual instructions to get more
instructions/sec
• Processor-level parallelism
Multiple CPUs work together on the same problem
2/20
Instruction-level Parallelism
3/20
Pipelining
a) A five-stage pipeline
b) The state of each stage as a function of time. Nine clock cycles are
illustrated
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
4/20
Pipelining – Processing Rate Computation
2 nsec (assume)
• Without pipelining:
Processing Rate = 1/10 nsec
= 100 MIPS (Millions of Instructions Per Second)
• With pipelining:
Processing Rate = 5 × 1/10 nsec
= 500 MIPS (Millions of Instructions Per Second)
5/20
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
Latency and Processor Bandwidth
• Latency:
• Processor bandwidth
How many MIPS (millions of instructions per second) the CPU has
------------------------------------------------------------------------------------------------------
(Clock) Cycle time T nanoseconds (nsec)
n stages in pipeline
Latency = nT
6/20
Pipelining – Processing Rate Computation
• Clock cycle time:
2 nsec (assume)
7/20
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
Performance Analysis
• How much does the performance of a computing system
improve when its components are improved?
Source: Wikipedia
8/20
Performance Analysis
Example:
• A computer program that processes files from disk.
• A part of that program may scan the directory of the disk and
create a list of files internally in memory.
• Another part of the program passes each file to a separate
thread for processing.
• The part that scans the directory and creates the file list
cannot be speed up on a parallel computer, but the part that
processes the files can.
Source: Wikipedia
9/20
Amdahl’s Law and its Derivation
Amdahl's law is a formula which gives the theoretical speedup in
latency of the execution of a task at fixed workload that can be expected of
a system whose resources are improved. It is named after computer
scientist Gene Amdahl who presented it 1967.
1
S latency (s ) =
(1 − p ) + p s
1
S latency (s ) =
(1 − p ) + p s
Source: Wikipedia 11/20
Amdahl’s Law and its Derivation
Slatency is the theoretical speedup of the execution of the whole task;
s is the speedup of the part of the task that benefits from improved
system resources;
p is the proportion of execution time that the part benefiting from
improved resources originally occupied.
1
S latency (s ) =
(1 − p ) + p s
1
S latency (s ) < S latency (∞ ) =
(1 − p ) Source: Wikipedia
12/20
Amdahl’s Law and its Derivation
T is the execution time of the whole task before the improvement.
p is the proportion of execution time that the part benefiting from
improved resources originally occupied.
T = (1 − p )T + pT Before speedup
p
T (s ) = (1 − p )T + T After speedup
s
p
S latency (s ) =
T 1
T (s ) = 1 − p + T =
s T (s ) 1 − p + p s
Source: Wikipedia 13/20
Example 1
If 30% of the execution time may be the subject of a
speedup, p will be 0.3.
1 1
S latency (s ) = = = 1.18
1 − 0.3 + 0.3 2 0.85
16/20
Source: Wikipedia
Percentage Improvement
1
The percentage improvement in
PI = 100 1 −
speed can be calculated as
Sl
Improving part A by a factor 2 will increase overall program speed by factor 1.60,
which makes it
1
PI = 100 1 − = 37%
1.60
faster than the original computation
Improving part B by a factor 5 will increase overall program speed by factor 1.25,
which makes it
1
PI = 100 1 − = 20%
1.25
faster than the original computation
Source: Wikipedia
17/20
Example 4
Amdahl’s Law gives us a handy way to estimate the performance
improvement we can expect when we upgrade a system component.
• Processes spend 70% of their time running in the CPU and 30% of their
time waiting for disk service.
• An upgrade of which component would offer the greater benefit for the
lesser cost?
18/20
Example 4
• The processor option offers a 30% speedup:
1
S= ≈ 1.30
1 − 0.7 + 0.7 1.5
1
S= ≈ 1.22
1 − 0.3 + 0.3 2.5
• Each 1% of improvement for the processor costs £333, and for the disk
a 1% improvement costs £318.
19/20
Example 4
• The processor option offers a 30% speedup:
1
S= = 1.30
1 − 0.7 + 0.7 1.5
1
S= ≈ 1.22
1 − 0.3 + 0.3 2.5
• Each 1% of improvement for the processor costs £333, and for the disk
a 1% improvement costs £318.