0% found this document useful (0 votes)
9 views

B38DF_LS2b_performance

The document discusses computer architecture focusing on parallelism to improve performance, detailing instruction-level and processor-level parallelism. It explains concepts like pipelining, Amdahl's Law, and performance analysis with examples illustrating the impact of system upgrades on execution speed. The document emphasizes the limitations of parallel computing and the importance of identifying which parts of a task can benefit from improvements.

Uploaded by

chaneuuk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

B38DF_LS2b_performance

The document discusses computer architecture focusing on parallelism to improve performance, detailing instruction-level and processor-level parallelism. It explains concepts like pipelining, Amdahl's Law, and performance analysis with examples illustrating the impact of system upgrades on execution speed. The document emphasizes the limitations of parallel computing and the importance of identifying which parts of a task can benefit from improvements.

Uploaded by

chaneuuk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

B38DF

Computer Architecture and Embedded Systems

Alexander Belyaev

Heriot-Watt University
School of Engineering & Physical Sciences
Electrical, Electronic and Computer Engineering

E-mail: a.belyaev@hw.ac.uk
Office: EM2.29

Based on the slides prepared by Dr. Mustafa Suphi Erden

B38DF: Computer Architecture and Embedded Systems 1/20


Processors – Part-1
Parallelism – Improving Performance

Two forms of parallelism:

• Instruction-level parallelism
Parallelism within individual instructions to get more
instructions/sec

• Processor-level parallelism
Multiple CPUs work together on the same problem

2/20
Instruction-level Parallelism

• A major bottle-neck in instruction execution speed:


Fetching of instructions from memory
• Prefetch buffer registers ( first step solution)
• Fetch instruction from memory in advance to the prefetch
registers
• Instruction execution in two parts: fetching + actual execution

• Pipelining ( profound solution)


• Instruction execution in many (a dozen or more) parts

3/20
Pipelining

a) A five-stage pipeline
b) The state of each stage as a function of time. Nine clock cycles are
illustrated
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.

4/20
Pipelining – Processing Rate Computation

• Clock cycle time:

2 nsec (assume)

• Instruction processing time


5 × 2nsec = 10 nsec

• Without pipelining:
Processing Rate = 1/10 nsec
= 100 MIPS (Millions of Instructions Per Second)

• With pipelining:
Processing Rate = 5 × 1/10 nsec
= 500 MIPS (Millions of Instructions Per Second)
5/20
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
Latency and Processor Bandwidth
• Latency:

How long it takes to execute an instruction

• Processor bandwidth
How many MIPS (millions of instructions per second) the CPU has
------------------------------------------------------------------------------------------------------
(Clock) Cycle time T nanoseconds (nsec)

n stages in pipeline

Latency = nT

One instruction completes every clock cycle

Number of instructions executed per second = 109 /T

Bandwidth = (109 /T) / 106 MIPS = 1000/T MIPS

6/20
Pipelining – Processing Rate Computation
• Clock cycle time:

2 nsec (assume)

• Instruction processing time


5 × 2nsec = 10 nsec

• Without pipelining: Latency


Processing Rate = 1/10 nsec
= 100 MIPS (Millions of Instructions Per Second)

• With pipelining: Throughput


Processing Rate = 5 × 1/10 nsec (Processor bandwidth)
= 500 MIPS (Millions of Instructions Per Second)

7/20
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc.
Performance Analysis
• How much does the performance of a computing system
improve when its components are improved?

• A task executed by a system whose resources are improved


compared to an initial similar system can be split up into two
parts:
• a part that does not benefit from the improvement;
• a part that benefits from the improvement.

Source: Wikipedia

8/20
Performance Analysis

Example:
• A computer program that processes files from disk.
• A part of that program may scan the directory of the disk and
create a list of files internally in memory.
• Another part of the program passes each file to a separate
thread for processing.
• The part that scans the directory and creates the file list
cannot be speed up on a parallel computer, but the part that
processes the files can.

Source: Wikipedia

9/20
Amdahl’s Law and its Derivation
Amdahl's law is a formula which gives the theoretical speedup in
latency of the execution of a task at fixed workload that can be expected of
a system whose resources are improved. It is named after computer
scientist Gene Amdahl who presented it 1967.

Amdahl's law is often used in parallel computing to predict the theoretical


speedup when using multiple processors..

1
S latency (s ) =
(1 − p ) + p s

Source: Wikipedia 10/20


Amdahl’s Law and its Derivation
For example, if a program needs 20 hours using a single processor core,
and a particular part of the program which takes one hour to execute
cannot be parallelized, while the remaining 19 hours (p = 0.95) of
execution time can be parallelized, then regardless of how many processors
are devoted to a parallelized execution of this program, the minimum
execution time cannot be less than that critical one hour. Hence, the
theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this
reason, parallel computing with many processors is useful only for highly
parallelizable programs.

1
S latency (s ) =
(1 − p ) + p s
Source: Wikipedia 11/20
Amdahl’s Law and its Derivation
Slatency is the theoretical speedup of the execution of the whole task;
s is the speedup of the part of the task that benefits from improved
system resources;
p is the proportion of execution time that the part benefiting from
improved resources originally occupied.

1
S latency (s ) =
(1 − p ) + p s
1
S latency (s ) < S latency (∞ ) =
(1 − p ) Source: Wikipedia

12/20
Amdahl’s Law and its Derivation
T is the execution time of the whole task before the improvement.
p is the proportion of execution time that the part benefiting from
improved resources originally occupied.

T = (1 − p )T + pT Before speedup

p
T (s ) = (1 − p )T + T After speedup
s

 p
S latency (s ) =
T 1
T (s ) =  1 − p +  T =
 s T (s ) 1 − p + p s
Source: Wikipedia 13/20
Example 1
If 30% of the execution time may be the subject of a
speedup, p will be 0.3.

If the improvement makes the affected part twice as fast: s=2.

Amdahl's law states that the overall speedup of applying the


improvement will be:

1 1
S latency (s ) = = = 1.18
1 − 0.3 + 0.3 2 0.85

Source: Wikipedia 14/20


Example 2
We are given a serial task which is split into four consecutive parts, whose
percentages of execution time are
p1 = 0.11, p2 = 0.18, p3 = 0.23, p4 = 0.48.
We are told that: T = T1 + T2 + T3 + T4 = p1T + p2T + p3T + p4T
1st part is not sped up, so s1 = 1, p p p p
T (s ) = 1 T + 2 T + 3 T + 4 T
2nd part is sped up 5 times, so s2 = 5, s1 s2 s3 s4
3rd part is sped up 20 times, so s3 = 20, T T (s ) = 1 ( p1 s1 + p2 s2 + p3 s3 + p4 s4 )
4th part is sped up 1.6 times, so s4 = 1.6.
1
S latency = = 2.2
By using Amdahl's law, p1 p2 p3 p4
the overall speedup is + + +
s1 s2 s3 s4
Notice how the 5 times and 20 times speedup on the 2nd and 3rd parts
respectively don't have much effect on the overall speedup when the Source:
Wikipedia
4th part (48% of the execution time) is accelerated by only 1.6 times.
15/20
(run Maple code)
Example 3
A serial program consists of two parts A and B for which TA = 3s, TB = 1s.

If part B is made to run 5 times faster, then S latency,B = 1


= 1.25
0.75 + 0.25 5
If part A is made to run 2 times faster, then 1
S latency,B = = 1.6
0.75 2 + 0.25
Therefore, making
part A to run 2 times faster
is better than making
part B to run 5 times faster.

16/20
Source: Wikipedia
Percentage Improvement
 1 
The percentage improvement in 
PI = 100 1 − 
speed can be calculated as
 Sl 
Improving part A by a factor 2 will increase overall program speed by factor 1.60,
which makes it
 1 
PI = 100 1 −  = 37%
 1.60 
faster than the original computation

Improving part B by a factor 5 will increase overall program speed by factor 1.25,
which makes it
 1 
PI = 100 1 −  = 20%
 1.25 
faster than the original computation
Source: Wikipedia

17/20
Example 4
Amdahl’s Law gives us a handy way to estimate the performance
improvement we can expect when we upgrade a system component.

• On a large system, suppose we can upgrade a CPU to make it 50%


faster for £10,000 or upgrade its disk drives for £7,000 to make them
150% faster.

• Processes spend 70% of their time running in the CPU and 30% of their
time waiting for disk service.

• An upgrade of which component would offer the greater benefit for the
lesser cost?

18/20
Example 4
• The processor option offers a 30% speedup:
1
S= ≈ 1.30
1 − 0.7 + 0.7 1.5

• And the disk drive option gives a 22% speedup:

1
S= ≈ 1.22
1 − 0.3 + 0.3 2.5
• Each 1% of improvement for the processor costs £333, and for the disk
a 1% improvement costs £318.

Should price/performance be your only concern?

19/20
Example 4
• The processor option offers a 30% speedup:
1
S= = 1.30
1 − 0.7 + 0.7 1.5

• And the disk drive option gives a 22% speedup:

1
S= ≈ 1.22
1 − 0.3 + 0.3 2.5
• Each 1% of improvement for the processor costs £333, and for the disk
a 1% improvement costs £318.

Should price/performance be your only concern?


If your disks are nearing the end of their expected life, or if you’re running out
of disk space, you might consider the disk upgrade even if it were to cost
more than the processor upgrade. 20/20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy