Lec10 Performance
Lec10 Performance
Spring 2022
Performance Metrics
Purchasing perspective
Given a collection of machines, which has the
• best performance ?
• least cost ?
• best cost/performance?
Design perspective
Faced with design options, which has the
• best performance improvement ?
• least cost ?
• best cost/performance?
2/21
Performance Metrics
Our Goal
Is to understand what factors in the architecture contribute to overall system performance
and the relative importance (and cost) of these factors
3/21
Throughput v.s. Response Time
Throughput (bandwidth)
• The total amount of work done in a given time
• Important to data center managers
4/21
Response Time Matters
1
performanceX =
execution_timeX
performanceX execution_timeY
= =n
performanceY execution_timeX
6/21
EX-1
If computer A runs a program in 10 seconds and computer B runs the same program in 15
seconds, how much faster is A than B?
7/21
Performance Factors
• CPU execution time (CPU time): time the CPU spends working on a task
• Does not include time waiting for I/O or running other programs
8/21
Review: Machine Clock Rate
Clock rate (clock cycles per second in MHz or GHz) is inverse of clock cycle time
(clock period)
1
CC =
CR
9/21
EX-2: Improving Performance Example
A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must a
computer B has to run this program in 6 seconds? Unfortunately, to accomplish this,
computer B will require 1.2 times as many clock cycles as computer A to run the program.
10/21
Clock Cycles per Instruction
11/21
Effective (Average) CPI
n
X
CPIi × ICi
i=1
• Computing the overall effective CPI is done by looking at the different types of
instructions and their individual cycle counts and averaging
• The overall effective CPI varies by instruction mix
• A measure of the dynamic frequency of instructions across one or many programs
12/21
Basic Performance Equation
CPI varies by instruction type and ISA implementation for which we must know
the implementation details
13/21
EX-3: Using the Performance Equation
Computers A and B implement the same ISA. Computer A has a clock cycle time of 250 ps
and an effective CPI of 2.0 for some program and computer B has a clock cycle time of 500
ps and an effective CPI of 1.2 for the same program. Which computer is faster and by how
much?
14/21
Determinates of CPU Performance
Programming
language
Compiler
ISA
Core
organization
Technology
15/21
Determinates of CPU Performance
Programming X X
language
Compiler X X
ISA
X X X
Core
X X
organization
Technology
X
15/21
EX-4
Op Freq CPIi Freq x CPIi
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
∑=
1 How much faster would the machine be if a better data cache reduced the average
load time to 2 cycles?
2 How does this compare with using branch prediction to shave a cycle off the branch
time?
3 What if two ALU instructions could be executed at once?
16/21
Workloads and Benchmarks
Benchmarks
A set of programs that form a “workload” specifically chosen to measure performance
17/21
SPEC CINT2006 on Barcelona (CC = 0.4 × 109 )
v
u n
uX
GM = n · t SPEC ratioi
i=1
19/21
Other Performance Metrics
Power Consumption
• Especially in the embedded market where battery life is important
• For power-limited applications, the most important metric is energy efficiency
20/21
Highest Clock Rate of Intel Processors