0% found this document useful (0 votes)

21 views

Lecture 2: Metrics To Evaluate Systems

1. The document discusses various metrics for evaluating computer system performance including power, reliability, cost, benchmark suites, and methods for summarizing performance like arithmetic mean (AM), geometric mean (GM), and harmonic mean (HM). 2. It provides examples of how to calculate total power consumption, energy usage, and compare performance between systems using different metrics like AM, GM, and execution time. 3. Key factors that impact power consumption and reliability over time are the increasing number of transistors, leakage power, and memory trends while techniques like dynamic voltage and frequency scaling can help reduce energy usage.

Uploaded by

Tahsin Arik Tusan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Lecture 2: Metrics To Evaluate Systems

Uploaded by

Tahsin Arik Tusan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Lecture 2: Metrics to Evaluate Systems

• Topics: Metrics: power, reliability, cost,

benchmark suites, performance equation,
summarizing performance with AM, GM, HM

• Sign up for the class mailing list!

 Video 1: Using AM as a performance summary

 Video 2: GM, Performance Equation
 Video 3: AM vs. HM vs. GM

1
Power Consumption Trends

• Dyn power α activity x capacitance x voltage2 x frequency

• Capacitance per transistor and voltage are decreasing,

but number of transistors is increasing at a faster rate;
hence clock frequency must be kept steady

• Leakage power is also rising; is a function of transistor

count, leakage current, and supply voltage

• Power consumption is already between 100-150W in

high-performance processors today

• Energy = power x time = (dynpower + lkgpower) x time

2
Problem 1

• For a processor running at 100% utilization at 100 W,

20% of the power is attributed to leakage. What is the
total power dissipation when the processor is running at
50% utilization?

3
Problem 1

• For a processor running at 100% utilization at 100 W,

20% of the power is attributed to leakage. What is the
total power dissipation when the processor is running at
50% utilization?

Total power = dynamic power + leakage power

= 80W x 50% + 20W
= 60W

4
Power Vs. Energy

• Energy is the ultimate metric: it tells us the true “cost” of

performing a fixed task

• Power (energy/time) poses constraints; can only work fast

enough to max out the power delivery or cooling solution

• If processor A consumes 1.2x the power of processor B,

but finishes the task in 30% less time, its relative energy
is 1.2 X 0.7 = 0.84; Proc-A is better, assuming that 1.2x
power can be supported by the system

5
Problem 2

• If processor A consumes 1.4x the power of processor B,

but finishes the task in 20% less time, which processor
would you pick:
(a) if you were constrained by power delivery constraints?
(b) if you were trying to minimize energy per operation?
(c) if you were trying to minimize response times?

6
Problem 2

• If processor A consumes 1.4x the power of processor B,

but finishes the task in 20% less time, which processor
would you pick:
(a) if you were constrained by power delivery constraints?
Proc-B
(b) if you were trying to minimize energy per operation?
Proc-A is 1.4x0.8 = 1.12 times the energy of Proc-B
(c) if you were trying to minimize response times?
Proc-A is faster, but we could scale up the frequency
(and power) of Proc-B and match Proc-A’s response
time (while still doing better in terms of power and
energy)

7
Reducing Power and Energy

• Can gate off transistors that are inactive (reduces leakage)

• Design for typical case and throttle down when activity

exceeds a threshold

• DFS: Dynamic frequency scaling -- only reduces frequency

and dynamic power, but hurts energy

• DVFS: Dynamic voltage and frequency scaling – can reduce

voltage and frequency by (say) 10%; can slow a program
by (say) 8%, but reduce dynamic power by 27%, reduce
total power by (say) 23%, reduce total energy by 17%
(Note: voltage drop  slow transistor  freq drop)
8
Problem 3

• Processor-A at 3 GHz consumes 80 W of dynamic power

and 20 W of static power. It completes a program in 20
seconds.
What is the energy consumption if I scale frequency down
by 20%?

What is the energy consumption if I scale frequency and

voltage down by 20%?

9
Problem 3

• Processor-A at 3 GHz consumes 80 W of dynamic power

and 20 W of static power. It completes a program in 20
seconds.
What is the energy consumption if I scale frequency down
by 20%?
New dynamic power = 64W; New static power = 20W
New execution time = 25 secs (assuming CPU-bound)
Energy = 84 W x 25 secs = 2100 Joules

What is the energy consumption if I scale frequency and

voltage down by 20%?
New DP = 41W; New static power = 16W;
New exec time = 25 secs; Energy = 1425 Joules
10
Other Technology Trends

• DRAM density increases by 40-60% per year, latency has

reduced by 33% in 10 years (the memory wall!), bandwidth
improves twice as fast as latency decreases

• Disk density improves by 100% every year, latency

improvement similar to DRAM

• Emergence of NVRAM technologies that can provide a

bridge between DRAM and hard disk drives

• Also, growing concerns over reliability (since transistors

are smaller, operating at low voltages, and there are so
many of them)
11
Defining Reliability and Availability

• A system toggles between

 Service accomplishment: service matches specifications
 Service interruption: services deviates from specs

• The toggle is caused by failures and restorations

• Reliability measures continuous service accomplishment

and is usually expressed as mean time to failure (MTTF)

• Availability measures fraction of time that service matches

specifications, expressed as MTTF / (MTTF + MTTR)

12
Cost

• Cost is determined by many factors: volume, yield,

manufacturing maturity, processing steps, etc.

• One important determinant: area of the chip

• Small area  more chips per wafer

• Small area  one defect leads us to discard a small-area

chip, i.e., yield goes up

• Roughly speaking, half the area  one-third the cost

13
Measuring Performance

• Two primary metrics: wall clock time (response time for a

program) and throughput (jobs performed in unit time)

• To optimize throughput, must ensure that there is minimal

waste of resources

14
Benchmark Suites

• Performance is measured with benchmark suites: a

collection of programs that are likely relevant to the user

 SPEC CPU 2006: cpu-oriented programs (for desktops)

 SPECweb, TPC: throughput-oriented (for servers)
 EEMBC: for embedded processors/workloads

15
Summarizing Performance

• Consider 25 programs from a benchmark set – how do

we capture the behavior of all 25 programs with a
single number?
P1 P2 P3
Sys-A 10 8 25
Sys-B 12 9 20
Sys-C 8 8 30

 Sum of execution times (AM)

 Sum of weighted execution times (AM)
 Geometric mean of execution times (GM)

16
Problem 4

• Consider 3 programs from a benchmark set. Assume that

system-A is the reference machine. How does the
performance of system-C compare against that of
system-B (for all 3 metrics)?
P1 P2 P3
Sys-A 5 10 20
Sys-B 6 8 18
Sys-C 7 9 14

 Sum of execution times (AM)

 Sum of weighted execution times (AM)
 Geometric mean of execution times (GM)
17
Problem 4

• Consider 3 programs from a benchmark set. Assume that

system-A is the reference machine. How does the
performance of system-C compare against that of
system-B (for all 3 metrics)?
P1 P2 P3 S.E.T S.W.E.T GM
Sys-A 5 10 20 35 3 10
Sys-B 6 8 18 32 2.9 9.5
Sys-C 7 9 14 30 3 9.6

 Relative to C, B provides a speedup of 1.03 (S.W.E.T)

or 1.01 (GM) or 0.94 (S.E.T)
 Relative to C, B reduces execution time by
3.3% (S.W.E.T) or 1% (GM) or -6.7% (S.E.T)
18
Sum of Weighted Exec Times – Example

• We fixed a reference machine X and ran 4 programs

A, B, C, D on it such that each program ran for 1 second

• The exact same workload (the four programs execute

the same number of instructions that they did on
machine X) is run on a new machine Y and the
execution times for each program are 0.8, 1.1, 0.5, 2

• With AM of normalized execution times, we can conclude

that Y is 1.1 times slower than X – perhaps, not for all
workloads, but definitely for one specific workload (where
all programs run on the ref-machine for an equal #cycles)

19
GM Example

Computer-A Computer-B Computer-C

P1 1 sec 10 secs 20 secs
P2 1000 secs 100 secs 20 secs

Conclusion with GMs: (i) A=B

(ii) C is ~1.6 times faster

• For (i) to be true, P1 must occur 100 times for every

occurrence of P2

• With the above assumption, (ii) is no longer true

Hence, GM can lead to inconsistencies

20
Summarizing Performance

• GM: does not require a reference machine, but does

not predict performance very well
 So we multiplied execution times and determined
that sys-A is 1.2x faster…but on what workload?

• AM: does predict performance for a specific workload,

but that workload was determined by executing
programs on a reference machine
 Every year or so, the reference machine will have
to be updated

21
CPU Performance Equation

• Clock cycle time = 1 / clock speed

• CPU time = clock cycle time x cycles per instruction x

number of instructions

• Influencing factors for each:

 clock cycle time: technology and pipeline
 CPI: architecture and instruction set design
 instruction count: instruction set design and compiler

• CPI (cycles per instruction) or IPC (instructions per cycle)

can not be accurately estimated analytically
22
Problem 5

• My new laptop has an IPC that is 20% worse than my old

laptop. It has a clock speed that is 30% higher than the old
laptop. I’m running the same binaries on both machines.
What speedup is my new laptop providing?

23
Problem 5

• My new laptop has an IPC that is 20% worse than my old

laptop. It has a clock speed that is 30% higher than the old
laptop. I’m running the same binaries on both machines.
What speedup is my new laptop providing?

Exec time = cycle time * CPI * instrs

Perf = clock speed * IPC / instrs
Speedup = new perf / old perf
= new clock speed * new IPC / old clock speed * old IPC
= 1.3 * 0.8 = 1.04

24
An Alternative Perspective - I

• Each program is assumed to run for an equal number

of cycles, so we’re fair to each program

• The number of instructions executed per cycle is a

measure of how well a program is doing on a system

• The appropriate summary measure is sum of IPCs or

AM of IPCs = 1.2 instr + 1.8 instr + 0.5 instr
cyc cyc cyc

• This measure implicitly assumes that 1 instr in prog-A

has the same importance as 1 instr in prog-B
25
An Alternative Perspective - II

• Each program is assumed to run for an equal number

of instructions, so we’re fair to each program

• The number of cycles required per instruction is a

measure of how well a program is doing on a system

• The appropriate summary measure is sum of CPIs or

AM of CPIs = 0.8 cyc + 0.6 cyc + 2.0 cyc
instr instr instr

• This measure implicitly assumes that 1 instr in prog-A

has the same importance as 1 instr in prog-B
26
AM and HM

• Note that AM of IPCs = 1 / HM of CPIs and

AM of CPIs = 1 / HM of IPCs

• So if the programs in a benchmark suite are weighted

such that each runs for an equal number of cycles, then
AM of IPCs or HM of CPIs are both appropriate measures

• If the programs in a benchmark suite are weighted such

that each runs for an equal number of instructions, then
AM of CPIs or HM of IPCs are both appropriate measures

27
AM vs. GM

• GM of IPCs = 1 / GM of CPIs

• AM of IPCs represents thruput for a workload where each

program runs sequentially for 1 cycle each; but high-IPC
programs contribute more to the AM

• GM of IPCs does not represent run-time for any real

workload (what does it mean to multiply instructions?); but
every program’s IPC contributes equally to the final measure

28
Problem 6

• My new laptop has a clock speed that is 30% higher than

the old laptop. I’m running the same binaries on both
machines. Their IPCs are listed below. I run the binaries
such that each binary gets an equal share of CPU time.
What speedup is my new laptop providing?
P1 P2 P3
Old-IPC 1.2 1.6 2.0
New-IPC 1.6 1.6 1.6

29
Problem 6

• My new laptop has a clock speed that is 30% higher than

AM of IPCs is the right measure. Could have also used GM.

Speedup with AM would be 1.3.

30
Speedup Vs. Percentage

• “Speedup” is a ratio = old exec time / new exec time

• “Improvement”, “Increase”, “Decrease” usually refer to

percentage relative to the baseline
= (new perf – old perf) / old perf

• A program ran in 100 seconds on my old laptop and in 70

seconds on my new laptop
 What is the speedup?
 What is the percentage increase in performance?
 What is the reduction in execution time?

31
Speedup Vs. Percentage

• “Speedup” is a ratio = old exec time / new exec time

• “Improvement”, “Increase”, “Decrease” usually refer to

percentage relative to the baseline
= (new perf – old perf) / old perf

• A program ran in 100 seconds on my old laptop and in 70

seconds on my new laptop
 What is the speedup? (1/70) / (1/100) = 1.42
 What is the percentage increase in performance?
( 1/70 – 1/100 ) / (1/100) = 42%
 What is the reduction in execution time? 30%
32
Title

• Bullet

RADAR 1100 Series: Technical Manual
No ratings yet
RADAR 1100 Series: Technical Manual
212 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Lecture: Metrics To Evaluate Performance
No ratings yet
Lecture: Metrics To Evaluate Performance
15 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Module 2 [26-10-2024]
No ratings yet
Module 2 [26-10-2024]
50 pages
Advanced Computer Architecture: 563 L02.1 Fall 2011
No ratings yet
Advanced Computer Architecture: 563 L02.1 Fall 2011
57 pages
Performance Numericals
No ratings yet
Performance Numericals
24 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
DA_CI
No ratings yet
DA_CI
13 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
CCS 1202 Lecture 2_Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2_Computer Evolution and Performance
32 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Lecture4 Performance Evaluation
No ratings yet
Lecture4 Performance Evaluation
34 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
28 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
Performance
No ratings yet
Performance
51 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Measuring and Reasoning About Performance: Readings: 1.4-1.5
No ratings yet
Measuring and Reasoning About Performance: Readings: 1.4-1.5
26 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
No ratings yet
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
41 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Intro
No ratings yet
Intro
14 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
18 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Designing For Performance - Performance Metrics
No ratings yet
Designing For Performance - Performance Metrics
19 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Performance Matrices
No ratings yet
Performance Matrices
14 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Performance Issues
No ratings yet
Performance Issues
19 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
Ca02 2014 PDF
No ratings yet
Ca02 2014 PDF
79 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
A Case Study for a Single-Phase Inverter Photovoltaic System of a Three-Bedroom Apartment Located in Alexandria, Egypt: building industry, #0
From Everand
A Case Study for a Single-Phase Inverter Photovoltaic System of a Three-Bedroom Apartment Located in Alexandria, Egypt: building industry, #0
Ahmed Paridie
No ratings yet
Foreign Literature Efficient Home Security System Based On Biometrics and Keypad System
No ratings yet
Foreign Literature Efficient Home Security System Based On Biometrics and Keypad System
32 pages
H3C CR16000-F Core Routers: Data Sheet
No ratings yet
H3C CR16000-F Core Routers: Data Sheet
10 pages
38788d d7038 spec sheet
No ratings yet
38788d d7038 spec sheet
2 pages
Power Supply Unit (PCA-N3060-PSU)
No ratings yet
Power Supply Unit (PCA-N3060-PSU)
2 pages
ECE545 Lecture 0 Introduction
No ratings yet
ECE545 Lecture 0 Introduction
77 pages
Esd Unit 3,4,5
No ratings yet
Esd Unit 3,4,5
67 pages
A New Capacitance Multiplier Structure With High
No ratings yet
A New Capacitance Multiplier Structure With High
10 pages
Fyber Dealer List
No ratings yet
Fyber Dealer List
2 pages
Download full (eBook PDF) Modeling Nanowire and Double-Gate Junctionless Field-Effect Transistors ebook all chapters
100% (6)
Download full (eBook PDF) Modeling Nanowire and Double-Gate Junctionless Field-Effect Transistors ebook all chapters
56 pages
Assignment 1 Generations of Computers
No ratings yet
Assignment 1 Generations of Computers
8 pages
VIPER12A
No ratings yet
VIPER12A
1 page
Videobadge Vb-400: The Videobadge Vb-400 Body-Worn Camera Is Designed For 21St Century Public Safety and Security
No ratings yet
Videobadge Vb-400: The Videobadge Vb-400 Body-Worn Camera Is Designed For 21St Century Public Safety and Security
8 pages
Hardware Function - Cpu
No ratings yet
Hardware Function - Cpu
42 pages
Unit 5 and 6 Mcqs
No ratings yet
Unit 5 and 6 Mcqs
4 pages
Proceedings of Spie: A Novel Spice Model of Photodetector For OEIC Design
No ratings yet
Proceedings of Spie: A Novel Spice Model of Photodetector For OEIC Design
9 pages
Cat4-2 en Web
No ratings yet
Cat4-2 en Web
302 pages
Kannad Aviation Install E-Prog and ManageYourBeacon
No ratings yet
Kannad Aviation Install E-Prog and ManageYourBeacon
7 pages
Alternators Ref. Guide DR
No ratings yet
Alternators Ref. Guide DR
192 pages
SGM2-450W (Rigid Panel)
No ratings yet
SGM2-450W (Rigid Panel)
1 page
Pure Sine Wave Inverter Circuit Using Arduino
No ratings yet
Pure Sine Wave Inverter Circuit Using Arduino
17 pages
Linux Kernel Internals: Research
No ratings yet
Linux Kernel Internals: Research
16 pages
Computer Organization and Microprocessor-P1
0% (1)
Computer Organization and Microprocessor-P1
26 pages
Chs Assignment 1
No ratings yet
Chs Assignment 1
7 pages
CP Notes CHPTR 1
No ratings yet
CP Notes CHPTR 1
27 pages
Kermit Manual
No ratings yet
Kermit Manual
1 page
Module 1 TEACHING ICT EXPLORATORY
No ratings yet
Module 1 TEACHING ICT EXPLORATORY
25 pages
Hard Ware (Parts of A Computer)
No ratings yet
Hard Ware (Parts of A Computer)
306 pages
N-Channel 60 V (D-S) MOSFET: Features Product Summary
No ratings yet
N-Channel 60 V (D-S) MOSFET: Features Product Summary
7 pages
4.iot Based Smart Agriculture Monitoring System
No ratings yet
4.iot Based Smart Agriculture Monitoring System
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.