Performance Issues
Performance Issues
FACULTY OF SCIENCES
DEPARTMENT OF COMPUTER SCIENCES
ICT4D L3
Performance Issues in
Computer Organization
Author:
CHE SWANSEN S.
5 Performance Laws 10
5.1 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Benchmarking 10
6.1 What is Benchmarking? . . . . . . . . . . . . . . . . . . . 10
6.2 Why is Benchmarking Important? . . . . . . . . . . . . . . 11
6.3 Types of Benchmarks . . . . . . . . . . . . . . . . . . . . . 11
6.3.1 Synthetic Benchmarks . . . . . . . . . . . . . . . . 11
6.3.2 Real-World Benchmarks . . . . . . . . . . . . . . . 12
6.3.3 Component-Specific Benchmarks . . . . . . . . . . 12
6.4 Key Benchmarking Metrics . . . . . . . . . . . . . . . . . . 12
6.5 Benchmarking Tools and Suites . . . . . . . . . . . . . . . 13
6.6 Challenges in Benchmarking . . . . . . . . . . . . . . . . . 13
6.7 SPEC Benchmarks: An Example . . . . . . . . . . . . . . 13
6.8 Calculating the Mean . . . . . . . . . . . . . . . . . . . . . 13
6.8.1 Arithmetic Mean . . . . . . . . . . . . . . . . . . . 14
6.8.2 Harmonic Mean . . . . . . . . . . . . . . . . . . . . 14
1
6.8.3 Geometric Mean . . . . . . . . . . . . . . . . . . . 15
6.8.4 Comparison of Means . . . . . . . . . . . . . . . . . 15
7 Exercises 16
7.1 Basic Measures of Computer Performance . . . . . . . . . 16
7.2 Factors Affecting Processor Performance . . . . . . . . . . 16
7.3 Instruction Pipelining . . . . . . . . . . . . . . . . . . . . . 17
7.4 Benchmarking and Performance Evaluation . . . . . . . . . 17
7.5 Advanced Topics: Amdahl’s and Little’s Laws . . . . . . . 17
7.6 Designing for Performance . . . . . . . . . . . . . . . . . . 18
7.7 Comprehensive Problem . . . . . . . . . . . . . . . . . . . 18
2
1 Introduction to Performance Issues
Computer architecture plays a crucial role in determining the performance
of computing systems. Performance issues in computer architecture in-
volve understanding the various factors that impact the speed, efficiency,
and cost of executing programs. These factors range from the hardware
design, such as processor architecture and memory organization, to the
software design, including algorithms and compilers. Understanding these
performance issues is essential not only for designing efficient systems but
also for evaluating the trade-offs involved in improving specific aspects of
performance.
The key question in performance analysis is: *How can we design sys-
tems that perform tasks faster and more efficiently while balancing power
consumption, cost, and scalability?* Addressing this question requires a
deep dive into metrics, benchmarks, and architectural innovations that
optimize system performance.
3
2.2 Performance Balance
Achieving performance balance is critical because improving one aspect
of the system often exposes bottlenecks in another. For example, increas-
ing processor speed without addressing memory access delays can result
in the processor waiting for data. This issue, commonly referred to as
the ”memory wall,” demonstrates the need to balance processor speed,
memory bandwidth, and I/O performance.
One approach to achieving balance is through the use of caches. Caches
reduce the time required to access frequently used data, bridging the gap
between processor speed and memory latency. Another example is opti-
mizing the instruction set architecture (ISA) to ensure instructions can be
executed efficiently by the processor.
3.1 Cache
The cache is a small, high-speed memory located within or close to the
processor. Its purpose is to store frequently accessed data and instructions,
4
reducing the time the processor spends fetching data from the slower main
memory (RAM). Caches are organized hierarchically into levels:
• L1 Cache: The smallest and fastest cache is located directly on the
processor core. It stores critical data and instructions.
• L2 Cache: Larger and slower than L1, shared among cores in some
architectures.
• L3 Cache: Even larger and slower, typically shared across all cores
in a multicore processor.
5
not always indicative of real-world performance. Factors like the number
of instructions executed per cycle (IPC) and memory access delays also
significantly impact overall performance.
6
3.6 Address Bus Width
The address bus width defines the maximum amount of memory the pro-
cessor can address. For instance:
• A 64-bit address bus can address 264 memory locations, which trans-
lates to 16 exabytes.
7
Figure 2: Instruction Pipelining
8
However, software must be optimized for multicore systems to realize
their full potential. Tasks that cannot be parallelized, such as sequential
code, may not benefit significantly from additional cores.
9
5 Performance Laws
5.1 Amdahl’s Law
Amdahl’s Law states that the maximum speedup of a system is limited by
the fraction of the task that cannot be parallelized. It is given by:
1
Speedup = P
(1 − P ) + N
L = 10 × 0.2 = 2 requests.
6 Benchmarking
Benchmarking is the process of evaluating the performance of a computer
system, component, or software application by running a set of standard
tests. These tests are designed to measure key performance metrics, such
as speed, throughput, and efficiency. Benchmarking allows comparisons
between systems, identifies performance bottlenecks, and ensures that a
system meets the required performance standards.
10
numerical results that can be used to compare performance across differ-
ent systems or configurations. The results provide valuable insights into
the strengths and weaknesses of a processor, memory subsystem, or entire
system.
For example, benchmarking a processor may involve tests to measure:
• Multithreaded performance.
11
• SPEC CPU: Measures CPU performance for integer and floating-
point operations.
• Linpack: Evaluates floating-point computation performance, com-
monly used to rank supercomputers.
12
6.5 Benchmarking Tools and Suites
Several tools and suites are commonly used for benchmarking:
• SPEC (Standard Performance Evaluation Corporation): A
widely recognized organization that provides benchmarks for CPUs,
memory, and entire systems.
• PassMark: A benchmarking tool that evaluates overall system per-
formance, including CPU, memory, and disk speeds.
• Geekbench: A cross-platform tool that tests single-core and mul-
ticore performance for CPUs and GPUs.
• Cinebench: A popular GPU and CPU benchmark that tests ren-
dering performance.
13
averages, or means, are used. Each type has specific use cases and impli-
cations. Below are the key types of means and their detailed explanations:
Formula: Pn
i=1 xi
Arithmetic Mean =
n
Formula:
n
Harmonic Mean = Pn 1
i=1 xi
Example: Consider three tasks with execution rates of 50 tasks/sec, 100 tasks/sec,
and 150 tasks/sec. The harmonic mean of the rates is:
3 3
Harmonic Mean = 1 1 1 = ≈ 75.68 tasks/sec.
50 + 100 + 150
0.02 + 0.01 + 0.00667
This value reflects the true average rate more accurately than the arith-
metic mean, especially when the rates vary significantly.
14
6.8.3 Geometric Mean
The geometric mean is ideal for summarizing ratios or benchmark scores
across systems. It is particularly useful when the data involves multi-
plicative relationships or growth rates. The geometric mean avoids the
distortion caused by outliers that can affect the arithmetic mean.
Formula: v
u n
uY
n
Geometric Mean = t xi
i=1
15
• Arithmetic Mean:
90 + 110 + 130 100 + 120 + 140
System A: = 110, System B: = 120.
3 3
• Harmonic Mean:
3
System A: 1 1 1 ≈ 106.7,
90 + 110 + 130
3
System B: 1 1 1 ≈ 116.4.
100 + 120 + 140
• Geometric Mean:
√3
√
3
System A: 90 · 110 · 130 ≈ 109.5, System B: 100 · 120 · 140 ≈ 119.1.
7 Exercises
7.1 Basic Measures of Computer Performance
1. A computer runs at a clock speed of 3.0 GHz. If a program requires
6 billion instructions to execute and the average CPI is 2.5, calculate
the total execution time of the program in seconds.
2. Explain the difference between clock speed and CPI. Why do these
two factors together determine the performance of a processor?
3. List three factors other than clock speed that affect the performance
of a processor and briefly explain each.
16
• Data bus width
• Address bus width
2. A processor has SPEC scores of 100, 110, and 120 for three bench-
marks. Calculate the geometric mean of these scores.
17
7.6 Designing for Performance
1. A system designer must choose between increasing clock speed from
2.5 GHz to 3.0 GHz or adding a second processor core. Briefly explain
the trade-offs involved in these decisions.
System A: SPEC scores = [120, 130, 140], System B: SPEC scores = [110, 140, 150].
1. Calculate the geometric mean for both systems and determine which
performs better.
18