0% found this document useful (0 votes)

6 views

HPC Lectures 1 5

There are two main types of high performance computing (HPC): clusters and supercomputers. Clusters are groups of interconnected computers that work together, while supercomputers provide high performance chips and processors that can solve problems faster. HPC uses parallel processing to improve computing performance and handle increasingly large datasets through techniques like distributed and shared memory architectures. Key algorithms like sorting and searching are important for optimizing HPC applications.

Uploaded by

mohamed samy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

HPC Lectures 1 5

Uploaded by

mohamed samy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

High performance computing

Lectures
BY ME @😁🪄
Introduction to HPC
🪄

The ability to process data and perform complex calculations efficiently,

reliably and at high speeds.
Uses supercomputers and computer clusters to solve advanced
What is HPC? computation problems.
Is the practice of using parallel data processing to improve computing
performance and perform complex calculations.

As technologies like the Internet of Things (IOT), artificial intelligence (AI),

Why HPC is
and 3-D imaging evolve (3 HPC Application). The size and amount of data
important? that organizations have to work with is growing exponentially.

Healthcare , Urban planning

Top HPC
Engineering , Finance, and business
Industries Aerospace

a computer with a high level of performance as compared to a general-

purpose computer.
The performance of a supercomputer is commonly measured in floating-
point operations per second (FLOPS) : which Is a measure of a computer's
What is performance based on the number of floating-point arithmetic calculations
Supercomputer? that the processor can perform within a second.
Since 2017, there have existed supercomputers which can perform over
(1017) FLOPS.
For comparison, a desktop computer has performance in the range of
hundreds of giga-FLOPS (1011) to tens of tera-FLOPS (1013).

Fields of using computational science, weather forecasting, climate research, oil and gas
Supercomputers exploration, molecular modeling, nuclear weapons, nuclear fusion.

A computer cluster is a group of two or more computers, or nodes, that

run in parallel to achieve a common goal.
A cluster is a group of inter-connected computers or hosts that work
What is Cluster?
together to support applications.
Big Companies prefer using Clusters instead of supercomputers
because they have large number of users. (Note)
• The performance of a program depends on the effectiveness of :
1. Algorithms.
2. Software System (OS/compiler).
3. The computer executes the machine instructions (processor and memory).
4. I/O systems

- Algorithms: An algorithm is a step-wise representation of a solution to a given problem.

• Searching Algorithms:
1- Linear Search :
• is a very basic and simple search algorithm.
• In Linear search, we search for an element or value in a given array by traversing the
array from the starting, till the desired element or value is found.
2- Binary Search :
• It is mandatory for the target array to be sorted.
• In Linear search, First, we shall determine half of the array by using this formula - mid =
low + (high - low) / 2 , and then decide which part should we search in base on the
value is grater or less than the mid.
• Sorting Algorithms :
1- The selection sort :
• For ascending order, a selection sort looks for the largest value as it makes a pass and,
after completing the pass, places it in the proper location.
Advantages Dis-advantages
▪ Implementation is very easy. ▪ Slow
▪ suitable for small lists ordering ▪ Blind algorithm, If the list reach the goal at
▪ Sort in-place, doesn’t need extra memories, any time, it doesn’t terminate and perform
as it makes swapping numbers in their the all steps.
locations.

2- The bubble sort:

• Make multiple passes through a list. It compares adjacent items and exchanges those
that are out of order.
Advantages Dis-advantages
▪ Simple, suitable for small lists ordering. ▪ Slow
▪ Sort in-place,
▪ Not blind, smart, when it reaches sorting at
any time, when it doesn’t make any
swapping at any time, it stops and gets the
right order.
Set Of Comparisons :
Algorithm Linear Search Binary Search Selection Sort Bubble Sort
Big(O) O(n) O(log n) O(n2) O(n2)

Main types of HPC

The HPC clusters Supercomputer
• Clusters of servers interconnected using a • Providing high performance chips.
high-speed connection. • Providing processors that work on large
• Its sizes range from tens of servers to tens of amount of data (Vectorization).
thousands of servers. (10-10,000) • Providing more than 1 CPU on the same
• Solving the problems by dividing them into chip.
smaller problems (Divide and Conquer). • CPUs on the same chip share the address
• The most popular from of HPC. space (Memory).
• Solving the problems faster.
Sequential and Parallel computing
🪄

Sequential computing Parallel computing

Traditionally, software has been written for
serial computation:
A parallel computer is a computer consisting
— To be run on a single computer having a
of :
single Central Processing Unit(CPU).
—two or more processors that can cooperate
— A problem is broken into a discrete series
and communicate to solve large problem fast
of instmctions.
—one or more memory modules
— Instructions are executed one after
another. —an interconnection network that connects
processors with each Other and/or with the
— Only one instruction may execute at any
memory modules.
moment in time.

Example of parallelizable problems

• Calculate the potential energy of several thousand and independent of a molecule. When
done, find the minimum energy conformation. (molecule here is independent similar to
matrix item)
o The problem is able to be solved in parallel. Each of the molecular conformation is
independently determined.
o The calculation of the minimum energy conformation is also parallelizable problem.

Example of non-parallelizable problems

• Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,……) using the formula:
F(K+2)=F(K+1)+F(K)
• This is non-parallelizable problem because the calculation as shown would entail dependent
calculations rather than independent ones.
Amdahl’s Law
_________________________________________________________________________________

• Used to predict maximum speed up using multiple processors.

𝑙𝑒𝑡 𝑓 = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑤𝑜𝑟𝑘 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑒𝑑 𝑠𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙𝑙𝑦 , 𝑡ℎ𝑒𝑛

(1 − 𝑓 ) = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑤𝑜𝑟𝑘 𝑡ℎ𝑎𝑡 𝑖𝑠 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑧𝑎𝑏𝑙𝑒
𝑃 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠
𝑜𝑛 1 𝑐𝑝𝑢 ∶ 𝑇1 = 𝑓 + (1 − 𝑓) = 1
(1 − 𝑓)
𝑜𝑛 𝑃 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠 ∶ 𝑇𝑝 = 𝑓 +
𝑝
𝑇1 1 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝: = <
𝑇𝑝 (1 − 𝑓)⁄ 𝑓
𝑓+ 𝑝

Foster’s Design methodology

_________________________________________________________________________________

Problem → Partition → Communicate → Agglomerate → Map

Parallel computer memory Architecture
_________________________________________________________________________________

Shared memory :
• Multiple processors can operate Independently but share the
same memory resources.
• Changes in a memory location effected by one processor are
visible to all other processors (global address space).

Distributed memory :
• Requires a communication Network to connect.
• Processors have their own memory, so
✓ it operates independently.
✓ Changes it makes to its local memory have no
effect on the memory of other processor.
• When a processor needs to access data in another processor, it is the task of the programmer
to define how and when.

shared memory distributed memory

Global adress space Memory is scalable with number of
processor .
lack of scalabilty between memory and
number of CPUs connected to it.
each processor can rapidly access own
memory.
Adding more CPUs increase traffic.

programmer responsssible for specifying Need strong security system.

snchronization ( read & write )

Hybrid Distributed-Shared Memory(DSM)

• Increased scalability is an important advantage.
• Increased programmer complexity is an important
disadvantage.
Data Parallelizm – Flynn's Classical Taxonomy
_________________________________________________________________________________

• Flynn’s taxonomy distinguishes multi-processor computer architectures according to:

o how they can be classified along the two independent dimensions of Instruction
Stream and Date Stream.

Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD)
• A serial (non-parallel) computer • A type of parallel computer
• Single Instruction: Only one instruction • Single Instruction: All processing units
stream is being acted on by the CPU execute the same instruction at any
during any one clock cycle. given clock cycle .
• Single Data: Only one data stream is being • Multiple Data: Each processing unit can
used as input during any one clock cycle. operate on a different data element.
• Deterministic execution
• This is the oldest type of computer

Multiple Instruction, Single Data (MISD) Multiple Instruction, Multiple Data (MIMD)
• A type of parallel computer • A type of parallel computer
• Multiple Instruction: Each processing unit • Multiple Instruction: Every processor
operates on the data independently via may be executing a different instruction
separate instruction streams. stream.
• Single Data: A single data stream is fed • Multiple Data: Every processor may be
into multiple processing units. working with a different data stream
Amdahl’s law
🪄

• If we have a percentage of parallelism in the program (part serial and part parallel).
1
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 𝑝
1−𝑝+( )
𝑛
o 𝑛 → 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑒𝑠
o 𝑝 → 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑖𝑛 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑤𝑜𝑟𝑘 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙
o ℎ𝑖𝑛𝑡 ∶ (1 − 𝑝) → 𝑝𝑒𝑟𝑠𝑒𝑛𝑡𝑎𝑔𝑒 𝑜𝑓 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑤𝑜𝑟𝑘 𝑠𝑒𝑟𝑖𝑎𝑙
Ex.1 on Amdahl’s law
• Using 3 cores , a 75% a parallized program,
o (1 − 𝑝) → 𝑝𝑒𝑟𝑠𝑒𝑛𝑡𝑎𝑔𝑒 𝑜𝑓 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑤𝑜𝑟𝑘 𝑠𝑒𝑟𝑖𝑎𝑙 = 1 − 0.75 = 0.25
o 𝑝 → 75 % , 𝑛 → 3
1
o 𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 0.75 =2
0.25+( )
3

• The parallelizable portion runs 3 times faster with 3 cores compared to 1 core.
• The non-parallelizable portion runs at the same speed on 1 core.
• When you combine these two parts, the overall speedup is 2 times. This is because you still
have that 25% of the work that cannot be speed up by adding more cores, which limits the
overall speedup to a maximum of 2 times in this scenario.
Ex.2 : Which is better ?
• 10% paralellized program on 90-cores Or 90% paralellized program on 10-cores ??
For Scenario 1 (10% parallelized program on 90 cores): For Scenario 2 (90% parallelized program on 10 cores):
Speedup (S1) = 1 / [(0.90 + (0.10 / 90)] ≈ 1.1111 Speedup (S2) = 1 / [ 0.1 + (0.90 / 10)] ≈ 5.2632

• If you have two options and one has more tasks running at the same time (parallelization)
and the other has more workers (cores), go for the one with more tasks running at the same
time if you want things to get done faster. This is because having more tasks running at the
same time can often be more efficient than just having more workers.
• So, the smart choice is to focus on maximizing the degree of parallelization in your program
to make the most of available computational resources and achieve better performance.
Effectiveness of parallel processing
_________________________________________________________________________________

• 𝑝 → 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠
• 𝑊(𝑝) → 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠
• 𝑇(𝑝) → 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑢𝑠𝑖𝑛𝑔 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠

Speed up Efficiency Redundancy Utilization Quality

𝑇(1) 𝑇(1) 𝑤(𝑝) 𝑤(𝑝) 𝑇 3 (1)
𝑠(𝑝) = 𝐸(𝑝) = 𝑅(𝑝) = 𝑈(𝑝) = 𝑄(𝑝) =
𝑇(𝑝) 𝑝 ∗ 𝑇(𝑝) 𝑤(1) 𝑝 ∗ 𝑇(1) 𝑝 ∗ 𝑇 2 (𝑝) ∗ 𝑤(𝑝)
𝑠𝑒𝑟𝑖 𝑠(𝑝)
𝑠(𝑝) = 𝐸(𝑝) =
𝑝𝑎𝑟𝑎 𝑝
Example of measuring efficiency
Multiprocessor Interconnection networks
🪄

Mode of operation
Synchronous Asynchronous
• a single global clock is used by all • No global clock required.
components in the system(lock-step • Hand shaking signals are used to coordinate
manner). the operation of asynchronous systems.
Control strategy
Centralized Decentralized
• one central control unit is used to control
• The control function is distributed among
the operations of the components of the
different components in the system.
system.
Switching Techniques
Circuit switching Paket switching
• a complete path has to be established prior • communication between a source and a
to the start of communication between a destination takes place via messages divided
source and a destination. into smaller entities, called packets
Topology
• Describes how to connect processors and memories to other processors and memories.
Static Dynamic
direct fixed links are established among nodes
connections are established when needed.
to form a fixed network.
- Have a fixed path.
- Uni-direction or bi-direction between
processors .
- number of links : O(N2)
- delay complexity : O(1) completely connected network

Static INs

Dynamic INs
- Simplest way to connect multiprocessor systems.
- The use of local caches reduces the processor- memory
traffic.
- Size of such system varies between 2 and 50 processors.
- Single bus multiprocessors are inherently limited by:
1. Bandwidth of bus.

Single
2. 1 processor can access the bus.
3. 1 memory access can take place at any given time.
Bus-Based Dynamic INs

- Several parallel buses to interconnect multiple processors

and multiple memory modules.
- Many connection schemes are possible.
• Multiple Bus with Full Bus — Memory Connection
Multiple

(MBFBMC).
• Multiple Bus with Single Bus — Memory Connection
(MBSBMC).
• Multiple Bus with Partial Bus — Memory Connection
(MBPBMC).
• Multiple Bus with Class-based Bus — Memory Connection
(MBCBMC).
• Provide simultaneous
connections among all its
inputs and all its outputs.
Switched-based INs.

• A Switching Element (SE) is

at the intersection of any 2
Crossbar

lines extended horizontally or

vertically inside the switch.
• It is a non-blocking
network allowing multiple
input- output connection
pattern to be achieved
simultaneously.
Symmetric and asymmetric multiprocessors
Symmetric Asymmetric
• All processors have equal access to all • One processor (master) executes the
peripheral devices. operating system.
• All processors are identical. • Other processors may be of different types
and may be dedicated to special tasks.

Parallel Computer Memory Architectures

• Two (Three) categories of parallel computers are distinguished on the basis of:
o Shared Memory:
1) Uniform Memory Access (UNIA)
2) Non-Uniform Memory Access (NUNIA)
3) Cache Only Memory Access (CONIA)
o Distributed Memory
o Hybrid Memory

Shared Memory
General Characteristics:
o Shared memory parallel computers vary widely, but generally have in common the ability
for all processors to access all memory as global address space.
o Multiple processors can operate independently but share the same memory resources.
o Changes in a memory location effected by one processor are visible to all other processors.
Uniform Memory Access Non-Uniform Memory Access Cache Only Memory Access
(UMA) (NUMA) (COMA)
• Most commonly represented today • Often made by physically linking • The Cache-Only Memory
by Symmetric Multiprocessor (SMP) two or more SMPs. Architecture (COMA) increases the
machines. • One SMP can directly access chances of data being available
• Identical processors - Equal access memory of another SNP. locally because the hardware
and access times to memory • Not all processors have equal transparently replicates the data
Sometimes called Cache Coherent access tune to all memories. and migrates it to the memory
UMA (CC-UMA). • Memory access across link is module of the node that is
• Cache Coherent means if one slower. currently assessing it.
processor updates a location in • If cache coherency is maintained, • Each memory module acts as a
shared memory, all the other then may also be called CC-NUMA. huge cache memory in which each
processors know about the block has a tag with the address
update. and the state.
• Cache Coherent is accomplished at • Data can be migrated or replicated
the hardware level. in the various memory banks of
the central main memory.
MIMD processing

Tightly coupled multi-processors Loosely coupled multiprocessors

• Shared global memory address space. • No shared global memory address space.
• Traditional multiprocessing: symmetric • Multicomputer network : “Network-based
multiprocessing (SMP) multiprocessor”
• Programming model similar to • usually programmed via message passing:
uniprocessors. “Explicit calls(send , receive) for
• Operations on shared data require communication”.
synchronization.

Why the Sequential Bottleneck?

• Parallel machines have a sequential bottleneck.
• Main cause: Non-parallelizable operations on data (e.g. non-parallelizable loops)
for ( i = 0 ; i < N; i++)
A[i] = (A[i] + A[i-1]) / 2
• Single thread prepares data and spawns parallel tasks (usually sequential)

Task Assignments

Static Assignment Dynamic Assignment

• No movement of tasks • Efficient: better utilizes resources when load
• Inefficient : underutilizes resources when is not balanced.
load is balanced “all processors take the • Ex: Compute histogram “distribution” of a
same number of instructions” large set of values.
• Ex: multiplying matrices to zeros or ones • counting the number of occurancy of
faster than multiplying to any other number. specific words in a book.
• Each task could count the occurancy of each
word in a set of pages. Some pages may be
empty or have little statements, so they will
be faster
Parallel speedup example
𝑅 = 𝑎4 𝑥 4 + 𝑎3 𝑥 3 + 𝑎2 𝑥 2 + 𝑎1 𝑥 + 𝑎0

• Assume each operation 1 second, no communication cost.

• How fast is this with a single processor?
• How fast is this with 3 processors?

Transferable Skills Resume Example
No ratings yet
Transferable Skills Resume Example
2 pages
Cheapassignmenthelp Co Uk
No ratings yet
Cheapassignmenthelp Co Uk
9 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Parallel and Distributed Computing Complete Notes
No ratings yet
Parallel and Distributed Computing Complete Notes
41 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
I Notes
No ratings yet
I Notes
27 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
CC_UNIT 1
No ratings yet
CC_UNIT 1
29 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
UNIT 3
No ratings yet
UNIT 3
46 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Co-1 (2)
No ratings yet
Co-1 (2)
66 pages
Parallel and distributed computing
No ratings yet
Parallel and distributed computing
16 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
PC 1
No ratings yet
PC 1
53 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Flynns
No ratings yet
Flynns
41 pages
Lecture 2 Introduction to Parallel and Distributed Computing
No ratings yet
Lecture 2 Introduction to Parallel and Distributed Computing
29 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
SMM Cap1
No ratings yet
SMM Cap1
101 pages
Technical Seminar Report On: "High Performance Computing"
No ratings yet
Technical Seminar Report On: "High Performance Computing"
14 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
CC Unit-1
No ratings yet
CC Unit-1
17 pages
multicore02-2
No ratings yet
multicore02-2
18 pages
Unit2_a
No ratings yet
Unit2_a
70 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
922 Final Paper v4
No ratings yet
922 Final Paper v4
11 pages
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
Lect 9
No ratings yet
Lect 9
17 pages
Security Lec8 Slides
No ratings yet
Security Lec8 Slides
18 pages
HCI Unit 2 (3rd Final)
No ratings yet
HCI Unit 2 (3rd Final)
105 pages
Advanced x86, - BIOS and SMM - Internals - PCI
No ratings yet
Advanced x86, - BIOS and SMM - Internals - PCI
54 pages
26.1.7 Lab - Snort and Firewall Rules
No ratings yet
26.1.7 Lab - Snort and Firewall Rules
17 pages
Motor Protection Relays - Catalogue
No ratings yet
Motor Protection Relays - Catalogue
16 pages
1-1 Intro
No ratings yet
1-1 Intro
20 pages
S5-MATHs PP
No ratings yet
S5-MATHs PP
10 pages
Comp7 - Quarter 3 Module 2-3
No ratings yet
Comp7 - Quarter 3 Module 2-3
5 pages
556 (Set-A)
No ratings yet
556 (Set-A)
32 pages
Sistema de Visão Keyence
No ratings yet
Sistema de Visão Keyence
38 pages
An Intelligent Multiple Vehicle Detection and Tracking Using Modified Vibe Algorithm and Deep Learning Algorithm
No ratings yet
An Intelligent Multiple Vehicle Detection and Tracking Using Modified Vibe Algorithm and Deep Learning Algorithm
13 pages
HP Online To Store Case Study - Case Studies PDF
No ratings yet
HP Online To Store Case Study - Case Studies PDF
5 pages
Exell Formule PT ARH
No ratings yet
Exell Formule PT ARH
3 pages
User-Agents Chrome Browser Android 1k
No ratings yet
User-Agents Chrome Browser Android 1k
34 pages
Compare and Contrast OSI and TCP-IP Models
No ratings yet
Compare and Contrast OSI and TCP-IP Models
28 pages
Kernel Checklist MR14
No ratings yet
Kernel Checklist MR14
12 pages
Functional Swift by Chris Eidhof Florian Kugler Wouter Swierstra
100% (1)
Functional Swift by Chris Eidhof Florian Kugler Wouter Swierstra
194 pages
Instruction Manual DS400 EN
No ratings yet
Instruction Manual DS400 EN
117 pages
En 61000-6-3 - 2007 PDF
100% (1)
En 61000-6-3 - 2007 PDF
20 pages
HIL-DSP-100-Interface
No ratings yet
HIL-DSP-100-Interface
8 pages
Advanced IP Services - Lecture Notes
No ratings yet
Advanced IP Services - Lecture Notes
47 pages
Eaton Power Xpert Gateway UPS Card Quick Start Instructions
No ratings yet
Eaton Power Xpert Gateway UPS Card Quick Start Instructions
32 pages
Windows XP
No ratings yet
Windows XP
12 pages
MASAR - (Smart Cities)
No ratings yet
MASAR - (Smart Cities)
30 pages
DCIM Sneijder - IT Energy Consumption Using DCIM
No ratings yet
DCIM Sneijder - IT Energy Consumption Using DCIM
11 pages
HDSD-ENG-VISION Pro
No ratings yet
HDSD-ENG-VISION Pro
51 pages
H Arctis 7+ PIG Web
No ratings yet
H Arctis 7+ PIG Web
27 pages
CV Nguyen Phi Nghia HW Engineer 20220915
No ratings yet
CV Nguyen Phi Nghia HW Engineer 20220915
1 page
Freston Analytics - Company Profile
No ratings yet
Freston Analytics - Company Profile
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HPC Lectures 1 5

Uploaded by

HPC Lectures 1 5

Uploaded by

High performance computing

The ability to process data and perform complex calculations efficiently,

As technologies like the Internet of Things (IOT), artificial intelligence (AI),

Healthcare , Urban planning

a computer with a high level of performance as compared to a general-

A computer cluster is a group of two or more computers, or nodes, that

- Algorithms: An algorithm is a step-wise representation of a solution to a given problem.

2- The bubble sort:

Main types of HPC

Sequential computing Parallel computing

Example of parallelizable problems

Example of non-parallelizable problems

• Used to predict maximum speed up using multiple processors.

𝑙𝑒𝑡 𝑓 = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑤𝑜𝑟𝑘 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑒𝑑 𝑠𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙𝑙𝑦 , 𝑡ℎ𝑒𝑛

Foster’s Design methodology

Problem → Partition → Communicate → Agglomerate → Map

shared memory distributed memory

programmer responsssible for specifying Need strong security system.

Hybrid Distributed-Shared Memory(DSM)

• Flynn’s taxonomy distinguishes multi-processor computer architectures according to:

Speed up Efficiency Redundancy Utilization Quality

- Several parallel buses to interconnect multiple processors

• A Switching Element (SE) is

lines extended horizontally or

Parallel Computer Memory Architectures

Tightly coupled multi-processors Loosely coupled multiprocessors

Why the Sequential Bottleneck?

Static Assignment Dynamic Assignment

• Assume each operation 1 second, no communication cost.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.