0% found this document useful (0 votes)

109 views

PDC - Lecture - No. 3

The document discusses Flynn's taxonomy for classifying computer architectures and parallel computing. It covers the four categories in Flynn's taxonomy: SISD, SIMD, MISD, and MIMD. Various parallel computer architectures are presented including PRAM models and how communication costs are analyzed for different routing techniques like store-and-forward, packet routing, and cut-through routing.

Uploaded by

nauman tariq

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views

PDC - Lecture - No. 3

Uploaded by

nauman tariq

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Parallel and Distributed

Computing

Flynn’s Taxonomy
Agenda

 A Quick Review
 Flynn’s Taxonomy
 SISD
 MISD
 SIMD
 MIMD
 Physical Organization of Parallel Platforms
 PRAM
 Routing techniques and Costs
Quick Review to the Previous Lecture

 Amdahl’s Law of Parallel Speedup

 Purpose, derivation, and examples
 Karp-Flatt Metric
 Finding sequential fraction in the given parallel setup
 Types of Parallelism
 Data-parallelism
 Same operation on different data elements
 Functional-parallelism
 Different independent tasks with different operations on different data
elements can be parallelized
 Pipelining
 Overlapping the instructions in a single instruction cycle to achieve
parallelism
Quick Review to the Previous Lecture

 Multiprocessor
 Centralized multiprocessor
 Distributed multiprocessor
 Shared address space(NUMA) vs Shared memory(UMA)
 Multicomputer
 Asymmetrical
 Symmetrical
 Cluster vs Network of Workstations
 Assigned Reading
 Cache Coherence and snooping
 Branch prediction and issues while pipelining the problem

CS3006 - Fall 2021

Flynn’s Taxonomy

 Widely used architectural classification scheme

 Classifies architectures into four types
 The classification is based on how data and instructions flow
through the cores.
Flynn’s Taxonomy

SISD (Single Instruction Single Data)

 Refers to traditional computer: a serial
architecture

 This architecture includes single core

computers

 Single instruction stream is in

execution at a given time

 Similarly, only one data stream is active

at any time
Example of SISD:
Flynn’s Taxonomy

SIMD (Single Instruction Multiple Data)

 Refers to parallel architecture with multiple
cores
 All the cores execute the same instruction
stream at any time but, data stream is
different for the each.
 Well-suited for the scientific operations
requiring large matrix-vector operations
 Vector computers (Cray vector processing
machine) and Intel co-processing unit
‘MMX’ fall under this category.
 Used with array operations, image processing
and graphics
Example of SIMD:
Flynn’s Taxonomy

MISD (Multiple Instructions Single Data)

 Multiple instruction stream and single data
stream
 A pipeline of multiple independently
executing functional units
 Each operating on a single stream of
data and forwarding results from one to
the next
 Rarely used in practice
 E.g., Systolic arrays : network of primitive
processing elements that pump data.
Example of MISD:
Flynn’s Taxonomy

MIMD (Multiple Instructions Multiple

Data)
 Multiple instruction streams and multiple
data streams
 Different CPUs can simultaneously
execute different instruction streams
manipulating different data
 Most of the modern parallel architectures
fall under this category e.g.,
Multiprocessor and multicomputer
architectures
 Many MIMD architectures include SIMD
executions by default.
Example of MIMD:
Flynn’s Taxonomy

A typical SIMD architecture (a) and a typical MIMD architecture (b).

SIMD-MIMD Comparison

 SIMD computers require less hardware than MIMD computers

(single control unit).
 However, since SIMD processors ae specially designed, they tend
to be expensive and have long design cycles.
 Not all applications are naturally suited to SIMD processors.
 In contrast, platforms supporting the SPMD (Same Program
Multiple Data) paradigm can be built from inexpensive off-the-
shelf components with relatively little effort in a short amount of
time.
 The Term SPMD is close variant of MIMD
Physical Organization of
Parallel Platforms
Architecture of an Ideal Parallel Computer

Parallel Random Access Machine (PRAM)

 An extension to ideal sequential model: random access
machine (RAM)
 PRAMs consist of p processors
 A global memory
 Unbounded size
 Uniformly accessible to all processors with same address space
 Processors share a common clock but may execute different
instructions in each cycle.
 Based on simultaneous memory access mechanisms, PRAM
can further be classified.
Graphical representation of PRAM:
Architecture of an Ideal Parallel Computer

Parallel Random Access Machine (PRAM)

 PRAMs can be divided into four subclasses.
1. Exclusive-read, exclusive-write (EREW) PRAM
 No two processors can perform read/write operations concurrently
 Weakest PRAM model, provides minimum memory access
concurrency
2. Concurrent-read, exclusive-write (CREW) PRAM
 All processors can read concurrently but can’t write at same time
 Multiple write accesses to a memory location are serialized
3. Exclusive-read, concurrent-write (ERCW) PRAM
 No two processors can perform read operations concurrently, but can
write
4. Concurrent-read, concurrent-write (CRCW) PRAM
 Most powerful PRAM model
Architecture of an Ideal Parallel Computer

Parallel Random Access Machine (PRAM)

 Concurrent reads do not create any semantic inconsistencies

 But, What about concurrent write?

 Need of an arbitration(mediation) mechanism to resolve

concurrent write access
Architecture of an Ideal Parallel Computer

Parallel Random Access Machine (PRAM)

 Mostly used arbitration protocols: -
 Common: write only if all values that the processors are
attempting to write are identical
 Arbitrary: write the data from a randomly selected processor and
ignore the rest.
 Priority: follow a predetermined priority order. Processor with
highest priority succeeds and the rest fail.
 Sum: Write the sum of the data items in all the write requests.
The sum-based write conflict resolution model can be extended
for any of the associative operators, that is defined for data being
written .
Architecture of an Ideal Parallel Computer

Physical Complexity of an Ideal Parallel Computer

 Processors and memories are connected via switches.

 Since these switches must operate in O(1) time at the level of
words, for a system of p processors and m words, the switch
complexity is O(mp).
 Clearly, for meaningful values of p and m, a true PRAM is not
realizable.
Communication Costs in
Parallel Machines
Communication Costs in Parallel Machines

 Along with idling (doing nothing) and contention (conflict e.g.,

resource allocation), communication is a major overhead in
parallel programs.
 The communication cost is usually dependent on a number of
features including the following:
 Programming model for communication
 Network topology
 Data handling and routing
 Associated network protocols
 Usually, distributed systems suffer from major communication
overheads.
Message Passing Costs in Parallel Computers

 The total time to transfer a message over a network comprises

of the following:
 Startup time (ts): Time spent at sending and receiving nodes (preparing
the message[adding headers, trailers, and parity information ] ,
executing the routing algorithm, establishing interface between node
and router, etc.).
 Per-hop time (th): This time is a function of number of hops (steps) and
includes factors such as switch latencies, network delays, etc.
 Also known as node latency.
 Per-word transfer time (tw): This time includes all overheads that are
determined by the length of the message. This includes bandwidth of
links, and buffering overheads, etc.
Message Passing Costs in Parallel Computers

Store-and-Forward Routing
 A message traversing multiple hops is completely received at
an intermediate hop before being forwarded to the next hop.
 The total communication cost for a message of size m words to
traverse l communication links is

 In most platforms, th is small and the above expression can be

approximated by
Message Passing Costs in Parallel Computers

Packet Routing
 Store-and-forward makes poor use of communication
resources.
 Packet routing breaks messages into packets and pipelines
them through the network.
 Since packets may take different paths, each packet must carry
routing information, error checking, sequencing, and other
related header information.
 The total communication time for packet routing is
approximated by:
 Here factor tw also accounts for overheads in packet headers.

CS3006 - Fall 2021

Message Passing Costs in Parallel Computers

Cut-Through Routing
 Takes the concept of packet routing to an extreme by further
dividing messages into basic units called flits or flow control
digits.
 Since flits are typically small, the header information must be
minimized.
 This is done by forcing all flits to take the same path, in
sequence.
 A tracer message first programs all intermediate routers. All
flits then take the same route.
 Error checks are performed on the entire message, as opposed
to flits.
 No sequence numbers are needed.
Message Passing Costs in Parallel Computers

Cut-Through Routing
 The total communication time for cut-through routing is
approximated by:

 This is identical to packet routing, however, tw is typically

much smaller.
Message Passing Costs in Parallel Computers

(a) through a store-and-forward communication

network;

b) and (c) extending the concept to cut-through

routing.
Message Passing Costs in Parallel Computers

Simplified Cost Model for Communicating Messages

 The cost of communicating a message between two nodes l
hops away using cut-through routing is given by

 In this expression, th is typically smaller than ts and tw. For this

reason, the second term in the RHS does not show, particularly,
when m is large.

 For these reasons, we can approximate the cost of message

transfer by

CS3006 - Fall 2021

Message Passing Costs in Parallel Computers

Simplified Cost Model for Communicating Messages

 It is important to note that the original expression for
communication time is valid for only uncongested networks.

 Different communication patterns congest different networks to

varying extents.

 It is important to understand and account for this in the

communication time accordingly.
Questions
References
1. Flynn, M., “Some Computer Organizations and Their Effectiveness,” IEEE Transactions on Computers, Vol. C-21,
No. 9, September 1972.
2. Kumar, V., Grama, A., Gupta, A., & Karypis, G. (1994). Introduction to parallel computing (Vol. 110). Redwood
City, CA: Benjamin/Cummings.
3. Quinn, M. J. Parallel Programming in C with MPI and OpenMP,(2003).

CCNP ENCOR 350-401 Practice Exam - V2
100% (2)
CCNP ENCOR 350-401 Practice Exam - V2
40 pages
Solution For Lab 4.6.1
No ratings yet
Solution For Lab 4.6.1
5 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Unit 1
No ratings yet
Unit 1
25 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
05 - Lecture #5 - 6
No ratings yet
05 - Lecture #5 - 6
42 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Parallel and Distributed Computing Lecture 03
No ratings yet
Parallel and Distributed Computing Lecture 03
44 pages
Unit 5
No ratings yet
Unit 5
96 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Unit V
No ratings yet
Unit V
95 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
ceg4131_models
No ratings yet
ceg4131_models
27 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
L2
No ratings yet
L2
27 pages
Fundamentals of Parallel Computers
No ratings yet
Fundamentals of Parallel Computers
6 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
downloadfile (3)
No ratings yet
downloadfile (3)
16 pages
StudM1p1Parallel Computer Modelsppt1shared
No ratings yet
StudM1p1Parallel Computer Modelsppt1shared
107 pages
Introduction
No ratings yet
Introduction
46 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Unit 1
No ratings yet
Unit 1
22 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
No ratings yet
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
27 pages
CS0051 - M1-Parallel Computing Hardware
No ratings yet
CS0051 - M1-Parallel Computing Hardware
36 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Coa Chapter 5
No ratings yet
Coa Chapter 5
96 pages
DC Unit 1
No ratings yet
DC Unit 1
32 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Parallel Computer Structures
No ratings yet
Parallel Computer Structures
23 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
Parallel Algorithms: Peter Harrison and William Knottenbelt
No ratings yet
Parallel Algorithms: Peter Harrison and William Knottenbelt
65 pages
Lecture 5 Network Topologies for Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies for Parallel Architectures - Updated
46 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
3 4 Flayynn Taxonomy, Network
No ratings yet
3 4 Flayynn Taxonomy, Network
84 pages
Unit -01 easid
No ratings yet
Unit -01 easid
18 pages
Parallel Algorithms and Architectures 1
No ratings yet
Parallel Algorithms and Architectures 1
22 pages
Lecture 3 - 3 Evaluating Static Interconnection Networks
No ratings yet
Lecture 3 - 3 Evaluating Static Interconnection Networks
41 pages
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
No ratings yet
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
31 pages
Concurrency and Multithreading in C: POSIX Threads and Synchronization
From Everand
Concurrency and Multithreading in C: POSIX Threads and Synchronization
Larry Jones
No ratings yet
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Networking Programming with C++: Build Efficient Communication Systems
From Everand
Networking Programming with C++: Build Efficient Communication Systems
Robert Johnson
No ratings yet
Assignment-Cn Name Zeeshan Ali Roll No 19011519-163-A Submited To
No ratings yet
Assignment-Cn Name Zeeshan Ali Roll No 19011519-163-A Submited To
4 pages
Zeeshan
No ratings yet
Zeeshan
4 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
Database Project Hospital Management System
No ratings yet
Database Project Hospital Management System
91 pages
Topic: Lab Task 11
No ratings yet
Topic: Lab Task 11
2 pages
Assignment-Cn: Syeda Pakiza Imran
No ratings yet
Assignment-Cn: Syeda Pakiza Imran
5 pages
Lab Task No. 5: 19011519-088 Nauman: Bs-Cs-A
No ratings yet
Lab Task No. 5: 19011519-088 Nauman: Bs-Cs-A
6 pages
Theories of Punishment in West
No ratings yet
Theories of Punishment in West
2 pages
Essay Writting
No ratings yet
Essay Writting
12 pages
Multivariable Calculus Multivariable Calculus
No ratings yet
Multivariable Calculus Multivariable Calculus
1 page
Submitted BY: Ali Raza
No ratings yet
Submitted BY: Ali Raza
3 pages
# Algorithm and Examples: Assignment #1
No ratings yet
# Algorithm and Examples: Assignment #1
5 pages
Assignment No:1 Name: Waleed Tariq Submitted To: Sir Hamza Subject: TAX Assignment Topic: Audit Class: LLM Semester 2 Campus: University of Lahore
No ratings yet
Assignment No:1 Name: Waleed Tariq Submitted To: Sir Hamza Subject: TAX Assignment Topic: Audit Class: LLM Semester 2 Campus: University of Lahore
6 pages
Kashmir Issue
No ratings yet
Kashmir Issue
20 pages
Articulators: DR - Muhammad Aamir Rafiq Assistant Prof
No ratings yet
Articulators: DR - Muhammad Aamir Rafiq Assistant Prof
52 pages
Assignment: Application of Calculus and Analytical Geometry
50% (2)
Assignment: Application of Calculus and Analytical Geometry
7 pages
5g network technology
No ratings yet
5g network technology
27 pages
MultiPOD Webminar External
No ratings yet
MultiPOD Webminar External
34 pages
Ccna Slide Present JOSSSS
No ratings yet
Ccna Slide Present JOSSSS
4 pages
Amsc Aromes 1005qi (Ip Base)
No ratings yet
Amsc Aromes 1005qi (Ip Base)
16 pages
Simulation Analysis of AODV Routing Protocol of Manet Using Opnet
No ratings yet
Simulation Analysis of AODV Routing Protocol of Manet Using Opnet
3 pages
Subnetting
No ratings yet
Subnetting
18 pages
Wroute
100% (1)
Wroute
231 pages
Static Dynamic Routing
No ratings yet
Static Dynamic Routing
9 pages
ACN Lab manual-UPDATED
No ratings yet
ACN Lab manual-UPDATED
90 pages
10CSL77 NETWORK - Lab PDF
No ratings yet
10CSL77 NETWORK - Lab PDF
39 pages
FMC and FTD Management Network
No ratings yet
FMC and FTD Management Network
48 pages
Self-Study Guide: BGP Protocol Terminologies: Isp-A
No ratings yet
Self-Study Guide: BGP Protocol Terminologies: Isp-A
4 pages
Default Routing
No ratings yet
Default Routing
17 pages
MCQ
100% (1)
MCQ
29 pages
Rip Eigrp Ospf
No ratings yet
Rip Eigrp Ospf
1 page
Seminar-Deep Learning Roport
No ratings yet
Seminar-Deep Learning Roport
40 pages
Cisco Router Configuration Commands
No ratings yet
Cisco Router Configuration Commands
4 pages
Subnetting Challenge
No ratings yet
Subnetting Challenge
30 pages
A New Approach for Solving Location Routing Problems with
No ratings yet
A New Approach for Solving Location Routing Problems with
4 pages
CCNA Exploration 2 - Module 1 Exam Answers Version 4.0
No ratings yet
CCNA Exploration 2 - Module 1 Exam Answers Version 4.0
62 pages
1: Prpoject Overview: Function of Switches, Routers and Firewall
No ratings yet
1: Prpoject Overview: Function of Switches, Routers and Firewall
9 pages
Flare Feb+Mar
No ratings yet
Flare Feb+Mar
102 pages
9.2.2 Packet Tracer - Configure OSPF Advanced Features
No ratings yet
9.2.2 Packet Tracer - Configure OSPF Advanced Features
2 pages
NSE7 - SD-WAN FortiOS 6.4.5 - Study Guide
No ratings yet
NSE7 - SD-WAN FortiOS 6.4.5 - Study Guide
229 pages
Exercice N°2
No ratings yet
Exercice N°2
7 pages
D2 T1 S5 ACI Multisite Troubleshooting
No ratings yet
D2 T1 S5 ACI Multisite Troubleshooting
54 pages
Release Notes For Cisco Catalyst 3650 Series Switches, Cisco IOS XE Everest 16.6.x
No ratings yet
Release Notes For Cisco Catalyst 3650 Series Switches, Cisco IOS XE Everest 16.6.x
56 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PDC - Lecture - No. 3

Uploaded by

PDC - Lecture - No. 3

Uploaded by

Parallel and Distributed

 Amdahl’s Law of Parallel Speedup

CS3006 - Fall 2021

 Widely used architectural classification scheme

SISD (Single Instruction Single Data)

 This architecture includes single core

 Single instruction stream is in

 Similarly, only one data stream is active

SIMD (Single Instruction Multiple Data)

MISD (Multiple Instructions Single Data)

MIMD (Multiple Instructions Multiple

A typical SIMD architecture (a) and a typical MIMD architecture (b).

 SIMD computers require less hardware than MIMD computers

Parallel Random Access Machine (PRAM)

Parallel Random Access Machine (PRAM)

Parallel Random Access Machine (PRAM)

 But, What about concurrent write?

 Need of an arbitration(mediation) mechanism to resolve

Parallel Random Access Machine (PRAM)

Physical Complexity of an Ideal Parallel Computer

 Processors and memories are connected via switches.

 Along with idling (doing nothing) and contention (conflict e.g.,

 The total time to transfer a message over a network comprises

 In most platforms, th is small and the above expression can be

CS3006 - Fall 2021

 This is identical to packet routing, however, tw is typically

(a) through a store-and-forward communication

b) and (c) extending the concept to cut-through

Simplified Cost Model for Communicating Messages

 In this expression, th is typically smaller than ts and tw. For this

 For these reasons, we can approximate the cost of message

CS3006 - Fall 2021

Simplified Cost Model for Communicating Messages

 Different communication patterns congest different networks to

 It is important to understand and account for this in the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.