0% found this document useful (0 votes)
72 views

Notes 03 Sorting PDF

Quicksort works by recursively dividing the array into two halves based on a pivot element. The pivot element is chosen, and all elements less than or equal to the pivot are moved before it and all greater elements moved after it. The process is then recursively applied to the two sub-arrays. Once the recursion bottoms out on arrays of size 0 or 1, the entire array is sorted.

Uploaded by

hussein hammoud
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Notes 03 Sorting PDF

Quicksort works by recursively dividing the array into two halves based on a pivot element. The pivot element is chosen, and all elements less than or equal to the pivot are moved before it and all greater elements moved after it. The process is then recursively applied to the two sub-arrays. Once the recursion bottoms out on arrays of size 0 or 1, the entire array is sorted.

Uploaded by

hussein hammoud
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

3 Efficient Sorting

Sebastian Wild
18 February 2020

version 2020-02-24 15:38


Outline

3 Efficient Sorting
3.1 Mergesort
3.2 Quicksort
3.3 Comparison-Based Lower Bound
3.4 Integer Sorting
3.5 Parallel computation
3.6 Parallel primitives
3.7 Parallel sorting
Why study sorting?
� fundamental problem of computer science that is still not solved
Algorithm with optimal #comparisons in worst case?
� building brick of many more advanced algorithms
� for preprocessing
� as subroutine

� playground of manageable complexity


to practice algorithmic techniques

Here:
� “classic” fast sorting method

� parallel sorting

1
Part I
The Basics
Rules of the game
� Given:
� array 𝐴[0..𝑛 − 1] of 𝑛 objects
� a total order relation ≤ among 𝐴[0], . . . , 𝐴[𝑛 − 1]
(a comparison function)

� Goal: rearrange (=permute) elements within 𝐴,


so that 𝐴 is sorted, i. e., 𝐴[0] ≤ 𝐴[1] ≤ · · · ≤ 𝐴[𝑛 − 1]

� for now: 𝐴 stored in main memory (internal sorting)


single processor (sequential sorting)

2
3.1 Mergesort
Clicker Question

How does mergesort work?


A Split elements around median, then recurse on small /
large elements.

B Recurse on left / right half, then combine sorted halves.

C Grow sorted part on left, repeatedly add next element


to sorted range.

D Repeatedly choose 2 elements and swap them if they


are out of order.

E Don’t know.

pingo.upb.de/622222
3
Clicker Question

How does mergesort work?


A Split elements around median, then recurse on small /
large elements.

B Recurse on left / right half, then combine sorted halves. �


C Grow sorted part on left, repeatedly add next element
to sorted range.

D Repeatedly choose 2 elements and swap them if they


are out of order.

E Don’t know.

pingo.upb.de/622222
3
Merging sorted lists

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Merging sorted lists

run1 run2 result

4
Mergesort
1 procedure mergesort(𝐴[𝑙..𝑟])
2 𝑛 := 𝑟 − 𝑙 + 1 � recursive procedure; divide & conquer
3 if 𝑛 ≥ 1 return
� � � merging needs
4 𝑚 := 𝑙 + 𝑛2
� temporary storage for result
5 mergesort(𝐴[𝑙..𝑚 − 1])
of same size as merged runs
6 mergesort(𝐴[𝑚..𝑟])
7 merge(𝐴[𝑙..𝑚 − 1], 𝐴[𝑚..𝑟], buf ) � to read and write each element twice
8 copy buf to 𝐴[𝑙..𝑟] (once for merging, once for copying back)

5
Mergesort
1 procedure mergesort(𝐴[𝑙..𝑟])
2 𝑛 := 𝑟 − 𝑙 + 1 � recursive procedure; divide & conquer
3 if 𝑛 ≥ 1 return
� � � merging needs
4 𝑚 := 𝑙 + 𝑛2
� temporary storage for result
5 mergesort(𝐴[𝑙..𝑚 − 1])
of same size as merged runs
6 mergesort(𝐴[𝑚..𝑟])
7 merge(𝐴[𝑙..𝑚 − 1], 𝐴[𝑚..𝑟], buf ) � to read and write each element twice
8 copy buf to 𝐴[𝑙..𝑟] (once for merging, once for copying back)

Analysis:� count “element visits” (read and/or write)


0 𝑛≤1
𝐶(𝑛) = same for best and worst case!
𝐶(�𝑛/2�) + 𝐶(�𝑛/2�) + 2𝑛 𝑛 ≥ 2

Simplification 𝑛 = 2 𝑘

0 𝑘≤0
𝐶(2 𝑘 ) = = 2 · 2 𝑘 + 22 · 2 𝑘−1 + 23 · 2 𝑘−2 + · · · + 2 𝑘 · 21 = 2𝑘 · 2 𝑘
2 · 𝐶(2 𝑘−1
)+2·2 𝑘
𝑘≥1
𝐶(𝑛) = 2𝑛 lg(𝑛) = Θ(𝑛 log 𝑛)
5
and for arbitrary 𝑛 we have 𝐶(𝑛) ≤ 𝐶(next larger power of 2) ≤ 4𝑛 lg(𝑛) + 2𝑛 = Θ(𝑛 log 𝑛)
Mergesort – Discussion
optimal time complexity of Θ(𝑛 log 𝑛) in the worst case

stable sorting method i. e., retains relative order of equal-key items

memory access is sequential (scans over arrays)

requires Θ(𝑛) extra space


there are in-place merging methods,
but they are substantially more complicated
and not (widely) used

6
3.2 Quicksort
Clicker Question

How does quicksort work?


A split elements around median, then recurse on small /
large elements.

B recurse on left / right half, then combine sorted halves.

C grow sorted part on left, repeatedly add next element to


sorted range.

D repeatedly choose 2 elements and swap them if they are


out of order.

E Don’t know.

pingo.upb.de/622222
7
Clicker Question

How does quicksort work?


A split elements around median, then recurse on small /
large elements. �
B recurse on left / right half, then combine sorted halves.

C grow sorted part on left, repeatedly add next element to


sorted range.

D repeatedly choose 2 elements and swap them if they are


out of order.

E Don’t know.

pingo.upb.de/622222
7
Partitioning around a pivot

8
Partitioning around a pivot

8
Partitioning around a pivot

8
Partitioning around a pivot

8
Partitioning around a pivot

<𝑝

8
Partitioning around a pivot


<𝑝

8
Partitioning around a pivot

<𝑝

8
Partitioning around a pivot

<𝑝 >𝑝

8
Partitioning around a pivot

<𝑝

>𝑝

8
Partitioning around a pivot

<𝑝

>𝑝

8
Partitioning around a pivot

<𝑝

>𝑝

>𝑝

8
Partitioning around a pivot

<𝑝

>𝑝 >𝑝

8
Partitioning around a pivot

<𝑝

>𝑝

<𝑝 >𝑝

8
Partitioning around a pivot

<𝑝

>𝑝

<𝑝 >𝑝

8
Partitioning around a pivot

<𝑝

<𝑝

>𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝

<𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝



<𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝



>𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝



>𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝



>𝑝

>𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝



>𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝


✗✗
>𝑝 <𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝


✗✗
>𝑝 <𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝


��
<𝑝 >𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝 <𝑝 >𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝 <𝑝 >𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

<𝑝 <𝑝 <𝑝 <𝑝 <𝑝 >𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning around a pivot

� no extra space needed

� visits each element once

� returns rank/position of pivot

<𝑝 <𝑝 <𝑝 <𝑝 <𝑝 >𝑝 >𝑝 >𝑝 >𝑝

8
Partitioning – Detailed code
Beware: details easy to get wrong; use this code!

1 procedure partition(𝐴, 𝑏)
2 // input: array 𝐴[0..𝑛 − 1], position of pivot 𝑏 ∈ [0..𝑛 − 1]
3 swap(𝐴[0], 𝐴[𝑏])
4 𝑖 := 0, 𝑗 := 𝑛
5 while true do
6 do 𝑖 := 𝑖 + 1 while 𝑖 < 𝑛 and 𝐴[𝑖] < 𝐴[0]
7 do 𝑗 := 𝑗 − 1 while 𝑗 ≥ 1 and 𝐴[𝑗] > 𝐴[0]
8 if 𝑖 ≥ 𝑗 then break (goto 8)
9 else swap(𝐴[𝑖], 𝐴[𝑗])
10 end while
11 swap(𝐴[0], 𝐴[𝑗])
12 return 𝑗

Loop invariant (5–10): 𝐴 𝑝 ≤𝑝 ? ≥𝑝


𝑖 𝑗
9
Quicksort
1 procedure quicksort(𝐴[𝑙..𝑟]) � recursive procedure; divide & conquer
2 if 𝑙 ≥ 𝑟 then return
3 𝑏 := choosePivot(𝐴[𝑙..𝑟]) � choice of pivot can be
4 𝑗 := partition(𝐴[𝑙..𝑟], 𝑏) � fixed position � dangerous!
5 quicksort(𝐴[𝑙..𝑗 − 1]) � random
6 quicksort(𝐴[𝑗 + 1..𝑟]) � more sophisticated, e. g., median of 3

10
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 9 8

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6 8 9

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6 8 9

1 2 3 5 6 8

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6 8 9

1 2 3 5 6 8

1 3 6

11
Quicksort & Binary Search Trees
Quicksort

7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6 8 9

1 2 3 5 6 8

1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8

2 1 3 4 5 6 8 9

1 2 3 5 6 8

1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9

1 2 3 5 6 8

1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4

1 2 3 5 6 8

1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4

1 2 3 5 6 8 2

1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2

1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2

1 3 6 1

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2

1 3 6 1 3

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2 8

1 3 6 1 3

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2 5 8

1 3 6 1 3

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2 5 8

1 3 6 1 3 6

11
Quicksort & Binary Search Trees
Quicksort Binary Search Tree (BST)

7 4 2 9 1 3 8 5 6 7 4 2 9 1 3 8 5 6

4 2 1 3 5 6 7 9 8 7

2 1 3 4 5 6 8 9 4 9

1 2 3 5 6 8 2 5 8

1 3 6 1 3 6

� recursion tree of quicksort = binary search tree from successive insertion

� comparisons in quicksort = comparisons to built BST

� comparisons in quicksort ≈ comparisons to search each element in BST

11
Quicksort – Worst Case
� Problem: BSTs can degenerate 1

� Cost to search for 𝑘 is 𝑘 − 1 2

�𝑛 𝑛(𝑛 − 1) 1 2
3
� Total cost 𝑘=1 (𝑘 − 1) = ∼ 2𝑛 4
2
5

� quicksort worst-case running time is in Θ(𝑛 2 ) 6

terribly slow!

But, we can fix this:


Randomized quicksort:
� choose a random pivot in each step

� same as randomly shuffling input before sorting

12
Randomized Quicksort – Analysis
� 𝐶(𝑛) = element visits (as for mergesort)

� quicksort needs ∼ 2 ln(2) · 𝑛 lg 𝑛 ≈ 1.39𝑛 lg 𝑛 in expectation

� also: very unlikely to be much worse:


e. g., one can prove: Pr[cost > 10𝑛 lg 𝑛] = 𝑂(𝑛 −2.5 )
distribution of costs is “concentrated around mean”
� intuition: have to be constantly unlucky with pivot choice

13
Quicksort – Discussion
fastest general-purpose method

Θ(𝑛 log 𝑛) average case

works in-place (no extra space required)

memory access is sequential (scans over arrays)

Θ(𝑛 2 ) worst case (although extremely unlikely)

not a stable sorting method

Open problem: Simple algorithm that is fast, stable and in-place.

14
3.3 Comparison-Based Lower Bound
Lower Bounds
� Lower bound: mathematical proof that no algorithm can do better.
� very powerful concept: bulletproof impossibility result
≈ conservation of energy in physics
� (unique?) feature of computer science:
for many problems, solutions are known that (asymptotically) achieve the lower bound
� can speak of “optimal algorithms”

15
Lower Bounds
� Lower bound: mathematical proof that no algorithm can do better.
� very powerful concept: bulletproof impossibility result
≈ conservation of energy in physics
� (unique?) feature of computer science:
for many problems, solutions are known that (asymptotically) achieve the lower bound
� can speak of “optimal algorithms”

� To prove a statement about all algorithms, we must precisely define what that is!

� already know one option: the word-RAM model

� Here: use a simpler, more restricted model.

15
The Comparison Model
� In the comparison model data can only be accessed in two ways:
� comparing two elements
� moving elements around (e. g. copying, swapping)

� Cost: number of these operations.

16
The Comparison Model
� In the comparison model data can only be accessed in two ways:
� comparing two elements
� moving elements around (e. g. copying, swapping)

� Cost: number of these operations. That’s good!


Keeps algorithms general!

� This makes very few assumptions on the kind of objects we are sorting.

� Mergesort and Quicksort work in the comparison model.

16
The Comparison Model
� In the comparison model data can only be accessed in two ways:
� comparing two elements
� moving elements around (e. g. copying, swapping)

� Cost: number of these operations. That’s good!


Keeps algorithms general!

� This makes very few assumptions on the kind of objects we are sorting.

� Mergesort and Quicksort work in the comparison model.

� Every comparison-based sorting algorithm corresponds to a decision tree.


� only model comparisons � ignore data movement
� nodes = comparisons the algorithm does
� next comparisons can depend on outcomes � different subtrees
� child links = outcomes of comparison
� leaf = unique initial input permutation compatible with comparison outcomes

16
Comparison Lower Bound
Example: Comparison tree for a sorting method for 𝐴[0..2]:

𝐴[0] : 𝐴[1]

≤ 1,2,3
1,2,3
1,3,2
> 2,1,3
1,3,2 3,1,2
2,1,3
2,3,1 3,2,1
2,3,1
3,1,2
𝐴[1] : 𝐴[2] 3,2,1 𝐴[1] : 𝐴[2]

≤ > ≤ >
1,3,2 2,1,3
2,3,1 3,1,2

1,2,3 𝐴[0] : 𝐴[2] 𝐴[0] : 𝐴[2] 3,2,1

≤ > ≤ >

1,3,2 2,3,1 2,1,3 3,1,2

17
Comparison Lower Bound
Example: Comparison tree for a sorting method for 𝐴[0..2]:

� Execution = follow a path in


𝐴[0] : 𝐴[1]
comparison tree.
≤ 1,2,3
>
� height of comparison tree =
1,2,3 2,1,3
1,3,2
1,3,2 3,1,2
2,1,3
worst-case # comparisons
2,3,1 3,2,1
2,3,1
3,1,2
𝐴[1] : 𝐴[2] 3,2,1 𝐴[1] : 𝐴[2]

≤ > ≤
� comparison trees are binary trees
>
1,3,2 2,1,3
2,3,1 3,1,2 � ℓ leaves � height ≥ �lg(ℓ )�
1,2,3 𝐴[0] : 𝐴[2] 𝐴[0] : 𝐴[2] 3,2,1
� comparison trees for sorting
≤ > ≤ > method must have ≥ 𝑛! leaves
� height ≥ lg(𝑛!) ∼ 𝑛 lg 𝑛
1,3,2 2,3,1 2,1,3 3,1,2
more precisely: lg(𝑛!) = 𝑛 lg 𝑛 − lg(𝑒)𝑛 + 𝑂(log 𝑛)

17
Comparison Lower Bound
Example: Comparison tree for a sorting method for 𝐴[0..2]:

� Execution = follow a path in


𝐴[0] : 𝐴[1]
comparison tree.
≤ 1,2,3
>
� height of comparison tree =
1,2,3 2,1,3
1,3,2
1,3,2 3,1,2
2,1,3
worst-case # comparisons
2,3,1 3,2,1
2,3,1
3,1,2
𝐴[1] : 𝐴[2] 3,2,1 𝐴[1] : 𝐴[2]

≤ > ≤
� comparison trees are binary trees
>
1,3,2 2,1,3
2,3,1 3,1,2 � ℓ leaves � height ≥ �lg(ℓ )�
1,2,3 𝐴[0] : 𝐴[2] 𝐴[0] : 𝐴[2] 3,2,1
� comparison trees for sorting
≤ > ≤ > method must have ≥ 𝑛! leaves
� height ≥ lg(𝑛!) ∼ 𝑛 lg 𝑛
1,3,2 2,3,1 2,1,3 3,1,2
more precisely: lg(𝑛!) = 𝑛 lg 𝑛 − lg(𝑒)𝑛 + 𝑂(log 𝑛)

� Mergesort achieves ∼ 𝑛 lg 𝑛 comparisons � asymptotically comparison-optimal!


� Open (theory) problem: Can we sort with 𝑛 lg 𝑛 − lg(𝑒)𝑛 + 𝑜(𝑛) comparisons?
≈ 1.4427
17
Clicker Question

Does the comparison-tree from the previous slide correspond to


a worst-case optimal sorting method?

A Yes B No

pingo.upb.de/622222
18
Clicker Question

Does the comparison-tree from the previous slide correspond to


a worst-case optimal sorting method?

A Yes � B No

pingo.upb.de/622222
18
3.4 Integer Sorting
How to beat a lower bound
� Does the above lower bound mean, sorting always takes time Ω(𝑛 log 𝑛)?

19
How to beat a lower bound
� Does the above lower bound mean, sorting always takes time Ω(𝑛 log 𝑛)?

� Not necessarily; only in the comparison model!


� Lower bounds show where to change the model!

19
How to beat a lower bound
� Does the above lower bound mean, sorting always takes time Ω(𝑛 log 𝑛)?

� Not necessarily; only in the comparison model!


� Lower bounds show where to change the model!

� Here: sort 𝑛 integers


� can do a lot with integers: add them up, compute averages, . . . (full power of word-RAM)

� we are not working in the comparison model


� above lower bound does not apply!

19
How to beat a lower bound
� Does the above lower bound mean, sorting always takes time Ω(𝑛 log 𝑛)?

� Not necessarily; only in the comparison model!


� Lower bounds show where to change the model!

� Here: sort 𝑛 integers


� can do a lot with integers: add them up, compute averages, . . . (full power of word-RAM)

� we are not working in the comparison model


� above lower bound does not apply!

� but: a priori unclear how much arithmetic helps for sorting . . .

19
Counting sort
� Important parameter: size/range of numbers
� numbers in range [0..𝑈) = {0, . . . , 𝑈 − 1} typically 𝑈 = 2𝑏 � 𝑏-bit binary numbers

� We can sort 𝑛 integers in Θ(𝑛 + 𝑈) time and Θ(𝑈) space when 𝑏 ≤ 𝑤


Counting sort word size

1 procedure countingSort(𝐴[0..𝑛 − 1])


2 // 𝐴 contains integers in range [0..𝑈).
� count how often each possible
3 𝐶[0..𝑈 − 1] := new integer array, initialized to 0
value occurs
4 // Count occurrences
� produce sorted result directly
5 for 𝑖 := 0, . . . , 𝑛 − 1
from counts
6 𝐶[𝐴[𝑖]] := 𝐶[𝐴[𝑖]] + 1
7 𝑖 := 0 // Produce sorted list � circumvents lower bound by
8 for 𝑘 := 0, . . . 𝑈 − 1 using integers as array index /
9 for 𝑗 := 1, . . . 𝐶[𝑘] pointer offset
10 𝐴[𝑖] := 𝑘; 𝑖 := 𝑖 + 1

� Can sort 𝑛 integers in range [0..𝑈) with 𝑈 = 𝑂(𝑛) in time and space Θ(𝑛).
20
Integer Sorting – State of the art
� 𝑂(𝑛) time sorting also possible for numbers in range 𝑈 = 𝑂(𝑛 𝑐 ) for constant 𝑐.
� radix sort with radix 2𝑤

� algorithm theory
� suppose 𝑈 = 2𝑤 , but 𝑤 can be arbitrary function of 𝑛
� how fast can we sort 𝑛 such 𝑤-bit integers on a 𝑤-bit word-RAM?
� for 𝑤 = 𝑂(log 𝑛): linear time (radix/counting sort)
� for 𝑤 = Ω(log2+𝜀 𝑛): linear time (signature sort)

� for 𝑤 in between: can do 𝑂(𝑛 lg lg 𝑛) (very complicated algorithm)
don’t know if that is best possible!

21
Integer Sorting – State of the art
� 𝑂(𝑛) time sorting also possible for numbers in range 𝑈 = 𝑂(𝑛 𝑐 ) for constant 𝑐.
� radix sort with radix 2𝑤

� algorithm theory
� suppose 𝑈 = 2𝑤 , but 𝑤 can be arbitrary function of 𝑛
� how fast can we sort 𝑛 such 𝑤-bit integers on a 𝑤-bit word-RAM?
� for 𝑤 = 𝑂(log 𝑛): linear time (radix/counting sort)
� for 𝑤 = Ω(log2+𝜀 𝑛): linear time (signature sort)

� for 𝑤 in between: can do 𝑂(𝑛 lg lg 𝑛) (very complicated algorithm)
don’t know if that is best possible!

∗ ∗ ∗

� for the rest of this unit: back to the comparisons model!

21
Part II
Sorting with of many processors
3.5 Parallel computation
Types of parallel computation
£££ can’t buy you more time, but more computers!
� Challenge: Algorithms for parallel computation.

There are two main forms of parallelism


1. shared-memory parallel computer ← focus of today
� 𝑝 processing elements (PEs, processors) working in parallel
� single big memory, accessible from every PE
� communication via shared memory

� think: a big server, 128 CPU cores, terabyte of main memory

2. distributed computing
� 𝑝 PEs working in parallel
� each PE has private memory
� communication by sending messages via a network

� think: a cluster of individual machines

22
PRAM – Parallel RAM
� extension of the RAM model (recall Unit 1)

� the 𝑝 PEs are identified by ids 0, . . . , 𝑝 − 1


� like 𝑤 (the word size), 𝑝 is a parameter of the model that can grow with 𝑛
� 𝑝 = Θ(𝑛) is not unusual maaany processors!

� the PEs all independently run a RAM-style program


(they can use their id there)
� each PE has its own registers, but MEM is shared among all PEs

� computation runs in synchronous steps:


in each time step, every PE executes one instruction

23
PRAM – Conflict management
Problem: What if several PEs simultaneously overwrite a memory cell?
� EREW-PRAM (exclusive read, exclusive write)
any parallel access to same memory cell is forbidden (crash if happens)

� CREW-PRAM (concurrent read, exclusive write)


parallel write access to same memory cell is forbidden, but reading is fine
� CRCW-PRAM (concurrent read, concurrent write)
concurrent access is allowed,
need a rule for write conflicts:
� common CRCW-PRAM:
all concurrent writes to same cell must write same value
� arbitrary CRCW-PRAM:
some unspecified concurrent write wins
� (more exist . . . )

� no single model is always adequate, but our default is CREW


24
PRAM – Execution costs
Cost metrics in PRAMs
� space: total amount of accessed memory

� time: number of steps till all PEs finish assuming sufficiently many PEs!
sometimes called depth or span
� work: total #instructions executed on all PEs

Holy grail of PRAM algorithms:


� minimal time

� work (asymptotically) no worse than running time of best sequential algorithm


� work-efficient algorithm: work in same Θ-class as best sequential

25
The number of processors
Hold on, my computer does not have Θ(𝑛) processors� Why should I care for span and work�?

Theorem 3.1 (Brent’s Theorem:)


If an algorithm has span 𝑇 and work 𝑊 (for an arbitrarily large number of processors), it can
be run on a PRAM with 𝑝 PEs in time 𝑂(𝑇 + 𝑊𝑝 ) (and using 𝑂(𝑊) work). �

Proof: schedule parallel steps in round-robin fashion on the 𝑝 PEs.

� span and work give guideline for any number of processors


26
3.6 Parallel primitives
Prefix sums
Before we come to parallel sorting, we study some useful building blocks.

Prefix-sum problem (also: cumulative sums, running totals)

� Given: array 𝐴[0..𝑛 − 1] of numbers

� Goal: compute all prefix sums 𝐴[0] + · · · + 𝐴[𝑖] for 𝑖 = 0, . . . , 𝑛 − 1


may be done “in-place”, i. e., by overwriting 𝐴

Example:

input: 3 0 0 5 7 0 0 2 0 0 0 4 0 8 0 1

output: 3 3 3 8 15 15 15 17 17 17 17 21 21 29 29 30

27
Clicker Question

What is the sequential running time achievable for prefix sums?

A 𝑂(𝑛 3 ) D 𝑂(𝑛)

B 𝑂(𝑛 2 ) E 𝑂( 𝑛)

C 𝑂(𝑛 log 𝑛) F 𝑂(log 𝑛)

pingo.upb.de/622222
28
Clicker Question

What is the sequential running time achievable for prefix sums?

A 𝑂(𝑛 3 ) D 𝑂(𝑛) �

B 𝑂(𝑛 2 ) E 𝑂( 𝑛)

C 𝑂(𝑛 log 𝑛) F 𝑂(log 𝑛)

pingo.upb.de/622222
28
Prefix sums – Sequential
� sequential solution does 𝑛 − 1 additions
1 procedure prefixSum(𝐴[0..𝑛 − 1])
� but: cannot parallelize them 2 for 𝑖 := 1, . . . , 𝑛 − 1 do
data dependencies! 3 𝐴[𝑖] := 𝐴[𝑖 − 1] + 𝐴[𝑖]

� need a different approach

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy