Unit 04 Sorting
Unit 04 Sorting
Have u seen a Telephone bill or Mobile postpaid bill? Did u notice any kind
of sorting?
Have u seen ur Bank statement/credit card bill for 1 month? How is it
organized?
Have u registered for an email account? Did u notice any kind of sorting?
How will u find out the largest file on ur Desktop? How will u find out the
most recently created file or the oldest file on ur Desktop? Do u see any kind of
sorting?
Let’s say u r searching for the occurrence of a particular word in a big text
book. What will u do?
Example of Software
MS Office
Window
Web Application like-
Yahoo
Gmail
9
Comparison Sorting
We find the smallest element from the unsorted sublist and swap it with the element at the
beginning of the unsorted data.
Each time we move one element from the unsorted sublist to the sorted sublist, we say that we have
completed a sort pass.
16
Selection Sort Algorithm
Selection Sort Algorithm
Alg.: SELECTION-SORT(A)
n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] ↔ A[smallest]
Example-
Consider the following array of int values.
[23, 78, 45, 8, 32, 56]
Sort the array using selection sort
Sorted Unsorted
23 78 45 8 32 56 Original List
8 78 45 23 32 56 After pass 1
8 23 45 78 32 56 After pass 2
After pass 3
8 23 32 78 45 56
8 23 32 45 78 56 After pass 4
After pass 5
8 23 32 45 56 78
Analysis of Selection Sort
cost times
Alg.: SELECTION-SORT(A)
c1 1
n ← length[A]
c2 n
for j ← 1 to n - 1
do smallest ← j c3 n-1
n2/2
for i ← j + 1 to n c4 nj11 (n j 1)
comparisons
do if A[i] < A[smallest] c5
n 1
j 1
(n j )
n
exchanges then smallest ← i c6
n 1
j 1
(n j )
n 1 n 1 n 1
j 1 i j 1
j (n 1 ( j 1) 1)
j 1
n 1
(n
j 1
j 1)
n 1 n 1 n 1
n1 j 1
j 1 j 1 j 1
( n 1) n
n 2
n
2
( n 2 )
23
What will be the space complexity of Selection Sort
Algorithm?
8 4 6 9 2 3 1
i=1 j
Bubble-Sort Running Time
Alg.: BUBBLESORT(A)
for i 1 to length[A] c1
do for j length[A] downto i + 1 c2
Comparisons: n2/2 do if A[j] < A[j -1] c3
Exchanges: n2/2 then exchange A[j] A[j-1]
c4
Bubble sort runtime
Running time (# comparisons) for input size n:
n 1 n 1 i n 1
i 0 j 1
j (n 1 i )
i 0
n 1 n 1 n 1
n1 1 i
i 0 i 0 i 0
( n 1) n
n n2
2
( n 2 )
number of actual swaps performed depends on the data; out-of-order data performs many swaps
32
Repeat all the questions which we have discussed
for selection sort algorithm?
Insertion Sort
In Insertion sort, after each iteration a element moves from unsorted portion to
sorted portion until all the elements are sorted in the list.
1 Assume that first element in the list is in sorted portion of the list
and remaining all elements are in unsorted portion
2 Select first element from the unsorted list and insert that element
into the sorted list in order
3 Repeat the above steps until all the elements from the unsorted list
are moves to sorted list
Insertion Sort
Complexity of the Insertion Sort Algorithm
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
• Insertion sort – sorts the elements in place
Insertion Sort Algorithm
1. FOR i ← 1 TO length[A]
2. DO item ← A[i]
3. place ← i
4. WHILE place > 0 && A[place-1] > item
5. DO A[place] ← A[place-1]
6. place ← place − 1
7. A[place] ← item
Insertion sort runtime
worst case: reverse-ordered elements in array.
n 1
(n 1)n
i 1 2 3 ... (n 1) 2
i 1
(n 2 )
best case: array is in sorted ascending order.
n 1
i n 1 ( n )
i 1
average case: each element is about halfway in order.
n 1
i 1 (n 1)n
2 2 (1 2 3... (n 1)) 4
i 1
(n 2 )
Advantages of Insertion Sort
It is easy to implement and efficient to use on small sets of data.
It can be efficiently implemented on data sets that are already substantially sorted.
It performs better than algorithms like selection sort and bubble sort. Insertion
sort algorithm is simpler than shell sort, with only a small trade-off in efficiency. It
is over twice as fast as the bubble sort and almost 40 per cent faster than the
selection sort.
It requires less memory space (only O(1) of additional memory space).
It is said to be online, as it can sort a list as and when it receives new elements.
Comparing sorts
We've seen "simple" sorting algorithms so far, such as selection sort bubble sort and
insertion sort.
They all use nested loops and perform approximately n2 comparisons
They are relatively inefficient
Sorting practice problem
Consider the following array of int values.
(a) Write the contents of the array after 3 passes of the outermost loop of bubble sort.
(b) Write the contents of the array after 5 passes of the outermost loop of insertion sort.
(c) Write the contents of the array after 4 passes of the outermost loop of selection sort.
What is the running time of the best and worst cases
of the following sorting algorithms?
Insertion Selection Sort Bubble Sort
Sort
Best Case O(n) O(n^2) O(n)
Selection Sort
Selection Sort Doesn’t matter, all lists take the same
number of steps
Quick Sort
Quick sort is a highly efficient sorting algorithm and is based on partitioning
of array of data into smaller arrays.
Quick sort is divide & conquer algorithm.
The basic idea of Quick Sort is as follow:
Quick sort works by finding an element randomly, called the pivot, in the given
input array
The array divide into 3 sub array such that
The left sub array-> contain elements which are less than or equal to pivot element
The middle sub array-> contain pivot
The right sub array-> contain elements which are greater than or equal to pivot element
Now the 2 sub arrays, are sorted recursively.
Quick Sort
There are many different version of Quick Sort that pick pivot element in
different ways.
Pick first element as pivot.
Pick last element as pivot.
Pick a random element as pivot.
Pick median as pivot.
Arranging the array elements around the pivot p generates two smaller sorting problems.
sort the left section of the array, and sort the right section of the array.
when these two smaller sorting problems are solved recursively, our bigger sorting
problem is solved.
Quick Sort Algorithm
Quick Sort Algorithm
Quicksort(A, p, r) Partition(A, p, r)
1: if p ≥ r then return x = A[r]
2: q = Partition(A, p, r) i←p−1
3: Quicksort(A, p, q − 1) for j ← p to r − 1 do
4: Quicksort(A, q + 1, r) if A[ j ] ≤ x then {
i←i+1
Exchange A[ i ] and A[ j ]
A[p..r] }
Exchange A[i + 1] and A[r]
5
return i + 1
A[p..q – 1] A[q+1..r]
Partition
5 5
Example
p r
initially: 2 5 8 3 9 4 1 7 10 6 note: pivot (x) = 6
i j
p i j r
>x x
x >x
p i j r
x
x >x
Correctness of Partition
Case 2: A[j] x Increment j
Increment i Condition 2 is maintained.
p i j r
x x
x >x
p i j r
x
x >x
Correctness of Partition
Termination:
When the loop terminates, j = r, so all elements in A are partitioned into one of the
three cases:
A[p..i] pivot
A[i+1..j – 1] > pivot
A[r] = pivot
Partition(A, p, r)
x, i := A[r], p – 1;
for j := p to r – 1 do
if A[j] x then
i := i + 1;
A[i] A[j]
A[i + 1] A[r];
return i + 1
Quicksort Overview
To sort a[left...right]:
1. if left < right:
1.1. Partition a[left...right] such that:
all a[left...p-1] are less than a[p], and
all a[p+1...right] are >= a[p]
1.2. Quicksort a[left...p-1]
1.3. Quicksort a[p+1...right]
2. Terminate
Partitioning in Quicksort
A key step in the Quicksort algorithm is partitioning the array
We choose some (any) number p in the array to use as a pivot
We partition the array into three parts:
search: 436924312189356
swap: 433924312189656
search: 433924312189656
swap: 433124312989656
Continue..
search: 433124312989656
swap: 433122314989656
search: 433122314989656
We note that
Each partition is linear over its subarray
All the partitions at one level cover the array
Partitioning at various levels
Best Case Analysis
We cut the array size in half each time
So the depth of the recursion in log2n
At each level of the recursion, all the partitions at that level do work that is linear
in n
O(log2n) * O(n) = O(n log2n)
Hence in the best case, quicksort has time complexity O(n log2n)
What about the worst case?
Worst case
In the worst case, partitioning always divides the size n array into these three
parts:
A length one part, containing the pivot itself
A length zero part, and
A length n-1 part, containing everything else
We don’t recur on the zero-length part
Recurring on the length n-1 part requires (in the worst case) recurring to depth
n-1
Worst case partitioning
Worst case for quicksort
In the worst case, recursion may be n levels deep (for an array of size n)
But the partitioning work done at each level is still n
O(n) * O(n) = O(n2)
So worst case for Quicksort is O(n2)
When does this happen?
There are many arrangements that could make this happen
Here are two common cases:
When the array is already sorted
When the array is inversely sorted (sorted in the opposite order)
Typical case for quicksort
If the array is sorted to begin with, Quicksort is terrible: O(n2)
It is possible to construct other bad cases
However, Quicksort is usually O(n log2n)
The constants are so good that Quicksort is generally the faster algorithm.
Most real-world sorting is done by Quicksort.
Picking a better pivot
Before, we picked the first element of the subarray to use as a pivot
If the array is already sorted, this results in O(n2) behavior
It’s no better if we pick the last element
We could do an optimal quicksort (guaranteed O(n log n)) if we always picked a
pivot value that exactly cuts the array in half
Such a value is called a median: half of the values in the array are larger, half are
smaller
Quicksort for Small Arrays
For very small arrays (N<= 20), quicksort does not perform as well as insertion
sort
A good cutoff range is N=10
Switching to insertion sort for small arrays can save about 15% in the running time
Quick Sort Example
Merge(A, p, q, r)
Take the smallest of the two topmost elements of
sequences A[p..q] and A[q+1..r] and put into the
resulting sequence. Repeat this, until both sequences
are empty. Copy the resulting sequence into A[p..r].
Merge Sort
MergeSort(A, left, right) {
if (left < right) {
mid = floor((left + right) / 2);
MergeSort(A, left, mid);
MergeSort(A, mid+1, right);
Merge(A, left, mid, right);
}
}
A = {10, 5, 7, 6, 1, 4, 8, 3, 2, 9};
What is the running time in worst, best & average
cases of the following sorting algorithms?
Insertion Selection Sort Bubble Sort Quick Sort Merge Sort
Sort
Worst Case
Best Case
Average
Case
Answer-
Counting sort
Bucket sort
Radix sort
Counting sort
Assumptions:
n records
Each record contains keys and data
All keys are in the range of 1 to k
Space
The unsorted list is stored in A, the sorted list will be stored in an additional array B
Uses an additional array C of size k
Counting sort
Main idea:
1) For each key value i, i = 1,…,k, count the number of times the keys occurs in the unsorted
input array A.
2) Store results in an auxiliary array, C
3) Use these counts to compute the offset. Offseti is used to calculate the location where the
record with key value i will be stored in the sorted output list B.
4) The offseti value has the location where the last keyi .
A 4 1 3 4 3 4
Counting-Sort( A, B, k)
k = 4, length = 6 1. for i 1 to k
2. do C[i ] 0
C 0 0 0 0
A 4 1 3 4 3 4
7. for j length[A] down 1
8. do B [ C[A[ j ] ] ] A[ j ]
9. C[A[ j ] ] ] C [A[ j ] ] -1
1 2 3 4 5 6
B
<-1-> <- - 3 - -> <- - - 4 - ->
C 1 1 3 6
Counting Sort Example
Analysis:
O(k + n) time
What if k = O(n)
Requires k + n extra storage.
This is a stable sort: It preserves the original order of equal keys.
Clearly no good for sorting 32 bit values.
Radix Sort
Radix sort algorithm different than other sorting algorithms that we talked.
It does not use key comparisons to sort an array.
It is used to sort integer and string both.
It is also an extension of bucket sort algorithm.
The radix sort :
Treats each data item as a group of number or group of character string.
First it groups data items according to their rightmost character, and put these groups
into order w.r.t. this rightmost character.
Then, combine these groups.
We, repeat these grouping and combining operations for all other character positions in
the data items from the rightmost to the leftmost character position.
At the end, the sort operation will be completed.
Radix Sort
String of Names Integer
Radix is 26 or 26 buckets(a,b,…z) Radix is 10 or 10 buckets(0,1,…..9)
Example-
Sort the numbers given below using radix sort.
345, 654, 924, 123, 567, 472, 555, 808, 911
In the first pass, the numbers are sorted according
to the digit at ones place.
After this pass, the numbers are collected bucket by
bucket. The new list thus formed is used as an input
for the next pass. In the second pass, the numbers
are sorted according to the digit at the tens place.
In the third pass, the numbers are sorted according
to the digit at the hundreds place.
The numbers are collected bucket by bucket. The new list thus formed is the final sorted
result. After the third pass, the list can be given as
123, 345, 472, 555, 567, 654, 808, 911, 924
Example-
Sort the number given below using Radix Sort
189, 986, 205, 421, 97, 192, 535
Radix Sort – Example
mom, dad, god, fat, bad, cat, mad, pat, bar, him original list
(bad,bar) (cat) (dad) (fat) (god) (him) (mad,mom) (pat) group strings by first letter
80 93 60 12 42 30 68 85 10
10 30 60 12 42 93 68 85 80
Shell Sort - example (2)
Resegmenting Gap = 2
10 30 60 12 42 93 68 85 80
10 12 42 30 60 85 68 93 80
Shell Sort - example (3)
Resegmenting Gap = 1
10 12 42 30 60 85 68 93 80
10 12 30 42 60 68 80 85 93
Example-2
Sort the elements given below using shell sort.
63, 19, 7, 90, 81, 36, 54, 45, 72, 27, 22, 9, 41, 59, 33
Shell Sort Algorithm
Analysis
Shellsort's worst-case performance using Hibbard's
increments is Θ(n3/2).
The average performance is thought to be about O(n 5/4)
The exact complexity of this algorithm is still being debated
Animations:
http://www.sorting-algorithms.com/shell-sort
Tree Sort
The properties of binary search tree is completely make use
of tree sort algorithm.
The tree sort algorithm first builds a binary search tree
using the elements in an array which is to be sorted and
then does an in-order traversal so that the numbers are
retrieved in a sorted order.
Saving Binary Search Tree in a File
An initially empty binary search tree after the insertion of 60, 20, 10, 40, 30, 50,
and 70
Saving Binary Search Tree in a File- Tree Sort
A full tree saved in a file by using inorder traversal
4 Height of root = 3
1 3
Height of (2)= 1 2 16 9 10 Level of (10)= 2
14 8
The Heap Data Structure
Def: A heap is a nearly complete binary tree with the following two properties:
Structural property: all levels are full, except possibly the last one, which is filled
from left to right
Order (heap) property: for any node x
Parent(x) ≥ x
From the heap property, it
follows that:
8 “The root is the
maximum/minimum
7 4 element of the heap!”
5 2
Heap
A heap is a binary tree that is filled in order
Array Representation of Heaps
A heap can be stored as an array A.
Root of tree is A[1]
Left child of A[i] = A[2i]
Right child of A[i] = A[2i + 1]
Parent of A[i] = A[ i/2 ]
Heapsize[A] ≤ length[A]
162
Max Heap Example
19
12 16
1 4 7
19 12 16 1 4 7
Array A
Min heap example
1
4 16
7 12 19
1 4 16 7 12 19
Array A
Heapify
Heapify picks the largest child key and compare it to the parent key. If parent
key is larger than heapify quits, otherwise it swaps the parent key with the
largest child key. So that the parent is now becomes larger than its children.
Heapify(A, i)
{
l left(i)
r right(i)
if l <= heapsize[A] and A[l] > A[i]
then largest l
else largest i
if r <= heapsize[A] and A[r] > A[largest]
then largest r
if largest != i
then swap A[i] A[largest]
Heapify(A, largest)
}
Build Heap
We can use the procedure 'Heapify' in a bottom-up fashion to convert an array
A[1 . . n] into a heap. Since the elements in the subarray A[n/2 +1 . . n] are all
leaves, the procedure BUILD_HEAP goes through the remaining nodes of the
tree and runs 'Heapify' on each one. The bottom-up order of processing node
guarantees that the subtree rooted at children are heap before 'Heapify' is run at
their parent.
Buildheap(A)
{
heapsize[A] length[A]
for i |length[A]/2 //down to 1
do Heapify(A, i)
}
Heap Sort Algorithm
The heap sort algorithm starts by using procedure BUILD-HEAP to build a heap
on the input array A[1 . . n]. Since the maximum element of the array stored at the
root A[1], it can be put into its correct final position by exchanging it with A[n]
(the last element in A). If we now discard node n from the heap than the
remaining elements can be made into heap. Note that the new element at the root
may violate the heap property. All that is needed to restore the heap property.
Heapsort(A)
{
Buildheap(A)
for i length[A] //down to 2
do swap A[1] A[i]
heapsize[A] heapsize[A] - 1
Heapify(A, 1)
}
Example: Convert the following array to a heap
16 4 7 1 12 19
16
4 7
12 19
1 Step-1
Max- Heapify
16 16
4 7 4 19
swap
12 19 1 12 7
1
16 19
swap
12 19 12 16
swap
4 7 1 4 7
1
Heap Sort
The heapsort algorithm consists of two phases:
- build a heap from an arbitrary array
- use the heap to sort the data
12 16
1 4 7
Example of Heap Sort
Take out biggest
19
12 16
Move the last element
to the root
1 4 7
Sorted:
Array A
12 16 1 4 7 19
7
swap
HEAPIFY()
12 16
1 4
Sorted:
Array A
7 12 16 1 4 19
16
12 7
1 4
Sorted:
Array A
16 12 7 1 4 19
Take out biggest
16
Move the last element
to the root
12 7
1 4
Sorted:
Array A
12 7 1 4 16 19
4
12 7
Sorted:
Array A
4 12 7 1 16 19
swap 4
HEAPIFY()
12 7
Sorted:
Array A
4 12 7 1 16 19
12
4 7
Sorted:
Array A
12 4 7 1 16 19
Take out biggest
12
Move the last
element to the
root 4 7
Sorted:
Array A
4 7 1 12 16 19
1
swap
4 7
Sorted:
Array A
1 4 7 12 16 19
7
4 1
Sorted:
Array A
7 4 1 12 16 19
Take out biggest
7
Move the last
element to the
4 1 root
Sorted:
Array A
1 4 7 12 16 19
swap 1
HEAPIFY()
4
Sorted:
Array A
4 1 7 12 16 19
Take out biggest
Move the last 4
element to the
root
1
Sorted:
Array A
1 4 7 12 16 19
Take out biggest
1
Sorted:
Array A
1 4 7 12 16 19
Sorted:
1 4 7 12 16 19
Time Analysis
Build Heap Algorithm will run in O(n) time
There are n-1 calls to Heapify each call requires O(log n)
time
Heap sort program combine Build Heap program and
Heapify, therefore it has the running time of O(n log n)
time
Total time complexity: O(n log n)
Comparison with Quick Sort and Merge Sort
Quick sort is typically somewhat faster, due to better cache behavior and other factors,
but the worst-case running time for quick sort is O (n2), which is unacceptable for
large data sets and can be deliberately triggered given enough knowledge of the
implementation, creating a security risk.
The quick sort algorithm also requires Ω (log n) extra storage space, making it not a
strictly in-place algorithm. This typically does not pose a problem except on the
smallest embedded systems, or on systems where memory allocation is highly
restricted. Constant space (in-place) variants of quick sort are possible to construct,
but are rarely used in practice due to their extra complexity.
Comparison with Quick Sort and Merge Sort (cont)
Thus, because of the O(n log n) upper bound on heap sort’s running time and constant
upper bound on its auxiliary storage, embedded systems with real-time constraints or
systems concerned with security often use heap sort.
Heap sort also competes with merge sort, which has the same time bounds, but
requires Ω(n) auxiliary space, whereas heap sort requires only a constant amount.
Heap sort also typically runs more quickly in practice. However, merge sort is simpler
to understand than heap sort, is a stable sort, parallelizes better, and can be easily
adapted to operate on linked lists and very large lists stored on slow-to-access media
such as disk storage or network attached storage. Heap sort shares none of these
benefits; in particular, it relies strongly on random access.
Possible Application
When we want to know the task that carry the highest priority given a large
number of things to do
Interval scheduling, when we have a lists of certain task with start and finish
times and we want to do as many tasks as possible
26
24 20
18 17 19 13
12 14 11
THANK YOU