BCA 403 (File & Data Structure)
BCA 403 (File & Data Structure)
Data Structure
Introduction to Data Structure
Definition:
Data Structure is a way of collecting and organising data in such a way that we can perform
operations on these data in an effective way. Data Structures is about rendering data elements
in terms of some relationship, for better organization and storage. These are the main part of
many computer science algorithms as they enable the programmers to handle the data in an
efficient manner. It plays a vital role in enhancing the performance of a software or a program
as the main function of the software is to store and retrieve the user's data as fast as possible.
Data Structure may be represented as:
Data Structure= Organized Data + Allowed Operations
General data structure types include the array, the file, the record, the table, the tree, and so on.
As applications are getting complexed and amount of data is increasing day by day, there
may arise the following problems:
a. Processor speed: To handle very large amount of data, high speed processing is
required, but as the data is growing day by day to the billions of files per entity,
processor may fail to deal with that much amount of data.
b. Data Search: Consider an inventory size of 106 items in a store, If our application
needs to search for a particular item, it needs to traverse 106 items every time, results
in slowing down the search process.
c. Multiple requests: If thousands of users are searching the data simultaneously on a
web server, then there are the chances that a very large server can be failed during that
process
In order to solve the above problems, data structures are used. Data is organized to form a
data structure in such a way that all items are not required to be searched and required data
can be searched instantly.
Data Structure
1
BCA 403 (File & Data Structure)
Data Structure
Linear Non-Linear
Tree Graph
Static Dynamic
1. Primitive Data Type
Primitive data types are the data types available in most of the programming languages.
These data types are used to represent single value. It is a basic data type available in most
of the programming language.
Array Linked List Stack Queue
Data type Description
Integer Used to represent a number without decimal point.
Float Used to represent a number with decimal point.
Character Used to represent single character.
Boolean Used to represent logical values either true or false.
Linear data structure traverses the data elements sequentially. In linear data structure,
only one data element can directly be reached. It includes array, linked list, stack and
queues.
2
BCA 403 (File & Data Structure)
In Static data structure the size of the structure is fixed. The content of the data structure
can be modified but without changing the memory space allocated to it.
2.1.2 Dynamic Data Structure
In Dynamic data structure the size of the structure in not fixed and can be modified during
the operations performed on it. Dynamic data structures are designed to facilitate change
of data structures in the run time.
Non-Linear data structure is opposite to linear data structure. In non-linear data structure, the
data values are not arranged in order and a data item is connected to several other data items.
It uses memory efficiently. Free contiguous memory is not required for allocating data items.
It includes trees and graphs.
3
BCA 403 (File & Data Structure)
If the size of data structure is n then we can only insert n-1 data elements into it.
iii. Traversing: Every data structure contains the set of data elements. Traversing the
data structure means visiting each element of the data structure in order to perform
some specific operation like searching or sorting.
iv. Deletion: The process of removing an element from the data structure is called
Deletion. We can delete an element from the data structure at any random location. If
we try to delete an element from an empty data structure then underflow occurs.
v. Searching: The process of finding the location of an element within the data structure
is called Searching. There are two algorithms to perform searching, Linear Search and
Binary Search. We will discuss each one of them later in this tutorial.
vi. Sorting: The process of arranging the data structure in a specific order is known as
Sorting. There are many algorithms that can be used to perform sorting, for example,
insertion sort, selection sort, bubble sort, etc.
vii. Merging: When two lists List A and List B of size M and N respectively, of similar
type of elements, clubbed or joined to produce the third list, List C of size (M+N), then
this process is called merging.
viii. Destroying: This must be the last operation of the data structure and apply this
operation when no longer needs of the data structure.
ADT stands for Abstract Data Type. It is an abstraction of a data structure. Abstract data type
is a mathematical model of a data structure. It describes a container which holds a finite number
of objects where the objects may be associated through a given binary relationship. It is a
logical description of how we view the data and the operations allowed without regard to how
they will be implemented. ADT concerns only with what the data is representing and not with
how it will eventually be constructed.
The definition of ADT only mentions what operations are to be performed but not how these
operations will be implemented. It does not specify how data will be organized in memory and
what algorithms will be used for implementing the operations. It is called “abstract” because it
gives an implementation independent view. The process of providing only the essentials and
hiding the details is known as abstraction.
It is a set of objects and operations. For example, List, Insert, Delete, Search, Sort.
ADT consists of data, operation and error.
i. Data describes the structure of the data used in the ADT.
ii. Operation describes valid operations for the ADT. It describes its interface.
iii. Error describes how to deal with the errors that can occur.
List ADT
A list contains elements of same type arranged in sequential order and following operations
can be performed on the list.
get() – Return an element from the list at any given position.
insert() – Insert an element at any position of the list.
remove() – Remove the first occurrence of any element from a non-empty list.
removeAt() – Remove the element at a specified location from a non-empty list.
replace() – Replace an element at any position by another element.
size() – Return the number of elements in the list.
isEmpty() – Return true if the list is empty, otherwise return false.
isFull() – Return true if the list is full, otherwise return false.
Stack ADT
A Stack contains elements of same type arranged in sequential order. All operations takes
place at a single end that is top of the stack and following operations can be performed:
push() – Insert an element at one end of the stack called top.
pop() – Remove and return the element at the top of the stack, if it is not empty.
peek() – Return the element at the top of the stack without removing it, if the stack is not
empty.
size() – Return the number of elements in the stack.
isEmpty() – Return true if the stack is empty, otherwise return false.
isFull() – Return true if the stack is full, otherwise return false.
Queue ADT
A Queue contains elements of same type arranged in sequential order. Operations takes place
at both ends, insertion is done at end and deletion is done at front. Following operations can
be performed:
enqueue() – Insert an element at the end of the queue.
dequeue() – Remove and return the first element of queue, if the queue is not empty.
peek() – Return the element of the queue without removing it, if the queue is not empty.
size() – Return the number of elements in the queue.
5
BCA 403 (File & Data Structure)
Advantages of ADT
i. ADT is reusable and ensures robust data structure.
ii. It reduces coding efforts.
iii. Encapsulation ensures that data cannot be corrupted.
iv. ADT is based on principles of Object Oriented Programming (OOP) and Software
Engineering (SE).
v. It specifies error conditions associated with operations.
6
BCA 403 (File & Data Structure)
Algorithm
Algorithm:
Algorithm is finite set of logic or instructions, written in order to accomplish the certain
predefined task. It is not the complete program or code, it is just a solution (logic) of a problem,
which can be represented as an informal description. Algorithms are generally independent of
underlying languages.
An algorithm is any well-defined computational procedure that takes some values or set of
values as input and produces some value or set of values, as output. An algorithm is thus a
sequence of computational steps that transform the input into the output.
Characteristics of an algorithm:
An algorithm is a tool for solving a well-specified computational problem. Every Algorithm
must satisfy the following properties:
Example 1: Design an algorithm to multiply the two numbers num1 and num2 and display
the result in res.
Step 1 START
7
BCA 403 (File & Data Structure)
1. Start from the leftmost element of arr[] and one by one compare x with each element
of arr[].
2. If x matches with an element, return the index.
3. If x doesn’t match with any of elements, return -1.
8
BCA 403 (File & Data Structure)
Compile time is constant or same program is executed many times, hence run time of
a program (tp) is important which depends upon instance characteristics
9
BCA 403 (File & Data Structure)
All above criteria differ from installation to installation. Therefore for priori estimates
frequency count of each statement is most important factor while analysing an
algorithm or program.
10
BCA 403 (File & Data Structure)
Time and Space tradeoffs in algorithms : In general, while solving the problem, for any
algorithm, computation time required will be more when space required is less and vice-a-
versa.
11
BCA 403 (File & Data Structure)
Asymptotic Notation:
The main idea of asymptotic analysis is to have a measure of efficiency of algorithms that doesn’t
depend on machine specific constants, and doesn’t require algorithms to be implemented and time
taken by programs to be compared. Asymptotic notations are mathematical tools to represent time
complexity of algorithms for asymptotic analysis.
It is used to mathematically calculate the running time of any operation inside an algorithm.
Example: Asymptotic analysis refers to computing the running time of any operation in mathematical
units of computation. For example, the running time of one operation is computed as f(n) and may be
for another operation it is computed as g(n2). This means the first operation running time will increase
linearly with the increase in n and the running time of the second operation will increase exponentially
when n increases. Similarly, the running time of both operations will be nearly the same if n is
significantly small.
12
BCA 403 (File & Data Structure)
• Worst case: It defines the input for which the algorithm takes the huge time.
• Average case: It takes average time for the program execution.
• Best case: It defines the input for which the algorithm takes the lowest time.
The commonly used asymptotic notations used for calculating the running time complexity of an
algorithm is given below:
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running
time. It measures the worst case time complexity or the longest amount of time an algorithm
can possibly take to complete.
When expressing complexity using the Big O notation, constant multipliers are ignored. So,
an O(4n) algorithm is equivalent to O(n), which is how it should be written.
If f(n) and g(n) are the functions defined on a positive integer number n, then
f(n) = O(g(n))
That is, f of n is Big–O of g of n if and only if positive constants c and n exist, such that
f(n)≤cg(n).
It means that for large amounts of data, f(n) will grow no more than a constant factor than
g(n). Hence, g provides an upper bound.
Categories of Algorithms
13
BCA 403 (File & Data Structure)
There are certain limitations with the Big O notation of expressing the complexity of
algorithms.These limitations are as follows:
Many algorithms are simply too hard to analyse mathematically.
There may not be sufficient information to calculate the behaviour of the algorithm in
the average case.
Big O analysis only tells us how the algorithm grows with the size of the problem,
not how efficient it is, as it does not consider the programming effort.
It ignores important constants. For example, if one algorithm takes O(n2) time to
execute and the other takes O(100000n2) time to execute, then as per Big O, both
algorithm have equal time complexity. In real-time systems, this may be a serious
consideration.
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running
time. It measures the best case time complexity or the best amount of time an algorithm can
possibly take to complete.
For example, for a function f(n)
Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of
an algorithm's running time. It is represented as follows −
θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }
Asymptotic Notations
Following is a list of some common asymptotic notations −
constant − Ο(1)
logarithmic − Ο(log n)
Linear − Ο(n)
quadratic − Ο(n2)
Cubic − Ο(n3)
14
BCA 403 (File & Data Structure)
polynomial − nΟ(1)
exponential − 2Ο(n)
Algorithm Efficiency
If a function is linear (without any loops or recursions), the efficiency of that algorithm or the
running time of that algorithm can be given as the number of instructions it contains.
However, if an algorithm contains loops, then the efficiency of that algorithm may vary
depending on the number of loops and the running time of each loop in the algorithm.
Let us consider different cases in which loops determine the efficiency of an algorithm.
Linear Loops
To calculate the efficiency of an algorithm that has a single loop, we need to first determine
the number of times the statements in the loop will be executed. This is because the number
of iterations is directly proportional to the loop factor. Greater the loop factor, more is the
number of iterations. For example, consider the loop given below:
for(i=0;i<100;i++)
Statement Block;
Here, 100 is the loop factor. We have already said that efficiency is directly proportional to
the number of iterations. Hence, the general formula in the case of linear loops may be given
as
f(n) = n
However calculating efficiency is not as simple as is shown in the above example. Consider
the loop given below:
for(i=0;i<100;i+=2)
Statement Block;
Here, the number of iterations is half the number of the loop factor. So, here the efficiency
can be given as
F(n)=n/2
Logarithmic Loops
We have seen that in linear loops, the loop updation statement either adds or subtracts the
loop-controlling variable. However, in logarithmic loops, the loop-controlling variable is
either multiplied or divided during each iteration of the loop.
For example, look at the loops given below:
for(i=1;i=1;i*=2) for(i=1;i=1;i/=2)
statement block; statement block;
Consider the first for loop in which the loop-controlling variable i is multiplied by 2. The
loop will be executed only 10 times and not 1000 times because in each iteration the value of
i doubles. Now, consider the second loop in which the loop-controlling variable i is divided
by 2. In this case also, the loop will be executed 10 times. Thus, the number of iterations is a
function of the number by which the loop-controlling variable is divided or multiplied. In the
examples discussed, it is 2. That is, when n = 1000, the number of iterations can be given by
log 1000 which is approximately equal to 10. Therefore, putting this analysis in general
terms, we can conclude that the efficiency of loops in which iterations divide or multiply the
loop-controlling variables can be given as
15
BCA 403 (File & Data Structure)
F(n)=log n
Nested Loops
Loops that contain loops are known as nested loops. In order to analyse nested loops, we
need to determine the number of iterations each loop completes. The total is then obtained
as the product of the number of iterations in the inner loop and the number of iterations in
the outer loop. In this case, we analyse the efficiency of the algorithm based on whether it
is a linear logarithmic, quadratic, or dependent quadratic nested loop.
Linear logarithmic loop
Consider the following code in which the loop-controlling variable of the inner loop is
multiplied after each iteration. The number of iterations in the inner loop is log 10. This
inner loop is controlled by an outer loop which iterates 10 times. Therefore, according to
the formula, the number of iterations for this code can be given as 10 log 10.
for(i=0;i<10;i++)
for(j=1; j<10;j*=2)
statement block;
Quadratic loop
In a quadratic loop, the number of iterations in the inner loop is equal to the number of
iterations in the outer loop. Consider the following code in which the outer loop executes 10
times and for each iteration of the outer loop, the inner loop also executes 10 times.
Therefore, the efficiency here is 100.
for(i=0;i<10;i++)
for(j=0; j<10;j++)
statement block;
The generalized formula for quadratic loop can be given as f(n) = n2
.
In this code, the inner loop will execute just once in the first iteration, twice in the second
iteration, thrice in the third iteration, so on and so forth. In this way, the number of iterations
can be calculated as
1 + 2 + 3 +…..+ 10 = 55
If we calculate the average of this loop (55/10 = 5.5), we will observe that it is equal to the
number of iterations in the outer loop (10) plus 1 divided by 2. In general terms, the inner
loop iterates (n + 1)/2 times. Therefore, the efficiency of such a code can be given as
f(n) = n (n + 1)/2
16
BCA 403 (File & Data Structure)
the complexity of the input. Polynomial-time algorithms are said to be "fast." Most familiar
mathematical operations such as addition, subtraction, multiplication, and division, as well as
computing square roots, powers, and logarithms, can be performed in polynomial time.
Computing the digits of most interesting mathematical constants, including pi and e, can also
be done in polynomial time.
All basic arithmetic operations ((i.e.) Addition, subtraction, multiplication, division),
comparison operations, sort operations are considered as polynomial time algorithms.
Algorithms which have exponential time complexity grow much faster than polynomial
algorithms.
Equations that show a polynomial time complexity have variables in the bases of their terms.
Examples: n3 + 2n2 + 1. Notice n is in the base, NOT the exponent. In exponential equations,
the variable is in the exponent. Examples: 2n. As said before, exponential time grows much
faster. If n is equal to 1000 (a reasonable input for an algorithm), then notice 10003 is 1 billion,
and 21000 is simply huge! For a reference, there are about 280 hydrogen atoms in the sun, this is
much more than 1 billion.
Recursion
The process in which a function calls itself directly or indirectly is called recursion and the
corresponding function is called as recursive function. Using recursive algorithm, certain
problems can be solved quite easily. Examples of such problems are Towers of Hanoi
(TOH), Inorder/Preorder/Postorder Tree Traversals, DFS of Graph, etc.
Properties
A recursive function can go infinite like a loop. To avoid infinite running of recursive function,
there are two properties that a recursive function must have −
Base criteria − There must be at least one base criteria or condition, such that, when
this condition is met the function stops calling itself recursively.
Progressive approach − The recursive calls should progress in such a way that each
time a recursive call is made it comes closer to the base criteria
Many programming languages implement recursion by means of stacks. Generally, whenever
a function (caller) calls another function (callee) or itself as callee, the caller function
transfers execution control to the callee. This transfer process may also involve some data to
be passed from the caller to the callee.
In recursive program, the solution to base case is provided and solution of bigger problem is
expressed in terms of smaller problems.
int fact(int n)
{
if (n < = 1) // base case
return 1;
17
BCA 403 (File & Data Structure)
else
return n*fact(n-1);
}
In the above example, base case for n < = 1 is defined and larger value of number can be
solved by converting to smaller one till base case is reached.
Types of recursion:
1. Direct Recursion: A function fun() is called direct recursive if it calls the same
function fun().
int function(int a)
{
if(x <= 0)
return x;
else
return function(x-1);
}
2. Indirect Recursion: A function fun() is called indirect recursive if it calls
another function say fun_new() and fun_new() calls fun directly or indirectly
int func1(int a)
{
if(x <= 0)
return x;
else
return func2(x);
}
int func2(int y)
{
return func1(y-1);
}
3. Tail Recursion: A Recursive function is said to be Tail recursive if there are
no pending operations to be performed on return from a recursive call. A Tail
Recursive Function is said to return the value of the last recursive call as the
value of the function. Tail Recursion is very desirable because the amount of
information which must be stored during the computation is independent of the
number of recursive calls.
// An example of tail recursive function
void print(int n)
{
if (n < 0) return;
cout << " " << n;
18
BCA 403 (File & Data Structure)
Removal of Recursion
The method of converting a recursive function into a iterative with the help of looping is called
Removal of Recursion. It involves 2 steps:
1. Convert the recursive function to the Tail recursive.
2. Convert the tail recursive function to iterative
Tower of Hanoi problem
Tower of Hanoi, is a mathematical puzzle which consists of three towers (pegs) and more than
one rings. These rings are of different sizes and stacked upon in an ascending order, i.e. the
smaller one sits over the larger one.
There are other variations of the puzzle where the number of disks increase, but the tower
count remains the same.
Rules
The mission is to move all the disks to some another tower without violating the sequence of
arrangement. A few rules to be followed for Tower of Hanoi are −
1. Only one disk can be moved among the towers at any given time.
2. Only the "top" disk can be removed.
3. No large disk can sit over a small disk.
Tower of Hanoi puzzle with n disks can be solved in minimum 2n−1 steps.
Algorithm
To write an algorithm for Tower of Hanoi, We mark three towers with
name, source, destination and aux (only to help moving the disks). If we have only one disk,
then it can easily be moved from source to destination peg.
If we have n disks −
Step 1 − Move n-1 disks from source to aux
Step 2 − Move nth disk from source to dest
Step 3 − Move n-1 disks from aux to dest
Example: If there are 3 disks then the 7 steps are:
19
BCA 403 (File & Data Structure)
20
BCA 403 (File & Data Structure)
Array
Array
An array is a collection of similar data elements. These data elements have the same data type.
The elements of the array are stored in consecutive memory locations and are referenced by an
index (also known as the subscript). The subscript is an ordinal number which is used to
identify an element of the array.
An array must be declared before being used. Declaring an array means specifying the
following:
Data type—the kind of values it can store, for example, int, char, float, double.
Name—to identify the array.
Size—the maximum number of values that the array can hold.
Arrays are declared using the following syntax:
type name[size];
Dimension of array:
The dimension of an array is the number of indices needed to select an element. Thus, if the
array is seen as a function on a set of possible index combinations, it is the dimension of the
space of which its domain is a discrete subset. Thus a one-dimensional array is a list of data, a
two-dimensional array a rectangle of data, a three-dimensional array a block of data, etc.
The length of an array is given by the number of elements stored in it. The general formula to
calculate the length of an array is
where upper_bound is the index of the last element and lower_bound is the index of the first
element in the array.
Calculating address of data element in the single dimensional array:
Since an array stores all its data elements in consecutive memory locations, storing just the
base address, that is the address of the first element in the array, is sufficient. The address of
other data elements can simply be calculated using the base address. The formula to perform
this calculation is,
Address of data element, A[k] = BA(A) + w(k – lower_bound)
Here, A is the array, k is the index of the element of which we have to calculate the address,
BA is the base address of the array A, and w is the size of one element in memory
Array Operations:
Following are the basic operations supported by an array.
Traverse − print all the array elements one by one.
21
BCA 403 (File & Data Structure)
Traversing Array:
Traversing the data elements of an array A can include printing every element, counting the
total number of elements, or performing any process on these elements. Since, array is a linear
data structure (because all its elements form a sequence), traversing its elements is very simple
and straightforward
Algorithm:
Step 1: [INITIALIZATION] SET I = lower_bound
Step 2: Repeat Steps 3 to 4 while I <= upper_bound
Step 3: Apply Process to A[I]
Step 4: SET I = I + 1
[END OF LOOP]
Step 5: EXIT
Insertion Operation
Insert operation is to insert one or more data elements into an array. Based on the requirement,
a new element can be added at the beginning, end, or any given index of array.
Algorithm
Let Array be a linear unordered array of MAX elements.
Example
Result
Let A be a Linear Array (unordered) with N elements and K is a positive integer such that
K<=N. Following is the algorithm where ITEM is inserted into the Kth position of A
1. Start
2. Set J = N
3. Set N = N+1
4. Repeat steps 5 and 6 while J >= K
5. Set A[J+1] = A[J]
6. Set J = J-1
7. Set A[K] = ITEM
8. Stop
Deletion Operation
Deletion refers to removing an existing element from the array and re-organizing all elements
of an array.
Algorithm
Consider A is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to delete an element available at the Kth position of A.
1. Start
2. Set J = K
22
BCA 403 (File & Data Structure)
Search Operation
You can perform a search for an array element based on its value or its index.
Algorithm
Consider A is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to find an element with a value of ITEM using sequential search.
1. Start
2. Set J = 0
3. Repeat steps 4 and 5 while J < N
4. IF A[J] is equal ITEM THEN GOTO STEP 6
5. Set J = J +1
6. PRINT J, ITEM
7. Stop
Update Operation
Update operation refers to updating an existing element from the array at a given index.
Algorithm
Consider A is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to update an element available at the Kth position of A.
1. Start
2. Set A[K-1] = ITEM
3. Stop
Multidimensional Array:
A two-dimensional array is specified using two subscripts where the first subscript
denotes the row and the second denotes the column. The C compiler treats a two-dimensional
array as an array of one-dimensional arrays.
Any array must be declared before being used. The declaration statement tells the compiler the
name of the array, the data type of each element in the array, and the size of each dimension.
A two-dimensional array is declared as:
data_type array_name[row_size][column_size];
23
BCA 403 (File & Data Structure)
We know that, when we declare and initialize one dimensional array in C programming
simultaneously, we don't need to specify the size of the array. However this will not work with
2D arrays. We will have to define at least the second dimension of the array.
The size of a two dimensional array is equal to the multiplication of number of rows and the
number of columns present in the array. We do need to map two dimensional array to the one
dimensional array in order to store them in the memory.
A 3 X 3 two dimensional array is shown in the following image. However, this array needs to
be mapped to a one dimensional array in order to store it into the memory.
Representation of 2D array in memory:
Two-dimensional array can be represented in memory in two ways:-
a. Row-major Order
b. Column-major Order
Row Major Order:
It stores data in the memory location row by row, i.e. we store first the first row of the array,
then the second row of the array , and then next row and so on.
The memory location for an element in two-dimensional array in row major order can be given
by:
Address(A[I][J]) = Base_Address + w{N ( I – 1) + (J – 1)}
where w is the number of bytes required to store one element, N is the number of columns, M
is the number of rows, and I and J are the subscripts of the array element
24
BCA 403 (File & Data Structure)
The memory location for an element in two-dimensional array in column major order can be
given by:
Address(A[I][J]) = Base_Address + w{M ( J – 1) + (I – 1)}
where w is the number of bytes required to store one element, N is the number of columns, M
is the number of rows, and I and J are the subscripts of the array element
Time and space complexity of various array operations are described in the following table.
Time Complexity
Algorithm Average Case Worst Case
Space Complexity
Advantages of Array
o Array provides the single name for the group of variables of the same type therefore,
it is easy to remember the name of all the elements of an array.
25
BCA 403 (File & Data Structure)
o Traversing an array is a very simple process, we just need to increment the base
address of the array in order to visit each element one by one.
o Any element in the array can be directly accessed by using the index.
Application of Array:
Arrays are used to implement mathematical vectors and matrices, as well as other kinds
of rectangular tables. Many databases, small and large, consist of (or include) one-
dimensional arrays whose elements are records.
Arrays are used to implement other data structures, such as lists, heaps, hash tables,
deques, queues, stacks, strings, and VLists. Array-based implementations of other data
structures are frequently simple and space-efficient (implicit data structures), requiring
little space overhead, but may have poor space complexity, particularly when modified,
compared to tree-based data structures (compare a sorted array to a search tree).
One or more large arrays are sometimes used to emulate in-program dynamic memory
allocation, particularly memory pool allocation. Historically, this has sometimes been
the only way to allocate "dynamic memory" portably.
26
BCA 403 (File & Data Structure)
Linked List
Linked List
A linked list is a sequence of data structures, which are connected together via links. Linked
list is the second most-used data structure after array.
Linked List can be defined as collection of objects called nodes that are randomly stored in
the memory. A node contains two fields i.e. data stored at that particular address and the
pointer which contains the address of the next node in the memory. The last node of the list
contains pointer to the null.
A linked list, in simple terms, is a linear collection of data elements. These data elements are
called nodes. Linked list is a data structure which in turn can be used to implement other data
structures. Thus, it acts as a building block to implement data structures such as stacks, queues,
and their variations. A linked list can be perceived as a train or a sequence of nodes in which
each node contains one or more data fields and a pointer to the next node.
Representation of Linked List
Since in a linked list, every node contains a pointer to another node which is of the same type,
it is also called a self-referential data type. Linked lists contain a pointer variable START
that stores the address of the first node in the list. The entire list can be traversed by using
START which contains the address of the first node; the next part of the first node in turn
stores the address of its succeeding node.
Using this technique, the individual nodes of the list will form a chain of nodes. If START =
NULL, then the linked list is empty and contains no nodes.
In C, we can implement a linked list using the following code:
struct node
{
int data;
struct node *next;
};
27
BCA 403 (File & Data Structure)
In the figure, we can see that the variable START is used to store the address of the first
node. Here, in this example, START = 1, so the first data is stored at address 1, which is H.
The corresponding NEXT stores the address of the next node, which is 4. So, we will look at
address 4 to fetch the next data item. The second data element obtained from address 4 is E.
Again, we see the corresponding NEXT to go to the next node. From the entry in the NEXT,
we get the next address, that is 7, and fetch L as the data. We repeat this procedure until we
reach a position where the NEXT entry contains –1 or NULL, as this would denote the end of
the linked list.
Why use linked list over array?
However, Array has several advantages and disadvantages which must be known in order to
decide the data structure which will be used throughout the program.
Array contains following limitations:
1. The size of array must be known in advance before using it in the program.
2. Increasing size of the array is a time taking process. It is almost impossible to expand
the size of the array at run time.
3. All the elements in the array need to be contiguously stored in the memory. Inserting
any element in the array needs shifting of all its predecessors.
Linked list is the data structure which can overcome all the limitations of an array. Using
linked list is useful because,
1. It allocates the memory dynamically. All the nodes of linked list are non-contiguously
stored in the memory and linked together with the help of pointers.
2. Sizing is no longer a problem since we do not need to define its size at the time of
declaration. List grows as per the program's demand and limited to the available
memory space.
START
1 2 3 4 x
29
BCA 403 (File & Data Structure)
Insertion
The insertion into a singly linked list can be performed at different positions. Based on the
position of the new node being inserted, the insertion is categorized into the following
categories.
SN Operation Description
2 Insertion at It involves insertion at the last of the linked list. The new
end of the list node can be inserted as the only node in the list or it can be
inserted as the last one. Different logics are implemented
in each scenario.
3 Insertion after It involves insertion after the specified node of the linked
specified node list. We need to skip the desired number of nodes in order
to reach the node after which the new node will be
inserted. .
Algorithm to insert a new node at the beginning of the list:
First we check whether memory is available for the new node. If the free memory has
exhausted, then an OVERFLOW message is printed. Otherwise, if a free memory cell is
available, then we allocate space for the new node. Set its DATA part with the given VAL and
the next part is initialized with the address of the first node of the list, which is stored in
START. Now, since the new node is added as the first node of the list, it will now be known
as the START node, that is, the START pointer variable will now hold the address of the
NEW_NODE.
30
BCA 403 (File & Data Structure)
2 Deletion at the It involves deleting the last node of the list. The list can either
end of the list be empty or full. Different logic is implemented for the
different scenarios.
3 Deletion after It involves deleting the node after the specified node in the list.
specified node we need to skip the desired number of nodes to reach the node
after which the node will be deleted. This requires traversing
through the list.
31
BCA 403 (File & Data Structure)
32
BCA 403 (File & Data Structure)
SN Operation Description
2 Insertion at end Adding the node into the linked list to the end.
3 Insertion after Adding the node into the linked list after the specified
specified node node.
5 Deletion at the end Removing the node from end of the list.
6 Deletion of the node Removing the node which is present just after the node
having given data containing the given data.
33
BCA 403 (File & Data Structure)
Algorithm
Case 2: The new node is inserted at the end.
34
BCA 403 (File & Data Structure)
Algorithm:
Case 1: The first node is deleted.
35
BCA 403 (File & Data Structure)
36
BCA 403 (File & Data Structure)
2 Insertion at the end Adding a node into circular singly linked list at the end.
1 Deletion at Removing the node from circular singly linked list at the
beginning beginning.
2 Deletion at the Removing the node from circular singly linked list at the end.
end
3 Searching Compare each element of the node with the given item and return
the location at which the item is present in the list otherwise
return null.
4 Traversing Visiting each element of the list at least once in order to perform
some specific operation.
Case 1: The new node is inserted at the beginning of the circular linked list.
37
BCA 403 (File & Data Structure)
Case 2: The new node is inserted at the end of the circular linked list.
Deletion Operation:
Case 1: The first node is deleted
38
BCA 403 (File & Data Structure)
Stack
Stack is an abstract data type with a bounded(predefined) capacity. It is a simple data
structure that allows adding and removing elements in a particular order. Every time an
element is added, it goes on the top of the stack and the only element that can be removed is
the element that is at the top of the stack, just like a pile of objects.
Stack ADT allows all data operations at one end only. At any given time, we can only access
the top element of a stack.
This feature makes it LIFO data structure. LIFO stands for Last-in-first-out. Here, the element
which is placed (inserted or added) last, is accessed first. In stack terminology, insertion
operation is called PUSH operation and removal operation is called POP operation.
Applications of Stack
The simplest application of a stack is to reverse a word. You push a given word to stack -
letter by letter - and then pop letters from the stack.
There are other uses also like:
1. Parsing
2. Expression Conversion(Infix to Postfix, Postfix to Prefix etc)
39
BCA 403 (File & Data Structure)
To use a stack efficiently, we need to check the status of stack as well. For the same purpose,
the following functionality is added to stacks −
peek() − get the top data element of the stack, without removing it.
isFull() − check if stack is full.
isEmpty() − check if stack is empty.
Algorithm of peek() function −
40
BCA 403 (File & Data Structure)
return stack[top]
end procedure
Application of stacks:
Applications of Stack
1. Recursion
2. Expression evaluations and conversions
3. Parsing
4. Browsers
5. Editors
6. Tree Traversals
7. Reversing a list
41
BCA 403 (File & Data Structure)
In array implementation, the stack is formed by using the array. All the operations regarding
the stack are performed using arrays. Lets see how each operation can be implemented on the
stack using array data structure.
Adding an element onto the stack (push operation)
Adding an element into the top of the stack is referred to as push operation. Push operation
involves following two steps.
1. Increment the variable Top so that it can now refer to the next memory location.
2. Add element at the position of incremented top. This is referred to as adding new
element at the top of the stack.
Stack is overflow when we try to insert an element into a completely filled stack therefore, our
main function must always avoid stack overflow condition.
begin
if top = n then stack full
top = top + 1
stack (top) : = item;
end
OPERATIONS ON A STACK
A stack supports three basic operations: push, pop, and peek. The push operation adds an
element to the top of the stack and the pop operation removes the element from the top of the
stack. The peek operation returns the value of the topmost element of the stack.
Push Operation
The push operation is used to insert an element into the stack. The new element is added at the
topmost position of the stack. However, before inserting the value, we must first check if
42
BCA 403 (File & Data Structure)
TOP=MAX–1, because if that is the case, then the stack is full and no more insertions can be
done. If an attempt is made to insert a value in a stack that is already full, an OVERFLOW
message is printed.
Pop Operation
The pop operation is used to delete the topmost element from the stack. However, before
deleting the value, we must first check if TOP=NULL because if that is the case, then it means
the stack is empty and no more deletions can be done. If an attempt is made to delete a value
from a stack that is already empty, an UNDERFLOW message is printed.
Peek Operation
Peek is an operation that returns the value of the topmost element of the stack without deleting
it from the stack
43
BCA 403 (File & Data Structure)
Pop Operation
The pop operation is used to delete the topmost element from a stack. However, before deleting
the value, we must first check if TOP=NULL, because if this is the case, then it means that the
stack is empty and no more deletions can be done. If an attempt is made to delete a value from
a stack that is already empty, an UNDERFLOW message is printed.
44
BCA 403 (File & Data Structure)
Queue
Queue is an abstract data structure, somewhat similar to Stacks. Unlike stacks, a queue is open at both
its ends. One end is always used to insert data (enqueue) and the other is used to remove data
(dequeue). Queue follows First-In-First-Out methodology, i.e., the data item stored first will be
accessed first.
A queue can be defined as an ordered list which enables insert operations to be performed at
one end called REAR and delete operations to be performed at another end called FRONT.
For example, people waiting in line for a rail ticket form a queue.
Queue Representation
The following diagram given below explain queue representation as data structure −
As in stacks, a queue can also be implemented using Arrays, Linked-lists, Pointers and
Structures. For the sake of simplicity, we shall implement queues using one-dimensional
array.
Basic Operations
Queue operations may involve initializing or defining the queue, utilizing it, and then
completely erasing it from the memory. Here we shall try to understand the basic operations
associated with queues −
enqueue() − add (store) an item to the queue.
dequeue() − remove (access) an item from the queue.
Few more functions are required to make the above-mentioned queue operation efficient.
These are −
peek() − Gets the element at the front of the queue without removing it.
isfull() − Checks if the queue is full.
isempty() − Checks if the queue is empty.
In queue, we always dequeue (or access) data, pointed by front pointer and while enqueing
(or storing) data in the queue we take help of rear pointer.
45
BCA 403 (File & Data Structure)
Check if the queue is already full by comparing rear to max - 1. if so, then return an overflow
error.
If the item is to be inserted as the first element in the list, in that case set the value of front
and rear to 0 and insert the element at the rear end.
Otherwise keep increasing the value of rear and insert each element one by one having rear as
the index.
Algorithm
Step 1: IF REAR = MAX - 1
Write OVERFLOW
Go to step
[END OF IF]
Step 2: IF FRONT = -1 and REAR = -1
SET FRONT = REAR = 0
ELSE
SET REAR = REAR + 1
[END OF IF]
Step 3: Set QUEUE[REAR] = NUM
Step 4: EXIT
If, the value of front is -1 or value of front is greater than rear , write an underflow message
and exit.
Otherwise, keep increasing the value of front and return the item stored at the front end of the
queue at each time.
Algorithm
46
BCA 403 (File & Data Structure)
Although, the technique of creating a queue is easy, but there are some drawbacks of using
this technique to implement a queue.
Memory wastage : The space of the array, which is used to store queue elements, can never
be reused to store the elements of that queue because the elements can only be inserted at
front end and the value of front might be so high so that, all the space before that, can never
be filled.
Deciding the array size: On of the most common problem with array implementation is the
size of the array which requires to be declared in advance. Due to the fact that, the queue can
be extended at runtime depending upon the problem, the extension in the array size is a time
taking process and almost impossible to be performed at runtime since a lot of reallocations
take place. Due to this reason, we can declare the array large enough so that we can store
queue elements as enough as possible but the main problem with this declaration is that, most
of the array slots (nearly half) can never be reused. It will again lead to memory wastage
Due to the drawbacks discussed in the previous section of this tutorial, the array
implementation can not be used for the large scale applications where the queues are
implemented. One of the alternative of array implementation is linked list implementation of
queue.
The storage requirement of linked representation of a queue with n elements is o(n) while the
time requirement for operations is o(1).
In a linked queue, each node of the queue consists of two parts i.e. data part and the link part.
Each element of the queue points to its immediate next element in the memory.
47
BCA 403 (File & Data Structure)
In the linked queue, there are two pointers maintained in the memory i.e. front pointer and
rear pointer. The front pointer contains the address of the starting element of the queue while
the rear pointer contains the address of the last element of the queue.
Insertion and deletions are performed at rear and front end respectively. If front and rear both
are NULL, it indicates that the queue is empty.
There are two basic operations which can be implemented on the linked queues. The
operations are Insertion and Deletion.
Insert operation
The insert operation append the queue by adding an element to the end of the queue. The new
element will be the last element of the queue.
Algorithm:
Deletion
Deletion operation removes the element that is first inserted among all the queue elements.
Firstly, we need to check either the list is empty or not. The condition front == NULL
becomes true if the list is empty, in this case , we simply write underflow on the console and
make exit.
Algorithm:
TYPES OF QUEUES
A queue data structure can be classified into the following types:
48
BCA 403 (File & Data Structure)
1. Circular Queue
2. Deque
3. Priority Queue
4. Multiple Queue
Circular Queue:
In the circular queue, the first index comes right after the last index. Deletions and insertions
can only be performed at front and rear end respectively, as far as linear queue is concerned.
The circular queue will be full only when front = 0 and rear = Max – 1. A circular queue is
implemented in the same manner as a linear queue is implemented. The only difference will
be in the code that performs insertion and deletion operations.
For insertion, we have to check for the following three conditions:
If front = 0 and rear = MAX – 1, then the circular queue is full.
If rear != MAX – 1, then rear will be incremented and the value will be inserted.
If front != 0 and rear = MAX – 1, then it means that the queue is not full. So, set rear
= 0 and insert the new element there.
Algorithm to insert an element in circular queue:
49
BCA 403 (File & Data Structure)
Deques
A deque (pronounced as ‘deck’ or ‘dequeue’) is a list in which the elements can be inserted or
deleted at either end. It is also known as a head-tail linked list because elements can be added
to or removed from either the front (head) or the back (tail) end.
There are two variants of a double-ended queue. They include
Input restricted deque In this dequeue, insertions can be done only at one of the ends,
while deletions can be done from both ends.
Output restricted deque In this dequeue, deletions can be done only at one of the ends,
while insertions can be done on both ends.
Priority Queues
A priority queue is a data structure in which each element is assigned a priority. The priority
of the element will be used to determine the order in which the elements will be processed. The
general rules of processing the elements of a priority queue are
An element with higher priority is processed before an element with a lower priority.
Two elements with the same priority are processed on a first-come-first-served (FCFS)
basis.
A priority queue can be thought of as a modified queue in which when an element has to be
removed from the queue, the one with the highest-priority is retrieved first. The priority of the
element can be set based on various factors. Priority queues are widely used in operating
systems to execute the highest priority process first. The priority of the process may be set
based on the CPU time it requires to get executed completely.
Sparse Matrix:
In computer programming, a matrix can be defined with a 2-dimensional array. Any array with
'm' columns and 'n' rows represents a m X n matrix. There may be a situation in which a matrix
50
BCA 403 (File & Data Structure)
contains more number of ZERO values than NON-ZERO values. Such matrix is known as
sparse matrix.
Why to use Sparse Matrix instead of simple matrix ?
Storage: There are lesser non-zero elements than zeros and thus lesser memory can be
used to store only those elements.
Computing time: Computing time can be saved by logically designing a data structure
traversing only non-zero elements.
Sparse Matrix Representations
A sparse matrix can be represented by using TWO representations, those are as follows...
1. Triplet Representation (Array Representation)
2. Linked Representation
Triplet Representation (Array Representation)
In this representation, we consider only non-zero values along with their row and column index
values. In this representation, the 0th row stores total number of rows, total number of columns
and total number of non-zero values in the sparse matrix.
2D array is used to represent a sparse matrix in which there are three rows named as
51
BCA 403 (File & Data Structure)
Tree
Tree Data Structure:
o A Tree is a recursive data structure containing the set of one or more data nodes where
one node is designated as the root of the tree while the remaining nodes are called as
the children of the root.
o The nodes other than the root node are partitioned into the non empty sets where each
one of them is to be called sub-tree.
o Nodes of a tree either maintain a parent-child relationship between them or they are
sister nodes.
o In a general tree, A node can have any number of children nodes but it can have only a
single parent.
Basic terminology
o Root Node :- The root node is the topmost node in the tree hierarchy. In other words,
the root node is the one which doesn't have any parent.
o Sub Tree :- If the root node is not null, the tree T1, T2 and T3 is called sub-trees of the
root node.
o Leaf Node :- The node of tree, which doesn't have any child node, is called leaf node.
Leaf node is the bottom most node of the tree. There can be any number of leaf nodes
present in a general tree. Leaf nodes can also be called external nodes.
o Path :- The sequence of consecutive edges is called path. In the tree shown in the above
image, path to the node E is A→ B → E.
52
BCA 403 (File & Data Structure)
o Ancestor node :- An ancestor of a node is any predecessor node on a path from root to
that node. The root node doesn't have any ancestors. In the tree shown in the above
image, the node F have the ancestors, B and A.
o Degree :- Degree of a node is equal to number of children, a node have. In the tree
shown in the above image, the degree of node B is 2. Degree of a leaf node is always 0
while in a complete binary tree, degree of each node is equal to 2.
o Level Number :- Each node of the tree is assigned a level number in such a way that
each node is present at one level higher than its parent. Root node of the tree is always
present at level 0.
General Tree
General Tree stores the elements in a hierarchical order in which the top level element is always
present at level 0 as the root element. All the nodes except the root node are present at number
of levels. The nodes which are present on the same level are called siblings while the nodes
which are present on the different levels exhibit the parent-child relationship among them. A
node may contain any number of sub-trees. The tree in which each node contain 3 sub-tree, is
called ternary tree.
Forests
Forest can be defined as the set of disjoint trees which can be obtained by deleting the root
node and the edges which connects root node to the first level node.
Binary Tree
Binary tree is a data structure in which each node can have at most 2 children. The node present
at the top most level is called the root node. A node with the 0 children is called leaf node.
Binary Trees are used in the applications like expression evaluation and many more. We will
discuss binary tree in detail, later in this tutorial.
Binary search tree is an ordered binary tree. All the elements in the left sub-tree are less than
the root while elements present in the right sub-tree are greater than or equal to the root node
element. Binary search trees are used in most of the applications of computer science domain
like searching, sorting, etc.
53
BCA 403 (File & Data Structure)
Expression Tree
Expression trees are used to evaluate the simple arithmetic expressions. Expression tree is
basically a binary tree where internal nodes are represented by operators while the leaf nodes
are represented by operands. Expression trees are widely used to solve algebraic expressions
like (a+b)*(a-b). Consider the following example.
Binary Tree
Binary Tree is a special type of generic tree in which, each node can have at most two children.
Binary tree is generally partitioned into three disjoint subsets.
1. Root of the node
2. left sub-tree which is also a binary tree.
3. Right binary sub-tree
In Strictly Binary Tree, every non-leaf node contain non-empty left and right sub-trees. In other
words, the degree of every non-leaf node will always be 2. A strictly binary tree with n leaves,
will have (2n - 1) nodes.
54
BCA 403 (File & Data Structure)
A Binary Tree is said to be a complete binary tree if all of the leaves are located at the same
level d. A complete binary tree is a binary tree that contains exactly 2^l nodes at each level
between level 0 and d. The total number of nodes in a complete binary tree with depth d is 2d+1-
1 where leaf nodes are 2d while non-leaf nodes are 2d-1.
1 Pre-order Traverse the root first then traverse into the left sub-tree and right sub-
Traversal tree respectively. This procedure will be applied to each sub-tree of
the tree recursively.
2 In-order Traverse the left sub-tree first, and then traverse the root and the right
Traversal sub-tree respectively. This procedure will be applied to each sub-tree
of the tree recursively.
3 Post-order Traverse the left sub-tree and then traverse the right sub-tree and root
Traversal respectively. This procedure will be applied to each sub-tree of the
tree recursively.
Pre-order traversal
Steps
o Visit the root node
o traverse the left sub-tree in pre-order
o traverse the right sub-tree in pre-order
Algorithm
o Step 1: Repeat Steps 2 to 4 while TREE != NULL
55
BCA 403 (File & Data Structure)
o Since, the traversal scheme, we are using is pre-order traversal, therefore, the first
element to be printed is 18.
o traverse the left sub-tree recursively. The root node of the left sub-tree is 211, print it
and move to left.
o Left is empty therefore print the right children and move to the right sub-tree of the
root.
o 20 is the root of sub-tree therefore, print it and move to its left. Since left sub-tree is
empty therefore move to the right and print the only element present there i.e. 190.
o Therefore, the printing sequence will be 18, 211, 90, 20, 190.
In-order traversal
Steps
o Traverse the left sub-tree in in-order
o Visit the root
o Traverse the right sub-tree in in-order
Algorithm
o Step 1: Repeat Steps 2 to 4 while TREE != NULL
o Step 2: INORDER(TREE -> LEFT)
o Step 3: Write TREE -> DATA
o Step 4: INORDER(TREE -> RIGHT)
[END OF LOOP]
o Step 5: END
56
BCA 403 (File & Data Structure)
Example
o print the left most node of the left sub-tree i.e. 23.
o print the root of the left sub-tree i.e. 211.
o print the right child i.e. 89.
o print the root node of the tree i.e. 18.
o Then, move to the right sub-tree of the binary tree and print the left most node i.e. 10.
o print the root of the right sub-tree i.e. 20.
o print the right child i.e. 32.
o hence, the printing sequence will be 23, 211, 89, 18, 10, 20, 32.
Post-order traversal
Steps
o Traverse the left sub-tree in post-order
o Traverse the right sub-tree in post-order
o visit the root
Algorithm
o Step 1: Repeat Steps 2 to 4 while TREE != NULL
o Step 2: POSTORDER(TREE -> LEFT)
o Step 3: POSTORDER(TREE -> RIGHT)
o Step 4: Write TREE -> DATA
[END OF LOOP]
o Step 5: END
Example
57
BCA 403 (File & Data Structure)
o Print the left child of the left sub-tree of binary tree i.e. 23.
o print the right child of the left sub-tree of binary tree i.e. 89.
o print the root node of the left sub-tree i.e. 211.
o Now, before printing the root node, move to right sub-tree and print the left child i.e.
10.
o print 32 i.e. right child.
o Print the root node 20.
o Now, at the last, print the root of the tree i.e. 18.
The printing sequence will be 23, 89, 211, 10, 32, 18.
In this representation, the binary tree is stored in the memory, in the form of a linked list where
the number of nodes are stored at non-contiguous memory locations and linked together by
inheriting parent child relationship like a tree. every node contains three parts : pointer to the
left node, data element and pointer to the right node. Each binary tree has a root pointer
which points to the root node of the binary tree. In an empty binary tree, the root pointer will
point to null.
In the above figure, a tree is seen as the collection of nodes where each node contains three
parts : left pointer, data element and right pointer. Left pointer stores the address of the left
58
BCA 403 (File & Data Structure)
child while the right pointer stores the address of the right child. The leaf node contains null in
its left and right pointers.
The following image shows about how the memory will be allocated for the binary tree by
using linked representation. There is a special pointer maintained in the memory which points
to the root node of the tree. Every node in the tree contains the address of its left and right child.
Leaf node contains null in its left and right pointers.
2. Sequential Representation
This is the simplest memory allocation technique to store the tree elements but it is an
inefficient technique since it requires a lot of space to store the tree elements. A binary tree is
shown in the following figure along with its memory allocation.
59
BCA 403 (File & Data Structure)
In this representation, an array is used to store the tree elements. Size of the array will be equal
to the number of nodes present in the tree. The root node of the tree will be present at the
1st index of the array. If a node is stored at ith index then its left and right children will be stored
at 2i and 2i+1 location. If the 1st index of the array i.e. tree[1] is 0, it means that the tree is
empty.
Binary Search Tree
Binary Search tree can be defined as a class of binary trees, in which the nodes are arranged in
a specific order. This is also called ordered binary tree. In a binary search tree, the value of all
the nodes in the left sub-tree is less than the value of the root. Similarly, value of all the nodes
in the right sub-tree is greater than or equal to the value of the root. This rule will be recursively
applied to all the left and right sub-trees of the root.
As the constraint applied on the BST, we can see that the root node 30 doesn't contain any
value greater than or equal to 30 in its left sub-tree and it also doesn't contain any value less
than 30 in its right sub-tree.
The process of creating BST by using the given elements, is shown in the image below.
60
BCA 403 (File & Data Structure)
There are many operations which can be performed on a binary search tree.
SN Operation Description
61
BCA 403 (File & Data Structure)
Inserting a value in the correct position is similar to searching because we try to maintain the
rule that left subtree is lesser than root and right subtree is larger than root. We keep going to
either right subtree or left subtree depending on the value and when we reach a point left or
right subtree is null, we put the new node there.
Algorithm:
If node == NULL
return createNode(data)
if (data < node->data)
node->left = insert(node->left, data);
else if (data > node->data)
node->right = insert(node->right, data);
return node;
50 50
/ \ delete(20) / \
30 70 ---------> 30 70
/ \ / \ \ / \
20 40 60 80 40 60 80
2) Node to be deleted has only one child: Copy the child to the node and delete the child
50 50
/ \ delete(30) / \
30 70 ---------> 40 70
\ / \ / \
40 60 80 60 80
3) Node to be deleted has two children:
Find inorder successor of the node. Copy contents of the inorder successor to the node and
delete the inorder successor. Note that inorder predecessor can also be used.
50 60
/ \ delete(50) / \
40 70 ---------> 40 70
/ \ \
60 80 80
The important thing to note is, inorder successor is needed only when right child is not empty.
In this particular case, inorder successor can be obtained by finding the minimum value in right
child of the node.
62
BCA 403 (File & Data Structure)
Algorithm:
Height (TREE)
Step 1: IF TREE = NULL
Return
ELSE
SET LeftHeight = Height(TREE ->LEFT)
SET RightHeight = Height(TREE-> RIGHT)
IF LeftHeight > RightHeight
Return LeftHeight + 1
ELSE
Return RightHeight + 1
[END OF IF]
[END OF IF]
Step 2: END
63
BCA 403 (File & Data Structure)
Algorithm:
totalNodes(TREE)
Step 1: IF TREE = NULL
Return
ELSE
Return totalNodes(TREE-> LEFT) + totalNodes(TREE ->RIGHT) + 1
[END OF IF]
Step 2: END
A. J. Perlis and C. Thornton have proposed new binary tree called "Threaded Binary Tree",
which makes use of NULL pointers to improve its traversal process. In threaded binary tree,
NULL pointers are replaced by references of other nodes in the tree. These extra references are
called as threads.
Threaded Binary Tree is also a binary tree in which all left child pointers that are NULL (in
Linked list representation) points to its in-order predecessor, and all right child pointers that
are NULL (in Linked list representation) points to its in-order successor. If there is no in-order
predecessor or in-order successor, then it points to the root node.
Example:
B C
D
E F G
64
H I J
BCA 403 (File & Data Structure)
To convert the above example binary tree into threaded binary tree, first find the in-order
traversal of that tree.
When we represent the above binary tree using linked list representation, nodes H, I, E, F, J
and G left child pointers are NULL. This NULL is replaced by address of its in-order
predecessor respectively (I to D, E to B, F to A, J to F and G to C), but here the node H does
not have its in-order predecessor, so it points to the root node A. And nodes H, I, E, J and G
right child pointers are NULL. This NULL pointers are replaced by address of its in-order
successor respectively (H to D, I to B, E to A, and J to C), but here the node G does not have
its in-order successor, so it points to the root node A.
Balance factor of a node is the difference between the heights of left and right subtrees of that
node. The balance factor of a node is calculated either
Balance Factor = height of left subtree - height of right subtree
(OR) height of right subtree - height of left subtree
Example:
65
BCA 403 (File & Data Structure)
66
BCA 403 (File & Data Structure)
67
BCA 403 (File & Data Structure)
Step 1 - Insert the new element into the tree using Binary Search Tree insertion logic.
Step 2 - After insertion, check the Balance Factor of every node.
Step 3 - If the Balance Factor of every node is 0 or 1 or -1 then go for next operation.
Step 4 - If the Balance Factor of any node is other than 0 or 1 or -1 then that tree is said to be
imbalanced. In this case, perform suitable Rotation to make it balanced and go for next
operation.
B Tree
B-Tree is a self-balancing search tree. In most of the other self-balancing search trees (like
AVL and Red-Black Trees), it is assumed that everything is in main memory. To understand
the use of B-Trees, we must think of the huge amount of data that cannot fit in main memory.
When the number of keys is high, the data is read from disk in the form of blocks. Disk access
time is very high compared to main memory access time. The main idea of using B-Trees is to
reduce the number of disk accesses. Most of the tree operations (search, insert, delete, max,
min, etc ) require O(h) disk accesses where h is the height of the tree. B-tree is a fat tree. The
height of B-Trees is kept low by putting maximum possible keys in a B-Tree node. Generally,
68
BCA 403 (File & Data Structure)
a B-Tree node size is kept equal to the disk block size. Since h is low for B-Tree, total disk
accesses for most of the operations are reduced significantly compared to balanced Binary
Search Trees like AVL Tree, Red-Black Tree, etc.
Properties of B-Tree
1) All leaves are at same level.
2) A B-Tree is defined by the term minimum degree‘t’. The value of t depends upon disk block
size.
3) Every node except root must contain at least t-1 keys. Root may contain minimum 1 key.
4) All nodes (including root) may contain at most 2t – 1 keys.
5) Number of children of a node is equal to the number of keys in it plus 1.
6) All keys of a node are sorted in increasing order. The child between two keys k1 and k2
contains all keys in the range from k1 and k2.
7) B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search
Trees grow downward and also shrink from downward.
8) Like other balanced Binary Search Trees, time complexity to search, insert and delete is
O(Logn).
Deletion process:
Deletion from a B-tree is more complicated than insertion, because we can delete a key from
any node-not just a leaf—and when we delete a key from an internal node, we will have to
rearrange the node’s children.
We sketch how deletion works with various cases of deleting keys from a B-tree.
b) If y has fewer than t keys, then, symmetrically, examine the child z that follows k in node
x. If z has at least t keys, then find the successor k0 of k in the subtree rooted at z. Recursively
delete k0, and replace k by k0 in x. (We can find k0 and delete it in a single downward pass.)
69
BCA 403 (File & Data Structure)
c) Otherwise, if both y and z have only t-1 keys, merge k and all of z into y, so that x loses
both k and the pointer to z, and y now contains 2t-1 keys. Then free z and recursively delete
k from y.
3. If the key k is not present in internal node x, determine the root x.c(i) of the appropriate
subtree that must contain k, if k is in the tree at all. If x.c(i) has only t-1 keys, execute step 3a
or 3b as necessary to guarantee that we descend to a node containing at least t keys. Then finish
by recursing on the appropriate child of x.
a) If x.c(i) has only t-1 keys but has an immediate sibling with at least t keys, give x.c(i) an
extra key by moving a key from x down into x.c(i), moving a key from x.c(i) ’s immediate
left or right sibling up into x, and moving the appropriate child pointer from the sibling into
x.c(i).
b) If x.c(i) and both of x.c(i)’s immediate siblings have t-1 keys, merge x.c(i) with one
sibling, which involves moving a key from x down into the new merged node to become the
median key for that node.
70
BCA 403 (File & Data Structure)
Graph
A Graph is a non-linear data structure consisting of nodes and edges. A graph is a pictorial
representation of a set of objects where some pairs of objects are connected by links. The
interconnected objects are represented by points termed as vertices, and the links that connect
the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices.
Graph Terminology
Adjacency: A vertex is said to be adjacent to another vertex if there is an edge connecting them.
Vertices 2 and 3 are not adjacent because there is no edge between them.
Path: A sequence of edges that allows you to go from vertex A to vertex B is called a path. 0-
1, 1-2 and 0-2 are paths from vertex 0 to vertex 2.
Directed Graph: A graph in which an edge (u,v) doesn't necessary mean that there is an edge
(v, u) as well. The edges in such a graph are represented by arrows to show the direction of the
edge.
Graph Representation
Graphs are commonly represented in two ways:
1. Adjacency Matrix
An adjacency matrix is 2D array of V x V vertices. Each row and column represent a vertex.
If the value of any element a[i][j] is 1, it represents that there is an edge connecting vertex i
and vertex j.
71
BCA 403 (File & Data Structure)
Since it is an undirected graph, for edge (0,2), we also need to mark edge (2,0); making the
adjacency matrix symmetric about the diagonal.
Edge lookup(checking if an edge exists between vertex A and vertex B) is extremely fast in
adjacency matrix representation but we have to reserve space for every possible link between
all vertices(V x V), so it requires more space.
2. Adjacency List
An adjacency list represents a graph as an array of linked list. The index of the array represents
a vertex and each element in its linked list represents the other vertices that form an edge with
the vertex.
The adjacency list for the graph we made in the first example is as follows:
An adjacency list representation represents graph as array of linked lists where index represents
the vertex and each element in linked list represents the edges connected to that vertex.
An adjacency list is efficient in terms of storage because we only need to store the values for
the edges. For a graph with millions of vertices, this can mean a lot of saved space.
Graph Operations
The most common graph operations are:
There are two graph traversal techniques and they are as follows...
72
BCA 403 (File & Data Structure)
Algorithm Steps:
73
BCA 403 (File & Data Structure)
1. Set all vertices distances = infinity except for the source vertex, set the source distance
=.
2. Push the source vertex in a min-priority queue in the form (distance , vertex), as the
comparison in the min-priority queue will be according to vertices distances.
3. Pop the vertex with the minimum distance from the priority queue (at first the popped
vertex = source).
4. Update the distances of the connected vertices to the popped vertex in case of "current
vertex distance + edge weight < next vertex distance", then push the vertex
5. with the new distance to the priority queue.
6. If the popped vertex is visited before, just continue without using it.
7. Apply the same algorithm again until the priority queue is empty.
The set sptSet is initially empty and distances assigned to vertices are {0, INF, INF, INF, INF,
INF, INF, INF} where INF indicates infinite. Now pick the vertex with minimum distance
value. The vertex 0 is picked, include it in sptSet. So sptSet becomes {0}. After including 0 to
sptSet, update distance values of its adjacent vertices. Adjacent vertices of 0 are 1 and 7. The
distance values of 1 and 7 are updated as 4 and 8. Following subgraph shows vertices and their
distance values, only the vertices with finite distance values are shown. The vertices included
in SPT are shown in green colour.
74
BCA 403 (File & Data Structure)
A spanning tree is a subset of Graph G, which has all the vertices covered with minimum
possible number of edges. Hence, a spanning tree does not have cycles and it cannot be
disconnected..
By this definition, we can draw a conclusion that every connected and undirected Graph G has
at least one spanning tree. A disconnected graph does not have any spanning tree, as it cannot
be spanned to all its vertices.
75
BCA 403 (File & Data Structure)
Kruskal's Algorithm
Prim's Algorithm
Kruskal’s Algorithm:
Kruskal's algorithm to find the minimum cost spanning tree uses the greedy approach. This
algorithm treats the graph as a forest and every node it has as an individual tree. A tree connects
to another only and only if, it has the least cost among all available options and does not violate
MST properties.
In this method, the emphasis is on the choice of edges of minimum weight from amongst all
the available edges, of course, subject to the condition that chosen edges do not form a cycle.
The connectivity of the chosen edges, at any stage, in the form of a subtree, which was
emphasized in Prim’s algorithm, is not essential.
The Kruskal’s algorithm to find minimal spanning tree of a given weighted and connected
graph, as follows:
(i) First of all, order all the weights of the edges in increasing order. Then repeat
the following two steps till a set of edges is selected containing all the vertices
of the given graph.
(ii) Choose an edge having the weight which is the minimum of the weights of the
edges not selected so far.
(iii) If the new edge forms a cycle with any subset of the earlier selected edges, then
drop it, else, add the edge to the set of selected edges.
Example:
The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
After sorting:
Weight Src Dest
1 7 6
2 8 2
2 6 5
4 0 1
4 2 5
6 8 6
7 2 3
7 7 8
8 0 7
76
BCA 403 (File & Data Structure)
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
Now pick all edges one by one from sorted list of edges
1. Pick edge 7-6: No cycle is formed, include it.
5.
4. Pick edge 0-1: No cycle is formed, include it.
5.
6. Pick edge 2-5: No cycle is formed, include it.
7.
6. Pick edge 8-6: Since including this edge results in cycle, discard it.
77
BCA 403 (File & Data Structure)
8. Pick edge 7-8: Since including this edge results in cycle, discard it.
9. Pick edge 0-7: No cycle is formed, include it.
10. Pick edge 1-2: Since including this edge results in cycle, discard it.
11. Pick edge 3-4: No cycle is formed, include it.
Since the number of edges included equals (V – 1), the algorithm stops here.
Prims’s Algorithm:
Like Kruskal’s algorithm, Prim’s algorithm is also a Greedy algorithm. It starts with an empty
spanning tree. The idea is to maintain two sets of vertices. The first set contains the vertices
already included in the MST, the other set contains the vertices not yet included. At every step,
it considers all the edges that connect the two sets, and picks the minimum weight edge from
these edges. After picking the edge, it moves the other endpoint of the edge to the set containing
MST.
A group of edges that connects two set of vertices in a graph is called cut in graph theory. So,
at every step of Prim’s algorithm, we find a cut (of two sets, one contains the vertices already
included in MST and other contains rest of the verices), pick the minimum weight edge from
the cut and include this vertex to MST Set (the set that contains already included vertices).
The idea behind Prim’s algorithm is simple, a spanning tree means all vertices must be
connected. So the two disjoint subsets (discussed above) of vertices must be connected to make
a Spanning Tree. And they must be connected with the minimum weight edge to make it a
Minimum Spanning Tree.
Algorithm
78
BCA 403 (File & Data Structure)
1) Create a set mstSet that keeps track of vertices already included in MST.
2) Assign a key value to all vertices in the input graph. Initialize all key values as INFINITE.
Assign key value as 0 for the first vertex so that it is picked first.
3) While mstSet doesn’t include all vertices
a) Pick a vertex u which is not there in mstSet and has minimum key value.
b) Include u to mstSet.
c) Update key value of all adjacent vertices of u. To update the key values, iterate
through all adjacent vertices. For every adjacent vertex v, if weight of edge u-v is less
than the previous key value of v, update the key value as weight of u-v
Let us understand with the following example:
The set mstSet is initially empty and keys assigned to vertices are {0, INF, INF, INF, INF,
INF, INF, INF} where INF indicates infinite. Now pick the vertex with minimum key value.
The vertex 0 is picked, include it in mstSet. So mstSet becomes {0}. After including
to mstSet, update key values of adjacent vertices. Adjacent vertices of 0 are 1 and 7. The key
values of 1 and 7 are updated as 4 and 8. Following subgraph shows vertices and their key
values, only the vertices with finite key values are shown. The vertices included in MST are
shown in green color.
Pick the vertex with minimum key value and not already included in MST (not in mstSET).
The vertex 1 is picked and added to mstSet. So mstSet now becomes {0, 1}. Update the key
values of adjacent vertices of 1. The key value of vertex 2 becomes 8.
Pick the vertex with minimum key value and not already included in MST (not in mstSET).
We can either pick vertex 7 or vertex 2, let vertex 7 is picked. So mstSet now becomes {0, 1,
7}. Update the key values of adjacent vertices of 7. The key value of vertex 6 and 8 becomes
79
BCA 403 (File & Data Structure)
Pick the vertex with minimum key value and not already included in MST (not in mstSET).
Vertex 6 is picked. So mstSet now becomes {0, 1, 7, 6}. Update the key values of adjacent
vertices of 6. The key value of vertex 5 and 8 are updated.
We repeat the above steps until mstSet includes all vertices of given graph. Finally, we get the
following graph.
80
BCA 403 (File & Data Structure)
Sorting
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Most common orders are in numerical or lexicographical
order.
The importance of sorting lies in the fact that data searching can be optimized to a very high
level, if data is stored in a sorted manner. Sorting is also used to represent data in more
readable formats.
In-place Sorting and Not-in-place Sorting
Sorting algorithms may require some extra space for comparison and temporary storage of
few data elements. These algorithms do not require any extra space and sorting is said to
happen in-place, or for example, within the array itself. This is called in-place sorting. Bubble
sort is an example of in-place sorting.
However, in some sorting algorithms, the program requires space which is more than or equal
to the elements being sorted. Sorting which uses equal or more space is called not-in-place
sorting. Merge-sort is an example of not-in-place sorting.
Stable and Not Stable Sorting
If a sorting algorithm, after sorting the contents, does not change the sequence of similar
content in which they appear, it is called stable sorting.
If a sorting algorithm, after sorting the contents, changes the sequence of similar content in
which they appear, it is called unstable sorting.
Stability of an algorithm matters when we wish to maintain the sequence of original elements,
like in a tuple for example.
81
BCA 403 (File & Data Structure)
In selection sort, the smallest value among the unsorted elements of the array is selected in
every pass and inserted to its appropriate position into the array.
First, find the smallest element of the array and place it on the first position. Then, find the
second smallest element of the array and place it on the second position. The process
continues until we get the sorted array.
The array with n elements is sorted by using n-1 pass of selection sort algorithm.
82
BCA 403 (File & Data Structure)
o In 1st pass, smallest element of the array is to be found along with its index pos. then,
swap A[0] and A[pos]. Thus A[0] is sorted, we now have n -1 elements which are to
be sorted.
o In 2nd pas, position pos of the smallest element present in the sub-array A[n-1] is
found. Then, swap, A[1] and A[pos]. Thus A[0] and A[1] are sorted, we now left with
n-2 unsorted elements.
o In n-1th pass, position pos of the smaller element between A[n-1] and A[n-2] is to be
found. Then, swap, A[pos] and A[n-1].
Therefore, by following the above explained process, the elements A[0], A[1], A[2],...., A[n-
1] are sorted.
Example
Consider the following array with 6 elements. Sort the elements of the array by using
selection sort.
1 1 2 10 3 90 43 56
2 2 2 3 10 90 43 56
3 2 2 3 10 90 43 56
4 4 2 3 10 43 90 56
5 5 2 3 10 43 56 90
Algorithm
SELECTION SORT(ARR, N)
o Step 1: Repeat Steps 2 and 3 for K = 1 to N-1
o Step 2: CALL SMALLEST(ARR, K, N, POS)
o Step 3: SWAP A[K] with ARR[POS]
[END OF LOOP]
o Step 4: EXIT
Bubble Sort
In Bubble sort, Each element of the array is compared with its adjacent element. The
algorithm processes the list in passes. A list with n elements requires n-1 passes for sorting.
Consider an array A of n elements whose elements are to be sorted by using Bubble sort. The
algorithm processes like following.
1. In Pass 1, A[0] is compared with A[1], A[1] is compared with A[2], A[2] is compared
with A[3] and so on. At the end of pass 1, the largest element of the list is placed at
the highest index of the list.
2. In Pass 2, A[0] is compared with A[1], A[1] is compared with A[2] and so on. At the
end of Pass 2 the second largest element of the list is placed at the second highest
index of the list.
3. In pass n-1, A[0] is compared with A[1], A[1] is compared with A[2] and so on. At
the end of this pass. The smallest element of the list is placed at the first index of the
list.
Algorithm :
o Step 1: Repeat Step 2 For i = 0 to N-1
o Step 2: Repeat For J = i + 1 to N - I
o Step 3: IF A[J] > A[i]
SWAP A[J] and A[i]
[END OF INNER LOOP]
[END OF OUTER LOOP
o Step 4: EXIT
Example:
Bubble sort starts with very first two elements, comparing them to check which one is greater.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we compare
33 with 27.
84
BCA 403 (File & Data Structure)
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We know then that 10 is smaller 35. Hence they are not sorted.
We swap these values. We find that we have reached the end of the array. After one iteration,
the array should look like this −
To be precise, we are now showing how an array should look like after each iteration. After
the second iteration, it should look like this −
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely sorted.
Insertion Sort
This is an in-place comparison-based sorting algorithm. Here, a sub-list is maintained which
is always sorted. For example, the lower part of an array is maintained to be sorted. An element
85
BCA 403 (File & Data Structure)
which is to be 'insert'ed in this sorted sub-list, has to find its appropriate place and then it has
to be inserted there. Hence the name, insertion sort.
The array is searched sequentially and unsorted items are moved and inserted into the sorted
sub-list (in the same array). This algorithm is not suitable for large data sets as its average and
worst case complexity are of Ο(n2), where n is the number of items.
Example:
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list.
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that
the sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-
list remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
So we swap them.
86
BCA 403 (File & Data Structure)
We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.
This process goes on until all the unsorted values are covered in a sorted sub-list.
Algorithm
o Step 1: Repeat Steps 2 to 5 for K = 1 to N-1
o Step 2: SET TEMP = ARR[K]
o Step 3: SET J = K ? 1
o Step 4: Repeat while TEMP <=ARR[J]
SET ARR[J + 1] = ARR[J]
SET J = J ? 1
[END OF INNER LOOP]
o Step 5: SET ARR[J + 1] = TEMP
[END OF LOOP]
o Step 6: EXIT
Merge Sort
Merge sort is a sorting technique based on divide and conquer technique. With worst-case
time complexity being Ο(n log n), it is one of the most respected algorithms.
Merge sort first divides the array into equal halves and then combines them in a sorted manner.
Example −
87
BCA 403 (File & Data Structure)
Algorithm:
Step 1: [INITIALIZE] SET I = BEG, J = MID + 1, INDEX = 0
Step 2: Repeat while (I <= MID) AND (J<=END)
IF ARR[I] < ARR[J]
SET TEMP[INDEX] = ARR[I]
SET I = I + 1
ELSE
SET TEMP[INDEX] = ARR[J]
SET J = J + 1
[END OF IF]
SET INDEX = INDEX + 1
[END OF LOOP]
Step 3: [Copy the remaining
elements of right sub-array, if
any]
IF I > MID
Repeat while J <= END
SET TEMP[INDEX] = ARR[J]
SET INDEX = INDEX + 1, SET J = J + 1
[END OF LOOP]
[Copy the remaining elements of
left sub-array, if any]
ELSE
Repeat while I <= MID
SET TEMP[INDEX] = ARR[I]
SET INDEX = INDEX + 1, SET I = I + 1
[END OF LOOP]
[END OF IF]
Step 4: [Copy the contents of TEMP back to ARR] SET K = 0
Step 5: Repeat while K < INDEX
SET ARR[K] = TEMP[K]
88
BCA 403 (File & Data Structure)
SET K = K + 1
[END OF LOOP]
Step 6: Exit
Shell Sort
Shell sort is the generalization of insertion sort which overcomes the drawbacks of insertion sort by
comparing elements separated by a gap of several positions. In general, Shell sort performs the
following steps.
o Step 1: Arrange the elements in the tabular form and sort the columns by using
insertion sort.
o Step 2: Repeat Step 1; each time with smaller number of longer columns in such a
way that at the end, there is only one column of data to be sorted.
Algorithm
Shell_Sort(Arr, n)
o Step 1: SET FLAG = 1, GAP_SIZE = N
o Step 2: Repeat Steps 3 to 6 while FLAG = 1 OR GAP_SIZE > 1
o Step 3:SET FLAG = 0
o Step 4:SET GAP_SIZE = (GAP_SIZE + 1) / 2
o Step 5:Repeat Step 6 for I = 0 to I < (N -GAP_SIZE)
o Step 6:IF Arr[I + GAP_SIZE] > Arr[I]
SWAP Arr[I + GAP_SIZE], Arr[I]
SET FLAG = 0
o Step 7: END
Quick Sort
Quick sort is the widely used sorting algorithm that makes n log n comparisons in average
case for sorting of an array of n elements. This algorithm follows divide and conquer
approach. The algorithm processes the array in the following way.
1. Set the first index of the array to left and loc variable. Set the last index of the array to
right variable. i.e. left = 0, loc = 0, en d = n ? 1, where n is the length of the array.
89
BCA 403 (File & Data Structure)
2. Start from the right of the array and scan the complete array from right to beginning
comparing each element of the array with the element pointed by loc.
Algorithm
90
BCA 403 (File & Data Structure)
o Step 8: END
Heap Sort
Heap sort processes the elements by creating the min heap or max heap using the elements of
the given array. Min heap or max heap represents the ordering of the array in which root
element represents the minimum or maximum element of the array. At each step, the root
element of the heap gets deleted and stored into the sorted array and the heap will again be
heapified.
Heap sort is a comparison based sorting technique based on Binary Heap data structure. It is
similar to selection sort where we first find the maximum element and place the maximum
element at the end. We repeat the same process for remaining element
The heap sort basically recursively performs two main operations.
91
BCA 403 (File & Data Structure)
Searching
Searching means to find whether a particular value is present in an array or not. If the value is
present in the array, then searching is said to be successful and the searching process gives the
location of that value in the array. However, if the value is not present in the array, the searching
process displays an appropriate message and in this case searching is said to be unsuccessful.
Searching Techniques
To search an element in a given array, it can be done in following ways:
1. Sequential Search
2. Binary Search
3. Interpolation Search
Sequential Search:
Sequential search is also called as Linear Search. Sequential search starts at the beginning of
the list and checks every element of the list. It is a basic and simple search algorithm. Sequential
search compares the element with all the other elements given in the list. If the element is
matched, it returns the value index, else it returns -1.
92
BCA 403 (File & Data Structure)
not present in the array or it is equal to the last element of the array. In both the cases, n
comparisons will have to be made. by using a sorted array.
Binary Search:
Binary Search is used for searching an element in a sorted array. It is a fast search algorithm
with run-time complexity of O(log n). Binary search works on the principle of divide and
conquer. This searching technique looks for a particular element by comparing the middle most
element of the collection. It is useful when there are large number of elements in an array.
Example:
93
BCA 403 (File & Data Structure)
can say that, in order to locate a particular value in the array, the total number of comparisons
that will be made is given as
2f(n) > n or f(n) = log2n
Interpolation Search:
Interpolation search, also known as extrapolation search, is a searching technique that finds a
specified value in a sorted array. The concept of interpolation search is similar to how we
search for names in a telephone book or for keys by which a book’s entries are ordered.
For example, when looking for a name “Bharat” in a telephone directory, we know that it will
be near the extreme left, so applying a binary search technique by dividing the list in two halves
each time is not a good idea. We must start scanning the extreme left in the first pass itself.
In each step of interpolation search, the remaining search space for the value to be found is
calculated. The calculation is done based on the values at the bounds of the search space and
the value to be searched. The value found at this estimated position is then compared with the
value being searched for. If the two values are equal, then the search is complete.
However, in case the values are not equal then depending on the comparison, the remaining
search space is reduced to the part before or after the estimated position. Thus, we see that
interpolation search is similar to the binary search technique. However, the important
difference between the two techniques is that binary search always selects the middle value of
the remaining search space. It discards half of the values based on the comparison between the
value found at the estimated position and the value to be searched. But in interpolation search,
interpolation is used to find an item near the one being searched for, and then linear search is
used to find the exact item.
Algorithm of Interpolation Search:
94