0% found this document useful (0 votes)
27 views

Unit 1

DSU notes

Uploaded by

ketankotane70
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Unit 1

DSU notes

Uploaded by

ketankotane70
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit 1

Introduction to Data Structures


Concepts and need of data structures
Introduction

Data Structure can be defined as the group of data elements which provides an efficient way of
storing and organising data in the computer so that it can be used efficiently. Some examples of
Data Structures are arrays, Linked List, Stack, Queue, etc. Data Structures are widely used in
almost every aspect of Computer Science i.e. Operating System, Compiler Design, Artifical
intelligence, Graphics and many more.

Data Structures are the main part of many computer science algorithms as they enable the
programmers to handle the data in an efficient way. It plays a vital role in enhancing the
performance of a software or a program as the main function of the software is to store and
retrieve the user's data as fast as possible.

Basic Terminology

Data structures are the building blocks of any program or the software. Choosing the
appropriate data structure for a program is the most difficult task for a programmer. Following
terminology is used as far as data structures are concerned.

Data: Data can be defined as an elementary value or the collection of values, for example,
student's name and its id are the data about the student.

Group Items: Data items which have subordinate data items are called Group item, for
example, name of a student can have first name and the last name.

Record: Record can be defined as the collection of various data items, for example, if we talk
about the student entity, then its name, address, course and marks can be grouped together to
form the record for the student.

File: A File is a collection of various records of one type of entity, for example, if there are 60
employees in the class, then there will be 20 records in the related file where each record
contains the data about each employee.

Attribute and Entity: An entity represents the class of certain objects. it contains various
attributes. Each attribute represents the particular property of that entity.

Field: Field is a single elementary unit of information representing the attribute of an entity.
Need of Data Structures

As applications are getting complexed and amount of data is increasing day by day, there may
arrise the following problems:

Processor speed: To handle very large amout of data, high speed processing is required, but as
the data is growing day by day to the billions of files per entity, processor may fail to deal with
that much amount of data.

Data Search: Consider an inventory size of 106 items in a store, If our application needs to
search for a particular item, it needs to traverse 106 items every time, results in slowing down
the search process.

Multiple requests: If thousands of users are searching the data simultaneously on a web server,
then there are the chances that a very large server can be failed during that process

in order to solve the above problems, data structures are used. Data is organized to form a data
structure in such a way that all items are not required to be searched and required data can be
searched instantly.

Advantages of Data Structures

Efficiency: Efficiency of a program depends upon the choice of data structures. For example:
suppose, we have some data and we need to perform the search for a perticular record. In that
case, if we organize our data in an array, we will have to search sequentially element by
element. hence, using array may not be very efficient here. There are better data structures
which can make the search process efficient like ordered array, binary search tree or hash
tables.

Reusability: Data structures are reusable, i.e. once we have implemented a particular data
structure, we can use it at any other place. Implementation of data structures can be compiled
into libraries which can be used by different clients.

Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The
client program uses the data structure through interface only, without getting into the
implementation details.
Data Structure Classification

Linear Data Structures: A data structure is called linear if all of its elements are arranged in the
linear order. In linear data structures, the elements are stored in non-hierarchical way where
each element has the successors and predecessors except the first and last element.

Types of Linear Data Structures are given below:

Arrays: An array is a collection of similar type of data items and each data item is called an
element of the array. The data type of the element may be any valid data type like char, int,
float or double.

The elements of array share the same variable name but each one carries a different index
number known as subscript. The array can be one dimensional, two dimensional or
multidimensional.
The individual elements of the array age are:

age[0], age[1], age[2], age[3],......... age[98], age[99].

Linked List: Linked list is a linear data structure which is used to maintain a list in the memory. It
can be seen as the collection of nodes stored at non-contiguous memory locations. Each node
of the list contains a pointer to its adjacent node.

Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called
top.

A stack is an abstract data type (ADT), can be implemented in most of the programming
languages. It is named as stack because it behaves like a real-world stack, for example: - piles of
plates or deck of cards etc.

Queue: Queue is a linear list in which elements can be inserted only at one end called rear and
deleted only at the other end called front.

It is an abstract data structure, similar to stack. Queue is opened at both end therefore it
follows First-In-First-Out (FIFO) methodology for storing the data items.

Non Linear Data Structures: This data structure does not form a sequence i.e. each item or
element is connected with two or more other items in a non-linear arrangement. The data
elements are not arranged in sequential structure.

Types of Non Linear Data Structures are given below:

Trees: Trees are multilevel data structures with a hierarchical relationship among its elements
known as nodes. The bottommost nodes in the herierchy are called leaf node while the
topmost node is called root node. Each node contains pointers to point adjacent nodes.

Tree data structure is based on the parent-child relationship among the nodes. Each node in the
tree can have more than one children except the leaf nodes whereas each node can have
atmost one parent except the root node. Trees can be classfied into many categories which will
be discussed later in this tutorial.

Graphs: Graphs can be defined as the pictorial representation of the set of elements
(represented by vertices) connected by the links known as edges. A graph is different from tree
in the sense that a graph can have cycle while the tree can not have the one.

Operations on data structure


1) Traversing: Every data structure contains the set of data elements. Traversing the data
structure means visiting each element of the data structure in order to perform some specific
operation like searching or sorting.

Example: If we need to calculate the average of the marks obtained by a student in 6 different
subject, we need to traverse the complete array of marks and calculate the total sum, then we
will devide that sum by the number of subjects i.e. 6, in order to find the average.

2) Insertion: Insertion can be defined as the process of adding the elements to the data
structure at any location.

If the size of data structure is n then we can only insert n-1 data elements into it.

3) Deletion:The process of removing an element from the data structure is called Deletion. We
can delete an element from the data structure at any random location.

If we try to delete an element from an empty data structure then underflow occurs.

4) Searching: The process of finding the location of an element within the data structure is
called Searching. There are two algorithms to perform searching, Linear Search and Binary
Search. We will discuss each one of them later in this tutorial.

5) Sorting: The process of arranging the data structure in a specific order is known as Sorting.
There are many algorithms that can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.

6) Merging: When two lists List A and List B of size M and N respectively, of similar type of
elements, clubbed or joined to produce the third list, List C of size (M+N), then this process is
called merging

What is abstract data type?

An abstract data type is an abstraction of a data structure that provides only the interface to
which the data structure must adhere. The interface does not give any specific details about
something should be implemented or in what programming language.

In other words, we can say that abstract data types are the entities that are definitions of data
and operations but do not have implementation details. In this case, we know the data that we
are storing and the operations that can be performed on the data, but we don't know about the
implementation details. The reason for not having implementation details is that every
programming language has a different implementation strategy for example; a C data structure
is implemented using structures while a C++ data structure is implemented using objects and
classes.

For example, a List is an abstract data type that is implemented using a dynamic array and
linked list. A queue is implemented using linked list-based queue, array-based queue, and stack-
based queue. A Map is implemented using Tree map, hash map, or hash table.

Abstract data type model

Before knowing about the abstract data type model, we should know about abstraction and
encapsulation.

Abstraction: It is a technique of hiding the internal details from the user and only showing the
necessary details to the user.

Encapsulation: It is a technique of combining the data and the member function in a single unit
is known as encapsulation.

The above figure shows the ADT model. There are two types of models in the ADT model, i.e.,
the public function and the private function. The ADT model also contains the data structures
that we are using in a program. In this model, first encapsulation is performed, i.e., all the data
is wrapped in a single unit, i.e., ADT. Then, the abstraction is performed means showing the
operations that can be performed on the data structure and what are the data structures that
we are using in a program.

Let's understand the abstract data type with a real-world example.

If we consider the smartphone. We look at the high specifications of the smartphone, such as:
 4 GB RAM
 Snapdragon 2.2ghz processor
 5 inch LCD screen
 Dual camera
 Android 8.0

The above specifications of the smartphone are the data, and we can also perform the
following operations on the smartphone:

 call(): We can call through the smartphone.


 text(): We can text a message.
 photo(): We can click a photo.
 video(): We can also make a video.

The smartphone is an entity whose data or specifications and operations are given above. The
abstract/logical view and operations are the abstract or logical views of a smartphone.

The implementation view of the above abstract/logical view is given below:

class Smartphone

private:

int ramSize;

string processorName;

float screenSize;

int cameraCount;

string androidVersion;

public:

void call();

void text();

void photo();
void video();

The above code is the implementation of the specifications and operations that can be
performed on the smartphone. The implementation view can differ because the syntax of
programming languages is different, but the abstract/logical view of the data structure would
remain the same. Therefore, we can say that the abstract/logical view is independent of the
implementation view.

Note: We know the operations that can be performed on the predefined data types such as int, float,
char, etc., but we don't know the implementation details of the data types. Therefore, we can say that
the abstract data type is considered as the hidden box that hides all the internal details of the data type.

Data structure example

Suppose we have an index array of size 4. We have an index location starting from 0, 1, 2, 3.
Array is a data structure where the elements are stored in a contiguous location. The memory
address of the first element is 1000, second element is 1004, third element is 1008, and the
fourth element is 1012. Since it is of integer type so it will occupy 4 bytes and the difference
between the addresses of each element is 4 bytes. The values stored in an array are 10, 20, 30
and 40. These values, index positions and the memory addresses are the implementations.

The abstract or logical view of the integer array can be stated as:

 It stores a set of elements of integer type.


 It reads the elements by position, i.e., index.
 It modifies the elements by index
 It performs sorting

The implementation view of the integer array:


a[4] = {10, 20, 30, 40}
cout<< a[2]

a[3] = 50
Time Complexity

What is Time complexity?


Every algorithm requires some amount of computer time to execute its instruction to perform the
task. This computer time required is called time complexity.
The time complexity of an algorithm can be defined as follows...

The time complexity of an algorithm is the total amount of time required by an algorithm to
complete its execution.

Generally, the running time of an algorithm depends upon the following...

1. Whether it is running on Single processor machine or Multi processor machine.


2. Whether it is a 32 bit machine or 64 bit machine.
3. Read and Write speed of the machine.
4. The amount of time required by an algorithm to perform Arithmetic operations, logical
operations, return value and assignment operations etc.,
5. Input data

Note - When we calculate time complexity of an algorithm, we consider only input data and
ignore the remaining things, as they are machine dependent. We check only, how our program is
behaving for the different input values to perform all the operations like Arithmetic, Logical,
Return value and Assignment etc.,

Calculating Time Complexity of an algorithm based on the system configuration is a very


difficult task because the configuration changes from one system to another system. To solve this
problem, we must assume a model machine with a specific configuration. So that, we can able to
calculate generalized time complexity according to that model machine.

To calculate the time complexity of an algorithm, we need to define a model machine. Let us
assume a machine with following configuration...

1. It is a Single processor machine


2. It is a 32 bit Operating System machine
3. It performs sequential execution
4. It requires 1 unit of time for Arithmetic and Logical operations
5. It requires 1 unit of time for Assignment and Return value
6. It requires 1 unit of time for Read and Write operations

Now, we calculate the time complexity of following example code by using the above-defined
model machine...

Consider the following piece of code...


Example 1
int sum(int a, int b)
{
return a+b;
}

In the above sample code, it requires 1 unit of time to calculate a+b and 1 unit of time to return
the value. That means, totally it takes 2 units of time to complete its execution. And it does not
change based on the input values of a and b. That means for all input values, it requires the same
amount of time i.e. 2 units.

If any program requires a fixed amount of time for all input values then its time complexity is
said to be Constant Time Complexity.

Consider the following piece of code...

Example 2
int sum(int A[], int n)
{
int sum = 0, i;
for(i = 0; i < n; i++)
sum = sum + A[i];
return sum;
}

For the above code, time complexity can be calculated as follows...

In above calculation
Cost is the amount of computer time required for a single operation in each line.
Repeatation is the amount of computer time required by each operation for all its repeatations.
Total is the amount of computer time required by each operation to execute.
So above code requires '4n+4' Units of computer time to complete the task. Here the exact time
is not fixed. And it changes based on the n value. If we increase the n value then the time
required also increases linearly.

Totally it takes '4n+4' units of time to complete its execution and it is Linear Time
Complexity.

If the amount of time required by an algorithm is increased with the increase of input value then
that time complexity is said to be Linear Time Complexity.

Space Complexity

What is Space complexity?


When we design an algorithm to solve a problem, it needs some computer memory to complete
its execution. For any algorithm, memory is required for the following purposes...

1. To store program instructions.


2. To store constant values.
3. To store variable values.
4. And for few other things like funcion calls, jumping statements etc,.

Space complexity of an algorithm can be defined as follows...

Total amount of computer memory required by an algorithm to complete its execution is called
as space complexity of that algorithm.

Generally, when a program is under execution it uses the computer memory for THREE reasons.
They are as follows...

1. Instruction Space: It is the amount of memory used to store compiled version of


instructions.
2. Environmental Stack: It is the amount of memory used to store information of partially
executed functions at the time of function call.
3. Data Space: It is the amount of memory used to store all the variables and constants.

Note - When we want to perform analysis of an algorithm based on its Space complexity, we
consider only Data Space and ignore Instruction Space as well as Environmental Stack.
That means we calculate only the memory required to store Variables, Constants, Structures, etc.,

To calculate the space complexity, we must know the memory required to store different
datatype values (according to the compiler). For example, the C Programming Language
compiler requires the following...
1. 2 bytes to store Integer value.
2. 4 bytes to store Floating Point value.
3. 1 byte to store Character value.
4. 6 (OR) 8 bytes to store double value.

Consider the following piece of code...

Example 1
int square(int a)
{
return a*a;
}

In the above piece of code, it requires 2 bytes of memory to store variable 'a' and another 2 bytes
of memory is used for return value.

That means, totally it requires 4 bytes of memory to complete its execution. And this 4
bytes of memory is fixed for any input value of 'a'. This space complexity is said to be
Constant Space Complexity.

If any algorithm requires a fixed amount of space for all input values then that space complexity
is said to be Constant Space Complexity.

Consider the following piece of code...

Example 2
int sum(int A[ ], int n)
{
int sum = 0, i;
for(i = 0; i < n; i++)
sum = sum + A[i];
return sum;
}

In the above piece of code it requires


'n*2' bytes of memory to store array variable 'a[ ]'
2 bytes of memory for integer parameter 'n'
4 bytes of memory for local integer variables 'sum' and 'i' (2 bytes each)
2 bytes of memory for return value.

That means, totally it requires '2n+8' bytes of memory to complete its execution. Here, the
total amount of memory required depends on the value of 'n'. As 'n' value increases the
space required also increases proportionately. This type of space complexity is said to be
Linear Space Complexity.
If the amount of space required by an algorithm is increased with the increase of input value,
then that space complexity is said to be Linear Space Complexity.

What is an Algorithm?

An algorithm is a process or a set of rules required to perform calculations or some other


problem-solving operations especially by a computer. The formal definition of an algorithm is
that it contains the finite set of instructions which are being carried in a specific order to perform
the specific task. It is not the complete program or code; it is just a solution (logic) of a problem,
which can be represented either as an informal description using a Flowchart or Pseudocode.

Characteristics of an Algorithm

The following are the characteristics of an algorithm:

 Input: An algorithm has some input values. We can pass 0 or some input value to an algorithm.
 Output: We will get 1 or more output at the end of an algorithm.
 Unambiguity: An algorithm should be unambiguous which means that the instructions in an
algorithm should be clear and simple.
 Finiteness: An algorithm should have finiteness. Here, finiteness means that the algorithm
should contain a limited number of instructions, i.e., the instructions should be countable.
 Effectiveness: An algorithm should be effective as each instruction in an algorithm affects the
overall process.
 Language independent: An algorithm must be language-independent so that the instructions in
an algorithm can be implemented in any of the languages with the same output.

Dataflow of an Algorithm

 Problem: A problem can be a real-world problem or any instance from the real-world problem
for which we need to create a program or the set of instructions. The set of instructions is
known as an algorithm.
 Algorithm: An algorithm will be designed for a problem which is a step by step procedure.
 Input: After designing an algorithm, the required and the desired inputs are provided to the
algorithm.
 Processing unit: The input will be given to the processing unit, and the processing unit will
produce the desired output.
 Output: The output is the outcome or the result of the program.

Why do we need Algorithms?

We need algorithms because of the following reasons:


 Scalability: It helps us to understand the scalability. When we have a big real-world problem, we
need to scale it down into small-small steps to easily analyze the problem.
 Performance: The real-world is not easily broken down into smaller steps. If the problem can be
easily broken into smaller steps means that the problem is feasible.

Let's understand the algorithm through a real-world example. Suppose we want to make a lemon
juice, so following are the steps required to make a lemon juice:

Step 1: First, we will cut the lemon into half.

Step 2: Squeeze the lemon as much you can and take out its juice in a container.

Step 3: Add two tablespoon sugar in it.

Step 4: Stir the container until the sugar gets dissolved.

Step 5: When sugar gets dissolved, add some water and ice in it.

Step 6: Store the juice in a fridge for 5 to minutes.

Step 7: Now, it's ready to drink.

The above real-world can be directly compared to the definition of the algorithm. We cannot
perform the step 3 before the step 2, we need to follow the specific order to make lemon juice.
An algorithm also says that each and every instruction should be followed in a specific order to
perform a specific task.

Now we will look an example of an algorithm in programming.

We will write an algorithm to add two numbers entered by the user.

The following are the steps required to add two numbers entered by the user:

Step 1: Start

Step 2: Declare three variables a, b, and sum.

Step 3: Enter the values of a and b.

Step 4: Add the values of a and b and store the result in the sum variable, i.e., sum=a+b.

Step 5: Print sum

Step 6: Stop
Big O Notation in Data Structures
Asymptotic analysis is the study of how the algorithm's performance changes when the order of
the input size changes. We employ big-notation to asymptotically confine the expansion of a
running time to within constant factors above and below. The amount of time, storage, and other
resources required to perform an algorithm determine its efficiency. Asymptotic notations are
used to determine the efficiency. For different types of inputs, an algorithm's performance may
vary. The performance will fluctuate as the input size grows larger.

When the input tends towards a certain value or a limiting value, asymptotic notations are used
to represent how long an algorithm takes to execute. When the input array is already sorted, for
example, the time spent by the method is linear, which is the best scenario.

However, when the input array is in reverse order, the method takes the longest (quadratic) time
to sort the items, which is the worst-case scenario. It takes average time when the input array is
not sorted or in reverse order. Asymptotic notations are used to represent these durations.

Big O notation classifies functions based on their growth rates: several functions with the same
growth rate can be written using the same O notation. The symbol O is utilized since a function's
development rate is also known as the order of the function. A large O notation description of a
function generally only offers an upper constraint on the function's development rate.

It would be convenient to have a form of asymptotic notation that means "the running time
grows at most this much, but it could grow more slowly." We use "big-O" notation for just such
occasions.

Advantages of Big O Notation

 When examining the efficiency of an algorithm using run-time inputs, asymptotic analysis is
quite useful. Otherwise, if we do it manually with passing test cases for various inputs,
performance may vary as the algorithm's input changes.
 When the algorithm is executed on multiple computers, its performance varies. As a result, we
pick an algorithm whose performance does not change much as the number of inputs increases.
As a result, a mathematical representation provides a clear understanding of the top and lower
boundaries of an algorithm's run-time.

Examples

Now let us have a deeper look at the Big O notation of various examples:

O(1):
1. void constantTimeComplexity(int arr[])
2. {
3. printf("First element of array = %d",arr[0]);
4. }

This function runs in O(1) time (or "constant time") relative to its input. The input array could be
1 item or 1,000 items, but this function would still just require one step.

O(n):

1. void linearTimeComplexity(int arr[], int size)


2. {
3. for (int i = 0; i < size; i++)
4. {
5. printf("%d\n", arr[i]);
6. }
7. }

This function runs in O(n) time (or "linear time"), where n is the number of items in the array. If
the array has 10 items, we have to print 10 times. If it has 1000 items, we have to print 1000
times.

O(n^2):

1. void quadraticTimeComplexity(int arr[], int size)


2. {
3. for (int i = 0; i < size; i++)
4. {
5. for (int j = 0; j < size; j++)
6. {
7. printf("%d = %d\n", arr[i], arr[j]);
8. }
9. }
10. }

Here we're nesting two loops. If our array has n items, our outer loop runs n times, and our inner
loop runs n times for each iteration of the outer loop, giving us n^2 total prints. If the array has
10 items, we have to print 100 times. If it has 1000 items, we have to print 1000000 times. Thus
this function runs in O(n^2) time (or "quadratic time").

O(2^n):

1. int fibonacci(int num)


2. {
3. if (num <= 1) return num;
4. return fibonacci(num - 2) + fibonacci(num - 1);
5. }

An example of an O(2^n) function is the recursive calculation of Fibonacci numbers. O(2^n)


denotes an algorithm whose growth doubles with each addition to the input data set. The growth
curve of an O(2^n) function is exponential - starting off very shallow, then rising meteorically.

So, in this article, we understood what Big O Notation in Data Structures is and how we can use
it in our daily practices to understand the time complexity of our routine deliverables.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy