0% found this document useful (0 votes)
94 views

Data Structures and Algorithms: 1 Algorithm Analysis and Recursion

The document discusses algorithms and data structures. It defines an algorithm as a specification for accomplishing a task and notes that algorithms are often associated with specific data structures. It then discusses algorithm analysis, which examines the time and memory requirements of algorithms, using notation such as Big-O to classify algorithms by complexity. The document focuses on algorithm analysis and complexity as important tools for evaluating algorithms and data structures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Data Structures and Algorithms: 1 Algorithm Analysis and Recursion

The document discusses algorithms and data structures. It defines an algorithm as a specification for accomplishing a task and notes that algorithms are often associated with specific data structures. It then discusses algorithm analysis, which examines the time and memory requirements of algorithms, using notation such as Big-O to classify algorithms by complexity. The document focuses on algorithm analysis and complexity as important tools for evaluating algorithms and data structures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Structures and Algorithms

An algorithm is a specification for a accomplishing a given task. The program that a


computer executes is an algorithm, albeit a rather long and complex one. In computer
science terms the word algorithm usually refers to a specification for a solution to an
important class of problems. For example, a means of arranging data items in a way
that allows efficient access. These are the sorts of algorithms we will concern ourselves
with. In this context, it becomes important to have model of the time and resource
requirements that a particular algorithm imposes on the system. That is:
1. how much memory will it require to accomplish its task, and
2. what length of time will we need to wait before we see the results?
These issues are addressed through something known as algorithm analysis.
Algorithms are often naturally associated with a data structure. A good example of this
is the concept of a queue. As we know from experience, people arrive at a queue from
the back and leave from the front. Thus, the model — or abstraction — of a queue is
of a sequence of items which can only be accessed from the back or front. Given this
abstraction, we can design a data structure which represents this behaviour, and then
develop the necessary algorithms to access and manipulate such an entity. The data
structure needs to encapsulate the idea of queueing at the back and dequeuing from the
front. Because we are thinking in abstract terms, it does not matter what sort of items
are actually in the queue: we are simply deciding how it should be represented, and
how to manipulate it. Of course, if we need to create a real queue, we need to know
what it will contain. However, if we know how a queue will operate in this abstract
sense, we immediately know how it will operate for any type of item! Thus, once we
design the data structure itself we can develop algorithms to access and manipulate
it, without any regard for what it will eventually contain. The data structures we will
consider are commonly used throughout computer science, since they are abstractions
of very useful concepts related to the efficient storage and access of data.

1 Algorithm Analysis and Recursion


Before advancing to the description of data structures and their associated algorithms,
we need to spend some time familiarising ourselves with so-called complexity analysis.
Some data structures are inappropriate, from an efficiency point of view, for certain
tasks. For example, the queue referred to in the introduction would not be a good
way of storing items if we wish to search through through it to find a specific item.
Algorithm analysis provides the tools we need to estimate both the memory and time
requirements (or complexity) of the algorithms we wish to examine. As part of this
process we will briefly examine the concept of recursion and discuss the perils of using
recursive algorithms arbitrarily.

1
1.1 Notation

In general, the algorithms we are interested in will manipulate a number of data items,

 
arranged in some sort of data structure. Let us call this number . For each of our pro-
posed algorithms, we can derive a function which will show how the complexity
(space or time) of that algorithm is tied to the number of data items. For example we
see that, given items arranged in a list1 , we require an average of steps down this 
list to find a given item. We could thus say that the time complexity is

 

Of course, the actual computer program that performs this “list traversal” will require
many more instructions to move through the list and perform the various tests that are
required. However, this extra overhead will simply result in a larger value multiplying

 
as well as an additional constant. Thus the expression remains a polynomial of order
1 — this is a linear relationship (of the form ).
When we analyse the complexity of an algorithm, we do not wish to consider issues
such as the precise value of the constants referred to above. Usually we wish to see how
the algorithm will perform for very large , since most of our data structure will need

 
to accommodate large amounts of data. So far we have only seen a linear expression for
. Unfortunately for most algorithms the complexity is somewhat worse: we may
have a quadratic function, for example. If we wish to compare two different algorithms
for accessing a data structure, for example, we need a way of comparing them sensibly.
Given the following two complexity estimates,

   "!     #  $


   
% 
which one would be better? The answer: it depends! For small values of ,
will win out, while for even moderate values of , will be better. Intuitively,
however, we would expect a linear expression to give use a lower complexity estimate
that a quadratic expression. In particular we note that, for large values of :
1. constants become irrelevant;
2. only the highest order term remains significant.
What we want is a way of sensibly comparing two functions so that we can see which is
“best” for our needs. The system we use must provide the correct ordering of functions,
regardless of lower order terms and constants. We can achieve this by introducing the
following notations:
Big-Oh A function  & has complexity ' *)+ , (pronounced Big-Oh), written as
(
 
'(-)+ .
if there are positive constants / and 0 such that 
&21 / )+  when 43 0 .
1 Imagine scanning through a list of 5 items written on paper; on average you will need to go through
half the list.

2
Function Name
6879
C constant
6879  logarithmic
log-squared
 linear
;: quadratic
 cubic
exponential

Table 1: Relative ordering of functions. In practice algorithms with complexities above


quadratic are deemed to be too expensive.

Big-Omega A function   has complexity =< *)+ , (pronounced Big-Omega),
 
<=*)+ ,
written as

if there are positive constants / and 0 such that 


&23 / )+  when 43 0 .
Big-Theta A function 
& has complexity >?*)+ , (pronounced Big-Theta), writ-
 
>?-)+ .
ten as

if and only if 


 '(-)+ . and  
<=  .
Little-Oh 
 A@ *)+ . if  & '(*)+ , and  C B >?-)+ . .
 will always have a lower (or equal) complexity when compared
The first notation, Big-Oh, provides an upper bound when comparing functions: we are
guaranteed that 
to the function )+
 . In many cases this is all we require, since we wish to estimate the
worst-case performance of our algorithm. If we can show that our algorithm is '(
&
&:D , however, is essentially useless unless
for example, then we know that we can handle very large numbers of data items without
running into trouble. An algorithm with '(
we can guarantee that is very small.
The other notations are less frequently used, but provide other sorts of ordering infor-

)+ 
mation. Big-Omega provides a lower bound estimate i.e. our complexity is guarantee

)+ 
to either match or exceed , while Big-Theta provides a precise growth rate for

  )+ 
the complexity: is bother an upper and lower bound. Finally, Little-Oh is simply
Big-Oh, but with the requirement that the growth of is strictly less than .
Table 1 shows a list of functions ordered by growth rate, and is a useful one to remem-
ber.
When you are comparing functions using any of the above definitions note the follow-
ing:
E  &F '(  
  '( $   
do not include constants or lower order terms i.e. rather than
;

3
E it is clear that for G H  both  G '( ;:D
and  G '(   are
valid. You should choose the tightest (most accurate) bound.

1.2 Rules for Code Analysis

There are a few formal rules that can be used to aid in the analysis of algorithms, as
well as some less formal more intuitive rules that make code analysis easy. The formal
rules are:
Rule 1: if D I '(*)+ , and     '(KJL . , then
1. M
N    2 PORQMS *'(-)+ .UT '(VJW ,. ;
2. M
    & '(-)+  JL . ;
Rule 2: if 
 is a polynomial of degree X ,  
>? ZYM
Rule 3: logs are sub-linear:
6879 Y [ '(  .
These formal rules help us to arrive at the short-cuts indicated below. We tend to use
these short-cut rules when evaluating the Big-Oh complexity of a piece of code we
have written.
FOR loops: The number of operations completed within a FOR loop is simply

'( 
times the number of operations at each step ( is the loop counter). Such a loop
is thus .

Y times, where X is the number of nested loops. It is thus '( Y  . Note


Nested FOR loops: A nested FOR loop executes the instructions in the innermost
loop
that if you use different variables for loops counters, the complexity becomes a
little more difficult to determine but is still the number of times instructions are
executed within the innermost loop.
Consecutive statements: The complexity of a sequence of statements is simply the
sum of the complexities of each individual statement;
branches: for a branch instruction (if/else, say) we look at the two complexities for
each branch and take he largest of these.
An example will help to clarify things. Given the following code snippet, estimate its
Big-Oh complexity:

\
public static int whatsitdo(int N, int p)

int sum = 0, i, j;
sum += N; sum = sum+2;
if (sum > p)
for (int i = 0; i < N*N; i++) sum += i*i;
else
for (i = 0; i < N*N; i++)

4
for (j = 0; j < N; j++)
] sum += i+j;

We see that we start with a sequence of statements: two lines of code, then a singly
nested loop followed by a doubly nested loop. The number of items is , so we derive

'( 
our time complexity expression using this variable. According to the rules stated above,

a singly nested loop has complexity
'(  
; however, in this case the loop counter

'   '(,  +^


ranges up to , and we thus arrive at an upper bound of for the first loop.
 (
'( ;:DD 
The doubly nested loop would have complexity , but now we have
or . The first two statements require a constant time independent of ,
giving us '(
. Finally, the branch instruction could go into either the single or double

'( D_`ORQMS *'(  aT (


'  ;:b.2 '( ;:D
loops; we thus take the larger part (the else branch). Adding all this together we get
. This algorithm is cubic in and would thus
be a very poor bet for any reasonable size ! We would then go back to our design
and either try to design a better algorithm for our data structure, or create a new data
structure which might better support the operations we wish to perform.

1.3 Recursion

Recursion is widely used in many of the more elegant algorithms required for efficient
manipulation of data items. The basic idea may seem strange at first but it the concept
is mathematically well defined, so even if it bothers you, it does rest on a sound basis!

 
Let us first consider recursive functions in a mathematical sense. If we have a function
we say that it is recursive if it can be expressed in terms of itself. That is

 
#c V Td
While this would seem to lead to a chicken-and-egg type scenario, this is not so, pro-
vided we are very careful when defining the function F. The simplest example of recur-
sion is the factorial function:

  P  fegba!  



We see here that a recursive function must have a so-called base case to allow the
recursion (self reference) to terminate. If there is no base case, the function will expand

 
into an infinite definition. The other point to note is how the self-reference is achieved:

 heiD
we cannot us on the right hand side, since we have not calculated it yet. However,
we can use terms that have been calculated — in this example !

 $
To show how one would evaluate such an expression, let us trace the steps to evaluate
:
1. T(3) = 3.T(2); we now need T(2)
2. T(2) = 2.T(1); we need T(1)

5
3. T(1) = 1, by definition, thus
4. T(3) = 3.(2.(1)) = 6 and we’re done!
The idea, then, is to build up your solution based on previous data you have calculated.
Many functions can be defined recursively. In particular, algorithms designed for data
structures which are self-similar (we’ll see this later) make heavy use of recursion.
For such recursive algorithms we can define recursive relationships to estimate the
complexity, but this is a fairly advanced topic that we will avoid.
The reason we are so interested in recursion stems from the fact that almost all com-
puter languages in use today support recursive function definitions. We can thus design
simple and elegant recursive functions to solve a whole host of problems, secure in the
knowledge that we can code them up directly. A recursive function definition for fac-
torial might look something like:

\
public int fact(int N)

if (N <= 1) return 1;
else
] return N*fact(N-1);

The system will arrange for the function fact to be called with different arguments
as it recurses down towards the base case. At that point, we have all the information
we need to start evaluating the partial expressions and as a result we can evaluate the
original request. Just as we did for the mathematical definition, we require a base case
to halt the recursion. If you do not have such a check, the system will go on calling the
function until it eventually runs out of memory.
While recursive functions are “cool” they are not always appropriate. When a computer
implements a recursive function call it needs to save extra information to help it resolve
the call correctly. The more recursive calls we need, the more resources it has to
expend during evaluation. For the case of the factorial function, we can re-write it using
iteration (i.e. a loop), which is computationally equivalent but less resource intensive:

\
public int fact(int N)

int f = 1;
if (N <= 1) return f;
else
] for (int i = 2; i <= N; i++) f = i*f;

In the case of the factorial function recursion is still a valid, if expensive, alterative.
However, for some functions a recursive definition imposes additional overhead be-
cause data that has already been calculated ends up being recalculated. This is wasteful
and leads to a very slow recursive solution. This is illustrated by the following recursive

6
function (the Fibonacci Series)

 
 feiD  fe U!jc    Ac  b k!

 l  $


In this case a naive recursive implementation on a computer causes the program to slow

   $    b


to snails pace. Consider the function call , from this we need to calculate

 
and , but to calculate we need to calculate and . We thus end
up calculating (recursively!) a second time! If you compound this recalculation

 &
over many different function calls it results in an enormous overhead. In this case,

'( 
one can again use an iterative approach to calculate the desired number . In fact,
the complexity is versus the exponential complexity of the recursive case. The
basic lesson we learn from this example is that recursion is only worthwhile if we do
not duplicate work. We shall see a great deal more of recursion in subsequent sections.

2 Lists
A list is a data structure in which data items are accessed by moving through the col-

'( &
lection from start to end. It is thus a linear data structure since the algorithms used to
traverse and manipulate the list have complexity , for data items. The list is
an example of an abstract data type (ADT): a data structure (or object in programming
terms) with an associated set of algorithms to manipulate it. The concept of a list does
not impose an particular implementation. However, from a programming point of view
some implementations are better than others, an these have become the norm. We will
examine two of these: singly linked and doubly linked lists.

2.1 Singly-Linked Lists

A linked lists is simply a collection of nodes (which will hold item data, at the very
least) which are connected in a special way. The use of a data-bearing node is common
to all the data structures or ADT’s that we will see; it is the way they are connected
and accessed that differentiates them. In the case of a singly linked lists (linked list for
short) we simply “link” each node to its successor. By successor we mean the node
that logically succeeds a given node: usually we impose an ordering on the data in the
list, so a “lower” value would precede a “higher” value. This is not required by the
definition however. As far as an implementation is concerned, we have have a node
object, within which we store a reference to its successor, as indicated in Figure 1.
Note that a null reference (one that is not pointing to anything) is represented by in m
the figures.
The node will store the actual data item, and may hold other information too. The
final node in the list will contain a null reference since it has no successor. Finally, to
complete the representation, we need a means to identify the front of the list, which
requires a reference to the node from which we can reach all others. We can call this
the head of the list.

7
data data data λ

Head

Figure 1: Singly linked List: Each node is linked to the next via a Java reference. We
also store a reference to the first node in the list, called head.

2.1.1 Insertion

Insertion can take place in a number of different ways: internally (within the list) or at
either end. The basic idea is simple enough:
1. create a new node with the given data item
2. find the successor and predecessor of this new node in the list
3. insert the node at this point by resetting the links appropriately
There are unfortunately a number of complications. When we insert at either end of
the list we are missing either a successor or a predecessor. The code that you write has
to to check for each of these 3 cases and must link the new node in the correct way.

'( 
The different possibilities are indicated in Figure 2. Finding the position to insert the
new node is .

2.1.2 Deletion

'( 
Deletion involves identifying a node in the list which is to be removed (an operation
that is ), and then unlinking this node from its successor. The predecessor of the
node is then linked to the node’s successor. As was the case with insertion, there are 3
possible cases to consider: deletion from either end or deletion from within the list.

2.2 Doubly-linked Lists

In many cases it is necessary to move through a list of objects from either the front or
the back of the list. In this case a single-linked list becomes less attractive. If we wish
to move through the list from the back, we must first identify the end of the list (which
requires steps from the head reference) and then somehow move towards the front
of the list. This involves keeping track of the predecessor to each node as we step back
towards the head. One can, of course, maintain a reference to the last element of the
list, which we can call the tail. But even in this case we need to maintain a reference to
the predecessor as we move backwards through the list.

8
Case 1: Insert inside list

data data data λ

Insertion position
Head
Item to Insert

X λ

data data data λ

Head

Case 2: Insert at list end


Insertion position

data data data λ X λ

Item to Insert

Head

data data data X λ

Head

Case 3: Insert at list start

X λ data data data λ

Item to Insert
Head

X data data data λ

Head 9

Figure 2: Insertion: Insertion into a singly linked list - we consider 3 different case.
Insertion at the front, back and middle.
λ data data data λ

Head Tail

Figure 3: Doubly linked list: We now have a successor and predecessor link. We also
store a reference to the first item, Head, and a reference to the last item, Tail.

A better approach, which leads to a cleaner more efficient implementation, is a doubly-


linked list. In this case our node now contains two references in addition to the data: a
reference to the successor and one to the predecessor. When we insert or delete a node
we now have to update more references, but the net result is a list that can be easily
traversed in either direction and has constant time access to the head and tail elements.
This is shown in Figure 3.

2.2.1 Insertion

Insertion for a doubly-linked list is similar to the singly-linked case: we may insert at
either end or in the middle of the list. Insertion at either end is trivial we update the
head and tail references and make the new node’s successor or predecessor point to
the original first/last node. Insertion in the middle of the list requires us to identify
either the successor or predecessor of the new node within the list, and then to up-
date the nodes involved so that there predecessor/successor links are consistent. These
operations are shown in Figure 4.

2.2.2 Deletion

Deletion in a doubly-linked list is similar to the single-linked case: we identify the


node to remove, and then simply update the predecessor and successor references to
reflect the fact that the node has been removed. As before, we need to write code that
will deal with the 3 cases: deletion from either the head or tail of the list, or deletion
from the middle. Some of these operations are shown in Figure 4.

3 Stacks and Queues


A stack is a very important ADT which occurs in many different places in computing.
Compilers make heavy use of stacks, and the operating system itself uses stacks to
implement a robust function call mechanism. A stack is a data structure that supports a
LIFO (last in first out) style of data storage. Imagine a stack of books: you would add

10
Insert inside double list

λ λ

New Node
λ X λ
Head Tail

λ λ

X
Head Tail

Delete: middle

λ X λ

Item to delete
Head Tail

λ X λ

Head Tail

Delete: first

λ X λ

Item to delete
Head Tail

λ X λ λ

Head Tail

Figure 4: Insertion and Deletion: Some of the cases we need to consider. For deletion
we simply re-attach the links to by pass the node we wish to delete.
11
a new book to the top of the pile, and also remove books one by one from the top of
the pile. removing a book from within the stack might cause it to topple over!
Stacks support two basic operations: push and pop. A push operation places a new
item on the top of the stack. The pop operation removes the top item from the stack (it
may also return the value of that item, if desired).
Stacks are most commonly implemented using a linked list: the head reference points
to the item on the “top” of the stack, and this item may be removed with a pop (delete)
operation. After a pop operation the head will point to a new “top of stack”. We simply
use the existing list infrastructure to implement the stack.
A queue is and ADT which support a FIFO (first in first out) style of storage and
access: items are inserted at the back the queue, and are always removed from the
front of the queue. This is the ways that queues work in the real world: you leave
before someone who arrived after you! Queues have many applications in areas such as
network modelling and are usually implemented as a simple modifications of a linked
list. The head points to the front of the queue, while the tail points to the back. You can
uses either a single or double-linked list. The enqueue and dequeue operations insert
data at the back of the queue, and removed it from the front of the queue, respectively.
These operations are indicated in Figure 5.

4 Trees
Within computing the tree ADT is of fundamental importance. Trees are used when we

'( 
desire more efficiency in our data access or manipulation. We have seen that in the case
of a linked list we can achieve for all the relevant operations. This is very good;

'  n6 79 
however, if our tree is properly constructed operations such as insertion and deletion
can be implemented in ( for data items. This is a vast improvement and
makes sorting and searching algorithms, for example, more useful. The next sections
examine trees in great detail and highlight their benefits and deficiencies.

4.1 What is a Tree?

A tree is a collection of nodes each of which has a number of links/references to other


nodes. Each node will be the child of some other node, and we can thus talk about
parent-child relationships and so on. Figure 6 shows a tree, and it is clear from this
representation how this data structure obtained its name — it has a branching structure,
rather than the simple linear structure of a list. It is this branching that allows us to
achieve the efficiency we seek. But it comes at a prices: insertion and deletion into this
structure are more complex than for lists.
There is one node, the root, which does not have a parent. We can trace a path to any
node in the tree by starting at the root and following the appropriate links. There are a
number of other tree terms which we need to familiarise ourselves with:

12
λ λ

Front Back

Enqueue:
Item to insert

λ λ λ X λ

Front Back

λ X λ

Front Back

Dequeue:

λ λ

Front Back

λ λ λ

Front Back

Figure 5: A Queue: A queue is an ADT which allows insertion on one end of the list
only (the back) and removal from the other (the front). These operations are illustrated
here; it can this be implemented using a linked list.

13
Root

Figure 6: A Tree: A tree is a collection of nodes in which each node has one parent
(predecessor) and (possibly) many children (successors). We can move from the root
node to any given node in the tree.

sub-tree A tree is a recursive structure — a tree consist of a root node with a number of
other trees hanging off this node. We call these sub-trees. Each of these sub-trees
can again be viewed as a root node to which still smaller sub-trees are attached.
depth The depth of a node is the number of links we have to traverse from the root to
reach that node.
height The height of a node is the length of the longest path from the node to any leaf
node within the sub-tree.
leaf node A leaf node is one which has no children: its child links are all null refer-
ences.
sibling If a node has a parent (the node that points to it), we call any other node also
pointed to by the parent a sibling.
The generalised tree that we have introduced here, in which a node may have an ar-
bitrary number of children, is rather complex to manipulate and implement. For most
applications a binary tree will suffice. A binary tree is a tree in which every node has
at most two children. That is, a node may have 0 1 or 2 children. The two children are
usually referred to as left and right. We also talk about the left sub-tree, for example, by
which we mean the sub-tree rooted at the left child node. It is important to distinguish
between the left and the right children since the order in which they are accessed will
be important for the algorithms we present below.
The remainder of this section deals with different kinds of binary trees.

4.2 Binary Search Trees

Trees are often used to obtain sub-linear time complexity for operations such as sorting
and searching of data. Database applications in particular make heavy use of complex
tree structures to speed up data queries. In order for data to be efficiently retrieved

14
20

12 30

2 17 25

13 19

Figure 7: Binary Search Tree: Every node respects an ordering property. The data
contained within its left subtree is less than the data in the node, while the data in the
right subtree is larger. This allows for efficient searching of the tree structure.

from a tree structure we require an ordering property which is consistently applied


when building the tree. The simplest of these pertains to binary trees:
For any node o
the data values in the left subtree are smaller than value stored
o
within , while the data values in the right subtree are larger than this value2 .
This is illustrated in Figure 7. The root node contain the data value 20. All the children
down its right subtree have values larger than 20, while those down the left subtree
have values less than 20. Since a tree is recursive (composed of yet smaller trees, in this
case), and this property is general, we see that the data respects this ordering property
throughout the entire tree! For example, we see that for the node containing the data
value 12, all its left children (only one in this case) are less than 12, while its right
children are greater than 12. The same property holds for the node with element 17.
This ordering property means that we can derive a very simple algorithm to effectively
find a data item within the tree. We usually conduct a search based on a key data value.
Each node will usually store such a search key value as well as some additional data
(which is what we wish to retrieve). In the examples we give here these values will be
one and the same.

4.2.1 BST Insertion

Insertion into a BST must maintain the BST ordering property. The algorithm is sur-
p
prisingly simple. Given a node with data value that we wish to insert:
1. If the tree is empty, link the root the the given node and exit.
2 We can have “larger than or equal to” if duplicate data is permitted.

15
Insert: 8,1, 13,12,20 Root
Root Root

8 8 8

1 1 13

Root Root

8 8

1 13 1 13

12 12 20

Figure 8: Insertion: If the tree is empty, we make the root point to the new node and
exit. Otherwise we move down the tree branching left and right as required until we
find an empty slot and insert the new node there.

2. If p is less than the current node data, proceed to its left child
3. otherwise, move down to its right child.
4. If the child node does not exist, slot the new node into that position and exit.
5. otherwise return to Step 2.
In other words, we branch left or right as we descend the tree looking for an empty slot
within which we can place our new node. The first empty slot we encounter will then
be filled; this involves linking the new node to the parent with the empty slot. This is
illustrated by Figure 8. We first insert 8, which simply involves linking the new node
to the root. Then, we insert the value 1: we see that 1 is less than 8, hence we move
down from the root node to its left child. This happens to be an empty slot, so we make
the left link of the root node point towards the new node (which is what we mean by
“slotting in it”). We then insert 13. Once more, we start at the root, compare the value
to be inserted, and drop left or right depending on this choice. Since 13 is greater than
8, we drop down to the right child — which happens to be an empty slot again. We
thus set the root node’s right link to point to the new node and we’re done. To insert 12
we need to move further down the tree. Starting at the root, we see that 12 is greater
than 8, so we drop to its right child. On the next step, we note that 12 is less than 13
so we drop to its left child. Once again we have found an empty slot, so we set the left

16
link of node 13 to point to our new node. Finally, we insert the value 20. Starting at
the root we drop to the right child (node 13) and then drop to the right again, landing
in an empty slot. We then set the right link of node 13 to point to the new node and we
are done.
It is worth emphasising that the link that we use (left or right) when linking a new node
to its parent is not arbitrary: it is the link that we dropped down to get to the empty slot.
In Java programming terms an empty slot is represented using a null reference. After
the insertion, the reference is updated to point to the newly inserted node.
Implementation
In general we use recursive functions to implement tree-based routines. This often
provides a more natural way of looking at trees, since they are recursive structures. In
the case of insertion, this involves finding a base case (to terminate the recursion) and
deciding how we should partition the problem so that we can recursively solve a set of
smaller sub-problems.
To accomplish insertion we can write the following Java-like code, in which we as-
sume that we have a Node class with a left and right field which contain object
references, and in int to hold our data:

\
public Node insert(int x, Node parent)

if (parent == null) return new Node (x);


if (x < parent.left.data)
parent.left = insert(x, parent.left);
else
parent.right = insert(x, parent.right);

] return parent;

This would be called with a statement like:


root = insert(3,root);
to insert the data value 3 into the tree. Note how a reference to the parent is returned
at every step of the recursion. We do this since the Node link (a Java reference) that
we went down in order to insert the Node will change when we hit an empty slot (a
null reference). By returning the value we pass back the new, updated, value to the
function that called this one. In the cases where no insertion happens the reference will
be unchanged so the assignment statements will not change anything.
We can see from the recursive implementation that
1. We have a base case — the first statement checks to see if the parent is a null ref-
erence (we have arrived at an empty slot. At this point the recursion is terminated
and the function calls start returning.

17
2. We have recursive function calls (we call the function from inside itself) which
are shrinking the problem at each step. This is required for recursion to be useful.
Here we decide, based on the comparison, whether to insert the data into the left
or right sub-tree. We then call insert again with the new child node as the root of
the tree we wish to insert in. At the next function call, this check and decision
are made again, and this procedure repeats until we come to the base case (null
reference). Then we can start returning the data we need to get our final result
answer back
We see that when we insert the first item, a 3 in this case, with a root value of null, then
we return without any recursion at all. The root (which is a Java object reference will
now point towards a Node with the value 3. If we then proceeded to insert the value 4,
the following would happen:
1. root = insert(4, root)
\
2. root = insert(4,root) root.right = insert(4,root.right)
]
\ \
3. root = insert(4,root) root.right = insert(4,root.right) Base case: return NodeRef
]]
\
4. root = insert(4,root) root.right = NodeRef
]
5. root = root
This is meant to illustrate the flow of logic in the recursive function calls: the braces
represent what happens within each function call. We can only return from the recur-
sion when we hit the base case. At that point we have enough information to return
something useful to the function that called us. For every function call we return the
value of the parent reference (which may have been changed). In this example the
only reference that is actually changed is root.right. The root of the tree is only
affected by the first Node we insert into the tree. If we inserted additional Nodes we
could follow the logic in a similar fashion, but it rapidly becomes cumbersome. The
basic point to note is that we can directly translate a recursive function into a Java
recursive function implementation. As long as we have an appropriate base case and
provided we set up the recursive calls correctly, the function will execute as expected.

4.2.2 BST Deletion

Deletion from a BST is somewhat more complicated than insertion. We must ensure
that the deletion preserves the BST ordering property. Unfortunately simply removing
nodes from the tree will cause it to fragment into smaller trees, unless the node is a leaf
node (has no children). The basic strategy is as follows:
1. Identify the node to be removed
2. if it is a leaf node, remove it and update its parent link;
3. If it is a node with one child, re-attach the parent link to the target’s node child;

18
4. If it is a node with two children, replace the node value with the smallest value
in the right sub-tree and then proceed down the tree to delete that node instead.
These cases are illustrated in Figure 9. The third case is the most interesting and bears
some discussion. The smallest value in the right sub-tree of a node is smaller than all
other items in that sub-tree, by definition. By copying this value into the node we wish
to delete, we preserve the BST ordering property for the tree. Furthermore, we know
that the node we now have to delete (the node we searched for down the sub-tree) can
have at most 1 child! If it had two children that would imply there was a node that
contained an even smaller value (down its left child). We thus end up back at one of
the first 2 cases.
Implementation
The following routine shows how one might implement deletion. It assumes the ex-
istence of a function called public int findMin(Node X) which returns the
smallest data item starting from the tree rooted at X, which is found by following the
left link until you hit a node with a null left child.

\
public void remove(int x, Node root)

if (root == null)
System.out.println("Data item is not in tree!");
else if (x < root.data)
root.left = remove(x,root.left);
else if (x > root.data)
root.right = remove(x,root.right);
\
else // we’re here! delete the node

if (root.left == null && root.right == null) //


leaf node
return null; // this unlinks node by passing null
to parent
else if (root.left == null || root.left == null)
// one child; link parent to node’s child...
return (root.left == null ? root.right : root.left);
else // two children - find min value, copy it over
\
then delete that Node

root.data = findMin(root.right);
] root.right = remove(root.data, root.right);
]
return root;
]

19
18 Delete 12:
18
Update parent link

8 24
8 24

12 20 30
20 30

18 18
Delete 8:
Link parent to 8’s child

8 24 24

12 20 30 12 20 30

18 Delete 18: 20
Replace 18 with 20
Delete 20
8 24 8 24

12 20 30 12 30

Figure 9: Deletion: There are 3 cases i) deleting a leaf, ii) deleting a node with one
child, and iii) deleting a node with 2 children. We cannot delete a node with 2 children
directly, so replace this data of this node with the data of its “smallest successor”, and
then proceed down the tree to delete that node.

20
Once more this function is a direct translation of the algorithm we described: we branch
left or right as we move down the tree looking for our target. Once we find it, we check
whether it has 0, 1 or 2 children and apply the appropriate deletion procedure. The
return statement in the single child case contains a compact way of generating a value
based on a simple “if-else” test. If the data item is not contained in a Node we will
end up down going down a null link; in that case we simply report this fact and we are
done.
A proper Java implementation will need additional elements: a class called Node which
will contain the key/data and a class called BST which will define the structure of the
tree (a collection of linked Nodes). Here we have assumed we are inserting integers;
a general Java implementation would insert values of type Object. In that case the
< operator can no longer be used and a Java comparison method would be required to
compare two data values. We also need to store the root object reference within the
BST class.

4.2.3 Tree Walks

While insertion and deletion are required to build and maintain the tree, the manner
in which binary search trees store data means that we can traverse them in interesting
ways. We talk about “walking” the tree, which simply means that we wish to visit each
node in the tree starting from the root. Generally we perform some or other useful
operation as we walk the tree, such as printing out the data value of the current node.
We shall call this the action in the following paragraphs. The action can, of course, be
“do nothing”, although would be rather silly!
There are 3 basic ways traversing the tree. They are all recursive, meaning that when
we refer to the root node, we can actually substitute any valid node within the tree.
Inorder Walk: Starting at the root, we walk the left sub-tree. We then perform the
action on the root node. We then walk the right-subtree. Here, “walk the
left/right sub-tree” means that we apply the same procedure to the tree rooted
at the left/right child of the node we are on. Hence, the Inorder walk is recursive.
Preorder Walk: Starting at the root, we apply the action immediately. We then walk
the left sub-tree. Finally, we walk the right sub-tree.
Postorder Walk: Starting at the root, we walk the left sub-tree, then we walk the right
sub-tree. Finally, we perform the action on the node.
Level Order Walk A level order walk proceeds down the tree, processing nodes at the
same depth, from left to right i.e. level-by-level processing. This is also called a
breadth-first traversal. One can use a queue ADT to implement such a traversal,
but we shall not concern ourselves further with this scheme.
This all sounds rather bizarre, so an example is in order. The fundamental point to
remember is that the “walking” procedure is recursive: we apply the rules listed above

21
M
Inorder: A G I M N P Z

G P Preorder: M G A I P N Z

Postorder: A I G N Z P M
A I N Z

Figure 10: Inorder walk: starting at the root, we visit the left sub-tree, perform our
action on the current node, then visit the right sub-tree.

for each node we arrive at, not only the root node. The walking procedure is '(  for
nodes3 .
Consider Figure 10 and suppose we perform an inorder walk. The following steps
occur:
1. We start at the root (node M).
2. We “walk the left-subtree” i.e. we drop to the node G.
3. The node G has its own left sub-tree: we “walk” that sub-tree dropping to A
4. We see that A does not have a left-subtree (it is a leaf node), so the “walk left
sub-tree” does nothing. Now, however, we can perform the action, since we
have processed the (empty) left sub-tree in its entirety. In this case, our action
is simply to print out the value of the node, “A”. We must now walk A’s right
sub-tree. But again, this is empty. So we have processed node A completely and
can return from that node.
5. We have now walked G’s left sub-tree (visited every node), so we move onto the
action for that node: we print “G”.
6. We must now walk G’s right sub-tree - we move down to the node I.
7. As before, we have to walk the left sub-tree first: it is empty, so we are done.
Then we can perform the action: we print “I”. We then walk the right (empty)
sub-tree and we’re done with node I.
8. node G has now been fully processed.
9. We have now walked M’s left sub-tree (visiting all the nodes contained therein);
we perform the action: we print “M”.
10. We now walk M’s right sub-tree: we drop to node P.
11. We walk P’s left-subtree: we drop to N.
3 That is, the number of function calls is qsrK5ut

22
12. We walk N’s left sub-tree (which is empty); we can now perform the action for
node N: we print “N”; and walk the (empty) right sub-tree, which means we’re
done with node N.
13. We have now processed P’s left sub-tree; sow we print “P”; and move down to
its right sub-tree, starting at node Z.
14. Since Z is a leaf node, it has an empty left sub-tree; we thus print “Z”; see it has
an empty right sub-tree too and so we are done with node Z.
15. We are also at this point, done with the entire tree since we have processed each
node in turn.
We note when we look at the output that the inorder traversal has a return an ordered
(alphabetical in this case) display of the data in the tree. This is very useful. For
example, if the data in each node was a record in a database, we could use an inorder
walk to display an alphabetical list of all the clients’ information.
The pre and postorder traversals work in a similar fashion, but the sub-trees are walked
in a different order, so we will generate different output. In the pre-order case, we
print out a node’s data as soon as we reach that node, before moving on to recursively
descend its left and then right sub-tree. The printout we get in this case is
vwT.x(TdyFT.z{T.|}Td~T€
which does not seem particularly useful. In the postorder case, we first process the left
and then the right sub-tree (recursively) before printing out the node value:
yFTdz"Tdx(T.~TUTd|}Tdv
Again, this does not seem to be of any use. Although it is not clear at all, there are
many uses for these tree traversals. Compilers in particular make heavy use of pre
or postorder walks to handle the correct evaluation of mathematical expressions in a
programming language. Directory tools for operating systems also make use of such
traversals to accumulate disk usage information and so on.
Implementation
The implementation for the traversal schemes is simple, if you view them recursively.
As usual, we require a base case to terminate the recursion, and we must ensure that
each recursive call works on a smaller portion of the input set (the tree nodes in this
case). For the the Inorder walk we have the following:

\
public void inorder(Node root)

if (root == null) return; // Base case: walk an empty


subtree, so do nothing
inorder(root.left); // walk the left sub-tree
System.out.println(node.data); // perform action (print,
in this example)

23
]
inorder(root.right); // walk the right sub-tree

As we can see from the code, we can only perform the action on a node once we have
processed its entire left sub-tree. Having processed both the left and right sub-trees of a
node, we back up to the parent node and carry on working there. We thus move up and
down the tree as required by the recursive function calls. We never process “perform
an action” on a node twice.
The code for the preorder walk is as follows:

\
public void preorder(Node root)

if (root == null) return; // Base case: walk an empty


subtree, so do nothing
System.out.println(node.data);
preorder(root.left); // walk the left sub-tree
]
preorder(root.right); // walk the right sub-tree

Finally, for the postorder walk we have:

\
public void postorder(Node root)

if (root == null) return; // Base case: walk an empty


subtree, so do nothing
postorder(root.left); // walk the left sub-tree
postorder(root.right); // walk the right sub-tree
System.out.println(node.data);
]
While it is confusing and time consuming to follow the logic involved in these recur-
sive function calls, we note that, as before, there is a direct mapping from a mathemat-
ical/intuitive definition onto a programming language. This is one of the benefits of
recursion: it allows us to express powerful algorithms very compactly and in an intu-
itive fashion. Thinking recursively comes with time and practice, so do not be alarmed
if it all seems incomprehensible right now!

4.3 AVL Trees

While the BST is a good attempt at achieving sub-linear data manipulation, it has a
major flaw: it can degenerate into a list! In other words, depending on the data we
insert or the the deletions we perform our tree can end up skewed completely to one

'( 6n79  '( &


side. When this happens the complexity of tree operations such as insert, delete and
search rises from to and we may as well use a simpler list ADT.

24
1

Insert: 1, 2, 3, 4

Figure 11: Degenerate Binary Tree: Inserting ordered data into a BST ensures that it
has depth for nodes.

Figure 11 illustrates the problem by inserting an ordered collection of values in to a


BST. Note that all the left sub-trees are empty.
Unfortunately we have no guarantee that data we wish to store will be randomized. In
fact, it is fairly likely that there will be some sort of ordering on the input data. Rather
than trying to concoct elaborate schemes to stabilize the tree, we adopt the approach
of imposing a balance constraint on the tree. In other words, we insist that, along with

'( 6n79 
the BST ordering property, the tree have some scheme to ensure that its height remains
. Ideally, we would want the tree to be perfectly balanced i.e. each left and
right subtree would have the same height. This is far too restrictive since we insert
items one at a time and a tree with even 2 items will, by definition4 , be unbalanced
under this scheme!
A more relaxed scheme which nonetheless guarantees '( 6n79  height, was proposed
by Adelson-Velskii and Landis and is called the AVL tree.
In the case of an AVL tree, the height of the left and right sub-trees of any node can
differ by at most one. In this case the tree is considered balanced. One can prove that

'( 6879 
this results in a tree which, although deeper than a completely balanced BST, still has
a height/depth of . This is all we require to make sure that the tree is useful
for data manipulation. An example of an AVL tree is shown in Figure 12; note the
heights of each node (see the earlier definition for tree height), and how it compares
to the corresponding “optimal” binary tree. It is worthing noting that there are many
different configurations, depending on the order in which the data is inserted into the
AVL tree.
Insertion
An AVL tree uses a number of special tree transformations called rotations to ensure
4 It will consist of the root node and either a left of right child node.

25
M h=2 I h=3

G h=1 P G h=1 N h=2

A I N Z A h=0 M P h=1
h=0

h=0

Z
h=0

Complete Binary Tree Possible AVL Tree

Figure 12: An AVL Tree: the best possible BST is indicated on the left, with a possible
‚
AVL tree for the data on the right. The height of each node is indicated by . For an
AVL tree we can have a difference of at most 1 between the heights of the left and right
sub-trees of a node.

that the correct balance is maintained after an insertion or deletion. These rotations are
guaranteed to respect the ordering properties of the BST. A very basic example of a
rotation is given in Figure 13.
After inserting A and B the tree is still AVL: the left and right sub-trees of the root node
differ by 1. Now we insert C. In this case we break the balance property at the root
node (the node B is still balanced): the left and right sub-trees of the root node now
differ by two. In order to restore balance at the root we “rotate” B towards A, resulting
in the final arrangement of nodes. We note two things:
1. the BST ordering property is preserved, and
2. the height of the root is now as it was prior to the insertion that caused the
imbalance.
This transformation is known as a single (left) rotation about B. We can also say that we
performed a single rotation with the right child of A or even that we rotated B towards
A.
What happens in more complicated scenarios? As it happens there are only 4 possible
transformation we need to consider, regardless of the tree’s complexity. Two of these
transformations are symmetric counterparts (mirror images) so in reality there are only
two basic transformations: single and double rotations. We determine which one to use
by examining the path we followed during the insertion step.
Let us look at the single rotation first, Figure 14. As always, we assume that the tree
was balanced prior to the insertion. The figure shows one of the 4 possible scenarios in

which an insertion has violated the balance at the root node . Here, we have inserted
into an “outer” sub-tree, :
and this insertion caused the height of that this sub-tree to

26
A A A
Insert B Insert C Left Rotation about B

B B

A C

Figure 13: Basic Rotation: We can rotate the node B “towards” A to fix the imbalance.
This gives us a new root, with a new left and right sub-tree. The tree will now respect
the AVL balance property.

rotate N2 towards N1
N1 N2

N2
N1
h h+1
T1
T3
h
T2 h+1 T1 T2
T3

Figure 14: Left Rotation: If an insertion into an “outer” sub-tree caused an imbalance
at node 
, we can perform a single (left) rotation from to  
to restore the AVL
property.

27
N1 Rotate Right N2

N1
N2
T3
T1

T2 T2 T3
T1

Figure 15: Right Rotation: This is the mirror image of the left rotation operation.

rotate N2 towards N1
N1 N2

N2 N1
h+2 h
h
T3
T1

h+1 h
T1
T3 T2
T
2

Figure 16: Failure of Single Rotation: If the insertion took place in an internal sub-tree,
a single rotation will not restore the balance property at the unbalanced node.

grow by 1 resulting in an imbalance at 


. To help us resolve the situation we expose
another node, 
: this is the first node on the path to our insertion point in . To %
fix the imbalance, we rotate 
towards 
which involves rearranging the nodes and
their sub-trees as indicated. This single rotation is the generalisation of the example
we introduced above: the sub-trees and ƒ 
are empty there, while the sub-tree :
consists of a single node. Observe that the new sub-tree, rooted at has the height 
it possessed prior to the insertion. This means that we do not need to worry about the
balance of the tree as a whole: it has been restored to the (balanced) state it occupied
prior to the insertion. We thus see that this local tree operation ensures that the entire
tree remains balanced!
This transformation has a symmetric counterpart (right rotation) in which the we simply
flip the figures from left to right. The transformation is shown in Figure 15.
What would happen if we inserted into the sub-tree 
which then grew in depth by 1
and violated the balance at the root? In this case, we see that a single rotation simply
leaves the height in its illegal state — Figure 16. We need an another transformation
— the double rotation. In this case we need to expose more of the tree’s structure.
Once more we choose the first few nodes on the path to the insertion point: we expose
3 nodes, which we label , and .  :

28
double rotate N1, N2, N3
N1 N3

N2
h+2
N1 N2
h h+1
T1
N3

T4 T1 T T3 T
2 4

T2 T3

Figure 17: Double Rotation: Insertion into an “inner” sub-tree can be fixed by exposing
more of the structure of the sub-tree and then performing 2 rotations — one from :

to , followed by another from 
towards . 
Notice that a single rotation can be used in the cases where the insertion has occurred
down the outside of the tree. A double rotation is only required when we have inserted
into an internal sub-tree. The double rotation rearranges the nodes and sub-trees in the
manner indicated — Figure 17 — and also ensures that the balance of the sub-tree (and
thus the whole tree) is returned to its state prior to insertion. A double rotation can be
viewed as two single rotations: one from :
towards 
, followed by one from 
towards . 
Essentially we use the following approach when inserting nodes into the tree:
1. Perform the insertion, noting the path we took to reach the insertion point;
2. Move up the tree from the insertion point and check for an imbalance;
3. If we reach the root and all is well, we’re done;
4. otherwise, we check to see whether the insertion was into an “inner” or “outer”
sub-tree of the left or right child of the unbalanced node.
5. For outer sub-tree insertion, we use a single rotation, otherwise we use a double
rotation.
Consider Figure 18. Note that the order in which we insert the nodes determines the
structure of the resulting AVL tree. Insertion of 1 and 7 do not cause any problems.
However, when we insert 12, the root node becomes unbalanced (indicated by a box).
We see that we have inserted into an “outer” sub-tree on the right child: we therefore
require a single rotation from 7 towards 1 in order to fix the imbalance. Note that sub-
trees are indicated with triangles — empty sub-trees are shown when necessary so you
can see which rules are are using. After the rotation we have a balanced sub-tree rooted
at node 7.
We then insert 10 and 11. Insertion of 11 causes an imbalance at node 12. We see
that we inserted into the right sub-tree of a left child node. This “zig-zag” pattern is
characteristic of a double rotation, so we resolve the right sub-tree further. In this case

29
Insert 1, 7

1 Insert 12 1 Imbalance: rot 7−>1 7

7
7 1 12

12
Insert 10, 11

7
7
drot: 11−>10−>12
1 12

1 11

10

10 12

11

Insert 15

7 rotate: 11−>7 11

1 11 7 12

1 10 15
10 12

15

Figure 18: AVL Insertion:

30
Insert 8, 10.5, 9 11

7 12

1 10 15

8 10.5

11

double rot: 8−>10−>7


8 12

7 10 15

1 9 10.5

Figure 19: AVL Insertion (ctd.):

31
we have the nodes 10, 11 and 12 which will be involved in the double rotation. All
the sub-trees are empty, however. This does not matter! We rotate from 11 towards
10, and then from 11 towards 12, giving us the tree shown on the right. As before, this
transformation restores the local balance of the tree to its state prior to insertion, so we
do not need to fix things further up in the tree.
We now insert 15. This causes an imbalance at the root. We see that we inserted into the
right sub-tree of the right child of the root. Such an “outer” insertion is characteristic of
a single rotation. We rotate 11 towards 7 to fix the tree. We then insert 8 and 10.5 (the
fact that’s its a real number is not important). The tree remains balanced until we insert
9. This introduces an imbalance at the node 7. As before, we see that the insertion
took place into a left sub-tree of the right child of 7 — a zig-zag pattern. This tells us
that we need to use a double rotation to fix the imbalance. The nodes involved will be
7,8,10, with the sub-trees as indicated in Figure 19. The resulting tree is shown on the
right-hand side of the diagram.
Implementation
An AVL Node looks much the same as BST Node, but each node now maintains a
height field, which is the height of that node. By looking at the heights of left and right
children Nodes one can see whether a Node is unbalanced and restore the tree. Re-
member that we only require a single or double rotation to restore the tree completely.
The basic algorithm is as follows:
1. Perform recursive BST tree traversal to find the insertion point (first null Node)
2. Insert the New node
3. Recalculate the height of tree into which we inserted
4. As we back out recursively, check the heights of each left and right child node
5. If they differ by more than 1, apply the appropriate rotation, and we are done.
Note that once we have fixed the unbalanced sub-tree we are done with our task, but
we still have to return from all the recursive function calls we used during the insertion.
However, we know that the balance at all other nodes will be OK, so we will never have
to do another rotation. It is not considered good programming practice to prematurely
terminate the return of a sequence of recursive calls: if efficiency is critical, you can use
an iterative (non-recursive) implementation, but this will require more complex code.
Deletion
Deletion is significantly more difficult, if you apply the formal algorithm. However,
all you really need do is the following (at least when you’re working out examples by
hand!):
1. apply the Standard BST deletion algorithm to find and delete the node,
2. starting at the deletion point, move towards the root, checking the node balance
at each step,

32
3. if you find a node out of balance, apply a single or double rotation to fix it; to
determine which to apply,
(a) identify the sub-trees attached to the left and right children of the unbal-
anced node;
(b) note which sub-tree is deeper, an internal or external one (in the former
case, resolve more structure);
(c) identify the Nodes
?„ leading to this sub-tree and then apply the fix; con-
tinue upwards.
4. you may have to apply a rotation at each node as you continue towards the root.
The root itself may have to be rebalanced.
As this algorithm implies, in the worst case you would incur '( 6n79 
rotations to
rebalance the tree after a deletion. This is not ideal, since a rotation requires a fair
amount of logic to implement.
Consider Figure 20. Remember that we apply the standard BST deletion strategy to
remove nodes from an AVL tree. We begin by deleting 10 (which is a leaf so this is
trivial).This causes an imbalance at node 20. To identify the kind of rotation we require
to rebalance the tree, we see that the left-subtree of the right child of 20 is deeper than
the right sub-tree (which is empty). We thus require a double rotation, resulting in a
new sub-tree with a root of 22. Unfortunately, as we continue moving up the tree we
see that out manipulations have unbalanced a node higher up! In this case the root of
the entire tree is the culprit: we can at least be sure that no additional rotations will be
required once we restore the balance of the root. We identify the left and right children
of the root node, and look at the structure of their sub-trees. We see that the outer-most
sub-tree is the deepest, which immediately tells us that a single rotation will suffice. We
thus rotate from 80 towards 30, fixing the balance of the root node and thus terminating
the deletion procedure.
We then wish to delete 30. We see that 30 has 2 children: we thus invoke the BST
deletion strategy and replace 30 with the smallest value down its right sub-tree, before
proceeding down the sub-tree to delete that node. In this case the tree balance has not
been affected. We now remove 70. This is a leaf, so we can delete it easily enough, but
in doing so we unbalance the node 60. Once more, we identify the sub-trees of the left
and right children of this node and see that we can get away with a single rotation from
22 towards 605 . Note that this rotation does NOT reduce the height of the sub-tree, but
it is sufficient to ensure that the AVL property is restored.
We then delete 25 (a leaf) which does not damage the tree. Deletion of 80 requires that
we replace it with the smallest value down its right sub-tree and proceed to delete that
node. We thus copy 90 into 80 and delete the node 90 in the right sub-tree. As usual,
we use the BST deletion process: we simply link the original 90’s child to its parent,
“bypassing” the node we wish to delete. Deletion of 22 follows the same logic.
5 Here the two sub-trees are of equal depth so both rotations would work. We use a single rotation since

its cheaper.

33
DELETE 10
30

20 80 drot 22−>25−>20

10 25 70 100

22 60 90 110

95

30 80
rotate: 80−>30

22 80 30 100

20 25
70 100 22 70 90 110

60 90 110 20 25 60 95

95

80

60 100
Delete 30: 60−>30, del old node

22 70 90 110

20 25 95

Figure 20: AVL Deletion:

34
Delete 70 Delete 25

80 rotate: 22−>60 80

60 100 22 100

22 70 90 110 20 60 90 110

20 25 95 25 95

Delete 80: copy 90−>80, del old node 90


90
Delete 22: copy 60−>22, del old node 60

60 100

20
95 110

Delete 95, 110, 100

90 rotate: 60−>90
60

60 100
20 90

20
95 110

Figure 21: AVL Deletion (ctd.):

35
Finally, we delete 95, 110 and 100 in succession. The final deletion causes the tree
root to become unbalanced. We see that an outer sub-tree is deeper and thus perform a
single rotation from 60 towards 90.
Implementation
The formal implementation of deletion involves a fairly large number of test cases
according to which you select either a single or double rotation. Unfortunately, unlike
insertion, fixing a local sub-tree after deletion may not return its height to what it was
prior to that deletion. This means that the tree may become unbalanced further up,
requiring still more rotations. The standard implementation is recursive (as one would
expect), and uses the usual branching tests as you move towards the deletion point. It
mirrors the BST deletion code, until you actually perform the deletion. In this case the
various cases for fixes via rotation have to be enumerated, and the fixes applied. We are
guaranteed that after we do this the sub-tree rooted at the node out of balance will be
balanced once more. We can then recompute its height for later use and recurse back
up the tree. One tricky aspect of the code is the way in which you recalculate the node
heights efficiently as you recurse back up the tree.
There are numerous Java code implementations performing both insertion and deletion
and rather than getting bogged down in details, we refer the reader to those. In prac-
tise, the more efficient Red-Black tree ADT is used to implement a balancing scheme.
Unfortunately this is a rather sophisticated data structure, which falls beyond the scope
of these notes.

4.4 Other Tree Representations

'( &
There are a multitude of tree ADT’s used for data storage and manipulation. The reason

'( 6n79 &


is obvious: a well balanced tree reduces access times from (for a sequential
scan) to . To achieve this we need to constrain the insertion and deletion
procedure. We have already seen one way of doing this — the AVL tree. Two other
popular methods are:
Red-Black Trees A Red-Black tree uses a node colouring property to enforce a bal-
ance constraint. Nodes are coloured either Red or Black as they are inserted, and
the insertion and deletion procedures are modified to ensure that a consistently
coloured tree emerges at each step. A Red-Black tree obeys the following rules:
1. The root node is coloured black;
2. We may not have two consecutive red nodes;
3. every path from the root to a null reference has the same number of black
nodes.
A Red-Black tree tends to be a little deeper than an AVL tree (still '( 6879 
),
but has the benefit that deletion will not require transformations all the way back
to the root. It is now the preferred data structure for binary tree manipulations.

36
v v
is one that can have at most
v
B-Trees A B-tree is an -ary tree which satisfies a set of constraints. An -ary tree
children per node. A B-tree differs from most
tree’s in that it grows level by level. The tree properties are as follows:
1. data is stored in special leaf nodes;
2. non-leaf nodes contain at most
v4e~ search keys, and
v node references;
3. root is either a leaf node, or has 2 to
v children;
4. other non-leaf nodes have between … and
v children;
5. leaf nodes are at the same depth and contain † to ‡ data entries.
The values
v ‡
and are chosen based on the application, but are usually re-
lated to disk-block size. B-trees are the primary data structure used in large

'( 6879 …  v
database systems since they ensure very fast query times. For a B-tree we have
for data nodes, due to -way branching.

37

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy