Block 4
Block 4
1.0 Introduction
1.1 Objectives
1.2 Some Preliminaries to P and NP Class of Problems
1.2.1 Tractable Vs Intractable Problems
1.2.2 Optimization Vs Decision Problems
1.2.3 Deterministic Vs Nondeterministic Algorithms
1.3 Introduction to P, NP, and NP-Complete Problems
1.3.1 P Class
1.3.2 NP Class
1.3.3 NP Complete
1.4 The CNF Satisfiability Problem
– The Fist NP- Complete Problem
1.5 Summary
1.6 Solution to Check Your Progress
1.0 INTRODUCTION
Until now we have studied a large number of problems and developed efficient
solutions using different problem solving techniques. After developing
solutions to simple algorithms (sorting, polynomial evaluation, exponent
evaluation, GCD) , we developed solutions to more difficult problems (MCST,
single source shortest path algorithms, all pair shortest path algorithms,
Knapsack problem, chained matrix multiplications, etc) using greedy, divide
and conquer technique and dynamic programming techniques. We also
formulated some problems as optimization problems, for example, Knapsack
and a single source shortest path algorithm. But so far we have not made any
serious effort to classify or quantify those problems which cannot be solved
efficiently. In this unit as well as in the subsequent unit we will investigate a
class of computationally hard problems. In fact, there is a large number of such
problems. In the last unit of this block, we will show how a computationally
hard problem can be solved through an approximation algorithm.
1.1 OBJECTIVES
The general view is that the problems are hard or intractable if they can be
solved only in exponential time or factorial time. The opposite view is that the
problems having polynomial time solutions are tractable or easy problems.
Although the exponential time function such as 2 grows more rapidly than
any polynomial time algorithms for an input size n, but for small values of n,
intractable problems for with exponential time bounded complexity can be
more efficient than polynomial time complexity. But in the asymptotic analysis
of algorithm complexity, we always assume that the size of n is very large
The Knapsack problem and the single source shortest path problem have been
formulated as optimization problems. The former is defined as maximization
optimization problem (i.e. maximum profit) and the latter is defined as
minimization optimization problem (minimum cost of a path). Depending upon
the problem, the optimal value is minimum (single source shortest path) or
maximum (Knapsack problem) which is selected among a large number of
candidate solutions. Optimization problems can be formally defined as:
An Optimization problem is one in which we are given a set of input values, which
are required to be either maximized or minimized w. r. t. some constraints or
conditions. Generally an optimization problem has n inputs (call this set as input
domain or Candidate set, C), we are required to obtain a subset of C (call it solution
set, S where, 𝑆 ⊆ 𝐶) that satisfies the given constraints or conditions. Any subset
S⊆C, which satisfies the given constraints, is called a feasible solution. We need to
find a feasible solution that maximizes or minimizes a given objective function. The
feasible solution that maximizes or minimizes a given objective function, is called an
optimal solution. For example, find the shortest path in a graph, find the minimum
cost spanning tree, 0/1 knapsack problem and fractional knapsack problem.
Decision Problems:
3
Introduction to NP- The 0-1 Knapsack Optimization Problem- Given the number of items, profit
Completeness
and weight of each item and maximum weight of a Knapsack, the 0-1
Knapsack optimization problem determines the maximal total profit of items
that can be placed in the Knapsack
In this example {V2, V3 and V5} in a clique where as {V2, V3 and V4} is not a
clique because V2 is not adjacent to V4. A maximal clique contains maximum
number of vertices. For ex. In the given graph a maximal clique is {V1, V2, V3, V5}
4
NP- Completeness and
Approximation Algorithm
The clique optimization problem finds out the size of a maximal clique for a given
graph. Given a graph and some integer value K, the clique decision problem
determines whether there is a clique containing at least K vertices.
1.3.1 P Class
5
Introduction to NP- An algorithm solves a problem in polynomial time if its worst -case time
Completeness
complexity belongs to O(p(n)) where n is a size of a problem and p(n) is
polynomial of the problem’s input size n. Problems can be classified as
tractable and intractable problems. Problems with polynomial time solutions
are called tractable, problems which do not have polynomial time solutions
are called intractable.
Problems with the following worst- case time complexities do not belong to
polynomial class of problems:
2 , n!, 2√
As we have seen in the previous section that many optimization problems can
be formulated as decision problems which are easier to implement. For
example, consider a graph coloring problem. Instead of asking a minimum
number of colors needed to color the vertices of a graph having two adjacent
vertices of a graph in different colors, we should ask whether a graph can be
colored in no more than m colors where m = 1, 2, …. without coloring the
adjacent vertices of a graph in the same color.
1.3.2 NP Class
6
(i) Guessing Stage (Nondeterministic stage): Given a problem NP- Completeness and
Approximation Algorithm
instance, in this stage a simple string S is produced which can be
thought of as a guess (candidate solution ) to the problem instance.
Given an arbitrary program with an arbitrary data, will the program terminate
or not? This type of problem is called undecidable problems. Undecidable
problems cannot be solved by guessing and checking. Although they are
decision problem somehow they cannot be solved by exhaustively checking
the solution space
The third string is the correct guess for the tour. Therefore, the
nondeterministic stage of the algorithm is satisfied. In general, function
verifyat the verification stage return True if the guess for a particular instance
is correct ,otherwise the verify function does not return true.
7
Introduction to NP- Pseudocode
Completeness
Boolean verify(weighted graph G, d, S)
return True;
else
return False;
The algorithm first checks to see whether S is indeed a tour. If the sum of
weights is no greater than d, it returns “True”.
2
B
A
6 7
3 4
5
C D
What other decision problem are in NP? There are many such problems. Just
to add few examples here: Knapsack, Graph coloring problem, clique, etc.
- Finally there are a large number of problems that are trivially in NP. It
means that every problem in P is also in NP as shown in the following
diagram( figure3):
T.
8
NP- Completeness and
Approximation Algorithm
NP Figure 3: P as a Subset of NP
NP Complete Problems – NP complete problems are the most difficult problem, also
called hardest problems in the subset of NP-class. It has been shown that there are
large class of problems in NP which are NP-Complete problems as shown in figure 3
9
Introduction to NP-
Completeness
NP
NP Complete
The set of NP-Complete problems keeps on growing all the time. It includes problems
like Traveling Salesperson Problem, Hamiltonian Circuit Problem, Satisfiability
Problem, etc. The list is becoming larger. The common feature is that there is no
polynomial time solution for any of these problems exists in the worst cases. In other
words, NP-Complete problems usually take super polynomial or exponential time or
space in the worst cases.
It is very unlikely that all the NP-Complete problems can be solved in polynomial
time but at the same time no one has claimed that NP-Complete problems cannot be
solved in polynomial time.
The main take away from the above discussion is that if we have two problems: X and
Y, X is a NP-Complete problem and Y is known to be in NP. Further, X is
polynomially reducible to Y so that we can solve X using Y. Since X is NP-Complete,
every problem in NP can be reduced to it in polynomial time
10
NP- Completeness and
Approximation Algorithm
Let us recall that for a new problem to be NP-Complete problem , it must be in NP-
class and then a known NP- Complete problem must be polynomially reduced to it. In
this unit we will not show any reduction example, because we are interested in the
general idea. Some examples of reductions will be covered in the next unit. But one
might be wondering how the first NP-Complete problem was proven to be NP-
Complete and how the reduction was done. The satisfiability problem was the first
NP-Complete problem ever found. The problem can be formulated as follows: Given
a Boolean expression, find out whether the expression is satisfiable or not. Whether an
assignment to the logical variables that gives a value 1(True)
A logical variable, also called a Boolean variable is a variable having only one of the
two values: 1 (True) or False (0). A literal is a logical variable or negation of a logical
variable. A clause combines literals through logical or(V) operator(s). A conjunctive
normal form (CNF) combines several clauses through logical and ( ∧) operator(s)
𝑋 𝑋 (𝑋 𝑋 𝑋 ) ⋀ (𝑋 ⋁𝑋 ⋁𝑋 )
Circuit Satisfability Decision Problem asks for a given logical expression in CNF,
Whether some combination of True and False values to the logical variables makes
the output of the expression as True.
For instance, if we assign X1 = True ,X2=False and X3 = True, then The following
logical expression in CNF satisfiability is Yes because the assignments of True and
False values to the logical variables make the Boolean expression True.
(𝑋 ∨ 𝑋 ∨ 𝑋 ) ∧ X1 ∨ 𝑋 ∧𝑋
But the assignments of X1= True, X2 – True and X3 – True make the following
Boolean expression False.
(𝑋 ∨ 𝑋 ) ∧ 𝑋 ∧ 𝑋
Therefore the answer to CNF Satisfiability is False.
Since it is not difficult to write a polynomial time algorithm to evaluate a Boolean
expression and check whether the result is true, CNF Satisfiability problem is in NP.
But to show that CNF satisfiability is NP-Complete, Cook applied the fact that every
problem in NP is solvable in polynomial time by a nondeterministic Turing machine.
Cook demonstrated that the actions of Turing machine can be simulated by a lengthy
and complicated Boolean expression but still in polynomial time. The Boolean
11
Introduction to NP- expression would be true if and only if the program being run by a nondeterministic
Completeness
Turing machine produced a Yes answer for its input.
Several new problems such as Hamiltonian Circuit problem, TSP problem, Graph
Coloring problem, etc. were proven to be NP-Complete after the satisfiability problem
was shown to be NP-Complete.
1.5 Summary
In this unit the following topics were discussed:
Three classes of problems in terms of its time complexities in the
worst cases: P, NP, NP-Complete.
Relationship between P, NP, NP-Complete
Differences between tractable and intractable problems, optimization
and decision problems and deterministic and nondeterministic
algorithms
CNF Satisfiability Problem
12
Ans. A famous decision problem which is not in NP problem is the Halting NP- Completeness and
Approximation Algorithm
problem, which is defined as follows:
Given an arbitrary program with an arbitrary data, will the program terminate
or not? This type of problem is called undecidable problems. Undecidable
problems cannot be solved by guessing and checking. Although they are
decision problem somehow they cannot be solved by exhaustively checking
the solution space
Ans. Circuit Satisfiability Decision Problem asks for a given logical expression
in CNF, Whether some combination of True and False values to the logical
variables makes the output of the expression as True
13
UNIT 2 NP-COMPLETENESS AND NP-HARD
PROBLEMS
Structure Page No
2.0 Introduction
2.1 Objectives
2.2 P Vs NP-Class of Problems
2.3 Polynomial time reduction
2.4 NP-Hard and NP-Complete problem
2.4.1 Reduction
2.4.2 NP-Hard Problem
2.4.3 NP Complete Problem
2.4.4 Relation between P, NP, NP-Complete and NP-Hard
2.5 Some well-known NP-Complete Problems-definitions
2.5.1 Optimization Problems
2.5.2 Decision Problems
2.6 Techniques (Steps) for proving NP-Completeness
2.7 Proving NP-completeness (Decision problems)
2.7.1 SAT (satisfiability) Problem
2.7.2 CLIQUE Problem
2.7.3 Vertex-Cover Problem (VCP)
2.8 Summary
2.9 Solutions/Answers
2.10 Further readings
2.0 INTRODUCTION
First, let us review the topics discussed in the previous unit. In general, a class of
problems can be divided into two parts: Solvable and unsolvable problems. Solvable
problems are those for which an algorithm exist, and Unsolvable problems are those for
which algorithm does not exist such as Halting problem of Turing Machine etc. Solvable
problem can be further divided into two parts: Easy problems and Hard Problems.
Class of Problem
Solvable Unsolvable
1
A Problem for which we know the algorithm and can be solved in polynomial time is
called a P-class (or Polynomial -Class) problem, such as Linear search, Binary Search,
Merge sort, Matrix multiplication etc.
There are some problems for which polynomial time algorithm(s) are known. Then there
are some problems for which neither we know any polynomial-time algorithm nor do
scientists believe them to exist. However, exponential time algorithms can be easily
designed for such problems. The latter class of problems is called NP (non-deterministic
polynomial). Some P and NP problems are listed in table-1.
We Know that Linear search takes O(n) times and Binary Search takes 𝑂(𝑙𝑜𝑔𝑛)time,
Researchers are trying to search an algorithm which takes lesser time than 𝑂(𝑙𝑜𝑔𝑛),may
be 𝑂(1). We know that the lower bound of any comparison based sorting algorithm is
𝑂(𝑛𝑙𝑜𝑔𝑛),we are searching a faster algorithm than 𝑂(𝑛𝑙𝑜𝑔𝑛), may be 𝑂(𝑛).
2 , 3 , 5 , …. takes more time than any polynomial algorithms 𝑛 , 𝑛 , … . , 𝑛 .Even if, for
example, 𝑛 ∨ 𝑛 is smaller than 2 for some large value of n. Researchers from
Mathematics or Computer science are trying to write (or find) the polynomial time
algorithm for those problems for which till now no polynomial time algorithm exist. This
is a basic framework for NP-Hard and NP-Complete problems. The hierarchy of classes
of problem can be illustrated by following figure1.
Figure-2: Hierarchy of classes of problems
2.1 OBJECTIVES
After studying this unit, you should be able to:
Proving NP completeness
Many problems from graph theory, combinatorics can be defined as language recognition
problems which require Yes/No answer for each instance of a problem. The solution to
the problem formulated as a language recognition problem can be solved by a finite
3
automata or any advanced theoretical machine like Turing machine ( refer to MCS-
212/Block2/Unit 2)
Using a formal language theory we say that the language representing a decision problem
is accepted by an algorithm A is the set of strings L = { x ∈ (0,1) ; A(x) =1}. If an
input string x is accepted by the algorithm, then A(x) = 1, and if x is rejected by the
algorithm, then A(x) = 0.
Let us introduce now important classes of languages :P, NP,NP- Complete class of
languages:
If there exist a polynomial time algorithm for L then problem L is said to be in class P
(Polynomial Class), that is in worst case L can be solved in 𝑛 , where n is the input
size of the problem and k is positive constant.
In other word,
Note: Deterministic algorithm means the next step to execute after any step is unambiguously
specified. All algorithms that we encounter in real-life are of this form in which there is a
sequence of steps which are followed one after another.
Example: Binary Search, Merge sort, Matrix multiplication problem, Shortest path in a
graph (Bellman/Dijkstra’s algorithm) etc.
NP is the set of decision problems (with ‘yes’ or ‘no’ answer) that can be solved by a Non-
deterministic algorithm (or Turing Machine) in Polynomial time.
Let us elaborate Nondeterministic algorithm (NDA). In NDA, certain steps have deliberate
ambiguity; essentially, there is a choice that is made during the runtime and the next few
steps depend on that choice. These steps are called the non-deterministic steps. It should be
noted that a non-deterministic algorithm is an abstract concept and no computer exists that
can run such algorithms.
Here we assume that Line No 2 of 𝑁𝐷𝐿𝑆𝑒𝑎𝑟𝑐ℎ(𝐴, 𝑛, 𝑥) takes 1 unit of time to find the
location of searched element x (i.e. index j). So the overall time complexity of this
algorithm is 𝑂(1). Here, line no. 2 of the algorithm is nondeterministic (magical). In j is
chosen correctly, the algorithm will execute correctly, but the exact behavior can only be
known when the algorithm is running.
An DA algorithm is said to be correct if for every input value its output value is correct.
Since the actual behaviour of an NDA algorithm is only known when it is running, we
define the correctness of an NDA algorithm differently than above. An NDA algorithm is
said to be correct if for every input value there are some correct choices during the non-
deterministic steps for which the output value is correct. In the above example, there is
always some correct value of j in line no. 2 for which the algorithm is correct. Note that j
can depend on the input. For example, if A[1]=x, then j=1 is a correct choice, and if A does
not contain x, then any j is a correct choice. Hence, the above algorithm is a correct NDA
for the search problem.
“A nondeterministic algorithm (NDA) but takes polynomial time”. It should not difficult
to view a deterministic algorithm as also a non-deterministic algorithm that does not
make any non-deterministic choice. Thus, if there exists a DA for a problem, then the
same algorithm can also be thought of as an NDA; what this means is that if a DA exists
for a problem, then an NDA also exists for a problem. Answer to to the converse question
is not always known. Further, if the DA algorithm is polynomial-time then the NDA
algorithm is also polynomial-time. So, we can say the following relationship hold
between P and NP class of problem.
𝑃 ⊆ 𝑁𝑃 ……… (1)
NP
5
A central question in algorithm design is the P=NP question. We discussed above that if a
problem has a polynomial time DA then it has a (trivial) polynomial-time NDA. The
P=NP question asks the converse direction. Currently we know many problems for which
there are polynomial time NDA. The question asks whether a polynomial-time NDA can
be always converted to a polynomial-time DA. This is effectively asking whether the
non-deterministic choices made by an NDA algorithm can be completely eliminated,
maybe at the cost of a slight polynomial increase in the running time. The scientist
community believes that the answer is no, i.e., P is not equal to NP, in other words, there
are NP problems which cannot be solved in polynomial time. However, the exact answer
is not known even after several decades of research.
In general, any new problem requires a new algorithm. But often we can solve a problem
X using a known algorithm for a related problem Y. That is, we can reduce X to Y in the
following informal sense: A given instance x of X is translated to a suitable instance y of
Y, so that we can use the available algorithm for Y. Eventually the result of this
computation on y is translated back, so that we get the desired result for x. Let us
consider an example to understand the concept of a reduction step.
Suppose we want an algorithm for multiplying two integers, and there is already an
efficient algorithm available that can compute the square of an integer. It needs S(n) time
for an integer of n digits. Can we somehow use it to multiply arbitrary integers a, b
efficiently, without developing a multiplication algorithm? Certainly, squaring and
multiplication are related problems. More precisely, we can use the identity 𝑎𝑏 =
( ) ( )
. We only have to add and subtract the factors in O(n) time, apply our
squaring algorithm in S(n) time, and divide the result by 4, which can be easily done in
O(n) time, since the divisor is a constant. Thus, we have reduced multiplication (problem
X) to squaring (problem Y) as follows. We have taken an instance of X (factors a, b),
transformed it quickly into some instances of Y (namely 𝑎 + 𝑏 and 𝑎 − 𝑏), solved these
instances of Y by the given squaring algorithm, and finally applied another fast
manipulation on the results (addition, division by 4) to get the solution 𝑎𝑏 to the instance
of problem X. It is essential that not only a fast algorithm for Y is available, but the
transformations are fast as well.
1. Solving a problem X with help of an already existing algorithm for a different problem
Y.
Note that (1) is of immediate practical value, and even usual business: Ready-to-use
implementations of standard algorithms exist in software packages and algorithm
libraries. One can just apply them as black boxes, using their interfaces only, without
caring about their internal details. This is nothing but a reduction!
Point (2) gives us a way to classify problems by their algorithmic complexity. We can
compare the difficulty of two problems without knowing their “absolute” time
complexity. If Y is at least as difficult as X, then research on improved algorithms should
first concentrate on problem X.
Reductions-Formal definition:
Reduction is a general technique for showing similarity between problems. To show the
similarity between problems we need one base problem. A procedure which is used to
show the relationship (or similarity) between problems is called Reduction step, and
symbolically can be written as
𝐴≼ 𝐵
Let us understand the concept of reduction using mathematical description. Suppose that
there are two problems, A and B. You know (or you strongly believe at least) that it is
impossible to solve problem A in polynomial time. You want to prove that B cannot be
solved in polynomial time. How would you do this?
We want to show that
(𝐴 ∉ 𝑃) ⇒ (𝐵 ∉ 𝑃) ----- (2)
To prove (2), we could prove the contrapositive
(𝐵 ∈ 𝑃) ⇒ (𝐴 ∈ 𝑃) [Note: (𝑄 → 𝑃is the contrapositive of 𝑃 → 𝑄]
In other words, to show that B is not solvable in polynomial time, we will suppose that
there is an algorithm that solves B in polynomial time, and then derive a contradiction by
showing that A can be solved in polynomial time.
How do we do this? Suppose that we have a subroutine that can solve any instance of
problem B in polynomial time. Then all we need to do is to show that we can use this
subroutine to solve problem A in polynomial time. Thus we have “reduced” problem A to
problem B.
It is important to note here that this supposed subroutine is really a fantasy. We know (or
strongly believe) that A cannot be solved in polynomial time, thus we are essentially
proving that the subroutine cannot exist, implying that B cannot be solved in polynomial
time.
yes yes
instance a polynomial-time instance B polynomial-time
reduction algorithm algorithm to decide B
of A of B no
no
Polynomial-time algorithm to decide A
7
Reduction is a general technique for showing similarity between problems. To show the
similarity between problems we need one base problem.
We take SAT (Satisfiability) problem as a base problem.
SAT problem: Given a set of clauses 𝐶1, 𝐶2, . . . , 𝐶𝑚in CNF form, where Ci contains
literals from 𝑥1, 𝑥2, . . . , 𝑥𝑛. The Problem is to check if all clauses are simultaneously
satisfiable.
Note: Cook-Levin theorem shows that SAT is NP-Complete, which will be discussed
later.
To understand the reduction step, consider a CNF-SAT (or simply SAT) problem and
see how to reduce it to the Independent Set (IS) problem.
𝑓(𝑥 , 𝑥 , 𝑥 ) = (𝑥 ∨ 𝑥 ∨ 𝑥 ) ∧ (𝑥 ∨ 𝑥 ∨ 𝑥 ) … … … . . (1)
We know that for n variables, we have total 2 possible values. Since there are 3
variables so 2 =8 possible values (as shown in figure 3). The question asked by the
SAT problem is if there is some of𝑥 , 𝑥 , 𝑥 for which the formula f is satisfiable, That is,
out of 8 possible values of 𝑥 , 𝑥 , 𝑥 does any assignment make the formula 𝑓 TRUE?
𝑥 𝑥 𝑥
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
It can be easily verified that solution to the given formula in (1) will be either
(𝑥 , 𝑥 , 𝑥 ) = (0,0,0) or (𝑥 , 𝑥 , 𝑥 ) = (0,1,0), which evaluate the formula f to 1
(TRUE). So the formula is satisfiable. We do not know of any polynomial time algorithm
to find this out for arbitrary formulas, and all known algorithms run in exponential time.
In the IS problem, a graph G and an integer k is given and the question is to find out if
there is a subset of vertices with at least k vertices such that there is no edge between
any two vertices in that subset. Such a subset is known as an independent set.
A reduction from SAT to IS is an algorithm that converts any SAT instance, which is a
Boolean formula (denoted f), to an IS instance, i.e., a graph G and an integer k. The
reduction must run in polynomial time and ensure the following: If the formula is
satisfiable then G must have an independent set with at least k vertices. And if the
formula is not satisfiable then G must not have any independent set with k or more
vertices. The catch is that the reduction must do the above without finding out if f is
satisfiable (the requirement should be obvious since there is no known algorithm to
determine satisfiability that runs in polynomial time).
Even though it may appear surprising how to perform the conversion, it is indeed
possible as is shown in Fig. 3.9. It is easy to check that the formula on the left is
satisfiable (by setting x1=1, x2=0,x3=1,x4=0) and the graph on the right has an
independent set with 3 vertices (x1,x2,x3).
f → f’
K=3
3SAT X1 X2
(X X X )⋀
X3
(X X X )⋀
(X X X ) X1 X2 X3 X2
X3 X1
Fig 3.8 The reduction algorithm from SAT to IS converts the formula on the left to the
pair of integer K and the graph shown on the right. It can be
The meaning of the above statement is “SAT problem is polynomial time reducible to
Independent Set problem”. The implication is that if there exist a polynomial time
algorithm for IS problem then SAT problem can also have polynomial time algorithm.
And furthermore, if there is no polynomial time algorithm for SAT then there cannot be
a polynomial time algorithm for IS. Here SAT problem is taken as a base problem. There
are similar reductions known between thousands of problems, like CLIQUE, Sum-of
subset, Vertex cover problem (VCP), Travelling salesman problem (TSP), 0/1-Knapsack,
etc.
𝐸𝑎𝑠𝑦 → 𝑃
𝑀𝑒𝑑𝑖𝑢𝑚 → 𝑁𝑃
𝐻𝑎𝑟𝑑 → 𝑁𝑃 − 𝐶𝑜𝑚𝑝𝑙𝑒𝑡𝑒
𝐻𝑎𝑟𝑑𝑒𝑠𝑡 → 𝑁𝑃 − 𝐻𝑎𝑟𝑑
The following figure 6 shows a relationship between P, NP, NP-C and NP-Hard
problems:
NP-Hard
Hardest
Hard
NP-Complete
Medium
NP
P Easy
Using the diagram, we assume that P and NP are not the same set, or, in other words, we
assume that 𝑃 ≠ 𝑁𝑃. Another interesting result from the diagram is that there is an
overlap between NP and NP-Hard problem. We call this NP-Complete problem
(problem belongs to both NP and NP-Hard sets).
NP-Complete problems are the “hardest” problems to solve among all NP problems, in
that if one of them can be solved in polynomial time then every problem in NP can be
solved in polynomial time. This is where the concept of reduction comes in. There may
be even harder problems to solve that are not in the class of NP, called NP-Hard
problems.
The study of NP Completeness is important: the most cited reference in all of Computer
Science is Garey & Johnson’s (1979) book Computers and Intractability: A Guide to the
Theory of NP-Completeness. (A text book is the second most cited reference in
Computer Science!).
In 1979 Garey & Johnson wrote, “The question of whether or not the NP-complete
problems are intractable is now considered to be one of the foremost open questions of
contemporary mathematics and computer science.”
Over 30 years later, in spite of a million-dollar prize offer and intensive study by many of
the best minds in computer science, this is still true: No one has been able to either
Prove that there are problems in NP that cannot be solved in polynomial time
(which would mean P ≠NP), or
Find a polynomial time solution for a single NP Complete Problem (which would
mean P=NP).
Now, to understand the concept of NP-Hard problem, let us consider by the following
example.
CNF-SAT problem is well known NP-Hard problem (a base problem to prove other
problems are NP-Hard) and let us call all the other exponential time taking problems (0/1
knapsack, TSP, VCP, Hamiltonian cycle etc.) as a hard problem. We can say that if all
these hard problems are related to CNF-SAT problem and if CNF-SAT is solved (in
polynomial time) then all these hard problems can also be solved in polynomial time.
Symbolically 𝐿 ≼ 𝐿𝑓𝑜𝑟𝑎𝑙𝑙𝐿 ∈ 𝑁𝑃
Let us consider an example to understand the concept of NP-Hard. We have already seen
a CNF-SAT problem is polynomial time reducible to 0/1 knapsack problem, and denoted
as CNF-SAT≼ 0⁄1 𝑘𝑛𝑎𝑝𝑠𝑎𝑐𝑘 ---(1)
Here CNF-SAT is already known NP-Hard problem and this known NP-hard problem is
polynomial time reduces to given problem L (i.e., 0/1 knapsack problem). By writing
CNF-SAT≼ 0⁄1 𝑘𝑛𝑎𝑝𝑠𝑎𝑐𝑘 means that “if 0/1 knapsack problem is solvable in
polynomial time, then so is 3CNF-SAT problem, which also means that, if 3CNF-SAT is
not solvable in polynomial time, then the 0/1 knapsack problem can't be solved in
polynomial time either.”
In other word, here we show that any instance (say 𝐼 ) or formula of CNF-SAT problem
is converted into a 0/1 knapsack problem (say instance 𝐼 ) and we can say that if 0/1
knapsack problem is solved in polynomial time by using an algorithm A then the same
algorithm A can also be used to solve CNF-SAT problem in polynomial time. Note that
reduction (or conversion) step takes polynomial time.
NP-Complete problems are the “hardest” problems to solve among all NP problems. The
set of NP-complete problems are all problems in the complexity class NP, for which it is
known that if anyone is solvable in polynomial time, then they all are, and conversely, if
11
anyone is not solvable in polynomial time, then none are. In other word we can say that if
any problem in NP-Complete is polynomial-time solvable, then P=NP.
1. 𝐿 ∈ 𝑁𝑃
to L).
Cook proved that there exists a Nondeterministic polynomial time algorithm for the
CIRCUIT-SAT problem (similar to 3CNF-SAT), and also showed that any other problem
for which a similar algorithm exists can be reduced in polynomial-time to the CIRCUIT-
SAT problem, thereby discovering the first NP-complete problem. Very quickly many
other problems, like 3CNF-SAT, were discovered to be NP-complete (see Figure 10).
Figure 10 Few early NP-complete problems. An array from A -> B indicates that first A
was proved to be NP-complete, and then B was proved as NP-complete using a
polynomial-time reduction from A.
From figure-9, it is clear that intersection of NP-Hard and NP-class (there exist a
nondeterministic polynomial time algorithm for problem) is NP-Complete problem.
P- class means polynomial time (deterministic) algorithm exist for the problem. NP-class
means nondeterministic polynomial time algorithm exist for those problems. whether
P=NP or not, it’s an open question to all computer scientist. So today we can say
𝑃 ⊆ 𝑁𝑃.
Q1. Differentiate between P, NP, NP-C and NP-Hard Problems with a suitable diagram.
13
C. Both A1 and A2 are in NP
D. Both A1 and A2 are in NP hard
Q5: Consider the following statements about NP-Complete and NP-Hard problems.
Write TRUE/FALSE against each of the following statements
1. The first problem that was proved as NP-complete was the circuit satisfiability
problem.
2. NP-complete is a subset of NP Hard problems, i.e. NP-Complete⊆NP-Hard.
3. SAT problem is well-known NP-Hard as well as NP-Complete problem.
Q6 Which of the following statements are TRUE?
1. The problem of determining whether there exists a cycle in an undirected graph is in P.
2. The problem of determining whether there exists a cycle in an undirected graph is in
NP.
3. If a problem A is NP-Complete, there exists a non-deterministic polynomial time
algorithm to solve A.
We have seen many problems that can be solved in polynomial time such as Binary
search, Merge sort, matrix multiplication etc. The class of all these problems are so
solvable are in P-Class. The following are some well-known problems that are NP-
complete when expressed as decision problem (with ‘YES’ or ‘NO’ answer). In the
previous unit we have defined
2. 3-CNF SAT
6. Clique problem.
SAT:
consisting of variables, parentheses, and
Instance: A Boolean formula ∅consisting
and/or/not operators.
Question: Is there an assignment of True/False values to the variables that
makes the formula evaluate to True?
3-SAT:
Instance: A CNF formula with 3 literals in each clause.
Question: Is there an assignment of True/False values to the variables
that makes the formula evaluate to True?
3. 0/1Knapsack problem:
Given a set of cities and distance between every pair of cities, the problem is to find the
shortest possible route that visits every city exactly once and returns to the starting point.
There is a integer cost 𝐶(𝑖, 𝑗) to travel from city i to city j and the salesman wishes to
make the tour whose total cost is minimum, where the total cost is the sum of individual
costs along the edges of the tour. For example, consider the following graph:
A TSP tour in the graph is A-B-D-C-A. The cost of the tour is 10+25+30+15=80
15
𝐺 = (𝑉, 𝐸)𝑖𝑠 𝑎 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑔𝑟𝑎𝑝ℎ, 𝐶
⎧ ⎫
⎪ 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑇𝑟𝑎𝑣𝑒𝑙𝑙𝑖𝑛𝑔 ⎪
𝑇𝑆𝑃 = ⟨𝐺, 𝐶, 𝑘⟩ 𝑠𝑎𝑙𝑒𝑎𝑚𝑎𝑛 𝑡𝑜𝑢𝑟
⎨ 𝑉 × 𝑉 → 𝑍, 𝑘 ∈ 𝑍 ∧ 𝐺ℎ𝑎𝑠𝑎 𝑤𝑖𝑡ℎ 𝑐𝑜𝑠𝑡 𝑎𝑡 𝑚𝑜𝑠𝑡 𝑘⎬
⎪ ⎪
⎩ ⎭
TSP is a famous NP-Hard problem. There is no polynomial time know solution for this
problem. There are few different approaches to solve TSP. if we use Naïve method (brute
force method), it takes 𝜃(𝑛!) time to solve TSP. Dynamic programming approach takes
𝑂(𝑛 2 ) time to solve TSP.
Input: Given a undirected connected weighted graph 𝐺(𝑉, 𝐸), and an integer k.
Question: Does a graph G has tour of cost at most 𝑘? If P ≠ NP, then we can not find the
minimum-cost tour in polynomial time.
5. Sum-of-Subset problem:
Given a set of positive integer S = {x1, x2, ..., xn} and a target sum K, the decision
problem asks for a subset 𝑆 of S(𝑖. 𝑒. 𝑆 ⊆ 𝑆) having a sum equal to K.
SUBSET_SUM = {⟨S, k⟩: ∃ subset 𝑆 ⊆ 𝑆 such that sum of elements of 𝑆 equal to given
sum k}.
6. CLIQUE Problem:
In any complete graph, if number of vertices |𝑉| = 𝑛 then the total number of edges in
( )
Kn is: |𝐸| = .
Ex1:-
In a graph of EX1, a CLIQUE of size k=4 is possible
Ex:- 2 K=4╳
K=3✓
In this example a CLIQUE of size k=5 or
Ex:- 3
K=4✓
K=3✓
K=2
So,
Decision problem : A graph is having a CLIQUE of size k or not.
For example, consider a graph of Exampe2, question is, Is there a clique of size k=4,
answer is NO. Next Is there a clique of size k=3, the answer is Yes.
Optimization problem: Find what is the max. CLIQUE size of a graph.
17
In other word, a vertex cover is a subset of all of the vertices of a graph such that every
edge in the graph is “covered” or incident to at least one vertex in the vertex cover subset.
The size of the vertex cover, then, is simply the number of vertices in the vertex cover
subset. For Example, the vertex cover set for the following graph, as shown in figure-5 is
{𝑎, 𝑐, 𝑑, 𝑒} ∨ {𝑎, 𝑏, 𝑑, 𝑒}. c
b
a d
e
f
VERTEX COVER:
Instance: A graph G and an integer K.
Question: Is there a set of K vertices in G that touches each edge at least
once?
So, the input to VERTEX COVER is a graph G and an integer k. The algorithm returns
either a certificate/witness, or “no such vertex cover exists”.
b c c
b
a d a d
e
e f f
HAM-CYCLE={𝐺(𝑉, 𝐸}|𝐷𝑜𝑒𝑠𝑎𝑔𝑟𝑎𝑝ℎ𝐺ℎ𝑎𝑣𝑒𝑎ℎ𝑎𝑚𝑖𝑙𝑡𝑜𝑛𝑖𝑎𝑛𝑐𝑦𝑐𝑙𝑒? }
Or
Instance: An undirected graph G = (V, E).
Question: Does G contain a cycle that visits every vertex exactly once?
A. Given a graph G, is there a cycle that visits every vertex exactly once?
B. Given a graph G, is there a cycle that visits every vertex at least once?
C. Given a graph G, is there a cycle that visits every vertex at most once?
Q.3: Suppose a polynomial time algorithm is discovered that correctly computes the
largest clique in a given graph. In this scenario, which one of the following represents the
correct Venn diagram of the complexity classes P, NP and NP Complete (NPC)?
19
(A) NP (B)
P p NP
NPC
NPC
2.8 Summary
There are many problems which have decision and optimization versions, for
example Traveling salesman problem (TSP). Optimization: find Hamiltonian
cycle of minimum weight.
P = set of problems that can be solved in polynomial time. For example: Binary
search, Merge sort, Quick sort, matrix multiplication, Dijkstra’s algorithm etc.
2. If 𝐿 ≼ 𝐿 and 𝐿 ∈ 𝑁𝑃 then 𝐿 ∈ 𝑁𝑃
𝑆𝐴𝑇 = {𝑓: 𝑓 is a given Boolean formula in CNF with n variables and m clauses, Is this
formula 𝑓 satisfiable?}
3𝐶𝑁𝐹 − 𝑆𝐴𝑇 = {𝑓: 𝑓 is a given Boolean formula in CNF with n variables and m
clauses and at most 3 literals per clause, Is this formula 𝑓 satisfiable?}
𝐺 = (𝑉, 𝐸) 𝑖𝑠 𝑎 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒
⎧ ⎫
⎪ 𝑔𝑟𝑎𝑝ℎ, 𝐶 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑇𝑟𝑎𝑣𝑒𝑙𝑙𝑖𝑛𝑔 ⎪
𝑇𝑆𝑃 = ⟨𝐺, 𝐶, 𝑘⟩ 𝑠𝑎𝑙𝑒𝑎𝑚𝑎𝑛 𝑡𝑜𝑢𝑟
⎨ 𝑉 × 𝑉 → 𝑍, 𝑘 ∈ 𝑍 ∧ 𝐺ℎ𝑎𝑠𝑎 𝑤𝑖𝑡ℎ 𝑐𝑜𝑠𝑡 𝑎𝑡 𝑚𝑜𝑠𝑡 𝑘⎬
⎪ ⎪
⎩ ⎭
21
2.9 Solutions/Answers
Answer1:
P = set of problems that can be solved in polynomial time. For example: Binary search,
Merge sort, Quick sort, matrix multiplication, Dijkstra’s algorithm etc.
NP = set of problems for which a solution can be verified in polynomial time. Examples:
0/1 Knapsack, TSP, 3-CNF SAT, CLIQUE, VCP etc.
(Objective Questions)
Answers 2-B
Answer 3-D
Answer 4-A
Answer 1 : Option A
It is trivial to visit every vertex in the graph and return to the starting vertex if
we are allowed to visit a vertex any number of times.
Answer 2: Option E, F
Any graph cycle detection algorithm can be used to identify if a graph has any
cycle; such algorithms run in polynomial time.
Answer 3: Option D
2.11 FURTHER READINGS
**********************************
23
Design and Analysis of
UNIT 3 HANDLING INTRACTABILITY Algorithms
Structure
3.0 Introduction
3.1 Objectives
3.2 Intelligent Exhaustive Search
3.2.1 Backtracking
3.2.2 Branch and Bound
3.3 Approximation Algorithms Basics
3.4 Summary
3.5 Solution to Check Your Progress
3.0 INTRODUCTION
It has been stated in the previous units that the large class of optimization problems
belong to NP -hard class of problems.It is widely accepted that NP-hard problems
are intractable problems, i.e., although not yet proven, it appears that such
problems do not have polynomial time solutions.Their time complexities are
exponential in the worst cases. Some examples of intractable problems are traveling
salesman problem, vertex cover problems and graph coloring problems. Besides these
combinatorial problems, there are several thousand computational problems in
different domains like, biology, data science, finance and operation research fall into
NP-hard category.
An exhaustive search can be applied to solve these combinatorial problems but it
works when problem instances are quite small. An optimization technique such as
dynamic programming may be applied to find solutions for combinatorial problems
some such problems but it also has the similar limitation as in exhaustive search that
the problem instance should be small. Another limitation of this technique is that the
problems must follow the principle of optimality.
Problem solving techniques such as backtracking and branch and bound
techniques perform better in comparison to exhaustive search. But unlike exhaustive
search, these techniques construct the solutions step by step (considering only one
element at a time) and performs evaluation of the partial solution. In case the results
do not represent a better solution, further exploration of the remaining elements are
not considered. Backtracking and branch and bound techniques apply some kinds of
intelligence on the exhaustive search. It provides efficient solution if it has good
design. But in the worst case it takes exponential time. One real advantage of branch
and bound technique is that it can handle a large problem instance.
We can also apply Tabu search or Simulated Annealing which are heuristic local
search methods or Genetic Algorithms and Particle Swarm Optimization which are
meta heuristic techniques to find out solutions to the optimization problems.
But all these methods in spite of good performance do not ensure rigorous guarantees
for achieving the quality of solution i.e., how far the proposed solution is far away
from the optimal solution. Sometimes we should ask questions whether there is a
possibility of finding near optimal solutions (approximate solution) to combinatorial
optimization problems efficiently? Approximation algorithms have been found to be
efficient algorithms with good quality solutions which are measured in terms of the
maximum distance between the proposed approximate solution and the optimal
1
Handling Intractability solution over all the problem instances. What does it mean? It means that the
approximation algorithms always produce solution quite close to the optimal solution.
The focus of the unit will be to discuss few techniques such as backtracking and
branch and bound techniques and approximation algorithms to handle intractable
problems.
3.1 OBJECTIVES
The main objectives of the unit are to:
3.2.1 Backtracking
Backtracking is a general technique for design of algorithms. It is applied to solve
problems in which components are selected in sequence from the specified set so that
it satisfies some criteria or objective. The backtracking procedure includes depth first
search of a state space tree, verifying whether a node is leading to any solution(called
promising node)or not but dead ends (called non promising node), does backtracking
to the parent of the node if a node is not promising and continuing with the search
process on the next child.
In this section we will use backtracking technique to solve two problems: (i)
Hamiltonian Circuit problem and (ii) Subset Sum problem.
(i) Hamiltonian circuit problem
2
vertex.𝑉 is a starting vertex of a cycle where𝑉 ∈ G and 𝑉 , 𝑉 ……..𝑉 are distinct Design and Analysis of
vertices in the cycle except𝑉 and 𝑉 vertices which are equal. Algorithms
V1 V2
V3
V4 V5
V6
V1
V4
V3 V6 V5
V5 V2 V5 V6 V3
dead dead
V6 end V3 end V2
dead dead
end V2 end
V1
Final solution
3
Handling Intractability In order to explore another Hamiltonian cycle, the process can re-start from the leaf
node of the tree through backtracking .
(ii) Subsetsum Problem
Given a positive integer W and a set S of n positive integer values i.e., S = { 𝑠 , 𝑠 ,..
𝑠 }, the main objective of the Subset Sum problem is to search for all combinations
of subsets of integers whose sum is equal to W. As an example let us take S = {
1,4,6,9} and W = 10, there are two solution subsets : {1,9} and {4,6}. In some cases,
problem instances may not have any solution subset.
We will assume that elements in the set are in the sorted order, i.e., 𝑠 ≤ 𝑠 ≤ ⋯ ≤
𝑠 .
Design of state space tree for subset sum problem:
S = {4,6,7,8} and W = 18
The state space tree is designed as a binary tree. The root of the tree does not represent
any decision about a node. It is simply a starting point of the tree. Its left and right sub
trees include and exclude the element of the set represented by 1 and 0 respectively at
every level.
At level 1 in the left branch of the tree, the second element 𝑠 is included while in the
right subtree it is excluded. A subset of a given set is represented by a path from the
root node to the leaf node in the tree. A path from the root node of the tree to a node at
the 𝑖 level represent the inclusion of the first i numbers in the subset.
Let s’ be the sum of numbers of all the nodes included at the 𝑖 level. If s’ is equal to
W then the problem has a final solution. If we want to find out all the subsets, we need
to do backtracking to the parent of the node or stop if no further subsets are required.
If the sum s’ is not equal to W then the following two inequalities hold and the node
can be declared as non-promising node.
(i) s’ + s > W ( the large value of the sum)
(ii) s’ + ∑ s <W ( the small value of the sum)
4
Design and Analysis of
0 0 Algorithms
0
4 0
6 0
0
0
10 4 6
7 0 7 0 7 0
0
11 4 13 6 0
17 10
x
x x x
x x
8 0
(17+8 > 18) (11+8 > 18)
18 10
x
Promising Solution
s’ + s >W
5
Handling Intractability w – total weight of items selected at a node
p – total profit
bound – total profits of any subset of items
One of the simplest way to calculate the bound is given below:
bound = p + (W-w)(p /w )
The bound at any node is addition of profit p, the total profits of already selected
items with the product of the left over capacity of knapsack (W-w) and the best profit
per weight unit which is :
p /w .
Example: Given a knapsack problem instance, apply the branch and bound to find a
subset which gives the maximum profit. The following is a problem instance:
Knapsack capacity = 10
The following is the state space tree (figure 4)of representing the given instance of
knapsack problem:
node 0
P=0, W=0
bound= 100
node1 (with 1)
without 1 node 2
P=40, W=40
P=40, W=40
bound= 76
bound= 76
node 3 (with 2)
without 2 node 4
W=11
P=40, W=40
bound= 70
node 8
node 7 (with 4)
P=65, W=9
W=12 bound= 65
Figure 3.4: State space tree for knapsack problem using branch and bound technique
6
The root node indicates that no items have been selected as yet. Therefore p=0 and w= Design and Analysis of
0 and the bound is computed as per equation (i) which is Rs.100( 0+ (10-0) (40/4)). Algorithms
The left branch of the tree includes the item1 and which is the only item in the subset
while the right branch excludes the item 1. The total profit p and weight w is Rs 40
and 4 respectively. The bound value is 76 (40 + (10 -4)(42/7)). All three values are
shown at node 1. Node 2 at the right branch excludes item 1.Therefore p = 0 and w =
0, because no item has been selected in the subset at node 2. Bound at this node is60(
0 + (10-0) (42/7)). Compared to node 2, node 1 is more promising for optimization
because it has a larger bound. Node 3 and node 4 are child nodes of node 1 have a
subset with item 1 and item 2or without item 2 respectively. Let us calculate the total
profit(p), total weight(w) and bound at node 3 first. Since w( total weight) of a subset
represented by a node 3 exceeds 10 which is a capacity of the knapsack , this node is
considered non promising. Since item 2 is not included at node 4 values of p and w
are the same as its parent node 2 i.e., p= 40 , w= 4 . The bound= Rs.70( 40 + (10-
4)(25/5)). Compared to node 2, node 4 is selected for expanding the state space tree
because its bound is larger than node 2. Now we move to node 5 and 6 which
represent subsets including and excluding item 3 respectively. The total profit(p) ,
total weight (w) and bound at node 5 are:
p(item1 + item3) = Rs. 40 +Rs 25 = Rs.65
w( item 1 +item3) = 4 + 5 = 9
bound= 65+ (10 -9)(12/3)= 69
We will repeat the computation of p, w and bound at node 6:
P(item 1) = 40
w(item 1) = 4
bound = 40 + (10 – 4)(12/3) = 64
We will continue with node 5 because of its larger bound. Node 6 and node 7
represent a subset with and without item 4 respectively.
Node 7 is a non promising node because its total weight is 12(9 +3) exceeding the
knapsack capacity.
p and w at node 8( without item 4) are 65 and 9 respectively, same as its parent
bound = 65 + (10-9)(0) =65
There is a single subset {1,3) represented by node 8. The other two nodes 2 and 6
have lower bounds than node 8. Hence node 8 is the final and optimal solution to the
knapsack problem.
7
Handling Intractability
Further suppose that the nodes correspond to points in an Euclidean space, e.g.,
a 3D room, and the distance between any two points is their Euclidean
distance.
There is an algorithm whose solution has the following guarantee:
approxvalue is less than twice optimal value (i.e.<2xoptvalue)
where, optvalue is the optimal solution for the problem and approxvalue is the
approximate solution that the algorithm outputs.
3.4 SUMMARY
Backtracking and branch and bound techniques apply some kinds of intelligence on
the exhaustive search. It takes exponential time in the worst case for the solution of
difficult combinatorial problems but it provides efficient solution if it has good
design. But unlike exhaustive search, these techniques construct the solutions step by
step, (one element at a time) and perform evaluation of the partial solution. If no
solution is achieved by a given result, the tree is not further expanded.
State space tree is a principal mechanism employed by both backtracking and branch
and bound techniques which is a binary tree where nodes represent partial solutions.
Expansion of the tree is stopped as soon it is ensured that no solutions can be achieved
by considering choices that correspond to the node’s descendants.
Approximation algorithms are often applied to find near optimal solution to NP-hard
optimization problems. Approximation ratio is the main metric to measure the
accuracy of the solution to combinatorial optimization problems.
9
Handling Intractability Ans. There are three distinct features of branch and bound technique: (i) unlike
backtracking which traverse the tree in DFS, branch and bound does not restrict
traversing in a particular way (ii) branch and bound technique computes a bound
(value) of a node to decide whether the node is promising or not promising (iii)
generally used for optimization problems
Q3 How is backtracking different from branch and bound technique?
Ans. They differ in terms of types of problems they can solve and how nodes in the
state space tree are generated?
(i) Backtracking generally does not apply to optimization problems whereas
branch and bound technique can be used to solve optimization problems
because it computes a bound on the possible values of the objective
function.
(ii) Depth first search is used to generate a tree in backtracking whereas there
is no such restriction applied to branch and bound to generate a state space
tree.
10