Errors
Errors
ERRORS 3
Since one ains to produce results within some chosen limit of error, it is
useful to consider error propagation. Roughly speaking, based on
experience, the propagated error depends on the mathematical
algorithm chosen, whereas the generated error is more sensitive to
the actual ordering of the computational steps. It is possible to be more
precise, as described below.
Absolute error
Relative error
This error is the ratio of the absolute error to the absolute exact
number, i.e.,
(Note that the upper bound follows from the triangle inequality; thus
Error propagation
The magnitude of the propagated error is therefore not more than the
sum of the initial absolute enors; of course, it may be zero.
Error generation
so that the accumulated enor does not exceed the sum of the
propagated and generated errors. Examples may be found in Step 4.
Example
1. 3.45+4.87-5.16
2. 3.55 x 2.73
There are two methods which the student may consider: The first is to
invoke the concepts of absolute and relative error as defined above.
Thus, the result for 1. is 3.16 +/- 0.015, since the maximum absolute
error is 0.005 + 0.005 + 0.005 = 0.015. We conclude that the answer is
3 (to 1S ), for the number certainly lies between 3.145 and 3.175. In 2. ,
the product 9.6915 is subject to the maximum relative error:
whence the maximum (absolute) error ~ (2.73 + 3.55) x 0.005 ~ 0.03, so
that the answer is 9.7.
and above by
and above by
Checkpoint
EXERCISES
Evaluate the following operations as accurately as possible, assuming
all values to the number of digits given:
1. 8.24 + 5.33.
2. 124.53 - 124.52.
3. 4.27 x 3.13.
4. 9.48 x 0.513 - 6.72.
5. 0.25 x 2.84/0.64.
6. 1.73 - 2.16 + 0.08 + 1.00 - 2.23 - 0.97 + 3.02.
STEP 4
ERRORS 4
Interval arithmetic
The exponents are added and the mantissae are multiplied; the final
result is obtained by rounding (after shifting the mantissa right and
increasing the exponent by 1, if necessary). Thus:
Division
The exponents are subtracted and the mantissae are divided; the
final result is obtained by rounding (after shifting the mantissa
left and reducing the exponent by 1, if necessary). Thus:
(6.18x101+1.84xl0-1)/((4.27x101)x(3.68x101))
(6.20x101)/(1.57x103)=3.94904...x10-2 3.95x10-2
Generated error
Note that all the above examples (except the subtraction and the
first addition) involve generated errors which are relatively large
due to the small number of digits in the mantissae. Thus the
generated error in
2.77x102+7.55x102=10.32x102 1.03x103
Consequences
1/a is 3.33x10-1
and
then
whereas
5. Checkpoint
1. 6.19x102+5.82x102,
2. 6.19x102+3.61x101,
3. 6.19x102-5.82x102,
4. 6.19x 102-3.61x101,
5. (3.60 x 103)x(1.01x10-1,
6. (-7.50x10-1x(-4.44x101,
7. (6.45x102/(5.16xl0-1,
8. (-2.86 x 10-2)/(3.29 x 103).
9. Estimate the accumulated errors in the results of
Exercise 1, assuming that all values are correct to 3S.
10. Evaluate the following expressions, using four-digit
decimal normalized floating point arithmetic with
rounding, then recalculate them, carrying all decimal
places, and estimate the propagated error.
1. Given a = 6.842x10-1, b = 5.685x101, c = 5.641x101,
find a(b-c) and ab-ac.
2. Given a=9.812xl01, b=4.631x+l0-1, c=8.340xl0-1,
find (a+b)+c and a+(b+c).
3. Use four-digit decimal normalized floating point
arithmetic with rounding to calculate f (x)=
tanx - sinx for x = 0.1.
Since
tanx-sinx=tanx(1-cosx)=tanx(2sin2(x/2))
STEP 5
ERRORS
Approximation to functions
1. Taylor series
where
with
.
Note that this expansion has only odd-powered terms, although
the polynomial approximation is of degree (2k - 1 ) - it has only k
terms. Moreover, the absence of even-powered terms means that
the same polynomial approximation is obtained with n = 2k, and
hence R2k-1 = R2k; the remainder term R2k - 1 given above is actually
the expression for R2k. Since then
where
Polynomial approximation
The Taylor series provides a simple method of polynomial
approximation (of chosen degree n)
3. Recursive procedures
Checkpoint
EXERCISES
STEP 6
Approximation to functions
and all students will be familiar with the formula for its roots:
The formula for the roots of a general cubic is somewhat more
complicated and that for a general quartic usually takes several
pages to describe! We are spared further effort by a theorem which
states that there is no such formula for general polynomials of
degree higher than four. Accordingly, except in special cases (for
example, when factorization is easy), we prefer in practice to use a
numerical method to solve polynomial equations of degree higher than
two.
1. A transcendental equation
where is the area of the sector OAB, the triangle OAD. Hence
.
When we have solved the transcendental equation
we obtain h from
FIGURE 2.
Cylindrical tank (cross-section).
2. Locating roots
a)
We now know that the root lies between 1.49 and 1.50, and we
can use a numerical method to obtain a more accurate answer as
is discussed in tlater Steps.
b)
We conclude that the roots lie between 0.15 and 0.2, 1.6 and 1.8,
and 3.1 and 3.2, respectively. Note that the values in the table
were calculated to an accuracy of at least 5SD. For example,
working to 5S accuracy, we
have f (0.15) = 0.97045-
0.79088= 0.17957, which is
then rounded to 0.1796. Thus
the entry in the table for
f(0.15) is 0.1796 and not 0.1795
as one might expect from
calculating 0.9704 - 0.7909.
Checkpoint
EXERCISES
a) x + 2 cos x = 0.
b) x + ex= 0.
c) x(x - 1) - ex= 0.
d) x(x - 1 - sin x = 0.
STEP 7
NONLINEAR EQUATIONS 2
3. Example
(Note that the values in the table are displayed to only 4D.)
Hence the root accurate to three decimal places is 0.258.
Checkpoint
4. When may the bisection method be used to find a root of the
equation f(x) = 0?
5. What are the three possible choices after a bisection value is
calculated?
6. What is the maximum error after n iterations of the bisection
method`?
EXERCISES
x+cosx = 0.
x - 0.2sinx - 0.5=0.
STEP 8
NONLINEAR EQUATIONS 3
1. PROCEDURE
The curve y = f(x) is not generally a straight line. However, one may
join the points (a,f(a)) and (b,f(b)) by the straight line
so that
where we have used the fact that f( )=0. Thus we see that en+1 is
proportional to enen-1, which may be expressed in mathematical
notation as
We seek k such that
3. EXAMPLE
Then
The student may verify that doing one more iteration of the method of
false position yields an estimate x2 = 0.257628 for which the function
value is less than 5*10-6. Since x1 and x2 agree to 4D, we conclude that
the root is 0.2576, correct to 4D.
Checkpoint
EXERCISES
1. Use the method of false position to find the smallest root of the
equation f (x) = 2 sin x + x - 2 = 0, stopping when
3sin x = x + 1/x.
STEP 9
NONLINEAR EQUATIONS
1. Procedure
2. Example
3xex = 1
to an accuracy of 4D.
x = e-x/3 = (x).
Thus, we see that after eight iterations the root is 0.2576 to 4D. A
graphical interpretation of the first three iterations is shown in Fig. 8.
3. Convergence
x = 3/x = (x)
xn+1 = 3/xn.
where k is a point between the root and the approximation xk. We have
whence
Note that (x) = 3/x has derivative | '(x)| = |-3/x?| > 1 for |x| < 3?.
Checkpoint
1. Assuming x0 = 1, show by simple iteration that one root of the
equation 2x - 1 -2sinx = 0 is 1.4973.
2. Use simple iteration to find (to 4D) the root of the equation x + cos
x = 0.
STEP 10
NONLINEAR EQUATIONS 5
1. Procedure
Let x0 denote the known approximate value of the root of f(x) = 0 and h
the difference between the true value and the approximate value, i.e.,
and, consequently,
should be a better
estimate of the root
than x0. Even better
approximations may be
obtained by repetition
(iteration) of the process, which then becomes
2. Example
We will use the Newton- Raphson method to find the positive root of
the equation sin x = x2, correct to 3D.
and
yielding
and
so that
3. Convergence
If we write
may be rewritten
i.e., convergence is not as assured as, say, for the bisection method.
4. Rate of convergence
Since , we find
This result states that the error at the (n + 1)-th iteration is proportional
to the square of the error at the nth iteration; hence, if , an
answer correct to one decimal place at one iteration should be
accurate to two places at the next iteration, four at the next, eight at
the next, etc. This quadratic - second- order convergence - outstrips
the rate of convergence of the methods of bisection and false
position!
f(x) = x2 - a = 0.
Checkpoint
EXERCISES
x cos x = 0.
STEP 11
Hence there arises a need for rapid and accurate methods for solving
systems of linear equations. The student will already be familiar
with solving by elimination systems of equations with two or three
variables. This Step presents a formal description of the Gauss
elimination method for n-variable systems and discusses certain
errors which might arise in their solutions. Partial pivoting, a
technique which enhances the accuracy of this method, is discussed in.
in the next Step.
Obviously, the dots indicate similar terms in the variables and the
remaining (n - 3) equations.
In this notation, the variables are denoted by x1, x2, . . , xn; they are
sometimes referred to as xi , i = 1, 2, ? ?. , n. The coefficients of the
variables may be detached and written in a coefficient matrix:
The notation aij will be used to denote the coefficient of xj in the i-th
equation. Note that aij occurs in the i-th row and j-th column of the
matrix.
The system, now in upper triangular form, has the coefficient matrix:
Solution by back- substitution
First stage: Eliminate the coefficients a21 and a31, using row R1:
The matrix is now in the form which permits back- substitution. The
full system of equations at this stage, equivalent to the original system,
is
First stage: Eliminate the coefficients a21, a31,. . . , an1 by calculating the
multipliers
and then
and then
whence
We continue to eliminate unknowns, going on to columns 3, 4, . . so that
by the beginning of the k-th stage we have the augmented matrix
and then
Notes
1. The diagonal elements akk, used at the k-th stage of the successive
elimination, are referred to as pivot elements
.
2. In order to proceed from one stage to the next, the pivot elements
must be non- zero, since they are used as divisors in the multipliers
and in the final solution. If at any stage a pivot element vanishes,
rearrange the remaining rows of the matrix, in order to obtain a non-
zero pivot; if this is not possible, then the system of linear equations
has no solution.
The manipulations leading to the solution are set out in tabular form
below. For the purpose of illustration, the calculations have been
executed in 3D floating point arithmetic. For example, at the first stage,
the multiplier 0.794 arises as follows:
Checkpoint
EXERCISES
a)
x1 + x2 - x3 = 0,
2x1 - x2 + x3 = 6,
3x1 + 2x2 - 4x3 = -4.
b)
5.6 x + 3.8 y + 1.2z = 1.4,
3.lx + 7.ly - 4.7z = 5.1,
1.4x - 3.4y + 8.3z = 2.4.
c)
2x + 6&127 + 4z = 5,
6x + 19y + 12z = 6,
2x + 8y + 14z = 7.
d)
1.3x + 4.6y + 3.lz= -1,
5.6x + 5.8y + 7.9z = 2,
4.2x + 3.2y + 4.5z = -3.
STEP 12
For any system of linear equations, the question concerning the errors
in a solution obtained by a numerical method is not readily answered. A
general discussion of the problems it raises is beyond the scope of this
book. However, some sources of errors will be indicated below.
In many practical cases, the coefficients of the variables, and also the
constants on the right-hand sides of the equations are obtained from
observations of experiments or from other numerical calculations. They
will have errors; and therefore, once the solution of a system has been
found, it too will contain errors. In order to show how this kind of error
is carried through calculations, we shall solve a simple example in two
variables, assuming that the constants have errors at most as large as
+/- 0.01. Consider the system:
whence 3/2 y lies between 2.985 and 3.015, i.e., y lies between 1.990
and 2.010. From the first equation, we now obtain
If the coefficients and constants of the system were exact, its exact
solution would be x = 1, y = 2. However, since the constants are not
known exactly, it does not make sense to talk of an exact solution; all
one can say is that 0.99 x 1.01 and 1.99 y 2.01.
In this example, the error in the solution is of the same order as that in
the constants. Yet, in general, the errors in the solutions are greater
than those in the constants.
3. Partial pivoting
In Gauss elimination, the buildup of round-off errors may be reduced by
rearranging the equations so that the use of large multipliers in the
elimination operations is avoided. The corresponding procedure is
known as partial pivoting (or pivotal condensation). The general
rule to follow involves: At each elimination stage, rearrange the rows of
the augmented matrix so that the new pivot element is larger in
absolute value than (or equal to) any element beneath it in its column.
In the following tabular solution, the pivot elements are in bold print.
(Note that the magnitude of all the multipliers is less than unity.)
4. Ill-conditioning
Certain systems of linear equations are such that their solutions are
very sensitive to small changes (and therefore to errors) in their
coefficients and constants. We give an example below in which 1 %
changes in two coefficients change the solution by a factor of 10 or
more. Such systems are said to be ill- conditioned. If a system is ill-
conditioned, a solution obtained by a numerical method may differ
greatly from the exact solution, even though great care is taken to
keep round-off and other errors very small.
Checkpoint
EXERCISES
1. Find the range of solutions for the following system, assuming that
the maximum errors in the constants are as shown:
a)
b)
c)
a) Plot the lines of the first system on graph paper; then describe ill-
conditioning in geometrical terms when only two unknowns are
involved.
b) Insert the solution of the first system into the left-hand side of the
second system. Does x=1, y=2 `look like a good solution to the second
system? Comment!
c) Insert the solution of the second system into the left-hand side of
the first system. Comment!
d) The system
STEP 13
The methods used in lhe previous Steps for solving systems of linear
equations are termed direct methods. If a direct method is used, and
round-off and other errors do not arise, an exact solution is reached
after a finite number of arithmetic operations. In general, of course,
round-off errors do arise; and when large systems are being solved by
direct methods, the growing errors can become so large as to render
the results obtained quite unacceptable.
1. Iterative methods
This text will only present one iterative method for linear equations,
due to Gauss and improved by Seidel. We shall use this method to
solve the system
The first step is to solve the first equation for x1, the second for x2, and
the third for x3 when the system becomes:
We begin with the starting vector x(0) = (x(0)1, (x(0)2, (x(0)3) all components
of which are 0, and then apply these relations repeatedly in the order
(1)', (2)' and (3)'. Note that, when we insert values for xl, x2 and x3 into
the right-hand sides, we always use the most recent estimates found
for each unknown.
3. Convergence
The student should check that the exact solution for this system is
(1,1,1). It is seen that the Gauss- Seidel solutions are rapidly
approaching these values; in other words, the method is converging.
Checkpoint
EXERCISES
1. For the example treated above, compute the value of S3, the
quantity used in the suggested stopping rule after the third
iteration.
2. Use the Gauss- Seidel method to solve the following systems to
5D accuracy (remembering to rearrange the equations if
appropriate). Compute the value of Sk (to 6D) after each
iteration.
a)
x - y + z = -7,
20x + 3y - 2z = 51,
2x + 8y + 4z = 25.
b)
10x - y = 1
10
-x + - z = 1
y
- y + 10z - w = 1
10
- z + = 1
w
STEP 14
Matrix inversion*
There are many numerical methods for finding the inverse of a matrix.
We shall describe one which uses Gauss elimination and back-
substitution procedures. It is simple to apply and is computationally
efficient. We shall illustrate the method by application to a 2 x 2 matrix
and a 3 x 3 matrix; it should then be clear to the reader, how the
method may be extended for use with n x n matrices.
such that
Thus:
a. Form the augmented matrix
In this simple example, one can work with fractions, so that no round-
off errors occurred and the resulting inverse matrix is exact. In
general, during hand calculations, the final result should be checked by
computing AA-1 , which should be approximately equal to the identity
matrix I. As the 33 example, consider
,
which is approximately equal to I. The noticeable inaccuracy is due to
carrying out the calculation of the elements of A-1 to 3S only.
Checkpoint
1. In the method for finding the inverse of A, what is the final form
of A after the elementary row operations have been carried out?
2. Is the solution of the system Mx = d, x = dM-1 or x = M-1 d (or
neither)?
3. Give a condition for a matrix not to have an inverse.
EXERCISES
1. 2. 3.
a.
b.
c.
STEP 15
Use of LU decomposition*
Note that in such a matrix all elements above the leading diagonal are
zero. Examples of upper triangular matrices are:
where all elements below the leading diagonal are zero. The product of
L1 and U1 is
1. Procedure
Stage l:
Write Ax = LUx = b.
Stage 2:
Note that the value of yi depends on the values y1, y2, . . , yi-1, which
have already been calculated.
Stage 3:
Example
Stage l:
Stage 2:
Solve
Back-substitution yields:
which you may check, using the original equations. We turn now to the
problem of finding an LU decomposition of a given square matrix
A.
Realizing an LU decomposition
Also, we saw that in the first stage we calculated the multipliers rn21 =
a2l/a11 = 1 / 1= 1 and m3l = a3l/a11 = 2/1 = 2, while, in the second stage, we
calculated the multiplier m32 = a32/a22 = -3/1 = -3. Thus
Checkpoint
EXERCISES
1. Find an LU decomposition of the matrix
where
b.
STEP 16
1. Norms
One of the most common tests for ill-conditioning of a linear system
involves the norm of the coefficient matrix. In order to define this
quantity, we must consider first the concept of the norm of a vector
or matrix, which in some way assesses the size of their elements.
Let x and y be vectors. Then a vector norm ||?|| is a real number with
the properties:
There are many possible ways to choose a vector norm with these
properties. One vector norm which is probably familiar to the student is
the Euclidean or 2-norm. Thus, if x is an n 1 vector, then the 2-norm
is denoted and defined by
2.
3.
4.
As for vector norms, there are many ways of choosing matrix norms
with the four properties above, but we will consider only the infinity
norm. If A is an nn matrix, then the infinity norm is defined by
. As an example, let
Then
so that
2. Ill-conditioning tests
where we have used the matrix norm property 4 given in the preceding
section. Large values of the condition number usually indicate ill-
conditioning. As a justification for this last statement, we state and
prove the theorem:
we see that
It is seen from the theorem that even if the difference between b and
is small, the change in the solution, as measured by the `relative error'
may be large when the condition number is large. It follows
that a large condition number is an indication of possible ill-
conditioning of a system. A similar theorem for the case when there
are small changes to the coefficients of A may be found in more
advanced texts such as Atkinson (1993). Such a theorem also shows
that a large condition number is an indicator of ill- conditioning.
There arises the question: How large has the condition number to be
for ill- conditioning to become a problem? Roughly speaking, if the
condition number is 1m and the machine in use to solve a linear system
has kD accuracy, then the solution of the linear system will be accurate
to k - m decimal digits.
and
We recall that the definition of the condition number requires A-1, the
computation of which is expensive. Moreover, even if the inverse were
calculated, this approximation might not be very accurate, if the
system is ill-conditioned. It is therefore common in software packages
to estimate the condition number by obtaining an estimate of ,
without explicitly finding A-1.
Checkpoint
EXERCISES
1. For the 5 x 1 vector with elements x1 = 4, x2 = -6, x3 = -5, x4 = 1, and x5
= -1, calculate .
STEP 17
Writing as
we conclude from the theorem in STEP 11 that this equation can have a
non-zero solution only if the determinant of |A - I| is zero. If we
expand this determinant, then we get an n-th degree polynomial in
known as the characteristic polynomial of A. Thus, one way to find
the eigen-values of A is to obtain its characteristic polynomial and then
to find the n zeros of this polynomial (some of them may be complex!).
For example, let
Then
2. Example
We will now use the power method to find the largest eigen-value
of the matrix
3. Variants
In the preceding example, the reader will have noted that the
components of w(j) were growing in size as j increases. Overflow
problems would arise, if this growth were to continue; hence, in
practice, it is usual to use instead the scaled power method.
This is identical to the power method except that the vectors w(j)
are scaled at each iteration. Thus, suppose w(0) be given and set
y(0) = w(0). Then we will carry out for j = 1, 2, . , . the steps:
d. calculate .
We shall now discuss the use of the power method to find the
eigen-value of the smallest magnitude. If A has an inverse,
then may be written as
4. Other aspects
Checkpoint
EXERCISES
STEP 18
FINITE DIFFERENCES 1
Tables
1. Tables of values
2. Finite differences
Checkpoint
EXERCISES
1. Construct the difference table for the function f (x) = x3 for x
= 0(1) 6.
STEP 19
FINITE DIFFERENCES 2
There are several different notations for the single set of finite
differences, described in the preceding Step. We introduce each of
these three notations in terms of the so-called shift operator, which we
will define first.
Consequently,
and
then
,
which is the first- order forward difference at xj. Similarly, we
find that
then
then
5. Differences display
EXERCISES
1. ;
2. ;
3. .
8. For the difference table in Section 3 of STEP 18 of f (x) = ex
for x = 0.1(0.05)0.5 determine to six significant digits the
quantities (taking x0 = 0.1 ):
1. ;
2. ;
3. ;
4. ;
5. ;
9. Prove the statements:
1. ;
2. ;
3. ;
4. .
STEP 20
FINITE DIFFERENCES 3
Polynomials
yields
.
Omitting the subscript of x, we find
In passing, the student may recall that in the Differential Calculus the increment is
2. Example
3
Construct for f (x) = x with x = 5.0(0.1)5.5 the difference table:
Whenever the higher differences of a table become small (allowing for round-off noise), the function represented may be
approximated well by a polynomial. For example, reconsider the difference table of 6D for f (x ) = e x with x =
0.1(0.05)0.5:
Since the estimate for round-off error at (cf. the table in STEP 12), we say that third differences are constant
within round-off error, and deduce that a cubic approximation is appropriate for ex over the range 0.1 < x < 0.5. An
example in which polynomial approximation is inappropriate occurs when f(x) = 10 x for x = 0(1)4, as is shown by the next
table:
Although the function f(x) = 10 x is `smooth', the large tabular interval (h = 1) produces large higher order finite differences.
It should also be understood that there exist functions that cannot usefully be tabulated at all, at least in certain
neighbourhoods; for example, f(x) = sin(1/x) near the origin x = 0. Nevertheless, these are fairly exceptional cases.
Finally, we remark that the approximation of a function by a polynomial is fundamental to the widespread use of finite
difference methods.
Checkpoint
1. What may be said about the higher order ( exact) differences of a polynomial?
2. What is the effect of round- off error on the higher order differences of a polynomial?
EXERCISES
1. Construct a difference table for the polynomial f(x) = x 4 for x = 0(0.1)1 when
a.
b. the values of f are exact;
d. Compare the fourth difference round-off errors with the estimate +/-6.
e.
2. Find the degree of the polynomial which fits the data in the table: