1.2 Lesson 2

```# Lesson 2: Floating point computations
# ===========================
#
# As we saw in Lesson 1,  Maple will write decimals or floating-point numbers
# with a certain number of significant digits, specified by the variable "Digits".
# Here are some floating-point numbers, with the default Digits = 10.
> evalf(exp(20)), evalf(exp(-7)), evalf(exp(-20));

9                                -8
.4851651954*10 , .0009118819656, .2061153622*10
# In principle a floating point number should be written as
# (possibly -) .digits * 10^exponent, but Maple doesn't use the exponent if it is
# small.  The exponent can be anywhere from  -9989 to 10009 for a floating
# point number to be written this way.
> bignumber:=evalf(.10*10^10009);

10009
bignumber := .1000000000*10
> smallnumber:= evalf(.10*10^(-9989));

-9989
smallnumber := .1000000000*10
# That should be a big enough range for most purposes, but actually there is
# another notation that is used for exponents outside that range, and these can be
# used for some REALLY big or small numbers.
> bignumber*10, bignumber^1000, smallnumber^10000;

10010                10008001                -99899999
.1000000000*10     , .1000000000*10        , .1000000000*10
# Many implementations of floating point actually store and calculate numbers
# in base 2 (converting to base 10 for output), but Maple actually uses base 10.
# So this calculation has no roundoff error, because 1/5 can be represented
# exactly in base 10:
> evalf(1/5)*5;

1.000000000
# This one, on the other hand, does have roundoff error, because 1/3 can't be
# represented exactly.
> evalf(1/3)*3;

.9999999999
# You can think of a floating point number as representing a whole interval of
# numbers, all of which have the same floating point representation.  Thus
# everything from 1.9999999995 to 2.0000000005 is represented as 2.000000000.
# When you add a small floating point number to a larger number, some
# significant digits of the small number are lost.
> .1234567891*10^(-5)+1;

1.000001235
# In an extreme case, adding two numbers that differ by at least "Digits" orders
# of magnitude, the smaller number is lost altogether.
> 1.+ 10^(-10);

1.000000000
# This can have serious consequences if you are trying to add up many numbers
# that are each very small (e.g. in numerical integration).  Suppose we start with
# 0 and add 10000 numbers, each equal to 4.444444444 * 10^(-4).  The correct
# I'll use a "do loop": the command total:= total + 4.444444444 * 10^(-4) will be
# executed
# 10000 times, once for each integer value of j from 1 to 10000.  Any sequence of
# commands
# could go between the "do" and the "od" (which is "do" backwards).  Note the
# of ";", because we don't want to see 10000 outputs.
> total:= 0:
> for j from 1 to 10000 do
>    total:= total + 4.444444444 * 10^(-4) od:
> total;

4.444440910
# The way to fix this would be to increase Digits: 14 should be enough.
> Digits:= 14:
> total:= 0:
> for j from 1 to 10000 do
>    total:= total + 4.444444444 * 10^(-4) od:
> total;

4.4444444440000
> Digits:= 10:
# Another case where roundoff error can be severe is when we subtract two
# numbers that are almost equal.  This is called "catastrophic cancellation".
> 1.000000001 - 1.000000000;

-8
.1*10
# We subtracted two numbers with 10 significant digits and the result has only
# one significant digit.
# One good way to think about this is in terms of absolute and relative errors.  In
# approximating a "true value" T by an approximate value A, the absolute error
# is |T-A|.  The relative error is |T-A|/|T|.  In representing a number in floating
# point the relative error should be at most 5 * 10^(-Digits).
# When two numbers are multiplied or divided, the relative error of the result is
# at most (approximately) the sum of the two relative errors.  This usually
# doesn't cause any problems.
# When two numbers are added or subtracted, the absolute error of the result is
# at most the sum of the two absolute errors.  If the result is close to 0, this
# means the relative error can be much larger than the relative errors of the
# original numbers.
# Here is an example of catastrophic cancellation affecting the solution of a
# quadratic equation.  Here's the solution formula from high school.
> r:= solve(a*x^2+b*x+c=0, x);

2         1/2              2         1/2
- b + (b  - 4 a c)         - b - (b  - 4 a c)
r := 1/2 ---------------------, 1/2 ---------------------
a                          a
# Let's try it when b is large compared to a and c.
> s:=solve(x^2 + 9751*x + 1 = 0, x);

1/2                         1/2
s := - 9751/2 + 1/2 95081997   , - 9751/2 - 1/2 95081997
# Let's use "evalf" to get floating point answers.
# There's no accuracy problem with the second one.  We get 10 significant digits.
> evalf(s[2]);

-9750.999898
# The first, however, involves subtracting two numbers that are very close
# together. The result has about the same absolute error as for the second
# solution, but  only 3 significant digits.
> evalf(s[1]);

-.000102
# Using a higher setting of "Digits" gives us more accuracy.
> evalf(s[1], 20); evalf(s[2], 20);

-.0001025535853263

-9750.9998974464146737
# But we can actually get better relative error at the same setting of "Digits" by
# using a different formula that won't involve this catastrophic cancellation.
# Recall that the product of the roots is the constant term c.
> expand(r[1]*r[2]);

c/a
# So the first root should be
> r[1] = c/r[2];

2         1/2
- b + (b  - 4 a c)                 c a
1/2 --------------------- = 2 ---------------------
a                       2         1/2
- b - (b  - 4 a c)
# which won't produce a catastrophic cancellation (when b is positive).  In our
# case:
> evalf(1/s[2]);

-.0001025535853
# In addition to "solve" which finds exact solutions (when possible), Maple has
# "fsolve" which finds floating-point approximations to solutions of equations,
# even those that "solve" can't handle.  The syntax is similar to that of "solve".
# Let's see how it handles our quadratic:
> fsolve(x^2 + 9751*x + 1 = 0);

-9750.999897, -.0001025535853
# It won't work, of course, on equations that have symbolic parameters.
> fsolve(a*x^2 + b*x + c=0, x);
Error, (in fsolve) should use exactly all the indeterminates

# How about our fifth-degree polynomial from last time?
> fsolve(x^5 - x^3 + 2*x + 1);

-.5603499931
# For a polynomial, "fsolve" finds all the real roots.  This polynomial has only
# one.  There should be four other complex roots.  To tell "fsolve" to find both
# real and complex roots, you give it a third input, the word "complex".
> s:=fsolve(x^5 - x^3 + 2*x + 1, x, complex);

s :=  - .7961643382 - .6551644381 I,  - .7961643382 + .6551644381 I,

-.5603499931, 1.076339335 - .7212070670 I, 1.076339335 + .7212070670 I
# Should we trust "fsolve" here?  One way to check a solution is to substitute the
# answer into the original problem.  The "subs" command does this.
> subs(x = s[1], x^5 - x^3 + 2*x + 1);

-9        -9
.6*10   + .4*10   I
# Not bad.  You can't expect more accuracy than that at Digits=10.  To get more
# accuracy, of course, you can use a higher "Digits" setting. Now let's try a
# non-polynomial equation that stumped "solve".
> fsolve(sin(x)=1-x, x);

.5109734294
# This one has only one real solution (since x + sin(x) is an increasing function).
# But even if there were more, "fsolve" would only give us one solution for a
# non-polynomial equation.
> fsolve((x^2-1)/(x+2), x);

-1.000000000
# For the polynomial x^2-1 itself, "fsolve" would give us both solutions.
> fsolve(x^2-1, x);

-1.000000000, 1.
# Of course the solutions of (x^2-1)/(x+2)=0 are the two solutions of x^2-1=0,
# but since the expression was not a polynomial "fsolve" only gave us one
# solution.
# Let's try something more complicated.
> f:= x - 10*sin(x);

f := x - 10 sin(x)
# Before trying "fsolve", let's get a picture of what happens here by plotting a
# graph.
# The "plot" command, in its simplest form, takes two inputs, an expression f(v)
# depending
# on a variable v, and a range for the v axis in the form v = a .. b.  This plots the
# curve
# y = f(v) for v from a to b.  The scale for the y axis is chosen automatically.  If
# you prefer,
# you can set it yourself by including another input y = c .. d.
> plot(f, x=-15 .. 15);
# The plot comes up in a separate window, which can be copied (in the Edit
# menu of the plot window) and pasted into the worksheet (in the Edit menu of
# the worksheet).
#

** Maple V Graphics **

# There appear to be 7 real solutions of f=0, at about x=-8.5, -7, -3, 0, 3, 7 and
# 8.5.  In general you'd have to worry about solutions outside the interval we
# plotted, but here It's not hard to show there are no solutions outside the
# interval [-10, 10], because -1 <= sin(x) <= 1 means x-10 <= f <= x+10.
# By the way, there's a very useful feature of these plot windows: if you click the
# left mouse button with the mouse pointer in the window, Maple shows you the
# X and Y coordinates of the tip of the arrow.  This lets you locate any point on
# the plot with better precision than you would have guessing "by eye", especially
# for points not on the axes.
> fsolve(f);

2.852341894
# This is the solution near 3.  To get other solutions, you can use an option in
# "fsolve" to find roots in a given interval.
> fsolve(f, x=6 .. 8);

7.068174358
> fsolve(f, x=8 .. 9);

8.423203932
# The other roots can be obtained by symmetry (f is an odd function).
#
#

```