DADDY BOB'S COMPUTER Q & A
FLOATING POINT ENUMERATION
To download this article in
MS Word format, click HERE
Just about everything a
computer does it does with just 2 numbers,
0’s, and 1’s. That is why it is called
binary computing. So, if the computer can
only use 2 numbers, how does it do
complicated mathematical computations? They
use something called “Floating Point”.
Earlier computers had a separate chip that
handles this, but since the 80486, both
functions have been combined into one chip.
Explaining how the floating point system
works is a little complicated. I have tried
to make it as simple as I could, but after
reading over it, I’m not too sure I
I’ll start off with some
Of or relating to a system of
numeration having 2 as its base.
man-tis-sa (man-tis)n. Mathematics. The
decimal part of a logarithm. In the
logarithm 2.95424, the mantissa is 0.95424.
(log-rith'm, log'-)n. Mathematics.
The power to which a base, usually 10, but
in this case 2, must be raised to produce a
given number. The kinds most often used are
the common logarithm and the natural
(ik-sponnt, eksponnt)n. A
number or symbol, as 3 in (x + y)3,
placed to the right of and above another
number, symbol, or expression, denoting the
power to which that number, symbol, or
expression is to be raised. In this sense,
also called power.
Floating point numbers are
numbers that are carried out to a certain
decimal position (such as 2.17986). They are
stored in three parts: the sign (plus or
minus), the significant or mantissa, which
is the digits that are meaningful, and the
exponent or order of magnitude of the
significant, which determines the place to
which the decimal point floats. Floating
point numbers are binary (expressed in
powers of 2).
In computers, FLOPS are
floating-point operations per second.
Floating-point is, "a method of encoding
real numbers within the limits of finite
precision available on computers." Using
floating-point encoding, extremely long
numbers can be handled relatively easily. A
floating-point number is expressed as a
basic number or mantissa, an exponent, and a
number base or radix (which is often
assumed). The number base is usually ten but
may also be 2. Floating-point operations
require computers with floating-point
registers. The computation of floating-point
numbers is often required in scientific or
real-time processing applications and FLOPS
is a common measure for any computer that
runs these applications.
In larger computers and
parallel processing, computer operations can
be measured in megaflops, gigaflops, teraflops,
petaflops, exaflops, zettaflops and
With one-digit mantissas
there are only nine valid mantissas. Because
there are three possible choices for an
exponent, however, there are 27 different
positive floating-point numbers. For
example, suppose that we choose 3 as a
mantissa. We can then make the numbers .3
(using an exponent of -1), 3 (using an
exponent of 0), and 30 (using an exponent of
The floating-point numbers
between .1 and .9 are separated by intervals
of .1; the floating-point numbers between 1
and 9 are separated by intervals of 1; and
the floating-point numbers between 10 and 90
are separated by intervals of 10.
Increasing the range of the
exponents increases the range of the
positive floating-point numbers. In addition
to the 27 numbers above, we also can have
nine numbers between .01 and .09 and nine
numbers between 100 and 900. But notice that
although we have enlarged the range of our
floating-point numbers, we have not changed
their spacing. Between .1 and 90, the
numbers are exactly as they were before.
We can increase the maximum
mantissa size to 2. If we do, the range of
the numbers will change only slightly.
Before they ranged from .01 to 900; now they
range from .01 to 990. With one-digit
mantissas, the possible mantissas ranged
from 1 to 9 in increments of 1. With
two-digit mantissas, the possible mantissas
range from 1 to 9.9 in increments of .1.
The gaps between the positive
floating-point numbers are now ten times
smaller. For example, with one-digit
mantissas the numbers between 10 and 90
occurred in steps of 10; now they occur in
steps of 1.
Between 0 and .01, there is a
noticeable gap where there are no
floating-point numbers whatsoever. This is
called the hole at zero. No matter how
small an exponent we allow, there will
always be a range of numbers close to zero
for which no good approximation exists in
our floating-point number system. Similarly,
no matter how large an exponent we allow,
there will always be a range of large
numbers for which no good approximation
Underflow, and Round off Error
Now let's turn our attention
to what happens when we do arithmetic
calculations with floating-point numbers.
Let's return to a simple example with
one-digit mantissas and exponents ranging
from -1 to 1.
As an example, let's take the
sum of the numbers 5 + 7, which should equal
12. However, in Floating point with only one
mantissa, it equals only 10. Because 12 is
not a valid floating-point number in the
current example, it is rounded to the
nearest number that is a floating-point
number. In this case, the closest
floating-point number to the sum is 10. The
error in this result that is caused by
rounding is called round off error.
Now if we increase the
mantissa size from 1 to 2, the sum will be
12 as it should be, and no rounding was
performed. This is because, in the new
floating-point system with two digits of
mantissa, 12 is a valid floating-point
Now we will calculate the sum
of 50 and 70. The true sum, 120, is too
large to fit. When the result of a
calculation is too large to represent in a
floating-point number system, we say that
overflow has occurred. This happens when a
number's exponent is larger than allowed.
For example, the exponent of 120 is 2 (in
other words, 120 = 1.2e2), and in the
current floating-point example no exponents
larger than 1 are allowed.
If we increase the mantissa
size from 1 to 2 the result is still too
large to represent as a floating-point
number. The mantissa size has no effect on
overflow errors; only on round off errors.
So, we will increase the
maximum exponent from 1 to 2. This time the
correct sum will be calculated. This, of
course, is because the exponent (2) of the
sum is now within the maximum allowed.
Just as overflow occurs when
an exponent is too big, underflow occurs
when an exponent is too small. To see this,
we will calculate the quotient of 1 and 50.
This time the quotient (0.02) is too small,
which is an underflow error. The problem is
that the exponent of 0.02 is -2, which is
less than the permitted -1. If you decrease
the minimum exponent to -2, the underflow
error will be eliminate.
To download this article in
MS Word format, click HERE