Daddy Bob

DADDY BOB'S COMPUTER Q & A

 

FLOATING POINT ENUMERATION

To download this article in
 MS Word format, click HERE

Just about everything a computer does it does with just 2 numbers, 0’s, and 1’s. That is why it is called binary computing. So, if the computer can only use 2 numbers, how does it do complicated mathematical computations? They use something called “Floating Point”. Earlier computers had a separate chip that handles this, but since the 80486, both functions have been combined into one chip.  Explaining how the floating point system works is a little complicated. I have tried to make it as simple as I could, but after reading over it, I’m not too sure I succeeded.

I’ll start off with some basic definitions;

bi-na-ry (bin-re)adj. Of or relating to a system of numeration having 2 as its base.

man-tis-sa (man-tis)n. Mathematics. The decimal part of a logarithm. In the logarithm 2.95424, the mantissa is 0.95424.

log-a-rithm (log-rith'm, log'-)n. Mathematics. The power to which a base, usually 10, but in this case 2, must be raised to produce a given number. The kinds most often used are the common logarithm and the natural logarithm.

ex-po-nent (ik-sponnt, eksponnt)n.  A number or symbol, as 3 in (x + y)3, placed to the right of and above another number, symbol, or expression, denoting the power to which that number, symbol, or expression is to be raised. In this sense, also called power.

Floating point numbers are numbers that are carried out to a certain decimal position (such as 2.17986). They are stored in three parts: the sign (plus or minus), the significant or mantissa, which is the digits that are meaningful, and the exponent or order of magnitude of the significant, which determines the place to which the decimal point floats. Floating point numbers are binary (expressed in powers of 2).

In computers, FLOPS are floating-point operations per second. Floating-point is, "a method of encoding real numbers within the limits of finite precision available on computers." Using floating-point encoding, extremely long numbers can be handled relatively easily. A floating-point number is expressed as a basic number or mantissa, an exponent, and a number base or radix (which is often assumed). The number base is usually ten but may also be 2. Floating-point operations require computers with floating-point registers. The computation of floating-point numbers is often required in scientific or real-time processing applications and FLOPS is a common measure for any computer that runs these applications.

In larger computers and parallel processing, computer operations can be measured in megaflops, gigaflops, teraflops, petaflops, exaflops, zettaflops and yattaflops.

 Mantissas and Exponents

With one-digit mantissas there are only nine valid mantissas. Because there are three possible choices for an exponent, however, there are 27 different positive floating-point numbers. For example, suppose that we choose 3 as a mantissa. We can then make the numbers .3 (using an exponent of -1), 3 (using an exponent of 0), and 30 (using an exponent of 1).

The floating-point numbers between .1 and .9 are separated by intervals of .1; the floating-point numbers between 1 and 9 are separated by intervals of 1; and the floating-point numbers between 10 and 90 are separated by intervals of 10.

Increasing the range of the exponents increases the range of the positive floating-point numbers. In addition to the 27 numbers above, we also can have nine numbers between .01 and .09 and nine numbers between 100 and 900. But notice that although we have enlarged the range of our floating-point numbers, we have not changed their spacing. Between .1 and 90, the numbers are exactly as they were before.

We can increase the maximum mantissa size to 2. If we do, the range of the numbers will change only slightly. Before they ranged from .01 to 900; now they range from .01 to 990.  With one-digit mantissas, the possible mantissas ranged from 1 to 9 in increments of 1. With two-digit mantissas, the possible mantissas range from 1 to 9.9 in increments of .1.

The gaps between the positive floating-point numbers are now ten times smaller. For example, with one-digit mantissas the numbers between 10 and 90 occurred in steps of 10; now they occur in steps of 1.

Between 0 and .01, there is a noticeable gap where there are no floating-point numbers whatsoever. This is called the hole at zero.  No matter how small an exponent we allow, there will always be a range of numbers close to zero for which no good approximation exists in our floating-point number system. Similarly, no matter how large an exponent we allow, there will always be a range of large numbers for which no good approximation exists.

 Overflow, Underflow, and Round off Error

Now let's turn our attention to what happens when we do arithmetic calculations with floating-point numbers. Let's return to a simple example with one-digit mantissas and exponents ranging from -1 to 1.

As an example, let's take the sum of the numbers 5 + 7, which should equal 12. However, in Floating point with only one mantissa, it equals only 10. Because 12 is not a valid floating-point number in the current example, it is rounded to the nearest number that is a floating-point number. In this case, the closest floating-point number to the sum is 10. The error in this result that is caused by rounding is called round off error.

Now if we increase the mantissa size from 1 to 2, the sum will be 12 as it should be, and no rounding was performed. This is because, in the new floating-point system with two digits of mantissa, 12 is a valid floating-point number.

Now we will calculate the sum of 50 and 70. The true sum, 120, is too large to fit. When the result of a calculation is too large to represent in a floating-point number system, we say that overflow has occurred. This happens when a number's exponent is larger than allowed. For example, the exponent of 120 is 2 (in other words, 120 = 1.2e2), and in the current floating-point example no exponents larger than 1 are allowed.

If we increase the mantissa size from 1 to 2 the result is still too large to represent as a floating-point number. The mantissa size has no effect on overflow errors; only on round off errors.

So, we will increase the maximum exponent from 1 to 2. This time the correct sum will be calculated. This, of course, is because the exponent (2) of the sum is now within the maximum allowed.

Just as overflow occurs when an exponent is too big, underflow occurs when an exponent is too small. To see this, we will calculate the quotient of 1 and 50. This time the quotient (0.02) is too small, which is an underflow error. The problem is that the exponent of 0.02 is -2, which is less than the permitted -1. If you decrease the minimum exponent to -2, the underflow error will be eliminate.

To download this article in
 MS Word format, click HERE

Disclaimer:

The materials in this site are provided "as is" and without warranties of any kind, either express or implied. To the fullest extent permissible pursuant to applicable law, I disclaim all warranties, express or implied, including, but not limited to, implied warranties of merchantability and fitness for a particular purpose. I do not warrant that the functions contained in the materials on this site will be uninterrupted or error-free, that defects will be corrected, or that any site or the servers that make such materials available are free of viruses, spyware, adware, or other harmful components, although all efforts have been made to assure that they are. I do not warrant or make any representations regarding the use or the results of the use of the materials on this site in terms of their correctness, accuracy, reliability, or otherwise. You assume the entire cost of all necessary servicing, repair, or correction. Applicable law may not allow the exclusion of implied warranties, so the above exclusion may not apply to you.