Floating point inaccuracy examples

Question

How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate  Do you have a favourite example or anecdote which seems to get the idea across much better than an precise  but dry  explanation  How is this taught in Computer Science classes

User · Accepted Answer

There are basically two major pitfalls people stumble in with floating-point numbers.

The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
```
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
```
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.

(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
```
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
```
But you can “amplify” the representation error by repeatedly adding the numbers together:
```
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
```
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent ¹/₃ only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.

Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r₁, r₂, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that r_min is the minimum possible value of r that results in f and r_max the maximum possible value of r for which this holds, then you got an interval [r_min, r_max] where any number in that interval can be your actual number r.

Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.

That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.

User · Answer

Here is my simple understanding   Problem  The value 0 45 cannot be accurately be represented by a float and is rounded up to 0 450000018  Why is that   Answer  An int value of 45 is represented by the binary value 101101   In order to make the value 0 45 it would be accurate if it you could take 45 x 10 -2    45   10 2    But that   s impossible because you must use the base 2 instead of 10   So the closest to 10 2   100 would be 128   2 7  The total number of bits you need is 9   6 for the value 45  101101    3 bits for the value 7  111   Then the value 45 x 2 -7   0 3515625  Now you have a serious inaccuracy problem  0 3515625 is not nearly close to 0 45   How do we improve this inaccuracy  Well we could change the value 45 and 7 to something else   How about 460 x 2 -10   0 44921875  You are now using 9 bits for 460 and 4 bits for 10   Then it   s a bit closer but still not that close  However if your initial desired value was 0 44921875 then you would get an exact match with no approximation   So the formula for your value would be X   A x 2 B  Where A and B are integer values positive or negative   Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited  For float you have a total number of 32  Double has 64 and Decimal has 128

User · Answer

In python    gt  gt  gt  1 0   10 0 10000000000000001   Explain how some fractions cannot be represented precisely in binary  Just like some fractions  like 1 3  cannot be represented precisely in base 10

User · Answer

A cute piece of numerical weirdness may be observed if one converts 9999999 4999999999 to a float and back to a double   The result is reported as 10000000  even though that value is obviously closer to 9999999  and even though 9999999 499999999 correctly rounds to 9999999

User · Answer

How s this for an explantation to the layman   One way computers represent numbers is by counting discrete units  These are digital computers   For whole numbers  those without a fractional part  modern digital computers count powers of two  1  2  4  8       Place value  binary digits  blah   blah  blah   For fractions  digital computers count inverse powers of two  1 2  1 4  1 8       The problem is that many numbers can t be represented by a sum of a finite number of those inverse powers   Using more place values  more bits  will increase the precision of the representation of those  problem  numbers  but never get it exactly because it only has a limited number of bits   Some numbers can t be represented with an infinite number of bits   Snooze     OK  you want to measure the volume of water in a container  and you only have 3 measuring cups  full cup  half cup  and quarter cup   After counting the last full cup  let s say there is one third of a cup remaining   Yet you can t measure that because it doesn t exactly fill any combination of available cups   It doesn t fill the half cup  and the overflow from the quarter cup is too small to fill anything  So you have an error - the difference between 1 3 and 1 4   This error is compounded when you combine it with errors from other measurements

User · Answer

Show them that the base-10 system suffers from exactly the same problem   Try to represent 1 3 as a decimal representation in base 10  You won t be able to do it exactly   So if you write  0 3333   you will have a reasonably exact representation for many use cases   But if you move that back to a fraction  you will get  3333 10000   which is not the same as  1 3    Other fractions  such as 1 2 can easily be represented by a finite decimal representation in base-10   0 5   Now base-2 and base-10 suffer from essentially the same problem  both have some numbers that they can t represent exactly   While base-10 has no problem representing 1 10 as  0 1  in base-2 you d need an infinite representation starting with  0 000110011

User · Answer

Another example  in C  printf      20f  n   3 6     incredibly gives   3 60000000000000008882

[floating-point] Floating point inaccuracy examples

Examples related to floating-point

Examples related to floating-accuracy