What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision ?
This question is related to
types
floating-point
double
ieee-754
You need to look at the size of the mantissa. An IEEE 754 64 bit floating point number (which has 52 bits, plus 1 implied) can exactly represent integers with an absolute value of less than or equal to 2^53.
It is true that, for 64-bit IEEE754 double, all integers up to 9007199254740992 == 2^53 can be exactly represented.
However, it is also worth mentioning that all representable numbers beyond 4503599627370496 == 2^52 are integers. Beyond 2^52 it becomes meaningless to test whether or not they are integers, because they are all implicitly rounded to a nearby representable value.
In the range 2^51 to 2^52, the only non-integer values are the midpoints ending with ".5", meaning any integer test after a calculation must be expected to yield at least 50% false answers.
Below 2^51 we also have ".25" and ".75", so comparing a number with its rounded counterpart in order to determine if it may be integer or not starts making some sense.
TLDR: If you want to test whether a calculated result may be integer, avoid numbers larger than 2251799813685248 == 2^51
DECIMAL_DIG
from <float.h>
should give at least a reasonable approximation of that. Since that deals with decimal digits, and it's really stored in binary, you can probably store something a little larger without losing precision, but exactly how much is hard to say. I suppose you should be able to figure it out from FLT_RADIX
and DBL_MANT_DIG
, but I'm not sure I'd completely trust the result.
Wikipedia has this to say in the same context with a link to IEEE 754:
On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit.
2^53 is just over 9 * 10^15.
9007199254740992 (that's 9,007,199,254,740,992) with no guarantees :)
Program
#include <math.h>
#include <stdio.h>
int main(void) {
double dbl = 0; /* I started with 9007199254000000, a little less than 2^53 */
while (dbl + 1 != dbl) dbl++;
printf("%.0f\n", dbl - 1);
printf("%.0f\n", dbl);
printf("%.0f\n", dbl + 1);
return 0;
}
Result
9007199254740991 9007199254740992 9007199254740992
The largest integer that can be represented in IEEE 754 double (64-bit) is the same as the largest value that the type can represent, since that value is itself an integer.
This is represented as 0x7FEFFFFFFFFFFFFF
, which is made up of:
0x7FE
(2046 which represents 1023 after the bias is subtracted) rather than 0x7FF
(2047 which indicates a NaN
or infinity).0xFFFFFFFFFFFFF
which is 52 bits all 1.In binary, the value is the implicit 1 followed by another 52 ones from the mantissa, then 971 zeros (1023 - 52 = 971) from the exponent.
The exact decimal value is:
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
This is approximately 1.8 x 10308.
1.7976931348623157 × 10^308
http://en.wikipedia.org/wiki/Double_precision_floating-point_format
Source: Stackoverflow.com