float vs double precision

Question

The code  float x    3 141592653589793238  double z   3 141592653589793238  printf  x  f n   x   printf  z  f n   z   printf  x  20 18f n   x   printf  z  20 18f n   z     will give you the output  x 3 141593 z 3 141593 x 3 141592741012573242 z 3 141592653589793116   where on the third line of output 741012573242 is garbage and on the fourth line 116 is garbage  Do doubles always have 16 significant figures while floats always have 7 significant figures  Why don t doubles have 14 significant figures

User · Answer

Do doubles always have 16 significant figures while floats always have 7 significant figures?

No. Doubles always have 53 significant bits and floats always have 24 significant bits (except for denormals, infinities, and NaN values, but those are subjects for a different question). These are binary formats, and you can only speak clearly about the precision of their representations in terms of binary digits (bits).

This is analogous to the question of how many digits can be stored in a binary integer: an unsigned 32 bit integer can store integers with up to 32 bits, which doesn't precisely map to any number of decimal digits: all integers of up to 9 decimal digits can be stored, but a lot of 10-digit numbers can be stored as well.

Why don't doubles have 14 significant figures?

The encoding of a double uses 64 bits (1 bit for the sign, 11 bits for the exponent, 52 explicit significant bits and one implicit bit), which is double the number of bits used to represent a float (32 bits).

User · Answer

It s usually based on significant figures of both the exponent and significand in base 2  not base 10  From what I can tell in the C99 standard  however  there is no specified precision for floats and doubles  other than the fact that 1 and 1   1E-5   1   1E-7 are distinguishable  float and double repsectively    However  the number of significant figures is left to the implementer  as well as which base they use internally  so in other words  an implementation could decide to make it based on 18 digits of precision in base 3    1   If you need to know these values  the constants FLT RADIX and FLT MANT DIG  and DBL MANT DIG   LDBL MANT DIG  are defined in float h   The reason it s called a double is because the number of bytes used to store it is double the number of a float  but this includes both the exponent and significand   The IEEE 754 standard  used by most compilers  allocate relatively more bits for the significand than the exponent  23 to 9 for float vs  52 to 12 for double   which is why the precision is more than doubled   1  Section 5 2 4 2 2   http   www open-std org jtc1 sc22 wg14 www docs n1256 pdf

User · Answer

Floating point numbers in C use IEEE 754 encoding   This type of encoding uses a sign  a significand  and an exponent   Because of this encoding  many numbers will have small changes to allow them to be stored   Also  the number of significant digits can change slightly since it is a binary representation  not a decimal one   Single precision  float  gives you 23 bits of significand  8 bits of exponent  and 1 sign bit   Double precision  double  gives you 52 bits of significand  11 bits of exponent  and 1 sign bit

User · Answer

A float has 23 bits of precision  and a double has 52

User · Answer

float   23 bits of significand  8 bits of exponent  and 1 sign bit   double   52 bits of significand  11 bits of exponent  and 1 sign bit

User · Answer

It s not exactly double precision because of how IEEE 754 works  and because binary doesn t really translate well to decimal  Take a look at the standard if you re interested

[c] 'float' vs. 'double' precision

Examples related to c

Examples related to floating-point