Ranges of floating point datatype in C

Question

I am reading a C book  talking about ranges of floating point  the author gave the table   Type     Smallest Positive Value  Largest value      Precision                                                                float    1 17549 x 10 -38         3 40282 x 10 38    6 digits double   2 22507 x 10 -308        1 79769 x 10 308   15 digits   I dont know where the numbers in the columns Smallest Positive and Largest Value come from

User · Answer

A 32 bit floating point number has 23   1 bits of mantissa and an 8 bit exponent  -126 to 127 is used though  so the largest number you can represent is    1   1   2       1    2   23      2   127      2   23   2   23        1     2    127 - 23       2   24 - 1     2   104     3 4e38

User · Answer

Infinity  NaN and subnormals  These are important caveats that no other answer has mentioned so far   First read this introduction to IEEE 754 and subnormal numbers  What is a subnormal floating point number   Then  for single precision floats  32-bit     IEEE 754 says that if the exponent is all ones  0xFF    255   then it represents either NaN or Infinity   This is why the largest non-infinite number has exponent 0xFE    254 and not 0xFF   Then with the bias  it becomes   254 - 127    127  FLT MIN is the smallest normal number  But there are smaller subnormal ones  Those take up the -127 exponent slot    All asserts of the following program pass on Ubuntu 18 04 amd64    include  lt assert h gt   include  lt float h gt   include  lt inttypes h gt   include  lt math h gt   include  lt stdlib h gt   include  lt stdio h gt   float float from bytes      uint32 t sign      uint32 t exponent      uint32 t fraction         uint32 t bytes      bytes   0      bytes    sign      bytes  lt  lt   8      bytes    exponent      bytes  lt  lt   23      bytes    fraction      return   float   amp bytes     int main void           All 1 exponent and non-0 fraction means NaN         There are of course many possible representations         and some have special semantics such as signalling vs not              assert isnan float from bytes 0  0xFF  1         assert isnan NAN        printf  nan                     e n   NAN           All 1 exponent and 0 fraction means infinity         assert INFINITY    float from bytes 0  0xFF  0        assert isinf INFINITY        printf  infinity                e n   INFINITY           ANSI C defines FLT MAX as the largest non-infinite number         assert FLT MAX    0x1 FFFFFEp127f          Not 0xFF because that is infinite         assert FLT MAX    float from bytes 0  0xFE  0x7FFFFF        assert  isinf FLT MAX        assert FLT MAX  lt  INFINITY       printf  largest non infinite    e n   FLT MAX           ANSI C defines FLT MIN as the smallest non-subnormal number         assert FLT MIN    0x1 0p-126f       assert FLT MIN    float from bytes 0  1  0        assert isnormal FLT MIN        printf  smallest normal         e n   FLT MIN           The smallest non-zero subnormal number         float smallest subnormal   float from bytes 0  0  1       assert smallest subnormal    0x0 000002p-126f       assert 0 0f  lt  smallest subnormal       assert  isnormal smallest subnormal        printf  smallest subnormal      e n   smallest subnormal        return EXIT SUCCESS      GitHub upstream   Compile and run with   gcc -ggdb3 -O0 -std c11 -Wall -Wextra -Wpedantic -Werror -o subnormal out subnormal c   subnormal out   Output   nan                    nan infinity               inf largest non infinite   3 402823e 38 smallest normal        1 175494e-38 smallest subnormal     1 401298e-45

User · Answer

As dasblinkenlight already answered  the numbers come from the way that floating point numbers are represented in IEEE-754  and Andreas has a nice breakdown of the maths    However - be careful that the precision of floating point numbers isn t exactly 6 or 15 significant decimal digits as the table suggests  since the precision of IEEE-754 numbers depends on the number of significant binary digits    float has 24 significant binary digits - which depending on the number represented translates to 6-8 decimal digits of precision  double has 53 significant binary digits  which is approximately 15 decimal digits    Another answer of mine has further explanation if you re interested

User · Answer

It s a consequence of the size of the exponent part of the type  as in IEEE 754 for example  You can examine the sizes with FLT MAX  FLT MIN  DBL MAX  DBL MIN in float h

User · Answer

The values for the float data type come from having 32 bits in total to represent the number which are allocated like this   1 bit  sign bit  8 bits  exponent p  23 bits  mantissa  The exponent is stored as p   BIAS where the BIAS is 127  the mantissa has 23 bits and a 24th hidden bit that is assumed 1   This hidden bit is the most significant bit  MSB  of the mantissa and the exponent must be chosen so that it is 1   This means that the smallest number you can represent is 01000000000000000000000000000000 which is 1x2 -126   1 17549435E-38    The largest value is 011111111111111111111111111111111  the mantissa is 2    1 - 1 65536  and the exponent is 127 which gives  1 - 1   65536    2   128   3 40277175E38   The same principles apply to double precision except the bits are   1 bit  sign bit  11 bits  exponent bits  52 bits  mantissa bits  BIAS  1023  So technically the limits come from the IEEE-754 standard for representing floating point numbers and the above is how those limits come about

User · Answer

These numbers come from the IEEE-754 standard  which defines the standard representation of floating point numbers  Wikipedia article at the link explains how to arrive at these ranges knowing the number of bits used for the signs  mantissa  and the exponent

[c] Ranges of floating point datatype in C?

Examples related to c

Examples related to floating-point

Examples related to ieee-754