What is the difference between float and double

Question

I ve read about the difference between double precision and single precision  However  in most cases  float and double seem to be interchangeable  i e  using one or the other does not seem to affect the results  Is this really the case  When are floats and doubles interchangeable  What are the differences between them

User · Answer

Given a quadratic equation  x2 nbsp  minus  nbsp 4 0000000 nbsp x nbsp   nbsp 3 9999999 nbsp   nbsp 0  the exact roots to 10 significant digits are  r1 nbsp   nbsp 2 000316228 and r2 nbsp   nbsp 1 999683772   Using float and double  we can write a test program    include  lt stdio h gt   include  lt math h gt   void dbl solve double a  double b  double c        double d   b b - 4 0 a c      double sd   sqrt d       double r1    -b   sd     2 0 a       double r2    -b - sd     2 0 a       printf    5f t  5f n   r1  r2      void flt solve float a  float b  float c        float d   b b - 4 0f a c      float sd   sqrtf d       float r1    -b   sd     2 0f a       float r2    -b - sd     2 0f a       printf    5f t  5f n   r1  r2         int main void        float fa   1 0f      float fb   -4 0000000f      float fc   3 9999999f      double da   1 0      double db   -4 0000000      double dc   3 9999999      flt solve fa  fb  fc       dbl solve da  db  dc       return 0        Running the program gives me   2 00000 2 00000 2 00032 1 99968   Note that the numbers aren t large  but still you get cancellation effects using float    In fact  the above is not the best way of solving quadratic equations using either single- or double-precision floating-point numbers  but the answer remains unchanged even if one uses a more stable method

User · Answer

The size of the numbers involved in the float-point calculations is not the most relevant thing  It s the calculation that is being performed that is relevant    In essence  if you re performing a calculation and the result is an irrational number or recurring decimal  then there will be rounding errors when that number is squashed into the finite size data structure you re using  Since double is twice the size of float then the rounding error will be a lot smaller   The tests may specifically use numbers which would cause this kind of error and therefore tested that you d used the appropriate type in your code

User · Answer

Here is what the standard C99  ISO-IEC 9899 6 2 5   10  or C  2003  ISO-IEC 14882-2003 3 1 9   8  standards say      There are three floating point types  float  double  and long double  The type double provides at least as much precision as float  and the type long double provides at least as much precision as double  The set of values of the type float is a subset of the set of values of the type double  the set of values of the type double is a subset of the set of values of the type long double    The C   standard adds      The value representation of floating-point types is implementation-defined    I would suggest having a look at the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic that covers the IEEE floating-point standard in depth  You ll learn about the representation details and you ll realize there is a tradeoff between magnitude and precision  The precision of the floating point representation increases as the magnitude decreases  hence floating point numbers between -1 and 1 are those with the most precision

User · Answer

Unlike an int  whole number   a float have a decimal point  and so can a double  But the difference between the two is that a double is twice as detailed as a float  meaning that it can have double the amount of numbers after the decimal point

User · Answer

Huge difference  As the name implies  a double has 2x the precision of float 1   In general a double has 15 decimal digits of precision  while float has 7  Here s how the number of digits are calculated   double has 52 mantissa bits   1 hidden bit  log 253   log 10    15 95 digits float has 23 mantissa bits   1 hidden bit  log 224   log 10    7 22 digits  This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done  e g  float a   1 f   81  float b   0  for  int i   0  i  lt  729     i      b    a  printf  quot   7g n quot   b      prints 9 000023  while double a   1 0   81  double b   0  for  int i   0  i  lt  729     i      b    a  printf  quot   15g n quot   b      prints 8 99999999999996  Also  the maximum value of float is about 3e38  but double is about 1 7e308  so using float can hit  quot infinity quot   i e  a special floating-point number  much more easily than double for something simple  e g  computing the factorial of 60  During testing  maybe a few test cases contain these huge numbers  which may cause your programs to fail if you use floats   Of course  sometimes  even double isn t accurate enough  hence we sometimes have long double 1   the above example gives 9 000000000000000066 on Mac   but all floating point types suffer from round-off errors  so if precision is very important  e g  money processing  you should use int or a fraction class   Furthermore  don t use    to sum lots of floating point numbers  as the errors accumulate quickly  If you re using Python  use fsum  Otherwise  try to implement the Kahan summation algorithm    1   The C and C   standards do not specify the representation of float  double and long double  It is possible that all three are implemented as IEEE double-precision  Nevertheless  for most architectures  gcc  MSVC  x86  x64  ARM  float is indeed a IEEE single-precision floating point number  binary32   and double is a IEEE double-precision floating point number  binary64

User · Answer

Floats have less precision than doubles  Although you already know  read  What WE Should Know About Floating-Point Arithmetic for better understanding

User · Answer

The built-in comparison operations differ as in when you compare 2 numbers with floating point  the difference in data type  i e  float or double  may result in different outcomes

User · Answer

I just ran into a error that took me forever to figure out and potentially can give you a good example of float precision    include  lt iostream gt   include  lt iomanip gt   int main      for float t 0 t lt 1 t  0 01        std  cout  lt  lt  std  fixed  lt  lt  std  setprecision 6   lt  lt  t  lt  lt  std  endl          The output is  0 000000 0 010000 0 020000 0 030000 0 040000 0 050000 0 060000 0 070000 0 080000 0 090000 0 100000 0 110000 0 120000 0 130000 0 140000 0 150000 0 160000 0 170000 0 180000 0 190000 0 200000 0 210000 0 220000 0 230000 0 240000 0 250000 0 260000 0 270000 0 280000 0 290000 0 300000 0 310000 0 320000 0 330000 0 340000 0 350000 0 360000 0 370000 0 380000 0 390000 0 400000 0 410000 0 420000 0 430000 0 440000 0 450000 0 460000 0 470000 0 480000 0 490000 0 500000 0 510000 0 520000 0 530000 0 540000 0 550000 0 560000 0 570000 0 580000 0 590000 0 600000 0 610000 0 620000 0 630000 0 640000 0 650000 0 660000 0 670000 0 680000 0 690000 0 700000 0 710000 0 720000 0 730000 0 740000 0 750000 0 760000 0 770000 0 780000 0 790000 0 800000 0 810000 0 820000 0 830000 0 839999 0 849999 0 859999 0 869999 0 879999 0 889999 0 899999 0 909999 0 919999 0 929999 0 939999 0 949999 0 959999 0 969999 0 979999 0 989999 0 999999   As you can see after 0 83  the precision runs down significantly   However  if I set up t as double  such an issue won t happen   It took me five hours to realize this minor error  which ruined my program

User · Answer

There are three floating point types   float double long double  A simple Venn diagram will explain about  The set of values of the types

User · Answer

Type float  32 bits long  has a precision of 7 digits  While it may store values with very large or very small range    - 3 4   10 38 or   10 -38   it has only 7 significant digits   Type double  64 bits long  has a bigger range   10   -308  and 15 digits precision   Type long double is nominally 80 bits  though a given compiler OS pairing may store it as 12-16 bytes for alignment purposes  The long double has an exponent that just ridiculously huge and should have 19 digits precision  Microsoft  in their infinite wisdom  limits long double to 8 bytes  the same as plain double   Generally speaking  just use type double when you need a floating point value variable  Literal floating point values used in expressions will be treated as doubles by default  and most of the math functions that return floating point values return doubles  You ll save yourself many headaches and typecastings if you just use double

User · Answer

When using floating point numbers you cannot trust that your local tests will be exactly the same as the tests that are done on the server side  The environment and the compiler are probably different on you local system and where the final tests are run  I have seen this problem many times before in some TopCoder competitions especially if you try to compare two floating point numbers

User · Answer

A double is 64 and single precision  float  is 32 bits  The double has a bigger mantissa  the integer bits of the real number   Any inaccuracies will be smaller in the double

User · Answer

If one works with embedded processing  eventually the underlying hardware  e g  FPGA or some specific processor   microcontroller model  will have float implemented optimally in hardware whereas double will use software routines  So if the precision of a float is enough to handle the needs  the program will execute some times faster with float then double  As noted on other answers  beware of accumulation errors

[c++] What is the difference between float and double?

Examples related to c++

Examples related to c

Examples related to floating-point

Examples related to precision