What is the most effective way for float and double comparison

Question

What would be the most efficient way to compare two double or two float values   Simply doing this is not correct   bool CompareDoubles1  double A  double B       return A    B      But something like   bool CompareDoubles2  double A  double B        diff   A - B     return  diff  lt  EPSILON   amp  amp   -diff  lt  EPSILON       Seems to waste processing   Does anyone know a smarter float comparer

User · Answer

There are actually cases in numerical software where you want to check whether two floating point numbers are exactly equal. I posted this on a similar question

https://stackoverflow.com/a/10973098/1447411

So you can not say that "CompareDoubles1" is wrong in general.

User · Answer

As others have pointed out  using a fixed-exponent epsilon  such as 0 0000001  will be useless for values away from the epsilon value  For example  if your two values are 10000 000977 and 10000  then there are NO 32-bit floating-point values between these two numbers -- 10000 and 10000 000977 are as close as you can possibly get without being bit-for-bit identical  Here  an epsilon of less than 0 0009 is meaningless  you might as well use the straight equality operator   Likewise  as the two values approach epsilon in size  the relative error grows to 100    Thus  trying to mix a fixed point number such as 0 00001 with floating-point values  where the exponent is arbitrary  is a pointless exercise  This will only ever work if you can be assured that the operand values lie within a narrow domain  that is  close to some specific exponent   and if you properly select an epsilon value for that specific test  If you pull a number out of the air   Hey  0 00001 is small  so that must be good     you re doomed to numerical errors  I ve spent plenty of time debugging bad numerical code where some poor schmuck tosses in random epsilon values to make yet another test case work   If you do numerical programming of any kind and believe you need to reach for fixed-point epsilons  READ BRUCE S ARTICLE ON COMPARING FLOATING-POINT NUMBERS   Comparing Floating Point Numbers

User · Answer

This is another solution with lambda    include  lt cmath gt   include  lt limits gt   auto Compare      float a  float b  float epsilon   std  numeric limits lt float gt   epsilon     return  std  fabs a - b   lt   epsilon

User · Answer

Found another interesting implementation on  https   en cppreference com w cpp types numeric limits epsilon   include  lt cmath gt   include  lt limits gt   include  lt iomanip gt   include  lt iostream gt   include  lt type traits gt   include  lt algorithm gt     template lt class T gt  typename std  enable if lt  std  numeric limits lt T gt   is integer  bool gt   type     almost equal T x  T y  int ulp           the machine epsilon has to be scaled to the magnitude of the values used        and multiplied by the desired precision in ULPs  units in the last place      return std  fabs x-y   lt   std  numeric limits lt T gt   epsilon     std  fabs x y    ulp            unless the result is subnormal            std  fabs x-y   lt  std  numeric limits lt T gt   min       int main         double d1   0 2      double d2   1   std  sqrt 5    std  sqrt 5       std  cout  lt  lt  std  fixed  lt  lt  std  setprecision 20            lt  lt   d1    lt  lt  d1  lt  lt    nd2    lt  lt  d2  lt  lt    n        if d1    d2          std  cout  lt  lt   d1    d2 n       else         std  cout  lt  lt   d1    d2 n        if almost equal d1  d2  2           std  cout  lt  lt   d1 almost equals d2 n       else         std  cout  lt  lt   d1 does not almost equal d2 n

User · Answer

return fabs a - b   lt  EPSILON    This is fine if    the order of magnitude of your inputs don t change much very small numbers of opposite signs can be treated as equal   But otherwise it ll lead you into trouble   Double precision numbers have a resolution of about 16 decimal places   If the two numbers you are comparing are larger in magnitude than EPSILON 1 0E16  then you might as well be saying   return a  b    I ll examine a different approach that assumes you need to worry about the first issue and assume the second is fine your application   A solution would be something like    define VERYSMALL   1 0E-150   define EPSILON     1 0E-8  bool AreSame double a  double b        double absDiff   fabs a - b       if  absDiff  lt  VERYSMALL                return true             double maxAbs    max fabs a  - fabs b        return  absDiff maxAbs   lt  EPSILON      This is expensive computationally  but it is sometimes what is called for   This is what we have to do at my company because we deal with an engineering library and inputs can vary by a few dozen orders of magnitude   Anyway  the point is this  and applies to practically every programming problem   Evaluate what your needs are  then come up with a solution to address your needs -- don t assume the easy answer will address your needs   If after your evaluation you find that fabs a-b   lt  EPSILON will suffice  perfect -- use it   But be aware of its shortcomings and other possible solutions too

User · Answer

You cannot compare two double with a fixed EPSILON  Depending on the value of double  EPSILON varies   A better double comparison would be   bool same double a  double b      return std  nextafter a  std  numeric limits lt double gt   lowest     lt   b      amp  amp  std  nextafter a  std  numeric limits lt double gt   max     gt   b

User · Answer

I d be very wary of any of these answers that involves floating point subtraction  e g   fabs a-b   lt  epsilon    First  the floating point numbers become more sparse at greater magnitudes and at high enough magnitudes where the spacing is greater than epsilon  you might as well just be doing a    b   Second  subtracting two very close floating point numbers  as these will tend to be  given that you re looking for near equality  is exactly how you get catastrophic cancellation   While not portable  I think grom s answer does the best job of avoiding these issues

User · Answer

The portable way to get epsilon in C   is   include  lt limits gt  std  numeric limits lt double gt   epsilon     Then the comparison function becomes   include  lt cmath gt   include  lt limits gt   bool AreSame double a  double b        return std  fabs a - b   lt  std  numeric limits lt double gt   epsilon

User · Answer

You have to do this processing for floating point comparison  since float s can t be perfectly compared like integer types  Here are functions for the various comparison operators  Floating Point Equal to      I also prefer the subtraction technique rather than relying on fabs   or abs    but I d have to speed profile it on various architectures from 64-bit PC to ATMega328 microcontroller  Arduino  to really see if it makes much of a performance difference  So  let s forget about all this absolute value stuff and just do some subtraction and comparison  Modified from Microsoft s example here       brief      See if two floating point numbers are approximately equal       param in   a        number 1      param in   b        number 2      param in   epsilon  A small value such that if the difference between the two numbers is                          smaller than this they can safely be considered to be equal       return     true if the two numbers are approximately equal  and false otherwise bool is float eq float a  float b  float epsilon        return   a - b   lt  epsilon   amp  amp    b - a   lt  epsilon     bool is double eq double a  double b  double epsilon        return   a - b   lt  epsilon   amp  amp    b - a   lt  epsilon      Example usage  constexpr float EPSILON   0 0001     1e-4 is float eq 1 0001  0 99998  EPSILON    I m not entirely sure  but it seems to me some of the criticisms of the epsilon-based approach  as described in the comments below this highly-upvoted answer  can be resolved by using a variable epsilon  scaled according to the floating point values being compared  like this  float a   1 0001  float b   0 99998  float epsilon   std  max std  fabs a   std  fabs b     1e-4   is float eq a  b  epsilon    This way  the epsilon value scales with the floating point values and is therefore never so small of a value that it becomes insignificant  For completeness  let s add the rest  Greater than   gt    and less than   lt         brief      See if floating point number  a  is  gt   b       param in   a        number 1      param in   b        number 2      param in   epsilon  a small value such that if  a  is  gt   b  by this amount   a  is considered                 to be definitively  gt   b       return     true if  a  is definitively  gt   b   and false otherwise bool is float gt float a  float b  float epsilon        return a  gt  b   epsilon    bool is double gt double a  double b  double epsilon        return a  gt  b   epsilon          brief      See if floating point number  a  is  lt   b       param in   a        number 1      param in   b        number 2      param in   epsilon  a small value such that if  a  is  lt   b  by this amount   a  is considered                 to be definitively  lt   b       return     true if  a  is definitively  lt   b   and false otherwise bool is float lt float a  float b  float epsilon        return a  lt  b - epsilon    bool is double lt double a  double b  double epsilon        return a  lt  b - epsilon     Greater than or equal to   gt     and less than or equal to   lt         brief      Returns true if  a  is definitively  gt    b   and false otherwise bool is float ge float a  float b  float epsilon        return a  gt  b - epsilon    bool is double ge double a  double b  double epsilon        return a  gt  b - epsilon          brief      Returns true if  a  is definitively  lt    b   and false otherwise bool is float le float a  float b  float epsilon        return a  lt  b   epsilon    bool is double le double a  double b  double epsilon        return a  lt  b   epsilon     See also   The macro forms of some of the functions above in my repo here  utilities h   UPDATE 29 NOV 2020  it s a work-in-progress  and I m going to make it a separate answer when ready  but I ve produced a better  scaled-epsilon version of all of the functions in C in this file here  utilities c  Take a look    Additional reading I need to do now have done  Floating-point tolerances revisited  by Christer Ericson

User · Answer

The following way you are comparing system-dependent  quot string representation quot  of two values  floats in your case   Similarly to when you print them both and see with your eyes if they look same   include  lt iostream gt   include  lt string gt   bool floatApproximatelyEquals const float a  const float b        return std  to string a     std  to string b      Proc   number factor  or power  is effectively taken into account  so it doesn t matter whether numbers are like 1 2  or 1 2e345678  or 0 00000123 or 1 2e-345678  the problem you normally face with absolute epsilons   Cons   you don t control precision to which you  quot round quot  numbers  F e  on my system it s 6 digits after first significant  non-zero  one in decimal representation of the number  which is good enough for most of my cases

User · Answer

Comparing floating point numbers for depends on the context   Since even changing the order of operations can produce different results  it is important to know how  equal  you want the numbers to be   Comparing floating point numbers by Bruce Dawson is a good place to start when looking at floating point comparison     The following definitions are from The art of computer programming by Knuth    bool approximatelyEqual float a  float b  float epsilon        return fabs a - b   lt      fabs a   lt  fabs b    fabs b    fabs a     epsilon      bool essentiallyEqual float a  float b  float epsilon        return fabs a - b   lt      fabs a   gt  fabs b    fabs b    fabs a     epsilon      bool definitelyGreaterThan float a  float b  float epsilon        return  a - b   gt     fabs a   lt  fabs b    fabs b    fabs a     epsilon      bool definitelyLessThan float a  float b  float epsilon        return  b - a   gt     fabs a   lt  fabs b    fabs b    fabs a     epsilon       Of course  choosing epsilon depends on the context  and determines how equal you want the numbers to be     Another method of comparing floating point numbers is to look at the ULP  units in last place  of the numbers   While not dealing specifically with comparisons  the paper What every computer scientist should know about floating point numbers is a good resource for understanding how floating point works and what the pitfalls are  including what ULP is

User · Answer

Be extremely careful using any of the other suggestions  It all depends on context    I have spent a long time tracing a bugs in a system that presumed a  b if  a-b  lt epsilon  The underlying problems were    The implicit presumption in an algorithm that if a  b and b  c then a  c   Using the same epsilon for lines measured in inches and lines measured in mils   001 inch   That is a  b but 1000a  1000b   This is why AlmostEqual2sComplement asks for the epsilon or max ULPS   The use of the same epsilon for both the cosine of angles and the length of lines  Using such a compare function to sort items in a collection   In this case using the builtin C   operator    for doubles produced correct results     Like I said  it all depends on context and the expected size of a and b   BTW  std  numeric limits lt double gt   epsilon   is the  machine epsilon   It is the difference between 1 0 and the next value representable by a double  I guess that it could be used in the compare function but only if the expected values are less than 1   This is in response to  cdv s answer      Also  if you basically have int arithmetic in doubles  here we use doubles to hold int values in certain cases  your arithmetic will be correct  For example 4 0 2 0 will be the same as 1 0 1 0  This is as long as you do not do things that result in fractions  4 0 3 0  or do not go outside of the size of an int

User · Answer

General-purpose comparison of floating-point numbers is generally meaningless  How to compare really depends on a problem at hand  In many problems  numbers are sufficiently discretized to allow comparing them within a given tolerance  Unfortunately  there are just as many problems  where such trick doesn t really work  For one example  consider working with a Heaviside  step  function of a number in question  digital stock options come to mind  when your observations are very close to the barrier  Performing tolerance-based comparison wouldn t do much good  as it would effectively shift the issue from the original barrier to two new ones  Again  there is no general-purpose solution for such problems and the particular solution might require going as far as changing the numerical method in order to achieve stability

User · Answer

Here s proof that using std  numeric limits  epsilon   is not the answer     it fails for values greater than one   Proof of my comment above    include  lt stdio h gt   include  lt limits gt   double ItoD    int64 x           Return double from 64-bit hexadecimal representation      return   reinterpret cast lt double  gt   amp x       void test    int64 ai    int64 bi        double a   ItoD ai   b   ItoD bi       bool close   std  fabs a-b   lt  std  numeric limits lt double gt   epsilon        printf     16f and   16f  s close  n   a  b  close    are      are not       int main         test  0x3fe0000000000000L            0x3fe0000000000001L        test  0x3ff0000000000000L            0x3ff0000000000001L       Running yields this output   0 5000000000000000 and 0 5000000000000001 are  close  1 0000000000000000 and 1 0000000000000002 are not close    Note that in the second case  one and just larger than one   the two input values are as close as they can possibly be  and still compare as not close  Thus  for values greater than 1 0  you might as well just use an equality test  Fixed epsilons will not save you when comparing floating-point values

User · Answer

In terms of the scale of quantities   If epsilon is the small fraction of the magnitude of quantity  i e  relative value  in some certain physical sense and A and B types is comparable in the same sense  than I think  that the following is quite correct    include  lt limits gt   include  lt iomanip gt   include  lt iostream gt    include  lt cmath gt   include  lt cstdlib gt   include  lt cassert gt   template lt  typename A  typename B  gt  inline bool close enough A const  amp  a  B const  amp  b                    typename std  common type lt  A  B  gt   type const  amp  epsilon        using std  isless      assert isless 0  epsilon       epsilon is a part of the whole quantity     assert isless epsilon  1        using std  abs      auto const delta   abs a - b       auto const x   abs a       auto const y   abs b          comparable generally and  a - b   lt  eps     a     b     2     return isless epsilon   y  x   amp  amp  isless epsilon   x  y   amp  amp  isless  delta   delta     x   y   epsilon      int main         std  cout  lt  lt  std  boolalpha  lt  lt  close enough 0 9  1 0  0 1   lt  lt  std  endl      std  cout  lt  lt  std  boolalpha  lt  lt  close enough 1 0  1 1  0 1   lt  lt  std  endl      std  cout  lt  lt  std  boolalpha  lt  lt  close enough 1 1     1 2     0 01   lt  lt  std  endl      std  cout  lt  lt  std  boolalpha  lt  lt  close enough 1 0001  1 0002  0 01   lt  lt  std  endl      std  cout  lt  lt  std  boolalpha  lt  lt  close enough 1 0  0 01  0 1   lt  lt  std  endl      return EXIT SUCCESS

User · Answer

Unfortunately  even your  wasteful  code is incorrect  EPSILON is the smallest value that could be added to 1 0 and change its value  The value 1 0 is very important     larger numbers do not change when added to EPSILON  Now  you can scale this value to the numbers you are comparing to tell whether they are different or not  The correct expression for comparing two doubles is   if  fabs a - b   lt   DBL EPSILON   fmax fabs a   fabs b                     This is at a minimum  In general  though  you would want to account for noise in your calculations and ignore a few of the least significant bits  so a more realistic comparison would look like   if  fabs a - b   lt   16   DBL EPSILON   fmax fabs a   fabs b                     If comparison performance is very important to you and you know the range of your values  then you should use fixed-point numbers instead

User · Answer

testing whether two doubles are almost equal  We consider two doubles     equal if the difference is within the range  0  epsilon           epsilon  a positive number  supposed to be small          if either x or y is 0  then we are comparing the absolute difference to     epsilon      if both x and y are non-zero  then we are comparing the relative difference     to epsilon  bool almost equal double x  double y  double epsilon        double diff   x - y      if  x    0  amp  amp  y    0           diff   diff y              if  diff  lt  epsilon  amp  amp  -1 0 diff  lt  epsilon           return true            return false      I used this function for my small project and it works  but note the following   Double precision error can create a surprise for you  Let s say epsilon   1 0e-6  then 1 0 and 1 000001 should NOT be considered equal according to the above code  but on my machine the function considers them to be equal  this is because 1 000001 can not be precisely translated to a binary format  it is probably 1 0000009xxx  I test it with 1 0 and 1 0000011 and this time I get the expected result

User · Answer

How about this   template lt typename T gt  bool FloatingPointEqual  T a  T b     return   a  lt  b   amp  amp    b  lt  a       I ve seen various approaches - but never seen this  so I m curious to hear of any comments too

User · Answer

Realizing this is an old thread but this article is one of the most straight forward ones I have found on comparing floating point numbers and if you want to explore more it has more detailed references as well and it the main site covers a complete range of issues dealing with floating point numbers The Floating-Point Guide  Comparison   We can find a somewhat more practical article in Floating-point tolerances revisited and notes there is absolute tolerance test  which boils down to this in C     bool absoluteToleranceCompare double x  double y        return std  fabs x - y   lt   std  numeric limits lt double gt   epsilon         and relative tolerance test   bool relativeToleranceCompare double x  double y        double maxXY   std  max  std  fabs x    std  fabs y          return std  fabs x - y   lt   std  numeric limits lt double gt   epsilon   maxXY       The article notes that the absolute test fails when x and y are large and fails in the relative case when they are small  Assuming he absolute and relative tolerance is the same a combined test would look like this   bool combinedToleranceCompare double x  double y        double maxXYOne   std  max    1 0  std  fabs x    std  fabs y             return std  fabs x - y   lt   std  numeric limits lt double gt   epsilon   maxXYOne

User · Answer

The comparison with an epsilon value is what most people do  even in game programming    You should change your implementation a little though   bool AreSame double a  double b        return fabs a - b   lt  EPSILON        Edit  Christer has added a stack of great info on this topic on a recent blog post  Enjoy

User · Answer

My way may not be correct but useful  Convert both float to strings and then do string compare  bool IsFlaotEqual float a  float b  int decimal        TCHAR form 50     T           stprintf form   T      df    decimal         TCHAR a1 30     T      a2 30     T           stprintf a1  form  a        stprintf a2  form  b        if   tcscmp a1  a2     0           return true       return false       operator overlaoding can also be done

User · Answer

Qt implements two functions  maybe you can learn from them   static inline bool qFuzzyCompare double p1  double p2        return  qAbs p1 - p2   lt   0 000000000001   qMin qAbs p1   qAbs p2        static inline bool qFuzzyCompare float p1  float p2        return  qAbs p1 - p2   lt   0 00001f   qMin qAbs p1   qAbs p2         And you may need the following functions  since         Note that comparing values where either p1 or p2 is 0 0 will not work    nor does comparing values where one of the values is NaN or infinity    If one of the values is always 0 0  use qFuzzyIsNull instead  If one   of the values is likely to be 0 0  one solution is to add 1 0 to both   values    static inline bool qFuzzyIsNull double d        return qAbs d   lt   0 000000000001     static inline bool qFuzzyIsNull float f        return qAbs f   lt   0 00001f

User · Answer

I found that the Google C   Testing Framework contains a nice cross-platform template-based implementation of AlmostEqual2sComplement which works on both doubles and floats  Given that it is released under the BSD license  using it in your own code should be no problem  as long as you retain the license  I extracted the below code from http   code google com p googletest source browse trunk include gtest internal gtest-internal h https   github com google googletest blob master googletest include gtest internal gtest-internal h and added the license on top   Be sure to  define GTEST OS WINDOWS to some value  or to change the code where it s used to something that fits your codebase - it s BSD licensed after all    Usage example   double left       something double right      something const FloatingPoint lt double gt  lhs left   rhs right    if  lhs AlmostEquals rhs         they re equal      Here s the code      Copyright 2005  Google Inc     All rights reserved        Redistribution and use in source and binary forms  with or without    modification  are permitted provided that the following conditions are    met              Redistributions of source code must retain the above copyright    notice  this list of conditions and the following disclaimer           Redistributions in binary form must reproduce the above    copyright notice  this list of conditions and the following disclaimer    in the documentation and or other materials provided with the    distribution           Neither the name of Google Inc  nor the names of its    contributors may be used to endorse or promote products derived from    this software without specific prior written permission        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS     AS IS  AND ANY EXPRESS OR IMPLIED WARRANTIES  INCLUDING  BUT NOT    LIMITED TO  THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR    A PARTICULAR PURPOSE ARE DISCLAIMED  IN NO EVENT SHALL THE COPYRIGHT    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT  INDIRECT  INCIDENTAL     SPECIAL  EXEMPLARY  OR CONSEQUENTIAL DAMAGES  INCLUDING  BUT NOT    LIMITED TO  PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES  LOSS OF USE     DATA  OR PROFITS  OR BUSINESS INTERRUPTION  HOWEVER CAUSED AND ON ANY    THEORY OF LIABILITY  WHETHER IN CONTRACT  STRICT LIABILITY  OR TORT     INCLUDING NEGLIGENCE OR OTHERWISE  ARISING IN ANY WAY OUT OF THE USE    OF THIS SOFTWARE  EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE        Authors  wan google com  Zhanyong Wan   eefacm gmail com  Sean Mcafee        The Google C   Testing Framework  Google Test       This template class serves as a compile-time function from size to    type   It maps a size in bytes to a primitive type with that    size  e g          TypeWithSize lt 4 gt   UInt       is typedef-ed to be unsigned int  unsigned integer made up of 4    bytes         Such functionality should belong to STL  but I cannot find it    there        Google Test uses this class in the implementation of floating-point    comparison        For now it only handles UInt  unsigned int  as that s all Google Test    needs   Other types can be easily added in the future if need    arises  template  lt size t size gt  class TypeWithSize    public       This prevents the user from using TypeWithSize lt N gt  with incorrect      values of N    typedef void UInt         The specialization for size 4  template  lt  gt  class TypeWithSize lt 4 gt     public       unsigned int has size 4 in both gcc and MSVC            As base basictypes h doesn t compile on Windows  we cannot use      uint32  uint64  and etc here    typedef int Int    typedef unsigned int UInt         The specialization for size 8  template  lt  gt  class TypeWithSize lt 8 gt     public   if GTEST OS WINDOWS   typedef   int64 Int    typedef unsigned   int64 UInt   else   typedef long long Int      NOLINT   typedef unsigned long long UInt      NOLINT  endif     GTEST OS WINDOWS         This template class represents an IEEE floating-point number     either single-precision or double-precision  depending on the    template parameters         The purpose of this class is to do more sophisticated number    comparison    Due to round-off error  etc  it s very unlikely that    two floating-points will be equal exactly   Hence a naive    comparison by the    operation often doesn t work         Format of IEEE floating-point          The most-significant bit being the leftmost  an IEEE      floating-point looks like           sign bit exponent bits fraction bits         Here  sign bit is a single bit that designates the sign of the      number          For float  there are 8 exponent bits and 23 fraction bits          For double  there are 11 exponent bits and 52 fraction bits          More details can be found at      http   en wikipedia org wiki IEEE floating-point standard        Template parameter          RawType  the raw floating-point type  either float or double  template  lt typename RawType gt  class FloatingPoint    public       Defines the unsigned integer type that has the same size as the      floating point number    typedef typename TypeWithSize lt sizeof RawType  gt   UInt Bits        Constants          of bits in a number    static const size t kBitCount   8 sizeof RawType           of fraction bits in a number    static const size t kFractionBitCount       std  numeric limits lt RawType gt   digits - 1          of exponent bits in a number    static const size t kExponentBitCount   kBitCount - 1 - kFractionBitCount        The mask for the sign bit    static const Bits kSignBitMask   static cast lt Bits gt  1   lt  lt   kBitCount - 1         The mask for the fraction bits    static const Bits kFractionBitMask        static cast lt Bits gt  0   gt  gt   kExponentBitCount   1         The mask for the exponent bits    static const Bits kExponentBitMask     kSignBitMask   kFractionBitMask         How many ULP s  Units in the Last Place  we want to tolerate when      comparing two numbers   The larger the value  the more error we      allow   A 0 value means that two numbers must be exactly the same      to be considered equal            The maximum error of a single floating-point operation is 0 5      units in the last place   On Intel CPU s  all floating-point      calculations are done with 80-bit precision  while double has 64      bits   Therefore  4 should be enough for ordinary use            See the following article for more details on ULP       http   www cygnus-software com papers comparingfloats comparingfloats htm    static const size t kMaxUlps   4        Constructs a FloatingPoint from a raw floating-point number            On an Intel CPU  passing a non-normalized NAN  Not a Number       around may change its bits  although the new value is guaranteed      to be also a NAN   Therefore  don t expect this constructor to      preserve the bits in x when x is a NAN    explicit FloatingPoint const RawType amp  x    u  value    x          Static methods       Reinterprets a bit pattern as a floating-point number            This function is needed to test the AlmostEquals   method    static RawType ReinterpretBits const Bits bits        FloatingPoint fp 0       fp u  bits    bits      return fp u  value             Returns the floating-point number that represent positive infinity    static RawType Infinity         return ReinterpretBits kExponentBitMask             Non-static methods       Returns the bits that represents this number    const Bits  amp bits   const   return u  bits           Returns the exponent bits of this number    Bits exponent bits   const   return kExponentBitMask  amp  u  bits           Returns the fraction bits of this number    Bits fraction bits   const   return kFractionBitMask  amp  u  bits           Returns the sign bit of this number    Bits sign bit   const   return kSignBitMask  amp  u  bits           Returns true iff this is NAN  not a number     bool is nan   const          It s a NAN if the exponent bits are all ones and the fraction        bits are not entirely zeros      return  exponent bits      kExponentBitMask   amp  amp   fraction bits      0             Returns true iff this number is at most kMaxUlps ULP s away from      rhs   In particular  this function              - returns false if either number is  or both are  NAN         - treats really large numbers as almost equal to infinity         - thinks  0 0 and -0 0 are 0 DLP s apart    bool AlmostEquals const FloatingPoint amp  rhs  const          The IEEE standard says that any comparison operation involving        a NAN must return false      if  is nan      rhs is nan    return false       return DistanceBetweenSignAndMagnitudeNumbers u  bits   rhs u  bits            lt   kMaxUlps        private       The data type used to store the actual floating-point number    union FloatingPointUnion       RawType value       The raw floating-point number      Bits bits           The bits that represent the number             Converts an integer from the sign-and-magnitude representation to      the biased representation   More precisely  let N be 2 to the      power of  kBitCount - 1   an integer x is represented by the      unsigned number x   N            For instance              -N   1  the most negative number representable using               sign-and-magnitude  is represented by 1         0      is represented by N  and        N - 1   the biggest number representable using               sign-and-magnitude  is represented by 2N - 1            Read http   en wikipedia org wiki Signed number representations      for more details on signed number representations    static Bits SignAndMagnitudeToBiased const Bits  amp sam        if  kSignBitMask  amp  sam             sam represents a negative number        return  sam   1        else            sam represents a positive number        return kSignBitMask   sam                  Given two numbers in the sign-and-magnitude representation       returns the distance between them as an unsigned number    static Bits DistanceBetweenSignAndMagnitudeNumbers const Bits  amp sam1                                                       const Bits  amp sam2        const Bits biased1   SignAndMagnitudeToBiased sam1       const Bits biased2   SignAndMagnitudeToBiased sam2       return  biased1  gt   biased2     biased1 - biased2     biased2 - biased1          FloatingPointUnion u        EDIT  This post is 4 years old  It s probably still valid  and the code is nice  but some people found improvements  Best go get the latest version of AlmostEquals right from the Google Test source code  and not the one I pasted up here

User · Answer

My class based on previously posted answers  Very similar to Google s code but I use a bias which pushes all NaN values above 0xFF000000  That allows a faster check for NaN   This code is meant to demonstrate the concept  not be a general solution  Google s code already shows how to compute all the platform specific values and I didn t want to duplicate all that  I ve done limited testing on this code   typedef unsigned int   U32      Float           Memory          Bias  unsigned      -----           ------          ---------------      NaN            0xFFFFFFFF      0xFF800001      NaN            0xFF800001      0xFFFFFFFF     -Infinity       0xFF800000      0x00000000 ---     -3 40282e 038   0xFF7FFFFF      0x00000001          -1 40130e-045   0x80000001      0x7F7FFFFF          -0 0            0x80000000      0x7F800000     --- Valid  lt   0xFF000000       0 0            0x00000000      0x7F800000         NaN  gt  0xFF000000      1 40130e-045   0x00000001      0x7F800001           3 40282e 038   0x7F7FFFFF      0xFEFFFFFF           Infinity       0x7F800000      0xFF000000 ---      NaN            0x7F800001      0xFF000001      NaN            0x7FFFFFFF      0xFF7FFFFF         Either value of NaN returns false       -Infinity and  Infinity are not  close        -0 and  0 are equal     class CompareFloat  public      union          float     m f32          U32       m u32             static bool   CompareFloat  IsClose  float A  float B  U32 unitsDelta   4                                             U32    a   CompareFloat  GetBiased  A                          U32    b   CompareFloat  GetBiased  B                           if    a  gt  0xFF000000      b  gt  0xFF000000                                                      return  false                                                  return   static cast lt U32 gt  abs  a - b      lt  unitsDelta                            protected      static U32    CompareFloat  GetBiased  float f                                             U32    r     CompareFloat   amp f - gt m u32                         if   r  amp  0x80000000                                                     return   r - 0x007FFFFF                                                  return  r   0x7F800000

User · Answer

For a more in depth approach read Comparing floating point numbers  Here is the code snippet from that link      Usable AlmostEqual function     bool AlmostEqual2sComplement float A  float B  int maxUlps                   Make sure maxUlps is non-negative and small enough that the            default NAN won t compare as equal to anything          assert maxUlps  gt  0  amp  amp  maxUlps  lt  4   1024   1024           int aInt     int   amp A             Make aInt lexicographically ordered as a twos-complement int         if  aInt  lt  0              aInt   0x80000000 - aInt             Make bInt lexicographically ordered as a twos-complement int         int bInt     int   amp B          if  bInt  lt  0              bInt   0x80000000 - bInt          int intDiff   abs aInt - bInt           if  intDiff  lt   maxUlps              return true          return false

User · Answer

I use this code   bool AlmostEqual double v1  double v2                return  std  fabs v1 - v2   lt  std  fabs std  min v1  v2     std  numeric limits lt double gt   epsilon

User · Answer

I ended up spending quite some time going through material in this great thread  I doubt everyone wants to spend so much time so I would highlight the summary of what I learned and the solution I implemented   Quick Summary   Is 1e-8 approximately same as 1e-16  If you are looking at noisy sensor data then probably yes but if you are doing molecular simulation then may be not  Bottom line  You always need to think of tolerance value in context of specific function call and not just make it generic app-wide hard-coded constant  For general library functions  it s still nice to have parameter with default tolerance  A typical choice is numeric limits  epsilon   which is same as FLT EPSILON in float h  This is however problematic because epsilon for comparing values like 1 0 is not same as epsilon for values like 1E9  The FLT EPSILON is defined for 1 0  The obvious implementation to check if number is within tolerance is fabs a-b   lt   epsilon however this doesn t work because default epsilon is defined for 1 0  We need to scale epsilon up or down in terms of a and b  There are two solution to this problem  either you set epsilon proportional to max a b  or you can get next representable numbers around a and then see if b falls into that range  The former is called  relative  method and later is called ULP method  Both methods actually fails anyway when comparing with 0  In this case  application must supply correct tolerance    Utility Functions Implementation  C  11     implements relative method - do not use for comparing with zero   use this most of the time  tolerance needs to be meaningful in your context template lt typename TReal gt  static bool isApproximatelyEqual TReal a  TReal b  TReal tolerance   std  numeric limits lt TReal gt   epsilon          TReal diff   std  fabs a - b       if  diff  lt   tolerance          return true       if  diff  lt  std  fmax std  fabs a   std  fabs b     tolerance          return true       return false       supply tolerance that is meaningful in your context   for example  default tolerance may not work if you are comparing double with float template lt typename TReal gt  static bool isApproximatelyZero TReal a  TReal tolerance   std  numeric limits lt TReal gt   epsilon          if  std  fabs a   lt   tolerance          return true      return false        use this when you want to be on safe side   for example  don t start rover unless signal is above 1 template lt typename TReal gt  static bool isDefinitelyLessThan TReal a  TReal b  TReal tolerance   std  numeric limits lt TReal gt   epsilon          TReal diff   a - b      if  diff  lt  tolerance          return true       if  diff  lt  std  fmax std  fabs a   std  fabs b     tolerance          return true       return false    template lt typename TReal gt  static bool isDefinitelyGreaterThan TReal a  TReal b  TReal tolerance   std  numeric limits lt TReal gt   epsilon          TReal diff   a - b      if  diff  gt  tolerance          return true       if  diff  gt  std  fmax std  fabs a   std  fabs b     tolerance          return true       return false       implements ULP method   use this when you are only concerned about floating point precision issue   for example  if you want to see if a is 1 0 by checking if its within   10 closest representable floating point numbers around 1 0  template lt typename TReal gt  static bool isWithinPrecisionInterval TReal a  TReal b  unsigned int interval size   1        TReal min a   a -  a - std  nextafter a  std  numeric limits lt TReal gt   lowest       interval size      TReal max a   a    std  nextafter a  std  numeric limits lt TReal gt   max    - a    interval size       return min a  lt   b  amp  amp  max a  gt   b

User · Answer

The code you wrote is bugged    return  diff  lt  EPSILON   amp  amp   -diff  gt  EPSILON     The correct code would be    return  diff  lt  EPSILON   amp  amp   diff  gt  -EPSILON         and yes this is different   I wonder if fabs wouldn t make you lose lazy evaluation in some case  I would say it depends on the compiler  You might want to try both  If they are equivalent in average  take the implementation with fabs   If you have some info on which of the two float is more likely to be bigger than then other  you can play on the order of the comparison to take better advantage of the lazy evaluation   Finally you might get better result by inlining this function  Not likely to improve much though     Edit  OJ  thanks for correcting your code  I erased my comment accordingly

User · Answer

It depends on how precise you want the comparison to be  If you want to compare for exactly the same number  then just go with      You almost never want to do this unless you actually want exactly the same number   On any decent platform you can also do the following   diff  a - b  return fabs diff  lt EPSILON    as fabs tends to be pretty fast  By pretty fast I mean it is basically a bitwise AND  so it better be fast   And integer tricks for comparing doubles and floats are nice but tend to make it more difficult for the various CPU pipelines to handle effectively  And it s definitely not faster on certain in-order architectures these days due to using the stack as a temporary storage area for values that are being used frequently   Load-hit-store for those who care

User · Answer

I write this for java  but maybe you find it useful  It uses longs instead of doubles  but takes care of NaNs  subnormals  etc   public static boolean equal double a  double b        final long fm   0xFFFFFFFFFFFFFL           fraction mask     final long sm   0x8000000000000000L        sign mask     final long cm   0x8000000000000L           most significant decimal bit mask     long c   Double doubleToLongBits a   d   Double doubleToLongBits b               int ea    int   c  gt  gt  52  amp  2047   eb    int   d  gt  gt  52  amp  2047       if  ea    2047  amp  amp   c  amp  fm     0    eb    2047  amp  amp   d  amp  fm     0  return false       NaN      if  c    d  return true                                identical - fast check     if  ea    0  amp  amp  eb    0  return true                      0 or subnormals     if   c  amp  sm      d  amp  sm   return false                 different signs     if  abs ea - eb   gt  1  return false                     b  gt  2 a or a  gt  2 b     d  lt  lt   12  c  lt  lt   12      if  ea  lt  eb  c   c  gt  gt  1   sm      else if  ea  gt  eb  d   d  gt  gt  1   sm      c -  d      return c  lt  65536  amp  amp  c  gt  -65536         don t use abs    because         There is a posibility c 0x8000000000000000 which cannot be converted to positive   public static boolean zero double a    return  Double doubleToLongBits a   gt  gt  52  amp  2047   lt  3      Keep in mind that after a number of floating-point operations  number can be very different from what we expect  There is no code to fix that

User · Answer

Why not perform bitwise XOR  Two floating point numbers are equal if their corresponding bits are equal  I think  the decision to place the exponent bits before mantissa was made to speed up comparison of two floats  I think  many answers here are missing the point of epsilon comparison  Epsilon value only depends on to what precision floating point numbers are compared  For example  after doing some arithmetic with floats you get two numbers  2 5642943554342 and 2 5642943554345  They are not equal  but for the solution only 3 decimal digits matter so then they are equal  2 564 and 2 564  In this case you choose epsilon equal to 0 001  Epsilon comparison is also possible with bitwise XOR  Correct me if I am wrong

User · Answer

In a more generic way  template  lt typename T gt  bool compareNumber const T amp  a  const T amp  b        return std  abs a - b   lt  std  numeric limits lt T gt   epsilon       Note  As pointed out by  SirGuy  this approach is flawed  I am leaving this answer here as an example not to follow

[c++] What is the most effective way for float and double comparison?

Examples related to c++

Examples related to algorithm

Examples related to optimization

Examples related to floating-point