How to get the sign mantissa and exponent of a floating point number

Question

I have a program  which is running on two processors  one of which does not have floating point support  So  I need to perform floating point calculations using fixed point in that processor  For that purpose  I will be using a floating point emulation library    I need to first extract the signs  mantissas and exponents of floating point numbers on the processor which do support floating point  So  my question is how can I get the sign  mantissa and exponent of a single precision floating point number   Following the format from this figure     That is what I ve done so far  but except sign  neither mantissa and exponent are correct  I think  I m missing something   void getSME  int amp  s  int amp  m  int amp  e  float number         unsigned int  ptr    unsigned int   amp number       s    ptr  gt  gt  31      e    ptr  amp  0x7f800000      e  gt  gt   23      m    ptr  amp  0x007fffff

User · Accepted Answer

I think it is better to use unions to do the casts, it is clearer.

#include <stdio.h>

typedef union {
  float f;
  struct {
    unsigned int mantisa : 23;
    unsigned int exponent : 8;
    unsigned int sign : 1;
  } parts;
} float_cast;

int main(void) {
  float_cast d1 = { .f = 0.15625 };
  printf("sign = %x\n", d1.parts.sign);
  printf("exponent = %x\n", d1.parts.exponent);
  printf("mantisa = %x\n", d1.parts.mantisa);
}

Example based on http://en.wikipedia.org/wiki/Single_precision

User · Answer

Find out the format of the floating point numbers used on the CPU that directly supports floating point and break it down into those parts  The most common format is IEEE-754   Alternatively  you could obtain those parts using a few special functions  double frexp double value  int  exp   and double ldexp double x  int exp    as shown in this answer   Another option is to use  a with printf

User · Answer

Don t make functions that do multiple things  Don t mask then shift  shift then mask  Don t mutate values unnecessarily because it s slow  cache-destroying and error-prone  Don t use magic numbers       NaNs  infinities  denormals unhandled       assumes sizeof float     4 and uses ieee754 binary32 format       assumes two s-complement machine       C99     include  lt stdint h gt    define SIGN f     f   lt   -0 0    1   0    define AS U32 f     const uint32 t   amp  f    define FLOAT EXPONENT WIDTH 8  define FLOAT MANTISSA WIDTH 23  define FLOAT BIAS   1 lt  lt  FLOAT EXPONENT WIDTH-1  -1     2  e-1 -1     define MASK width     1 lt  lt  width  -1     2 w - 1     define FLOAT IMPLICIT MANTISSA BIT  1 lt  lt FLOAT MANTISSA WIDTH      correct exponent with bias removed    int float exponent float f      return  int   AS U32 f   gt  gt  FLOAT MANTISSA WIDTH   amp  MASK FLOAT EXPONENT WIDTH   - FLOAT BIAS        of non-zero  normal floats only    int float mantissa float f      return  int  AS U32 f   amp  MASK FLOAT MANTISSA BITS     FLOAT IMPLICIT MANTISSA BIT        Hacker s Delight book is your friend

User · Answer

My advice is to stick to rule 0 and not redo what standard libraries already do  if this is enough  Look at math h  cmath in standard C    and functions frexp  frexpf  frexpl  that break a floating point value  double  float  or long double  in its significand and exponent part  To extract the sign from the significand you can use signbit  also in math h   cmath  or copysign  only C  11   Some alternatives  with slighter different semantics  are modf and ilogb scalbn  available in C  11  http   en cppreference com w cpp numeric math logb compares them  but I didn t find in the documentation how all these functions behave with   -inf and NaNs  Finally  if you really want to use bitmasks  e g   you desperately need to know the exact bits  and your program may have different NaNs with different representations  and you don t trust the above functions   at least make everything platform-independent by using the macros in float h cfloat

User · Answer

See this IEEE 754 types h header for the union types to extract  float  double and long double   endianness handled   Here is an extract         - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -     Single Precision  float   --  Standard IEEE 754 Floating-point Specification       define IEEE 754 FLOAT MANTISSA BITS  23    define IEEE 754 FLOAT EXPONENT BITS  8    define IEEE 754 FLOAT SIGN BITS      1            if  IS BIG ENDIAN    1      typedef union           float value          struct                 int8 t   sign       IEEE 754 FLOAT SIGN BITS                int8 t   exponent   IEEE 754 FLOAT EXPONENT BITS                uint32 t mantissa   IEEE 754 FLOAT MANTISSA BITS                   IEEE 754 float    else     typedef union           float value          struct                 uint32 t mantissa   IEEE 754 FLOAT MANTISSA BITS                int8 t   exponent   IEEE 754 FLOAT EXPONENT BITS                int8 t   sign       IEEE 754 FLOAT SIGN BITS                   IEEE 754 float    endif   And see dtoa base c for a demonstration of how to convert a double value to string form   Furthermore  check out section 1 2 1 1 4 2 - Floating-Point Type Memory Layout of the C CPP Reference Book  it explains super well and in simple terms the memory representation layout of all the floating-point types and how to decode them  w  illustrations  following the actually IEEE 754 Floating-Point specification   It also has links to really really good ressources that explain even deeper

User · Answer

Cast a pointer to the floating point variable as something like an unsigned int  Then you can shift and mask the bits to get each component   float foo  unsigned int ival  mantissa  exponent  sign   foo   -21 4f  ival      unsigned int    amp foo   mantissa     ival  amp  0x7FFFFF   ival   ival  gt  gt  23  exponent     ival   amp  0xFF    ival   ival  gt  gt  8  sign     ival  amp  0x01      Obviously you probably wouldn t use unsigned ints for the exponent and sign bits but this should at least give you the idea

User · Answer

On Linux package glibc-headers provides header  include  lt ieee754 h gt  with floating point types definitions  e g    union ieee754 double         double d          This is the IEEE 754 double-precision format          struct          if   BYTE ORDER      BIG ENDIAN     unsigned int negative 1      unsigned int exponent 11         Together these comprise the mantissa          unsigned int mantissa0 20      unsigned int mantissa1 32   endif                 Big endian       if   BYTE ORDER      LITTLE ENDIAN   if      FLOAT WORD ORDER      BIG ENDIAN     unsigned int mantissa0 20      unsigned int exponent 11      unsigned int negative 1      unsigned int mantissa1 32    else        Together these comprise the mantissa          unsigned int mantissa1 32      unsigned int mantissa0 20      unsigned int exponent 11      unsigned int negative 1    endif  endif                 Little endian              ieee          This format makes it easier to see if a NaN is a signalling NaN          struct          if   BYTE ORDER      BIG ENDIAN     unsigned int negative 1      unsigned int exponent 11      unsigned int quiet nan 1         Together these comprise the mantissa          unsigned int mantissa0 19      unsigned int mantissa1 32   else   if      FLOAT WORD ORDER      BIG ENDIAN     unsigned int mantissa0 19      unsigned int quiet nan 1      unsigned int exponent 11      unsigned int negative 1      unsigned int mantissa1 32    else        Together these comprise the mantissa          unsigned int mantissa1 32      unsigned int mantissa0 19      unsigned int quiet nan 1      unsigned int exponent 11      unsigned int negative 1    endif  endif         ieee nan         define IEEE754 DOUBLE BIAS 0x3ff    Added to exponent

User · Answer

You re  amp ing the wrong bits  I think you want   s    ptr  gt  gt  31  e    ptr  amp  0x7f800000  e  gt  gt   23  m    ptr  amp  0x007fffff    Remember  when you  amp   you are zeroing out bits that you don t set  So in this case  you want to zero out the sign bit when you get the exponent  and you want to zero out the sign bit and the exponent when you get the mantissa   Note that the masks come directly from your picture  So  the exponent mask will look like      0 11111111 00000000000000000000000   and the mantissa mask will look like      0 00000000 11111111111111111111111

[c] How to get the sign, mantissa and exponent of a floating point number

Examples related to c

Examples related to floating-point

Examples related to emulation