Timer function to provide time in nano seconds using C

Question

I wish to calculate the time it took for an API to return a value  The time taken for such an action is in the space of nano seconds  As the API is a C   class function  I am using the timer h to caculate the same     include  lt ctime gt     include  lt iostream gt     using namespace std     int main int argc  char   argv           clock t start        double diff        start   clock          diff     std  clock   - start      double CLOCKS PER SEC        cout lt  lt  quot printf   quot  lt  lt  diff  lt  lt   n          return 0       The above code gives the time in seconds  How do I get the same in nano seconds and with more precision

User · Answer

With that level of accuracy  it would be better to reason in CPU tick rather than in system call like clock    And do not forget that if it takes more than one nanosecond to execute an instruction    having a nanosecond accuracy is pretty much impossible   Still  something like that is a start   Here s the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started  It will work on Pentium and above  386 486 not supported   This code is actually MS Visual C   specific  but can be probably very easy ported to whatever else  as long as it supports inline assembly   inline   int64 GetCpuClocks             Counter     struct   int32 low  high    counter          Use RDTSC instruction to get clocks count       asm push EAX       asm push EDX       asm   emit 0fh   asm   emit 031h    RDTSC       asm mov counter low  EAX       asm mov counter high  EDX       asm pop EDX       asm pop EAX         Return result     return     int64     amp counter        This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute    Using the Timing Figures  If you need to translate the clock counts into true elapsed time  divide the results by your chip s clock speed  Remember that the  rated  GHz is likely to be slightly different from the actual speed of your chip  To check your chip s true speed  you can use several very good utilities or the Win32 call  QueryPerformanceFrequency

User · Answer

In general  for timing how long it takes to call a function  you want to do it many more times than just once  If you call your function only once and it takes a very short time to run  you still have the overhead of actually calling the timer functions and you don t know how long that takes   For example  if you estimate your function might take 800 ns to run  call it in a loop ten million times  which will then take about 8 seconds   Divide the total time by ten million to get the time per call

User · Answer

This new answer uses C  11 s  lt chrono gt  facility   While there are other answers that show how to use  lt chrono gt   none of them shows how to use  lt chrono gt  with the RDTSC facility mentioned in several of the other answers here   So I thought I would show how to use RDTSC with  lt chrono gt    Additionally I ll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system s built-in clock facilities  which will likely be based on clock    clock gettime   and or QueryPerformanceCounter   Note that the RDTSC instruction is x86-specific   QueryPerformanceCounter is Windows only   And clock gettime   is POSIX only   Below I introduce two new clocks  std  chrono  high resolution clock and std  chrono  system clock  which  if you can assume C  11  are now cross-platform   First  here is how you create a C  11-compatible clock out of the Intel rdtsc assembly instruction   I ll call it x  clock    include  lt chrono gt   namespace x    struct clock       typedef unsigned long long                 rep      typedef std  ratio lt 1  2 800 000 000 gt        period     My machine is 2 8 GHz     typedef std  chrono  duration lt rep  period gt  duration      typedef std  chrono  time point lt clock gt      time point      static const bool is steady                true       static time point now   noexcept               unsigned lo  hi          asm volatile  rdtsc      a   lo     d   hi            return time point duration static cast lt rep gt  hi   lt  lt  32   lo                    x   All this clock does is count CPU cycles and store it in an unsigned 64-bit integer   You may need to tweak the assembly language syntax for your compiler   Or your compiler may offer an intrinsic you can use instead  e g  now    return   rdtsc        To build a clock you have to give it the representation  storage type    You must also supply the clock period  which must be a compile time constant  even though your machine may change clock speed in different power modes   And from those you can easily define your clock s  native  time duration and time point in terms of these fundamentals   If all you want to do is output the number of clock ticks  it doesn t really matter what number you give for the clock period   This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds   And in that case  the more accurate you are able to supply the clock speed  the more accurate will be the conversion to nanoseconds   milliseconds  whatever    Below is example code which shows how to use x  clock   Actually I ve templated the code on the clock as I d like to show how you can use many different clocks with the exact same syntax   This particular test is showing what the looping overhead is when running what you want to time under a loop    include  lt iostream gt   template  lt class clock gt  void test empty loop            Define real time units     typedef std  chrono  duration lt unsigned long long  std  pico gt  picoseconds         or         typedef std  chrono  nanoseconds nanoseconds         Define double-based unit of clock tick     typedef std  chrono  duration lt double  typename clock  period gt  Cycle      using std  chrono  duration cast      const int N   100000000         Do it     auto t0   clock  now        for  int j   0  j  lt  N    j          asm volatile          auto t1   clock  now           Get the clock ticks per iteration     auto ticks per iter   Cycle t1-t0  N      std  cout  lt  lt  ticks per iter count    lt  lt    clock ticks per iteration n          Convert to real time units     std  cout  lt  lt  duration cast lt picoseconds gt  ticks per iter  count                  lt  lt   ps per iteration n       The first thing this code does is create a  real time  unit to display the results in   I ve chosen picoseconds  but you can choose any units you like  either integral or floating point based   As an example there is a pre-made std  chrono  nanoseconds unit I could have used   As another example I want to print out the average number of clock cycles per iteration as a floating point  so I create another duration  based on double  that has the same units as the clock s tick does  called Cycle in the code    The loop is timed with calls to clock  now   on either side   If you want to name the type returned from this function it is   typename clock  time point t0   clock  now       as clearly shown in the x  clock example  and is also true of the system-supplied clocks    To get a duration in terms of floating point clock ticks one merely subtracts the two time points  and to get the per iteration value  divide that duration by the number of iterations   You can get the count in any duration by using the count   member function   This returns the internal representation   Finally I use std  chrono  duration cast to convert the duration Cycle to the duration picoseconds and print that out   To use this code is simple   int main         std  cout  lt  lt    nUsing rdtsc  n       test empty loop lt x  clock gt          std  cout  lt  lt    nUsing std  chrono  high resolution clock  n       test empty loop lt std  chrono  high resolution clock gt          std  cout  lt  lt    nUsing std  chrono  system clock  n       test empty loop lt std  chrono  system clock gt         Above I exercise the test using our home-made x  clock  and compare those results with using two of the system-supplied clocks   std  chrono  high resolution clock and std  chrono  system clock   For me this prints out   Using rdtsc  1 72632 clock ticks per iteration 616ps per iteration  Using std  chrono  high resolution clock  0 620105 clock ticks per iteration 620ps per iteration  Using std  chrono  system clock  0 00062457 clock ticks per iteration 624ps per iteration   This shows that each of these clocks has a different tick period  as the ticks per iteration is vastly different for each clock   However when converted to a known unit of time  e g  picoseconds   I get approximately the same result for each clock  your mileage may vary    Note how my code is completely free of  magic conversion constants    Indeed  there are only two magic numbers in the entire example    The clock speed of my machine in order to define x  clock  The number of iterations to test over   If changing this number makes your results vary greatly  then you should probably make the number of iterations higher  or empty your computer of competing processes while testing

User · Answer

What do you think about that       int iceu system GetTimeNow long long int  res              static struct timespec buffer                 ifdef   CYGWIN         if  clock gettime CLOCK REALTIME   amp buffer           return 1       else       if  clock gettime CLOCK PROCESS CPUTIME ID   amp buffer           return 1       endif        res  long long int buffer tv sec   1000000000LL    long long int buffer tv nsec        return 0

User · Answer

I am using the following to get the desired results    include  lt time h gt   include  lt iostream gt  using namespace std   int main  int argc  char   argv           reset the clock     timespec tS      tS tv sec   0      tS tv nsec   0      clock settime CLOCK PROCESS CPUTIME ID   amp tS                    lt code to check for the time to be put here gt              clock gettime CLOCK PROCESS CPUTIME ID   amp tS       cout  lt  lt   Time taken is     lt  lt  tS tv sec  lt  lt       lt  lt  tS tv nsec  lt  lt  endl       return 0

User · Answer

In general  for timing how long it takes to call a function  you want to do it many more times than just once  If you call your function only once and it takes a very short time to run  you still have the overhead of actually calling the timer functions and you don t know how long that takes   For example  if you estimate your function might take 800 ns to run  call it in a loop ten million times  which will then take about 8 seconds   Divide the total time by ten million to get the time per call

User · Answer

What others have posted about running the function repeatedly in a loop is correct   For Linux  and BSD  you want to use clock gettime      include  lt sys time h gt   int main        timespec ts        clock gettime CLOCK MONOTONIC   amp ts      Works on FreeBSD    clock gettime CLOCK REALTIME   amp ts      Works on Linux     For windows you want to use the QueryPerformanceCounter  And here is more on QPC  Apparently there is a known issue with QPC on some chipsets  so you may want to make sure you do not have those chipset  Additionally some dual core AMDs may also cause a problem  See the second post by sebbbi  where he states      QueryPerformanceCounter   and   QueryPerformanceFrequency   offer a   bit better resolution  but have   different issues  For example in   Windows XP  all AMD Athlon X2 dual   core CPUs return the PC of either of   the cores  randomly   the PC sometimes   jumps a bit backwards   unless you   specially install AMD dual core driver   package to fix the issue  We haven t   noticed any other dual  core CPUs   having similar issues  p4 dual  p4 ht    core2 dual  core2 quad  phenom quad     EDIT 2013 07 16   It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http   msdn microsoft com en-us library windows desktop ee417693 v vs 85  aspx        While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for   multiple processors  bugs in the BIOS or drivers may result in these routines returning   different values as the thread moves from one processor to another      However this StackOverflow answer https   stackoverflow com a 4588605 34329 states that QPC should work fine on any MS OS after Win XP service pack 2   This article shows that Windows 7 can determine if the processor s  have an invariant TSC and falls back to an external timer if they don t  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html Synchronizing across processors is still an issue   Other fine reading related to timers    https   blogs oracle com dholmes entry inside the hotspot vm clocks http   lwn net Articles 209101  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html QueryPerformanceCounter Status    See the comments for more details

User · Answer

If this is for Linux  I ve been using the function  gettimeofday   which returns a struct that gives the seconds and microseconds since the Epoch  You can then use timersub to subtract the two to get the difference in time  and convert it to whatever precision of time you want  However  you specify nanoseconds  and it looks like the function clock gettime   is what you re looking for  It puts the time in terms of seconds and nanoseconds into the structure you pass into it

User · Answer

With that level of accuracy  it would be better to reason in CPU tick rather than in system call like clock    And do not forget that if it takes more than one nanosecond to execute an instruction    having a nanosecond accuracy is pretty much impossible   Still  something like that is a start   Here s the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started  It will work on Pentium and above  386 486 not supported   This code is actually MS Visual C   specific  but can be probably very easy ported to whatever else  as long as it supports inline assembly   inline   int64 GetCpuClocks             Counter     struct   int32 low  high    counter          Use RDTSC instruction to get clocks count       asm push EAX       asm push EDX       asm   emit 0fh   asm   emit 031h    RDTSC       asm mov counter low  EAX       asm mov counter high  EDX       asm pop EDX       asm pop EAX         Return result     return     int64     amp counter        This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute    Using the Timing Figures  If you need to translate the clock counts into true elapsed time  divide the results by your chip s clock speed  Remember that the  rated  GHz is likely to be slightly different from the actual speed of your chip  To check your chip s true speed  you can use several very good utilities or the Win32 call  QueryPerformanceFrequency

User · Answer

You can use the following function with gcc running under x86 processors   unsigned long long rdtsc        define rdtsc low  high               asm     volatile    rdtsc      a   low     d   high      unsigned int low  high    rdtsc low  high     return   ulonglong high  lt  lt  32    low      with Digital Mars C     unsigned long long rdtsc         asm              rdtsc          which reads the high performance timer on the chip  I use this when doing profiling

User · Answer

For C  11  here is a simple wrapper    include  lt iostream gt   include  lt chrono gt   class Timer   public      Timer     beg  clock   now           void reset     beg    clock   now          double elapsed   const           return std  chrono  duration cast lt second  gt               clock   now   - beg   count       private      typedef std  chrono  high resolution clock clock       typedef std  chrono  duration lt double  std  ratio lt 1 gt   gt  second       std  chrono  time point lt clock  gt  beg        Or for C  03 on  nix   class Timer   public      Timer     clock gettime CLOCK REALTIME   amp beg           double elapsed             clock gettime CLOCK REALTIME   amp end            return end  tv sec - beg  tv sec                end  tv nsec - beg  tv nsec    1000000000              void reset     clock gettime CLOCK REALTIME   amp beg       private      timespec beg   end        Example of usage   int main         Timer tmr      double t   tmr elapsed        std  cout  lt  lt  t  lt  lt  std  endl       tmr reset        t   tmr elapsed        std  cout  lt  lt  t  lt  lt  std  endl      return 0      From https   gist github com gongzhitaao 7062087

User · Answer

In general  for timing how long it takes to call a function  you want to do it many more times than just once  If you call your function only once and it takes a very short time to run  you still have the overhead of actually calling the timer functions and you don t know how long that takes   For example  if you estimate your function might take 800 ns to run  call it in a loop ten million times  which will then take about 8 seconds   Divide the total time by ten million to get the time per call

User · Answer

If you need subsecond precision  you need to use system-specific extensions  and will have to check with the documentation for the operating system   POSIX supports up to microseconds with gettimeofday  but nothing more precise since computers didn t have frequencies above 1GHz   If you are using Boost  you can check boost  posix time

User · Answer

Here is a nice Boost timer that works well     Stopwatch hpp   ifndef STOPWATCH HPP  define STOPWATCH HPP    Boost  include  lt boost chrono hpp gt    Std  include  lt cstdint gt   class Stopwatch   public      Stopwatch        virtual          Stopwatch        void            Restart        std  uint64 t   Get elapsed ns        std  uint64 t   Get elapsed us        std  uint64 t   Get elapsed ms        std  uint64 t   Get elapsed s    private      boost  chrono  high resolution clock  time point  start time       endif    STOPWATCH HPP     Stopwatch cpp   include  Stopwatch hpp   Stopwatch  Stopwatch         start time boost  chrono  high resolution clock  now        Stopwatch   Stopwatch       void Stopwatch  Restart          start time   boost  chrono  high resolution clock  now       std  uint64 t Stopwatch  Get elapsed ns         boost  chrono  nanoseconds nano s   boost  chrono  duration cast lt boost  chrono  nanoseconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  nano s count        std  uint64 t Stopwatch  Get elapsed us         boost  chrono  microseconds micro s   boost  chrono  duration cast lt boost  chrono  microseconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  micro s count        std  uint64 t Stopwatch  Get elapsed ms         boost  chrono  milliseconds milli s   boost  chrono  duration cast lt boost  chrono  milliseconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  milli s count        std  uint64 t Stopwatch  Get elapsed s         boost  chrono  seconds sec   boost  chrono  duration cast lt boost  chrono  seconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  sec count

User · Answer

Using Brock Adams s method  with a simple class   int get cpu ticks         LARGE INTEGER ticks      QueryPerformanceFrequency  amp ticks       return ticks LowPart       int64 get cpu clocks         struct   int32 low  high    counter         asm cpuid       asm push EDX       asm rdtsc       asm mov counter low  EAX       asm mov counter high  EDX       asm pop EDX       asm pop EAX      return     int64     amp counter      class cbench   public      cbench const char  desc in              desc strdup desc in    start get cpu clocks             cbench                 printf   s took    4f ms n   desc   float  get cpu clocks  -start  get cpu ticks             if desc  free desc         private      char  desc        int64 start       Usage Example   int main                   cbench c  test                code               return 0      Result   test took  0 0002 ms  Has some function call overhead  but should be still more than fast enough

User · Answer

If you need subsecond precision  you need to use system-specific extensions  and will have to check with the documentation for the operating system   POSIX supports up to microseconds with gettimeofday  but nothing more precise since computers didn t have frequencies above 1GHz   If you are using Boost  you can check boost  posix time

User · Answer

You can use the following function with gcc running under x86 processors   unsigned long long rdtsc        define rdtsc low  high               asm     volatile    rdtsc      a   low     d   high      unsigned int low  high    rdtsc low  high     return   ulonglong high  lt  lt  32    low      with Digital Mars C     unsigned long long rdtsc         asm              rdtsc          which reads the high performance timer on the chip  I use this when doing profiling

User · Answer

I am using the following to get the desired results    include  lt time h gt   include  lt iostream gt  using namespace std   int main  int argc  char   argv           reset the clock     timespec tS      tS tv sec   0      tS tv nsec   0      clock settime CLOCK PROCESS CPUTIME ID   amp tS                    lt code to check for the time to be put here gt              clock gettime CLOCK PROCESS CPUTIME ID   amp tS       cout  lt  lt   Time taken is     lt  lt  tS tv sec  lt  lt       lt  lt  tS tv nsec  lt  lt  endl       return 0

User · Answer

This new answer uses C  11 s  lt chrono gt  facility   While there are other answers that show how to use  lt chrono gt   none of them shows how to use  lt chrono gt  with the RDTSC facility mentioned in several of the other answers here   So I thought I would show how to use RDTSC with  lt chrono gt    Additionally I ll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system s built-in clock facilities  which will likely be based on clock    clock gettime   and or QueryPerformanceCounter   Note that the RDTSC instruction is x86-specific   QueryPerformanceCounter is Windows only   And clock gettime   is POSIX only   Below I introduce two new clocks  std  chrono  high resolution clock and std  chrono  system clock  which  if you can assume C  11  are now cross-platform   First  here is how you create a C  11-compatible clock out of the Intel rdtsc assembly instruction   I ll call it x  clock    include  lt chrono gt   namespace x    struct clock       typedef unsigned long long                 rep      typedef std  ratio lt 1  2 800 000 000 gt        period     My machine is 2 8 GHz     typedef std  chrono  duration lt rep  period gt  duration      typedef std  chrono  time point lt clock gt      time point      static const bool is steady                true       static time point now   noexcept               unsigned lo  hi          asm volatile  rdtsc      a   lo     d   hi            return time point duration static cast lt rep gt  hi   lt  lt  32   lo                    x   All this clock does is count CPU cycles and store it in an unsigned 64-bit integer   You may need to tweak the assembly language syntax for your compiler   Or your compiler may offer an intrinsic you can use instead  e g  now    return   rdtsc        To build a clock you have to give it the representation  storage type    You must also supply the clock period  which must be a compile time constant  even though your machine may change clock speed in different power modes   And from those you can easily define your clock s  native  time duration and time point in terms of these fundamentals   If all you want to do is output the number of clock ticks  it doesn t really matter what number you give for the clock period   This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds   And in that case  the more accurate you are able to supply the clock speed  the more accurate will be the conversion to nanoseconds   milliseconds  whatever    Below is example code which shows how to use x  clock   Actually I ve templated the code on the clock as I d like to show how you can use many different clocks with the exact same syntax   This particular test is showing what the looping overhead is when running what you want to time under a loop    include  lt iostream gt   template  lt class clock gt  void test empty loop            Define real time units     typedef std  chrono  duration lt unsigned long long  std  pico gt  picoseconds         or         typedef std  chrono  nanoseconds nanoseconds         Define double-based unit of clock tick     typedef std  chrono  duration lt double  typename clock  period gt  Cycle      using std  chrono  duration cast      const int N   100000000         Do it     auto t0   clock  now        for  int j   0  j  lt  N    j          asm volatile          auto t1   clock  now           Get the clock ticks per iteration     auto ticks per iter   Cycle t1-t0  N      std  cout  lt  lt  ticks per iter count    lt  lt    clock ticks per iteration n          Convert to real time units     std  cout  lt  lt  duration cast lt picoseconds gt  ticks per iter  count                  lt  lt   ps per iteration n       The first thing this code does is create a  real time  unit to display the results in   I ve chosen picoseconds  but you can choose any units you like  either integral or floating point based   As an example there is a pre-made std  chrono  nanoseconds unit I could have used   As another example I want to print out the average number of clock cycles per iteration as a floating point  so I create another duration  based on double  that has the same units as the clock s tick does  called Cycle in the code    The loop is timed with calls to clock  now   on either side   If you want to name the type returned from this function it is   typename clock  time point t0   clock  now       as clearly shown in the x  clock example  and is also true of the system-supplied clocks    To get a duration in terms of floating point clock ticks one merely subtracts the two time points  and to get the per iteration value  divide that duration by the number of iterations   You can get the count in any duration by using the count   member function   This returns the internal representation   Finally I use std  chrono  duration cast to convert the duration Cycle to the duration picoseconds and print that out   To use this code is simple   int main         std  cout  lt  lt    nUsing rdtsc  n       test empty loop lt x  clock gt          std  cout  lt  lt    nUsing std  chrono  high resolution clock  n       test empty loop lt std  chrono  high resolution clock gt          std  cout  lt  lt    nUsing std  chrono  system clock  n       test empty loop lt std  chrono  system clock gt         Above I exercise the test using our home-made x  clock  and compare those results with using two of the system-supplied clocks   std  chrono  high resolution clock and std  chrono  system clock   For me this prints out   Using rdtsc  1 72632 clock ticks per iteration 616ps per iteration  Using std  chrono  high resolution clock  0 620105 clock ticks per iteration 620ps per iteration  Using std  chrono  system clock  0 00062457 clock ticks per iteration 624ps per iteration   This shows that each of these clocks has a different tick period  as the ticks per iteration is vastly different for each clock   However when converted to a known unit of time  e g  picoseconds   I get approximately the same result for each clock  your mileage may vary    Note how my code is completely free of  magic conversion constants    Indeed  there are only two magic numbers in the entire example    The clock speed of my machine in order to define x  clock  The number of iterations to test over   If changing this number makes your results vary greatly  then you should probably make the number of iterations higher  or empty your computer of competing processes while testing

User · Answer

For C  11  here is a simple wrapper    include  lt iostream gt   include  lt chrono gt   class Timer   public      Timer     beg  clock   now           void reset     beg    clock   now          double elapsed   const           return std  chrono  duration cast lt second  gt               clock   now   - beg   count       private      typedef std  chrono  high resolution clock clock       typedef std  chrono  duration lt double  std  ratio lt 1 gt   gt  second       std  chrono  time point lt clock  gt  beg        Or for C  03 on  nix   class Timer   public      Timer     clock gettime CLOCK REALTIME   amp beg           double elapsed             clock gettime CLOCK REALTIME   amp end            return end  tv sec - beg  tv sec                end  tv nsec - beg  tv nsec    1000000000              void reset     clock gettime CLOCK REALTIME   amp beg       private      timespec beg   end        Example of usage   int main         Timer tmr      double t   tmr elapsed        std  cout  lt  lt  t  lt  lt  std  endl       tmr reset        t   tmr elapsed        std  cout  lt  lt  t  lt  lt  std  endl      return 0      From https   gist github com gongzhitaao 7062087

User · Answer

If this is for Linux  I ve been using the function  gettimeofday   which returns a struct that gives the seconds and microseconds since the Epoch  You can then use timersub to subtract the two to get the difference in time  and convert it to whatever precision of time you want  However  you specify nanoseconds  and it looks like the function clock gettime   is what you re looking for  It puts the time in terms of seconds and nanoseconds into the structure you pass into it

User · Answer

What others have posted about running the function repeatedly in a loop is correct   For Linux  and BSD  you want to use clock gettime      include  lt sys time h gt   int main        timespec ts        clock gettime CLOCK MONOTONIC   amp ts      Works on FreeBSD    clock gettime CLOCK REALTIME   amp ts      Works on Linux     For windows you want to use the QueryPerformanceCounter  And here is more on QPC  Apparently there is a known issue with QPC on some chipsets  so you may want to make sure you do not have those chipset  Additionally some dual core AMDs may also cause a problem  See the second post by sebbbi  where he states      QueryPerformanceCounter   and   QueryPerformanceFrequency   offer a   bit better resolution  but have   different issues  For example in   Windows XP  all AMD Athlon X2 dual   core CPUs return the PC of either of   the cores  randomly   the PC sometimes   jumps a bit backwards   unless you   specially install AMD dual core driver   package to fix the issue  We haven t   noticed any other dual  core CPUs   having similar issues  p4 dual  p4 ht    core2 dual  core2 quad  phenom quad     EDIT 2013 07 16   It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http   msdn microsoft com en-us library windows desktop ee417693 v vs 85  aspx        While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for   multiple processors  bugs in the BIOS or drivers may result in these routines returning   different values as the thread moves from one processor to another      However this StackOverflow answer https   stackoverflow com a 4588605 34329 states that QPC should work fine on any MS OS after Win XP service pack 2   This article shows that Windows 7 can determine if the processor s  have an invariant TSC and falls back to an external timer if they don t  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html Synchronizing across processors is still an issue   Other fine reading related to timers    https   blogs oracle com dholmes entry inside the hotspot vm clocks http   lwn net Articles 209101  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html QueryPerformanceCounter Status    See the comments for more details

User · Answer

If this is for Linux  I ve been using the function  gettimeofday   which returns a struct that gives the seconds and microseconds since the Epoch  You can then use timersub to subtract the two to get the difference in time  and convert it to whatever precision of time you want  However  you specify nanoseconds  and it looks like the function clock gettime   is what you re looking for  It puts the time in terms of seconds and nanoseconds into the structure you pass into it

User · Answer

In general  for timing how long it takes to call a function  you want to do it many more times than just once  If you call your function only once and it takes a very short time to run  you still have the overhead of actually calling the timer functions and you don t know how long that takes   For example  if you estimate your function might take 800 ns to run  call it in a loop ten million times  which will then take about 8 seconds   Divide the total time by ten million to get the time per call

User · Answer

Here is a nice Boost timer that works well     Stopwatch hpp   ifndef STOPWATCH HPP  define STOPWATCH HPP    Boost  include  lt boost chrono hpp gt    Std  include  lt cstdint gt   class Stopwatch   public      Stopwatch        virtual          Stopwatch        void            Restart        std  uint64 t   Get elapsed ns        std  uint64 t   Get elapsed us        std  uint64 t   Get elapsed ms        std  uint64 t   Get elapsed s    private      boost  chrono  high resolution clock  time point  start time       endif    STOPWATCH HPP     Stopwatch cpp   include  Stopwatch hpp   Stopwatch  Stopwatch         start time boost  chrono  high resolution clock  now        Stopwatch   Stopwatch       void Stopwatch  Restart          start time   boost  chrono  high resolution clock  now       std  uint64 t Stopwatch  Get elapsed ns         boost  chrono  nanoseconds nano s   boost  chrono  duration cast lt boost  chrono  nanoseconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  nano s count        std  uint64 t Stopwatch  Get elapsed us         boost  chrono  microseconds micro s   boost  chrono  duration cast lt boost  chrono  microseconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  micro s count        std  uint64 t Stopwatch  Get elapsed ms         boost  chrono  milliseconds milli s   boost  chrono  duration cast lt boost  chrono  milliseconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  milli s count        std  uint64 t Stopwatch  Get elapsed s         boost  chrono  seconds sec   boost  chrono  duration cast lt boost  chrono  seconds gt  boost  chrono  high resolution clock  now   -  start time       return static cast lt std  uint64 t gt  sec count

User · Answer

You can use the following function with gcc running under x86 processors   unsigned long long rdtsc        define rdtsc low  high               asm     volatile    rdtsc      a   low     d   high      unsigned int low  high    rdtsc low  high     return   ulonglong high  lt  lt  32    low      with Digital Mars C     unsigned long long rdtsc         asm              rdtsc          which reads the high performance timer on the chip  I use this when doing profiling

User · Answer

If you need subsecond precision  you need to use system-specific extensions  and will have to check with the documentation for the operating system   POSIX supports up to microseconds with gettimeofday  but nothing more precise since computers didn t have frequencies above 1GHz   If you are using Boost  you can check boost  posix time

User · Answer

plf  nanotimer is a lightweight option for this  works in Windows  Linux  Mac and BSD etc  Has  microsecond accuracy depending on OS     include  quot plf nanotimer h quot     include  lt iostream gt     int main int argc  char   argv            plf  nanotimer timer         timer start             Do something here        double results   timer get elapsed ns          std  cout  lt  lt   quot Timing   quot   lt  lt  results  lt  lt   quot  nanoseconds  quot   lt  lt  std  endl            return 0

User · Answer

With that level of accuracy  it would be better to reason in CPU tick rather than in system call like clock    And do not forget that if it takes more than one nanosecond to execute an instruction    having a nanosecond accuracy is pretty much impossible   Still  something like that is a start   Here s the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started  It will work on Pentium and above  386 486 not supported   This code is actually MS Visual C   specific  but can be probably very easy ported to whatever else  as long as it supports inline assembly   inline   int64 GetCpuClocks             Counter     struct   int32 low  high    counter          Use RDTSC instruction to get clocks count       asm push EAX       asm push EDX       asm   emit 0fh   asm   emit 031h    RDTSC       asm mov counter low  EAX       asm mov counter high  EDX       asm pop EDX       asm pop EAX         Return result     return     int64     amp counter        This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute    Using the Timing Figures  If you need to translate the clock counts into true elapsed time  divide the results by your chip s clock speed  Remember that the  rated  GHz is likely to be slightly different from the actual speed of your chip  To check your chip s true speed  you can use several very good utilities or the Win32 call  QueryPerformanceFrequency

User · Answer

What do you think about that       int iceu system GetTimeNow long long int  res              static struct timespec buffer                 ifdef   CYGWIN         if  clock gettime CLOCK REALTIME   amp buffer           return 1       else       if  clock gettime CLOCK PROCESS CPUTIME ID   amp buffer           return 1       endif        res  long long int buffer tv sec   1000000000LL    long long int buffer tv nsec        return 0

User · Answer

What others have posted about running the function repeatedly in a loop is correct   For Linux  and BSD  you want to use clock gettime      include  lt sys time h gt   int main        timespec ts        clock gettime CLOCK MONOTONIC   amp ts      Works on FreeBSD    clock gettime CLOCK REALTIME   amp ts      Works on Linux     For windows you want to use the QueryPerformanceCounter  And here is more on QPC  Apparently there is a known issue with QPC on some chipsets  so you may want to make sure you do not have those chipset  Additionally some dual core AMDs may also cause a problem  See the second post by sebbbi  where he states      QueryPerformanceCounter   and   QueryPerformanceFrequency   offer a   bit better resolution  but have   different issues  For example in   Windows XP  all AMD Athlon X2 dual   core CPUs return the PC of either of   the cores  randomly   the PC sometimes   jumps a bit backwards   unless you   specially install AMD dual core driver   package to fix the issue  We haven t   noticed any other dual  core CPUs   having similar issues  p4 dual  p4 ht    core2 dual  core2 quad  phenom quad     EDIT 2013 07 16   It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http   msdn microsoft com en-us library windows desktop ee417693 v vs 85  aspx        While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for   multiple processors  bugs in the BIOS or drivers may result in these routines returning   different values as the thread moves from one processor to another      However this StackOverflow answer https   stackoverflow com a 4588605 34329 states that QPC should work fine on any MS OS after Win XP service pack 2   This article shows that Windows 7 can determine if the processor s  have an invariant TSC and falls back to an external timer if they don t  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html Synchronizing across processors is still an issue   Other fine reading related to timers    https   blogs oracle com dholmes entry inside the hotspot vm clocks http   lwn net Articles 209101  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html QueryPerformanceCounter Status    See the comments for more details

User · Answer

Minimalistic copy amp paste-struct   lazy usage  If the idea is to have a minimalistic struct that you can use for quick tests  then I suggest you just copy and paste anywhere in your C   file right after the  include s  This is the only instance in which I sacrifice Allman-style formatting   You can easily adjust the precision in the first line of the struct  Possible values are  nanoseconds  microseconds  milliseconds  seconds  minutes  or hours    include  lt chrono gt  struct MeasureTime       using precision   std  chrono  microseconds      std  vector lt std  chrono  steady clock  time point gt  times      std  chrono  steady clock  time point oneLast      void p             std  cout  lt  lt   Mark                     lt  lt  times size   2                  lt  lt                         lt  lt  std  chrono  duration cast lt precision gt  times back   - oneLast  count                     lt  lt  std  endl            void m             oneLast   times back            times push back std  chrono  steady clock  now               void t             m            p            m              MeasureTime             times push back std  chrono  steady clock  now                Usage  MeasureTime m     first time is already in memory doFnc1    m t       Mark 1  next time  and print difference with previous mark doFnc2    m t       Mark 2  next time  and print difference with previous mark doStuff   doMoreStuff    andDoItAgain   doStuff aoeuaoeu    m t       prints  Mark 3  123123  etc      Standard output result  Mark 1  123 Mark 2  32 Mark 3  433234   If you want summary after execution  If you want the report afterwards  because for example your code in between also writes to standard output  Then add the following function to the struct  just before MeasureTime      void s        summary     int i   0      std  chrono  steady clock  time point tprev      for auto tcur   times                if i  gt  0                        std  cout  lt  lt   Mark    lt  lt  i  lt  lt                            lt  lt  std  chrono  duration cast lt precision gt  tprev - tcur  count                        lt  lt  std  endl                    tprev   tcur            i            So then you can just use   MeasureTime m  doFnc1    m m    doFnc2    m m    doStuff   doMoreStuff    andDoItAgain   doStuff aoeuaoeu    m m    m s      Which will list all the marks just like before  but then after the other code is executed  Note that you shouldn t use both m s   and m t

User · Answer

To do this correctly you can use one of two ways  either go with RDTSC or with clock gettime    The second is about 2 times faster and has the advantage of giving the right absolute time  Note that for RDTSC to work correctly you need to use it as indicated  other comments on this page have errors  and may yield incorrect timing values on certain processors   inline uint64 t rdtsc         uint32 t lo  hi        asm     volatile            xorl   eax    eax n         cpuid n         rdtsc n            a   lo     d   hi                    ebx     ecx         return  uint64 t hi  lt  lt  32   lo      and for clock gettime   I chose microsecond resolution arbitrarily    include  lt time h gt   include  lt sys timeb h gt     needs -lrt  real-time lib     1970-01-01 epoch UTC time  1 mcs resolution  divide by 1M to get time t  uint64 t ClockGetTime         timespec ts      clock gettime CLOCK REALTIME   amp ts       return  uint64 t ts tv sec   1000000LL    uint64 t ts tv nsec   1000LL      the timing and values produced   Absolute values  rdtsc             4571567254267600 clock gettime     1278605535506855  Processing time   10000000 runs  rdtsc             2292547353 clock gettime     1031119636

User · Answer

To do this correctly you can use one of two ways  either go with RDTSC or with clock gettime    The second is about 2 times faster and has the advantage of giving the right absolute time  Note that for RDTSC to work correctly you need to use it as indicated  other comments on this page have errors  and may yield incorrect timing values on certain processors   inline uint64 t rdtsc         uint32 t lo  hi        asm     volatile            xorl   eax    eax n         cpuid n         rdtsc n            a   lo     d   hi                    ebx     ecx         return  uint64 t hi  lt  lt  32   lo      and for clock gettime   I chose microsecond resolution arbitrarily    include  lt time h gt   include  lt sys timeb h gt     needs -lrt  real-time lib     1970-01-01 epoch UTC time  1 mcs resolution  divide by 1M to get time t  uint64 t ClockGetTime         timespec ts      clock gettime CLOCK REALTIME   amp ts       return  uint64 t ts tv sec   1000000LL    uint64 t ts tv nsec   1000LL      the timing and values produced   Absolute values  rdtsc             4571567254267600 clock gettime     1278605535506855  Processing time   10000000 runs  rdtsc             2292547353 clock gettime     1031119636

User · Answer

Minimalistic copy amp paste-struct   lazy usage  If the idea is to have a minimalistic struct that you can use for quick tests  then I suggest you just copy and paste anywhere in your C   file right after the  include s  This is the only instance in which I sacrifice Allman-style formatting   You can easily adjust the precision in the first line of the struct  Possible values are  nanoseconds  microseconds  milliseconds  seconds  minutes  or hours    include  lt chrono gt  struct MeasureTime       using precision   std  chrono  microseconds      std  vector lt std  chrono  steady clock  time point gt  times      std  chrono  steady clock  time point oneLast      void p             std  cout  lt  lt   Mark                     lt  lt  times size   2                  lt  lt                         lt  lt  std  chrono  duration cast lt precision gt  times back   - oneLast  count                     lt  lt  std  endl            void m             oneLast   times back            times push back std  chrono  steady clock  now               void t             m            p            m              MeasureTime             times push back std  chrono  steady clock  now                Usage  MeasureTime m     first time is already in memory doFnc1    m t       Mark 1  next time  and print difference with previous mark doFnc2    m t       Mark 2  next time  and print difference with previous mark doStuff   doMoreStuff    andDoItAgain   doStuff aoeuaoeu    m t       prints  Mark 3  123123  etc      Standard output result  Mark 1  123 Mark 2  32 Mark 3  433234   If you want summary after execution  If you want the report afterwards  because for example your code in between also writes to standard output  Then add the following function to the struct  just before MeasureTime      void s        summary     int i   0      std  chrono  steady clock  time point tprev      for auto tcur   times                if i  gt  0                        std  cout  lt  lt   Mark    lt  lt  i  lt  lt                            lt  lt  std  chrono  duration cast lt precision gt  tprev - tcur  count                        lt  lt  std  endl                    tprev   tcur            i            So then you can just use   MeasureTime m  doFnc1    m m    doFnc2    m m    doStuff   doMoreStuff    andDoItAgain   doStuff aoeuaoeu    m m    m s      Which will list all the marks just like before  but then after the other code is executed  Note that you shouldn t use both m s   and m t

User · Answer

You can use the following function with gcc running under x86 processors   unsigned long long rdtsc        define rdtsc low  high               asm     volatile    rdtsc      a   low     d   high      unsigned int low  high    rdtsc low  high     return   ulonglong high  lt  lt  32    low      with Digital Mars C     unsigned long long rdtsc         asm              rdtsc          which reads the high performance timer on the chip  I use this when doing profiling

User · Answer

plf  nanotimer is a lightweight option for this  works in Windows  Linux  Mac and BSD etc  Has  microsecond accuracy depending on OS     include  quot plf nanotimer h quot     include  lt iostream gt     int main int argc  char   argv            plf  nanotimer timer         timer start             Do something here        double results   timer get elapsed ns          std  cout  lt  lt   quot Timing   quot   lt  lt  results  lt  lt   quot  nanoseconds  quot   lt  lt  std  endl            return 0

User · Answer

I am using the following to get the desired results    include  lt time h gt   include  lt iostream gt  using namespace std   int main  int argc  char   argv           reset the clock     timespec tS      tS tv sec   0      tS tv nsec   0      clock settime CLOCK PROCESS CPUTIME ID   amp tS                    lt code to check for the time to be put here gt              clock gettime CLOCK PROCESS CPUTIME ID   amp tS       cout  lt  lt   Time taken is     lt  lt  tS tv sec  lt  lt       lt  lt  tS tv nsec  lt  lt  endl       return 0

User · Answer

What others have posted about running the function repeatedly in a loop is correct   For Linux  and BSD  you want to use clock gettime      include  lt sys time h gt   int main        timespec ts        clock gettime CLOCK MONOTONIC   amp ts      Works on FreeBSD    clock gettime CLOCK REALTIME   amp ts      Works on Linux     For windows you want to use the QueryPerformanceCounter  And here is more on QPC  Apparently there is a known issue with QPC on some chipsets  so you may want to make sure you do not have those chipset  Additionally some dual core AMDs may also cause a problem  See the second post by sebbbi  where he states      QueryPerformanceCounter   and   QueryPerformanceFrequency   offer a   bit better resolution  but have   different issues  For example in   Windows XP  all AMD Athlon X2 dual   core CPUs return the PC of either of   the cores  randomly   the PC sometimes   jumps a bit backwards   unless you   specially install AMD dual core driver   package to fix the issue  We haven t   noticed any other dual  core CPUs   having similar issues  p4 dual  p4 ht    core2 dual  core2 quad  phenom quad     EDIT 2013 07 16   It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http   msdn microsoft com en-us library windows desktop ee417693 v vs 85  aspx        While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for   multiple processors  bugs in the BIOS or drivers may result in these routines returning   different values as the thread moves from one processor to another      However this StackOverflow answer https   stackoverflow com a 4588605 34329 states that QPC should work fine on any MS OS after Win XP service pack 2   This article shows that Windows 7 can determine if the processor s  have an invariant TSC and falls back to an external timer if they don t  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html Synchronizing across processors is still an issue   Other fine reading related to timers    https   blogs oracle com dholmes entry inside the hotspot vm clocks http   lwn net Articles 209101  http   performancebydesign blogspot com 2012 03 high-resolution-clocks-and-timers-for html QueryPerformanceCounter Status    See the comments for more details

User · Answer

You can use Embedded Profiler  free for Windows and Linux  which has an interface to a multiplatform timer  in a processor cycle count  and can give you a number of cycles per seconds   EProfilerTimer timer  timer Start            Your code here  const uint64 t number of elapsed cycles   timer Stop    const uint64 t nano seconds elapsed       mumber of elapsed cycles    double  timer GetCyclesPerSecond     1000000000    Recalculation of cycle count to time is possibly a dangerous operation with modern processors where CPU frequency can be changed dynamically  Therefore to be sure that converted times are correct  it is necessary to fix processor frequency before profiling

User · Answer

With that level of accuracy  it would be better to reason in CPU tick rather than in system call like clock    And do not forget that if it takes more than one nanosecond to execute an instruction    having a nanosecond accuracy is pretty much impossible   Still  something like that is a start   Here s the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started  It will work on Pentium and above  386 486 not supported   This code is actually MS Visual C   specific  but can be probably very easy ported to whatever else  as long as it supports inline assembly   inline   int64 GetCpuClocks             Counter     struct   int32 low  high    counter          Use RDTSC instruction to get clocks count       asm push EAX       asm push EDX       asm   emit 0fh   asm   emit 031h    RDTSC       asm mov counter low  EAX       asm mov counter high  EDX       asm pop EDX       asm pop EAX         Return result     return     int64     amp counter        This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute    Using the Timing Figures  If you need to translate the clock counts into true elapsed time  divide the results by your chip s clock speed  Remember that the  rated  GHz is likely to be slightly different from the actual speed of your chip  To check your chip s true speed  you can use several very good utilities or the Win32 call  QueryPerformanceFrequency

User · Answer

If you need subsecond precision  you need to use system-specific extensions  and will have to check with the documentation for the operating system   POSIX supports up to microseconds with gettimeofday  but nothing more precise since computers didn t have frequencies above 1GHz   If you are using Boost  you can check boost  posix time

User · Answer

I m using Borland code here is the code ti hund  gives me some times a negativnumber but timing is fairly good    include  lt dos h gt   void main      struct  time t  int Hour Min Sec Hun  gettime  amp t   Hour t ti hour  Min t ti min  Sec t ti sec  Hun t ti hund  printf  Start time is   2d  02d  02d  02d n      t ti hour  t ti min  t ti sec  t ti hund        your code to time         read the time here remove Hours and min if the time is in sec  gettime  amp t   printf   nTid Hour  d Min  d Sec  d  Hundreds  d n  t ti hour-Hour                               t ti min-Min t ti sec-Sec t ti hund-Hun   printf   n nAlt Ferdig Press a Key n n    getch         end main

User · Answer

If this is for Linux  I ve been using the function  gettimeofday   which returns a struct that gives the seconds and microseconds since the Epoch  You can then use timersub to subtract the two to get the difference in time  and convert it to whatever precision of time you want  However  you specify nanoseconds  and it looks like the function clock gettime   is what you re looking for  It puts the time in terms of seconds and nanoseconds into the structure you pass into it

User · Answer

I m using Borland code here is the code ti hund  gives me some times a negativnumber but timing is fairly good    include  lt dos h gt   void main      struct  time t  int Hour Min Sec Hun  gettime  amp t   Hour t ti hour  Min t ti min  Sec t ti sec  Hun t ti hund  printf  Start time is   2d  02d  02d  02d n      t ti hour  t ti min  t ti sec  t ti hund        your code to time         read the time here remove Hours and min if the time is in sec  gettime  amp t   printf   nTid Hour  d Min  d Sec  d  Hundreds  d n  t ti hour-Hour                               t ti min-Min t ti sec-Sec t ti hund-Hun   printf   n nAlt Ferdig Press a Key n n    getch         end main

User · Answer

I am using the following to get the desired results    include  lt time h gt   include  lt iostream gt  using namespace std   int main  int argc  char   argv           reset the clock     timespec tS      tS tv sec   0      tS tv nsec   0      clock settime CLOCK PROCESS CPUTIME ID   amp tS                    lt code to check for the time to be put here gt              clock gettime CLOCK PROCESS CPUTIME ID   amp tS       cout  lt  lt   Time taken is     lt  lt  tS tv sec  lt  lt       lt  lt  tS tv nsec  lt  lt  endl       return 0

User · Answer

Using Brock Adams s method  with a simple class   int get cpu ticks         LARGE INTEGER ticks      QueryPerformanceFrequency  amp ticks       return ticks LowPart       int64 get cpu clocks         struct   int32 low  high    counter         asm cpuid       asm push EDX       asm rdtsc       asm mov counter low  EAX       asm mov counter high  EDX       asm pop EDX       asm pop EAX      return     int64     amp counter      class cbench   public      cbench const char  desc in              desc strdup desc in    start get cpu clocks             cbench                 printf   s took    4f ms n   desc   float  get cpu clocks  -start  get cpu ticks             if desc  free desc         private      char  desc        int64 start       Usage Example   int main                   cbench c  test                code               return 0      Result   test took  0 0002 ms  Has some function call overhead  but should be still more than fast enough

User · Answer

You can use Embedded Profiler  free for Windows and Linux  which has an interface to a multiplatform timer  in a processor cycle count  and can give you a number of cycles per seconds   EProfilerTimer timer  timer Start            Your code here  const uint64 t number of elapsed cycles   timer Stop    const uint64 t nano seconds elapsed       mumber of elapsed cycles    double  timer GetCyclesPerSecond     1000000000    Recalculation of cycle count to time is possibly a dangerous operation with modern processors where CPU frequency can be changed dynamically  Therefore to be sure that converted times are correct  it is necessary to fix processor frequency before profiling

[c++] Timer function to provide time in nano seconds using C++

Examples related to c++

Examples related to c

Examples related to timer