[c++] Measuring execution time of a function in C++

I want to find out how much time a certain function takes in my C++ program to execute on Linux. Afterwards, I want to make a speed comparison . I saw several time function but ended up with this from boost. Chrono:

process_user_cpu_clock, captures user-CPU time spent by the current process

Now, I am not clear if I use the above function, will I get the only time which CPU spent on that function?

Secondly, I could not find any example of using the above function. Can any one please help me how to use the above function?

P.S: Right now , I am using std::chrono::system_clock::now() to get time in seconds but this gives me different results due to different CPU load every time.

This question is related to c++ optimization profiling

The answer is


In Scott Meyers book I found an example of universal generic lambda expression that can be used to measure function execution time. (C++14)

auto timeFuncInvocation = 
    [](auto&& func, auto&&... params) {
        // get time before function invocation
        const auto& start = std::chrono::high_resolution_clock::now();
        // function invocation using perfect forwarding
        std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
        // get time after function invocation
        const auto& stop = std::chrono::high_resolution_clock::now();
        return stop - start;
     };

The problem is that you are measure only one execution so the results can be very differ. To get a reliable result you should measure a large number of execution. According to Andrei Alexandrescu lecture at code::dive 2015 conference - Writing Fast Code I:

Measured time: tm = t + tq + tn + to

where:

tm - measured (observed) time

t - the actual time of interest

tq - time added by quantization noise

tn - time added by various sources of noise

to - overhead time (measuring, looping, calling functions)

According to what he said later in the lecture, you should take a minimum of this large number of execution as your result. I encourage you to look at the lecture in which he explains why.

Also there is a very good library from google - https://github.com/google/benchmark. This library is very simple to use and powerful. You can checkout some lectures of Chandler Carruth on youtube where he is using this library in practice. For example CppCon 2017: Chandler Carruth “Going Nowhere Faster”;

Example usage:

#include <iostream>
#include <chrono>
#include <vector>
auto timeFuncInvocation = 
    [](auto&& func, auto&&... params) {
        // get time before function invocation
        const auto& start = high_resolution_clock::now();
        // function invocation using perfect forwarding
        for(auto i = 0; i < 100000/*largeNumber*/; ++i) {
            std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
        }
        // get time after function invocation
        const auto& stop = high_resolution_clock::now();
        return (stop - start)/100000/*largeNumber*/;
     };

void f(std::vector<int>& vec) {
    vec.push_back(1);
}

void f2(std::vector<int>& vec) {
    vec.emplace_back(1);
}
int main()
{
    std::vector<int> vec;
    std::vector<int> vec2;
    std::cout << timeFuncInvocation(f, vec).count() << std::endl;
    std::cout << timeFuncInvocation(f2, vec2).count() << std::endl;
    std::vector<int> vec3;
    vec3.reserve(100000);
    std::vector<int> vec4;
    vec4.reserve(100000);
    std::cout << timeFuncInvocation(f, vec3).count() << std::endl;
    std::cout << timeFuncInvocation(f2, vec4).count() << std::endl;
    return 0;
}

EDIT: Ofcourse you always need to remember that your compiler can optimize something out or not. Tools like perf can be useful in such cases.


Since none of the provided answers are very accurate or give reproducable results I decided to add a link to my code that has sub-nanosecond precision and scientific statistics.

Note that this will only work to measure code that takes a (very) short time to run (aka, a few clock cycles to a few thousand): if they run so long that they are likely to be interrupted by some -heh- interrupt, then it is clearly not possible to give a reproducable and accurate result; the consequence of which is that the measurement never finishes: namely, it continues to measure until it is statistically 99.9% sure it has the right answer which never happens on a machine that has other processes running when the code takes too long.

https://github.com/CarloWood/cwds/blob/master/benchmark.h#L40


simple program to find a function execution time taken.

#include <iostream>
#include <ctime> // time_t
#include <cstdio>

void function()
{
     for(long int i=0;i<1000000000;i++)
     {
        // do nothing
     }
}

int main()
{

time_t begin,end; // time_t is a datatype to store time values.

time (&begin); // note time before execution
function();
time (&end); // note time after execution

double difference = difftime (end,begin);
printf ("time taken for function() %.2lf seconds.\n", difference );

return 0;
}

Easy way for older C++, or C:

#include <time.h> // includes clock_t and CLOCKS_PER_SEC

int main() {

    clock_t start, end;

    start = clock();
    // ...code to measure...
    end = clock();

    double duration_sec = double(end-start)/CLOCKS_PER_SEC;
    return 0;
}

Timing precision in seconds is 1.0/CLOCKS_PER_SEC


Here is an excellent header only class template to measure the elapsed time of a function or any code block:

#ifndef EXECUTION_TIMER_H
#define EXECUTION_TIMER_H

template<class Resolution = std::chrono::milliseconds>
class ExecutionTimer {
public:
    using Clock = std::conditional_t<std::chrono::high_resolution_clock::is_steady,
                                     std::chrono::high_resolution_clock,
                                     std::chrono::steady_clock>;
private:
    const Clock::time_point mStart = Clock::now();

public:
    ExecutionTimer() = default;
    ~ExecutionTimer() {
        const auto end = Clock::now();
        std::ostringstream strStream;
        strStream << "Destructor Elapsed: "
                  << std::chrono::duration_cast<Resolution>( end - mStart ).count()
                  << std::endl;
        std::cout << strStream.str() << std::endl;
    }    

    inline void stop() {
        const auto end = Clock::now();
        std::ostringstream strStream;
        strStream << "Stop Elapsed: "
                  << std::chrono::duration_cast<Resolution>(end - mStart).count()
                  << std::endl;
        std::cout << strStream.str() << std::endl;
    }

}; // ExecutionTimer

#endif // EXECUTION_TIMER_H

Here are some uses of it:

int main() {
    { // empty scope to display ExecutionTimer's destructor's message
         // displayed in milliseconds
         ExecutionTimer<std::chrono::milliseconds> timer;

         // function or code block here

         timer.stop();

    } 

    { // same as above
        ExecutionTimer<std::chrono::microseconds> timer;

        // code block here...

        timer.stop();
    }

    {  // same as above
       ExecutionTimer<std::chrono::nanoseconds> timer;

       // code block here...

       timer.stop();

    }

    {  // same as above
       ExecutionTimer<std::chrono::seconds> timer;

       // code block here...

       timer.stop();

    }              

    return 0;
}

Since the class is a template we can specify real easily in how we want our time to be measured & displayed. This is a very handy utility class template for doing bench marking and is very easy to use.


#include <iostream>
#include <chrono>

void function()
{
    // code here;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    function();
    auto t2 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

    std::cout << duration<<"/n";
    return 0;
}

This Worked for me.


Note:

The high_resolution_clock is not implemented consistently across different standard library implementations, and its use should be avoided. It is often just an alias for std::chrono::steady_clock or std::chrono::system_clock, but which one it is depends on the library or configuration. When it is a system_clock, it is not monotonic (e.g., the time can go backwards).

For example, for gcc's libstdc++ it is system_clock, for MSVC it is steady_clock, and for clang's libc++ it depends on configuration.

Generally one should just use std::chrono::steady_clock or std::chrono::system_clock directly instead of std::chrono::high_resolution_clock: use steady_clock for duration measurements, and system_clock for wall-clock time.


Here's a function that will measure the execution time of any function passed as argument:

#include <chrono>
#include <utility>

typedef std::chrono::high_resolution_clock::time_point TimeVar;

#define duration(a) std::chrono::duration_cast<std::chrono::nanoseconds>(a).count()
#define timeNow() std::chrono::high_resolution_clock::now()

template<typename F, typename... Args>
double funcTime(F func, Args&&... args){
    TimeVar t1=timeNow();
    func(std::forward<Args>(args)...);
    return duration(timeNow()-t1);
}

Example usage:

#include <iostream>
#include <algorithm>

typedef std::string String;

//first test function doing something
int countCharInString(String s, char delim){
    int count=0;
    String::size_type pos = s.find_first_of(delim);
    while ((pos = s.find_first_of(delim, pos)) != String::npos){
        count++;pos++;
    }
    return count;
}

//second test function doing the same thing in different way
int countWithAlgorithm(String s, char delim){
    return std::count(s.begin(),s.end(),delim);
}


int main(){
    std::cout<<"norm: "<<funcTime(countCharInString,"precision=10",'=')<<"\n";
    std::cout<<"algo: "<<funcTime(countWithAlgorithm,"precision=10",'=');
    return 0;
}

Output:

norm: 15555
algo: 2976

If you want to safe time and lines of code you can make measuring the function execution time a one line macro:

a) Implement a time measuring class as already suggested above ( here is my implementation for android):

class MeasureExecutionTime{
private:
    const std::chrono::steady_clock::time_point begin;
    const std::string caller;
public:
    MeasureExecutionTime(const std::string& caller):caller(caller),begin(std::chrono::steady_clock::now()){}
    ~MeasureExecutionTime(){
        const auto duration=std::chrono::steady_clock::now()-begin;
        LOGD("ExecutionTime")<<"For "<<caller<<" is "<<std::chrono::duration_cast<std::chrono::milliseconds>(duration).count()<<"ms";
    }
};

b) Add a convenient macro that uses the current function name as TAG (using a macro here is important, else __FUNCTION__ will evaluate to MeasureExecutionTime instead of the function you wanto to measure

#ifndef MEASURE_FUNCTION_EXECUTION_TIME
#define MEASURE_FUNCTION_EXECUTION_TIME const MeasureExecutionTime measureExecutionTime(__FUNCTION__);
#endif

c) Write your macro at the begin of the function you want to measure. Example:

 void DecodeMJPEGtoANativeWindowBuffer(uvc_frame_t* frame_mjpeg,const ANativeWindow_Buffer& nativeWindowBuffer){
        MEASURE_FUNCTION_EXECUTION_TIME
        // Do some time-critical stuff 
}

Which will result int the following output:

ExecutionTime: For DecodeMJPEGtoANativeWindowBuffer is 54ms

Note that this (as all other suggested solutions) will measure the time between when your function was called and when it returned, not neccesarily the time your CPU was executing the function. However, if you don't give the scheduler any change to suspend your running code by calling sleep() or similar there is no difference between.


You can have a simple class which can be used for this kind of measurements.

class duration_printer {
public:
    duration_printer() : __start(std::chrono::high_resolution_clock::now()) {}
    ~duration_printer() {
        using namespace std::chrono;
        high_resolution_clock::time_point end = high_resolution_clock::now();
        duration<double> dur = duration_cast<duration<double>>(end - __start);
        std::cout << dur.count() << " seconds" << std::endl;
    }
private:
    std::chrono::high_resolution_clock::time_point __start;
};

The only thing is needed to do is to create an object in your function at the beginning of that function

void veryLongExecutingFunction() {
    duration_calculator dc;
    for(int i = 0; i < 100000; ++i) std::cout << "Hello world" << std::endl;
}

int main() {
    veryLongExecutingFunction();
    return 0;
}

and that's it. The class can be modified to fit your requirements.


  • It is a very easy to use method in C++11.
  • We can use std::chrono::high_resolution_clock from header
  • We can write a method to print the method execution time in a much readable form.

For example, to find the all the prime numbers between 1 and 100 million, it takes approximately 1 minute and 40 seconds. So the execution time get printed as:

Execution Time: 1 Minutes, 40 Seconds, 715 MicroSeconds, 715000 NanoSeconds

The code is here:

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

typedef high_resolution_clock Clock;
typedef Clock::time_point ClockTime;

void findPrime(long n, string file);
void printExecutionTime(ClockTime start_time, ClockTime end_time);

int main()
{
    long n = long(1E+8);  // N = 100 million

    ClockTime start_time = Clock::now();

    // Write all the prime numbers from 1 to N to the file "prime.txt"
    findPrime(n, "C:\\prime.txt"); 

    ClockTime end_time = Clock::now();

    printExecutionTime(start_time, end_time);
}

void printExecutionTime(ClockTime start_time, ClockTime end_time)
{
    auto execution_time_ns = duration_cast<nanoseconds>(end_time - start_time).count();
    auto execution_time_ms = duration_cast<microseconds>(end_time - start_time).count();
    auto execution_time_sec = duration_cast<seconds>(end_time - start_time).count();
    auto execution_time_min = duration_cast<minutes>(end_time - start_time).count();
    auto execution_time_hour = duration_cast<hours>(end_time - start_time).count();

    cout << "\nExecution Time: ";
    if(execution_time_hour > 0)
    cout << "" << execution_time_hour << " Hours, ";
    if(execution_time_min > 0)
    cout << "" << execution_time_min % 60 << " Minutes, ";
    if(execution_time_sec > 0)
    cout << "" << execution_time_sec % 60 << " Seconds, ";
    if(execution_time_ms > 0)
    cout << "" << execution_time_ms % long(1E+3) << " MicroSeconds, ";
    if(execution_time_ns > 0)
    cout << "" << execution_time_ns % long(1E+6) << " NanoSeconds, ";
}

I recommend using steady_clock which is guarunteed to be monotonic, unlike high_resolution_clock.

#include <iostream>
#include <chrono>

using namespace std;

unsigned int stopwatch()
{
    static auto start_time = chrono::steady_clock::now();

    auto end_time = chrono::steady_clock::now();
    auto delta    = chrono::duration_cast<chrono::microseconds>(end_time - start_time);

    start_time = end_time;

    return delta.count();
}

int main() {
  stopwatch(); //Start stopwatch
  std::cout << "Hello World!\n";
  cout << stopwatch() << endl; //Time to execute last line
  for (int i=0; i<1000000; i++)
      string s = "ASDFAD";
  cout << stopwatch() << endl; //Time to execute for loop
}

Output:

Hello World!
62
163514

Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to optimization

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Measuring execution time of a function in C++ GROUP BY having MAX date How to efficiently remove duplicates from an array without using Set Storing JSON in database vs. having a new column for each key Read file As String How to write a large buffer into a binary file in C++, fast? Is optimisation level -O3 dangerous in g++? Why is processing a sorted array faster than processing an unsorted array? MySQL my.cnf performance tuning recommendations

Examples related to profiling

Measuring execution time of a function in C++ Calculate summary statistics of columns in dataframe How to measure time taken between lines of code in python? How do I measure the execution time of JavaScript code with callbacks? How to find the default JMX port number? Measuring function execution time in R Where is the Query Analyzer in SQL Server Management Studio 2008 R2? W3WP.EXE using 100% CPU - where to start? How to set the maximum memory usage for JVM? What is perm space?