[java] How do I write a correct micro-benchmark in Java?

How do you write (and run) a correct micro-benchmark in Java?

I'm looking for some code samples and comments illustrating various things to think about.

Example: Should the benchmark measure time/iteration or iterations/time, and why?

Related: Is stopwatch benchmarking acceptable?

This question is related to java jvm benchmarking jvm-hotspot microbenchmark

The answer is


If you are trying to compare two algorithms, do at least two benchmarks for each, alternating the order. i.e.:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.


To add to the other excellent advice, I'd also be mindful of the following:

For some CPUs (e.g. Intel Core i5 range with TurboBoost), the temperature (and number of cores currently being used, as well as thier utilisation percent) affects the clock speed. Since CPUs are dynamically clocked, this can affect your results. For example, if you have a single-threaded application, the maximum clock speed (with TurboBoost) is higher than for an application using all cores. This can therefore interfere with comparisons of single and multi-threaded performance on some systems. Bear in mind that the temperature and volatages also affect how long Turbo frequency is maintained.

Perhaps a more fundamentally important aspect that you have direct control over: make sure you're measuring the right thing! For example, if you're using System.nanoTime() to benchmark a particular bit of code, put the calls to the assignment in places that make sense to avoid measuring things which you aren't interested in. For example, don't do:

long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");

Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:

final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");

There are many possible pitfalls for writing micro-benchmarks in Java.

First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.

Second: You cannot trust the accuracy of the measured times for very short intervals.

Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.

My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).


Should the benchmark measure time/iteration or iterations/time, and why?

It depends on what you are trying to test.

If you are interested in latency, use time/iteration and if you are interested in throughput, use iterations/time.


There are many possible pitfalls for writing micro-benchmarks in Java.

First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.

Second: You cannot trust the accuracy of the measured times for very short intervals.

Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.

My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).


Should the benchmark measure time/iteration or iterations/time, and why?

It depends on what you are trying to test.

If you are interested in latency, use time/iteration and if you are interested in throughput, use iterations/time.


I know this question has been marked as answered but I wanted to mention two libraries that help us to write micro benchmarks

Caliper from Google

Getting started tutorials

  1. http://codingjunkie.net/micro-benchmarking-with-caliper/
  2. http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK

Getting started tutorials

  1. Avoiding Benchmarking Pitfalls on the JVM
  2. Using JMH for Java Microbenchmarking
  3. Introduction to JMH

Important things for Java benchmarks are:

  • Warm up the JIT first by running the code several times before timing it
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.


http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.


Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.


It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.

This is because implementation A might be faster during most of the runs of the benchmark than implementation B. But A might also have a higher spread, so the measured performance benefit of A won't be of any significance when compared with B.

So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.


Should the benchmark measure time/iteration or iterations/time, and why?

It depends on what you are trying to test.

If you are interested in latency, use time/iteration and if you are interested in throughput, use iterations/time.


Important things for Java benchmarks are:

  • Warm up the JIT first by running the code several times before timing it
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.


Important things for Java benchmarks are:

  • Warm up the JIT first by running the code several times before timing it
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.


jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth a look.

The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.

Very interesting pieces of information buried in the sample tests comments.

See also:


If you are trying to compare two algorithms, do at least two benchmarks for each, alternating the order. i.e.:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.


Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.


jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth a look.

The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.

Very interesting pieces of information buried in the sample tests comments.

See also:


If you are trying to compare two algorithms, do at least two benchmarks for each, alternating the order. i.e.:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.


I know this question has been marked as answered but I wanted to mention two libraries that help us to write micro benchmarks

Caliper from Google

Getting started tutorials

  1. http://codingjunkie.net/micro-benchmarking-with-caliper/
  2. http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK

Getting started tutorials

  1. Avoiding Benchmarking Pitfalls on the JVM
  2. Using JMH for Java Microbenchmarking
  3. Introduction to JMH

It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.

This is because implementation A might be faster during most of the runs of the benchmark than implementation B. But A might also have a higher spread, so the measured performance benefit of A won't be of any significance when compared with B.

So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.


To add to the other excellent advice, I'd also be mindful of the following:

For some CPUs (e.g. Intel Core i5 range with TurboBoost), the temperature (and number of cores currently being used, as well as thier utilisation percent) affects the clock speed. Since CPUs are dynamically clocked, this can affect your results. For example, if you have a single-threaded application, the maximum clock speed (with TurboBoost) is higher than for an application using all cores. This can therefore interfere with comparisons of single and multi-threaded performance on some systems. Bear in mind that the temperature and volatages also affect how long Turbo frequency is maintained.

Perhaps a more fundamentally important aspect that you have direct control over: make sure you're measuring the right thing! For example, if you're using System.nanoTime() to benchmark a particular bit of code, put the calls to the assignment in places that make sense to avoid measuring things which you aren't interested in. For example, don't do:

long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");

Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:

final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");

If you are trying to compare two algorithms, do at least two benchmarks for each, alternating the order. i.e.:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.


http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to jvm

Cannot inline bytecode built with JVM target 1.8 into bytecode that is being built with JVM target 1.6 How can I get a random number in Kotlin? Kotlin unresolved reference in IntelliJ Is JVM ARGS '-Xms1024m -Xmx2048m' still useful in Java 8? Android Gradle Could not reserve enough space for object heap Android java.exe finished with non-zero exit value 1 Android Studio Gradle project "Unable to start the daemon process /initialization of VM" Android Studio - No JVM Installation found Android Studio error: "Environment variable does not point to a valid JVM installation" Installing Android Studio, does not point to a valid JVM installation error

Examples related to benchmarking

Why is 2 * (i * i) faster than 2 * i * i in Java? ab load testing Why is reading lines from stdin much slower in C++ than Python? Execution time of C program How to use clock() in C++ Clang vs GCC - which produces faster binaries? How to Calculate Execution Time of a Code Snippet in C++ Which is faster: multiple single INSERTs or one multiple-row INSERT? What do 'real', 'user' and 'sys' mean in the output of time(1)? How do I write a correct micro-benchmark in Java?

Examples related to jvm-hotspot

Class JavaLaunchHelper is implemented in both ... libinstrument.dylib. One of the two will be used. Which one is undefined -XX:MaxPermSize with or without -XX:PermSize How do I write a correct micro-benchmark in Java? Real differences between "java -server" and "java -client"?

Examples related to microbenchmark

How do I write a correct micro-benchmark in Java?