Fillumina's Blog

June 8, 2012

How to write a Performance Test in Java

Filed under: java, testing — fillumina @ 2:54 pm

Scenario

Java is an interpreted language and from the advent of JIT virtual machines the byte code is not anymore translated into machine code instruction by instruction but complex algorithms are in place to cache the compiled code and to optimize it on the go. This means that the code execution speed isn’t constant over time but depends on the optimizations the code executor (JVM) implements. This is a big advantage over statically compiled languages because it allows to improve the speed at runtime taking full advantage of live statistics and without any programmer intervention. But this also means that it’s difficult to make a decent estimation of its actual performances as it may vary over time in an unpredictably way.

The first time a piece of byte-code is executed in Java every instruction is translated into native code and executed in place. This means that the first execution is quite slow. But after some time the virtual machine will detect which parts of your code are executed more often and will cache the compiled machine code for future use. It will also try to smartly optimize it according to execution statistics. This process may progress in many steps and this is why the performances may differ quite considerably over the execution time. Remember that this process takes place from scratch at every new execution of the program.

Other things that may affect the accuracy of the measurement are a multitasking OS and the CPU operational frequency throttling. Both these circumstances may actually slow down the execution: one because of concurrent programs, the other because of the throttling of the CPU clock due to overheating.

Performance Test

I would like to suggest a general approach to performance tests that will statistically hide the underlined differences between various JVMs and configurations. The method is based on two observations:

  1. If you repeat a test in a loop often enough all the short disturbing events will eventually statistically fade out and the JVM will have time to completely optimize the code;
  2. If you check the test code against another reference one and then look at the speed ratio instead of the absolute execution speed you will hide many more and far bigger differences. There is also the added benefit that the performance measure is now comparable with other systems as well.

Each code must be executed a specific number of iterations before giving a consistent and stable performance measure. The JVM must optimize it first. There is no way to pre-calculate this number (for some JVM the number of iterations before optimization kicks in is determined but may vary with versions or command line options). The only sensible thing to do is to check the measured values taking samples until they are stable enough.

To improve this technique you can implement some other useful tricks:

  • Use a reference code as close as possible to what you want to optimize (ideally it should be a previous version of the same code). This way both codes will be treated the same by the JVM giving much more accurate results.
  • Because most JIT JVM removes dead branches (code that doesn’t provide any effect), always check the results of your code to be sure your test is actually executed (you can use an assertion).
  • To improve accuracy use the same result verification in the reference code so its time will be removed in the ratio. Do the same with every code you are not interested in. Incidentally the overhead due to the code needed to manage the test is removed as well.
  • Interleave your code iterations with the reference code iterations so to better average the odd of a short disturbance.

And, of course, before even considering doing optimization on your code be sure that your performance tests are accurate and well written and produces consistent and stable results: bad tests always lead to bad conclusions!

A last remark goes to warn you that the research for maximum performances is often a serious illness ;-). It’s like the quest of the Holy Grail. Remember that you should optimize only those very little piece of code that are critical and try to not overdo. The JVM may change its way to optimize the code in the future (it has already happened a lot) and there could be differences between different systems and vendors. Do not sacrifice the clarity and understandability of your code. I have experienced that in an optimization process there are always a couple of changes that can increase your speed by up to 20% or 30% or even more and then a thousand little things that fight to reach 1 or 2%. In my opinion just leave those thousand things alone. Once you have gained your target speed there is generally very little you can do to improve it any better. Take a look at the algorithms chosen because they are the must probable culprit.

Data

The data presented in the following have been collected using different iterations count (10,000, 100,000 and 1,000,000) with 10 samples each so to be able to appreciate the variation of each iteration count. To be comparable the figures are expressed in percentage of the ratio of the execution time of two similar codes. The first few iterations are used by the JVM to optimize the code and they clearly show that the first few thousand executions should not be trusted for a performance assessment. Only after 3 order of magnitude the results are starting to be tolerably stable to be usable (it should be better to go a little bit further). Please note that the data have been collected with a single run of a test (all the tests are executed in sequence).

     10000,      77.96,     100.00    <--- lot of instability here
     10000,      12.77,     100.00         
     10000,      13.43,     100.00         
     10000,      88.37,     100.00
     10000,      76.83,     100.00    <--- the JVM is optimizing here
     10000,     100.00,      60.45    <--- more stable and very different!
     10000,     100.00,      60.76
     10000,     100.00,      60.31
     10000,     100.00,      60.82
     10000,     100.00,      61.58
    100000,     100.00,      70.18
    100000,     100.00,      71.01
    100000,     100.00,      71.29
    100000,     100.00,      61.74
    100000,     100.00,      70.17
    100000,     100.00,      68.82
    100000,     100.00,      57.16
    100000,     100.00,      58.97
    100000,     100.00,      49.20
    100000,     100.00,      55.47
   1000000,     100.00,      61.44    <--- values are stabilizing
   1000000,     100.00,      63.22
   1000000,     100.00,      59.71
   1000000,     100.00,      58.99
   1000000,     100.00,      53.26
   1000000,     100.00,      52.35
   1000000,     100.00,      52.68
   1000000,     100.00,      52.73
   1000000,     100.00,      52.80
   1000000,     100.00,      51.05
  10000000,     100.00,      52.86    <--- here the variance is very low
  10000000,     100.00,      53.40 
  10000000,     100.00,      54.15
  10000000,     100.00,      53.38
  10000000,     100.00,      54.64
  10000000,     100.00,      54.19
  10000000,     100.00,      54.47
  10000000,     100.00,      53.85
  10000000,     100.00,      54.45
  10000000,     100.00,      53.98

Graph with the spread of the performance samples retreived with different iterations count

Graph with the spread of the performance samples retreived with different iterations count.

One interesting fact emerges from the data: an unsophisticated code tends to perform better within the first few thousand iterations. If your code is not meant to be executed very often (it’s not in some critical part or in a tight loop) it is probably better to leave it as simple as possible. In this context simple means to avoid complicated caching mechanism (that requires time to setup), lazy initializations and any prolixity (remember that every instruction has to be translated and executed so the less they are the better). On the other side, if a non critical code is slower because highly optimized your program would not probably be statistically impacted anyway.

Code

I propose here a little class that helps me doing performance tests:

public class PerformanceTimer implements Serializable {
    private static final long serialVersionUID = 1L;
    public static final int FRACTIONS = 100;

    private Map<String, Long> timeMap = new LinkedHashMap<>();
    private Map<String, Runnable> executors = new LinkedHashMap<>();
    private long executions;

    public PerformanceTimer ignore(final String msg, final Runnable executor) {
        return this;
    }

    public PerformanceTimer add(final String msg, final Runnable executor) {
        executors.put(msg, executor);
        return this;
    }

    public PerformanceTimer execute(final long times) {
        assertTimesNotNegative(times);
        executions += times;
        final long fraction = times / FRACTIONS;

        // interleaves the tests
        for (int f=0; f<FRACTIONS; f++) {
            for (Map.Entry<String, Runnable> entry: executors.entrySet()) {
                final String msg = entry.getKey();
                final Runnable runnable = entry.getValue();
                final long time = System.nanoTime();
                for (int t=0; t<fraction; t++) {
                    runnable.run();
                }
                add(msg, System.nanoTime() - time);
            }
        }
        return this;
    }

    public void printSerie(final int baseTimes,
            final int maximumMagnitude,
            final int samplePerMagnitude) {
        int index = 0;
        for (int i=0; i< maximumMagnitude; i++) {
            for (int j=0; j<samplePerMagnitude; j++) {
                execute(Math.round(baseTimes * Math.pow(10, i)));
                final String idxStr = String.format("%5d, ", index);
                System.out.println(idxStr + toCommaSeparatedString());
                reset();
                index++;
            }
        }
    }

    private void reset() {
        timeMap.clear();
        executions = 0;
    }

    private void add(final String msg, final long elapsed) {
        Long time = timeMap.get(msg);
        if (time == null) {
            timeMap.put(msg, elapsed);
        } else {
            timeMap.put(msg, time + elapsed);
        }
    }

    private void assertTimesNotNegative(final long times) {
        if (times < 0) {
            throw new IllegalArgumentException(
                    "Cannot manage negative numbers: " + times);
        }
    }

    public String toCommaSeparatedString() {
        StringBuilder buf = new StringBuilder();
        buf.append(String.format("%15d", executions));
        final long slowest = getSlowest();
        for (Map.Entry<String, Long> entry: timeMap.entrySet()) {
            final double percentageValue = entry.getValue() * 1.0D / slowest * 100;
            buf.append(String.format(", %10.2f", percentageValue));
        }
        return buf.toString();
    }

    public String toString(final String message) {
        return message + ":\n" + toString();
    }

    @Override
    public String toString() {
        final StringBuilder buf = new StringBuilder();
        final long slowest = getSlowest();
        final int longer = getLongerMessage();
        for (Map.Entry<String, Long> entry: timeMap.entrySet()) {
            final double elapsed = toMillis(entry.getValue());
            final String speed = String.format("%10.2f ms", elapsed);
            final double percentageValue = entry.getValue() * 1.0D / slowest * 100;
            final String percentage = String.format("%10.2f %%", percentageValue);
            buf.append(equilizeLength(entry.getKey(), longer))
                    .append(" :")
                    .append(speed)
                    .append("\t")
                    .append(percentage)
                    .append("\n");
        }
        return buf.toString();
    }

    private double toMillis(final long nano) {
        return nano/ 1_000_000.0D;
    }

    private long getSlowest() {
        long slowest = 0;
        for (Long l: timeMap.values()) {
            if (l > slowest) {
                slowest = l;
            }
        }
        return slowest;
    }

    private int getLongerMessage() {
        int longer = 0;
        for (String msg: timeMap.keySet()) {
            final int length = msg.length();
            if (length > longer) {
                longer = length;
            }
        }
        return longer;
    }

    private String equilizeLength(final String str, final int len) {
        final int length = str.length();
        if (length < len) {
            final char[] carray = new char[len - length];
            Arrays.fill(carray, ' ');
            return str + new String(carray);
        }
        return str;
    }
}

This class has evolved into a full blown open sourced API named PerformanceTools that can be downloaded freely from GitHub.

Reference

2 Comments »

  1. I completely agree on the idea that a viable way to measure performances is to have a reference code and to measure the related performance ratio. Indeed, I tried using a couple of libraries for performance test: p-unit ( http://p-unit.sourceforge.net/ ) and JPerf ( http://sourceforge.net/projects/jperf/ ) but it seems to me that both suffer the same problem, they measure the elapsed time using the system clock, making the same test give different result on different machines.

    Comment by Daniele Demichelis (@escher75) — June 20, 2012 @ 8:08 pm

  2. […] you’re interested I would point you to the very detailed Francesco’s post about the […]

    Pingback by Performance Automated Test as Comparison Test | Daniele Demichelis — June 20, 2012 @ 8:35 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: