Fillumina's Blog

February 2, 2013

Sorry guys

Filed under: Uncategorized — fillumina @ 6:33 pm

It’s too long since the last time I’ve logged in! Sorry guys, now I am back.

June 16, 2012

Override Throwable.getMessage() to speed up Exceptions

Filed under: java, performance — fillumina @ 11:30 am

Exceptions are a very useful and practical mechanism to manage exceptional events in your application and not only a way to report meaningful errors back to developers. The problem about this dual nature is that they should provide a meaningful message and be fast to generate. Unfortunately building an error message can be time consuming but I really would not leave that out (on the contrary they should really be as helpful as possible). The solution is to override the Throwable#getMessage() method:

                throw new BeanToolsException(ex) {
                    private static final long serialVersionUID = 1L;

                    @Override
                    public String getMessage() {
                        return "Method " +
                        getterMethod.getName() + "() is defined on " +
                        declaringClass.getCanonicalName() + " but is called on " +
                        (bean == null ? "null" : bean.getClass().getCanonicalName());
                    }
                };

By this way the error message is built only if it is required leaving the Exception creation fast to be used in your code.

June 9, 2012

Performance comparison between inheritance and composition in Java

Filed under: java, performance, testing — fillumina @ 3:12 pm

After spending some time writing and researching about the best way to estimate the performance of an algorithm in Java I have started to use my newly acquired knowledge to research over some long running doubts about Java programming: what is faster inheritance or composition?.

To research that I have written a simple test that uses the PerformanceTimer class described in How to write a Performance Test in Java.

public class InheritanceAgainstCompositionPerformanceApp {

    private static abstract class AbstractInheritableClass {

        public abstract int multiply(int a, int b);

        public int doOperation(int a, int b) {
            return multiply(a, b);
        }
    }

    private static class ExtendingMultiplier extends AbstractInheritableClass {

        @Override
        public int multiply(int a, int b) {
            return a * b;
        }
    }

    private static class StandAloneMultiplier {

        public int multiply(int a, int b) {
            return a * b;
        }
    }

    private static class ComposedClass {
        private final static StandAloneMultiplier multiplier = new StandAloneMultiplier();

        public int doOperation(int a, int b) {
            return multiplier.multiply(a, b);
        }
    }


    public static void main(final String[] args) {
        new InheritanceAgainstCompositionPerformanceApp().test();
    }

    public void test() {
        final PerformanceTimer pt = new PerformanceTimer();

        pt.add("inheritance", new Runnable() {
            private ExtendingMultiplier em = new ExtendingMultiplier();

            @Override
            public void run() {
                assertEquals(12, em.doOperation(4, 3));
            }
        });

        pt.add("composition", new Runnable() {
            private ComposedClass cc = new ComposedClass();

            @Override
            public void run() {
                assertEquals(12, cc.doOperation(4, 3));
            }
        });

        pt.printSerie(10_000, 6, 10);

    }
}

And here is the result data where:

  1. The first column is the iteration number;
  2. number of iterations in that particular loop;
  3. the speed ratio of the two algorithms expressed as a percentage over the slower one.

One percentage is always 100% and the other represents how much faster the algorithm is in respect to the slower. i.e. if A runs in 80ms and B runs in 100ms than B will be 100% and A 80%.

    0,           10000,     100.00,      46.28
    1,           10000,     100.00,      99.71
    2,           10000,      97.75,     100.00
    3,           10000,      97.61,     100.00
    4,           10000,     100.00,      99.84
    5,           10000,      98.25,     100.00
    6,           10000,     100.00,      98.74
    7,           10000,      86.82,     100.00
    8,           10000,     100.00,      95.96
    9,           10000,     100.00,      97.49
   10,          100000,      96.69,     100.00
   11,          100000,      94.12,     100.00
   12,          100000,     100.00,      84.98
   13,          100000,     100.00,      97.43
   14,          100000,     100.00,      89.68
   15,          100000,     100.00,      89.00
   16,          100000,     100.00,      92.28
   17,          100000,     100.00,      91.74
   18,          100000,     100.00,      97.10
   19,          100000,     100.00,      91.81
   20,         1000000,     100.00,      67.01
   21,         1000000,     100.00,      66.25
   22,         1000000,     100.00,      67.41
   23,         1000000,     100.00,      66.28
   24,         1000000,     100.00,      66.17
   25,         1000000,     100.00,      68.83
   26,         1000000,     100.00,      64.92
   27,         1000000,     100.00,      65.66
   28,         1000000,     100.00,      68.19
   29,         1000000,     100.00,      68.21
   30,        10000000,     100.00,      68.19
   31,        10000000,     100.00,      67.48
   32,        10000000,     100.00,      66.39
   33,        10000000,     100.00,      67.40
   34,        10000000,     100.00,      65.54
   35,        10000000,     100.00,      68.32
   36,        10000000,     100.00,      66.84
   37,        10000000,     100.00,      65.37
   38,        10000000,     100.00,      67.56
   39,        10000000,     100.00,      66.30
   40,       100000000,     100.00,      66.62
   41,       100000000,     100.00,      66.79
   42,       100000000,     100.00,      66.46
   43,       100000000,     100.00,      66.45
   44,       100000000,     100.00,      67.76
   45,       100000000,     100.00,      68.10
   46,       100000000,     100.00,      67.13
   47,       100000000,     100.00,      67.89
   48,       100000000,     100.00,      66.35
   49,       100000000,     100.00,      66.64
   50,      1000000000,     100.00,      66.93
   51,      1000000000,     100.00,      66.54
   52,      1000000000,     100.00,      67.55
   53,      1000000000,     100.00,      67.20
   54,      1000000000,     100.00,      66.51
   55,      1000000000,     100.00,      66.92
   56,      1000000000,     100.00,      66.92
   57,      1000000000,     100.00,      66.56
   58,      1000000000,     100.00,      66.99
   59,      1000000000,     100.00,      67.29

It seems that after about 300 K iterations the performances start to stabilize ending with composition being 67% faster than inheritance on the long run. That makes sense because an inheriting class not only must have a reference to the base class but should also maintain a reference table to manage virtual methods (those that can be overwritten and may be redirected). Please note that even if composition is faster and it’s actually recommended for most uses there are cases where inheritance makes much more sense and if you are not pressed by extreme performances you should consider using it. For a more detailed analysis of the two methods see Comparison between composition and template pattern.

June 8, 2012

How to write a Performance Test in Java

Filed under: java, testing — fillumina @ 2:54 pm

Scenario

Java is an interpreted language, that means that the code execution speed isn’t constant over time but depends on the optimization the code executor (JVM) implements. This is a big advantage over statically compiled languages because it helps improving the code speed at runtime with the full advantage of live statistics and without any programmer intervention. But that also means that it’s difficult to make a decent estimation of its actual performances. As with any optimization process there is much use of statistics in there and the problem is delicate.

The first time a piece of code is executed in Java every instruction is translated into native code and executed in place. This means that the first execution is quite slow. But after some iterations the virtual machine will detect which parts of your code are executed more often and are critical to your program and will cache the produced machine code for future use. It will also try to smartly optimize it according to execution statistics. This process may progress in many steps during the execution and this is why the performances may differ quite considerably over the execution time. This process takes place at every run of the program.

Other things that may affect the accuracy of the measurement are a multitasking OS and the CPU operational frequency throttling. Both these circumstances may actually slow down the execution: one because of a concurrent program, the other because of the throttling down of the CPU clock due to overheating.

Performance Test

I would like to suggest a general approach to performance tests that will statistically hide the underline differences between various JVMs and configurations. The method is based on two observations:

  1. If you repeat a test in a loop often enough all the short disturbing events will eventually statistically fade out;
  2. If you check your code against another reference code which you will not change during the optimization process and then check the speed ratio instead of the absolute speed you will hide the longer disturbing events (such as the CPU throttling).

To choose an appropriate number of iterations just increment the value until the ratio stabilizes.

To improve this technique you can implement some other useful tricks:

  • Use a reference code as close as possible to what you want to optimize (ideally it should be a previous version of the same code). This way both codes will be treated the same by the JVM giving much more accurate results.
  • Because the JVM removes dead branches (code that doesn’t provide any effect), always check the results of your code to be sure your test is actually executed (you can use an assertion).
  • To improve the accuracy use the same result verification in the reference code so its time will be removed in the ratio. Do the same with every code you are not interested in.
  • Interleave your code iterations with the reference code iterations so to average the odd of a disturbance.

And, of course, before even considering doing optimization on your code be sure that your performance tests are accurate and well written and produces consistent and stable results.

A last remark goes to warn you that the research for maximum performances is often a serious illness ;-) . It’s like the research for the Holy Grail. Remember that you should optimize only those very little piece of code that are critical. And try to not overdo. The JVM may change its way to optimize the code in the future (it has already happened a lot) and there could be differences between different systems and vendors. And do not sacrifice the clarity and understandability of your code. I have experienced that in an optimization process there are always a couple of changes that can increase your speed by up to 20 or 30% or even more and then a thousand little things that fight to reach 1 or 2%. In my opinion just leave those thousand things alone. Once you have gained your target speed there is generally very little you can do to improve it any better.

Data

The data presented in the following have been collected using different iterations count (10,000, 100,000 and 1,000,000) with 10 samples each so to be able to appreciate the variation of each iteration count. To be comparable the figures are expressed in percentage of the ratio of the execution time of two similar codes (BeanUtils.getProperty() and BeanTools.get()). The first few iterations are used by the JVM to optimize the code and they clearly show that the first few thousand executions should not be trusted for a performance assessment. Only after 3 order of magnitude the results are starting to be tolerably stable to be usable (should be better to go a little bit further). Please note that the data have been collected with a single run of a test (all the tests are executed in succession, that’s why the strange behavior of the first tests).

     10000,      77.96,     100.00    <--- lot of instability here
     10000,      12.77,     100.00         
     10000,      13.43,     100.00         
     10000,      88.37,     100.00
     10000,      76.83,     100.00    <--- the JVM is optimizing here
     10000,     100.00,      60.45    <--- more stable and very different!
     10000,     100.00,      60.76
     10000,     100.00,      60.31
     10000,     100.00,      60.82
     10000,     100.00,      61.58
    100000,     100.00,      70.18
    100000,     100.00,      71.01
    100000,     100.00,      71.29
    100000,     100.00,      61.74
    100000,     100.00,      70.17
    100000,     100.00,      68.82
    100000,     100.00,      57.16
    100000,     100.00,      58.97
    100000,     100.00,      49.20
    100000,     100.00,      55.47
   1000000,     100.00,      61.44    <--- values are stabilizing
   1000000,     100.00,      63.22
   1000000,     100.00,      59.71
   1000000,     100.00,      58.99
   1000000,     100.00,      53.26
   1000000,     100.00,      52.35
   1000000,     100.00,      52.68
   1000000,     100.00,      52.73
   1000000,     100.00,      52.80
   1000000,     100.00,      51.05
  10000000,     100.00,      52.86    <--- here the variance is very low
  10000000,     100.00,      53.40 
  10000000,     100.00,      54.15
  10000000,     100.00,      53.38
  10000000,     100.00,      54.64
  10000000,     100.00,      54.19
  10000000,     100.00,      54.47
  10000000,     100.00,      53.85
  10000000,     100.00,      54.45
  10000000,     100.00,      53.98

Performance retreived with different iterations count

One interesting fact emerges from the data: an unsophisticated code tends to perform better within the first few thousand iterations. If your code is not meant to be executed very often (it’s not in some critical part or in a tight loop) it is probably better to leave it as simple as possible. In this context simple means to avoid complicated caching mechanism (that requires time to setup), lazy initializations and any complexity (remember that every instruction has to be translated and executed so the less they are the better). On the other side, if a non critical code is slower because highly optimized for many iterations your program would not probably be impacted anyway.

Code

I propose here a little class that helps me doing performance tests:

public class PerformanceTimer implements Serializable {
    private static final long serialVersionUID = 1L;
    public static final int FRACTIONS = 100;

    private Map<String, Long> timeMap = new LinkedHashMap<>();
    private Map<String, Runnable> executors = new LinkedHashMap<>();
    private long executions;

    public PerformanceTimer ignore(final String msg, final Runnable executor) {
        return this;
    }

    public PerformanceTimer add(final String msg, final Runnable executor) {
        executors.put(msg, executor);
        return this;
    }

    public PerformanceTimer execute(final long times) {
        assertTimesNotNegative(times);
        executions += times;
        final long fraction = times / FRACTIONS;

        // interleaves the tests
        for (int f=0; f<FRACTIONS; f++) {
            for (Map.Entry<String, Runnable> entry: executors.entrySet()) {
                final String msg = entry.getKey();
                final Runnable runnable = entry.getValue();
                final long time = System.nanoTime();
                for (int t=0; t<fraction; t++) {
                    runnable.run();
                }
                add(msg, System.nanoTime() - time);
            }
        }
        return this;
    }

    public void printSerie(final int baseTimes,
            final int maximumMagnitude,
            final int samplePerMagnitude) {
        int index = 0;
        for (int i=0; i< maximumMagnitude; i++) {
            for (int j=0; j<samplePerMagnitude; j++) {
                execute(Math.round(baseTimes * Math.pow(10, i)));
                final String idxStr = String.format("%5d, ", index);
                System.out.println(idxStr + toCommaSeparatedString());
                reset();
                index++;
            }
        }
    }

    private void reset() {
        timeMap.clear();
        executions = 0;
    }

    private void add(final String msg, final long elapsed) {
        Long time = timeMap.get(msg);
        if (time == null) {
            timeMap.put(msg, elapsed);
        } else {
            timeMap.put(msg, time + elapsed);
        }
    }

    private void assertTimesNotNegative(final long times) {
        if (times < 0) {
            throw new IllegalArgumentException(
                    "Cannot manage negative numbers: " + times);
        }
    }

    public String toCommaSeparatedString() {
        StringBuilder buf = new StringBuilder();
        buf.append(String.format("%15d", executions));
        final long slowest = getSlowest();
        for (Map.Entry<String, Long> entry: timeMap.entrySet()) {
            final double percentageValue = entry.getValue() * 1.0D / slowest * 100;
            buf.append(String.format(", %10.2f", percentageValue));
        }
        return buf.toString();
    }

    public String toString(final String message) {
        return message + ":\n" + toString();
    }

    @Override
    public String toString() {
        final StringBuilder buf = new StringBuilder();
        final long slowest = getSlowest();
        final int longer = getLongerMessage();
        for (Map.Entry<String, Long> entry: timeMap.entrySet()) {
            final double elapsed = toMillis(entry.getValue());
            final String speed = String.format("%10.2f ms", elapsed);
            final double percentageValue = entry.getValue() * 1.0D / slowest * 100;
            final String percentage = String.format("%10.2f %%", percentageValue);
            buf.append(equilizeLength(entry.getKey(), longer))
                    .append(" :")
                    .append(speed)
                    .append("\t")
                    .append(percentage)
                    .append("\n");
        }
        return buf.toString();
    }

    private double toMillis(final long nano) {
        return nano/ 1_000_000.0D;
    }

    private long getSlowest() {
        long slowest = 0;
        for (Long l: timeMap.values()) {
            if (l > slowest) {
                slowest = l;
            }
        }
        return slowest;
    }

    private int getLongerMessage() {
        int longer = 0;
        for (String msg: timeMap.keySet()) {
            final int length = msg.length();
            if (length > longer) {
                longer = length;
            }
        }
        return longer;
    }

    private String equilizeLength(final String str, final int len) {
        final int length = str.length();
        if (length < len) {
            final char[] carray = new char[len - length];
            Arrays.fill(carray, ' ');
            return str + new String(carray);
        }
        return str;
    }
}

Reference

May 14, 2012

Taking advantage of exceptions

Filed under: java, theory — fillumina @ 5:17 pm

Apart from being used to pass the parcel to upper layers or to stop the execution of some code in case of unrecoverable conditions, Exceptions are quite avoided by most programmers today. It seems that what was their main focus point, to evidence the code to execute on special conditions from the default one, is now becoming disturbing and cluttering. Instead of catching an Exception is now common to employ a conditional check to switch between different code paths. Those two code paths are not clearly distinguishable anymore.
Exceptions exists exactly for this reason: they just say to do an action but then, if a condition verifies, to do some other stuff instead. Their presence is basically a semantic suggestion and I love when semantic emerges clearly from a bunch of flat calculations.
And there is even more because Exceptions can be passed up to the method callers in search of someone actually able to solve the problem!
Let’s look at some common use cases:

Index out of boundaries

This is a code that manages the add operation on an ArrayList (not the actual java.utils.ArrayList) written according to the usual way of putting a conditional check first.

    public CharBuilder append(final char c) {
        if (index < array.length) {
            array[index] = c;
        } else {
            final int length = array.length;
            final char[] tmp = new char[length << 1];
            System.arraycopy(array, 0, tmp, 0, length);
            array = tmp;
            array[index] = c;
        }
        index++;
        return this;
    }

Taking advantage of Exceptions the check can be avoided.

    public CharBuilder add(final String str) {
        try {
            array[index++] = str;
        } catch (ArrayIndexOutOfBoundsException e) {            
            final int length = array.length;
            final String[] tmp = new String[length << 1];
            System.arraycopy(array, 0, tmp, 0, length);
            array = tmp;
            array[index - 1] = str;
        }
        return this;
    }

The exception gives away quite clearly the intent of the code that follows and you are not required to write a check that is already made by Java every time it accesses an array. So you gain a little more safety, because you cannot miss an erroneous condition by mistake, and the code is clearer. Their execution speed is the same.

Equality check

Another example concerns a common idiom for equality check:

        @Override
        public boolean equals(final Object obj) {
            if (obj == null) {
                return false;
            }
            if (getClass() != obj.getClass()) {
                return false;
            }
            final CharArray other = (CharArray) obj;
            if (!Arrays.equals(this.array, other.array)) {
                return false;
            }
            return true;
        }

If the given object is of a different Class or it’s null an exception is raised so there is no need to check for that:

        @Override
        public boolean equals(final Object obj) {
            try {
                final CharArray other = (CharArray) obj;
                if (!Arrays.equals(this.array, other.array)) {
                    return false;
                }
                return true;
            } catch (ClassCastException | NullPointerException ex) {
                assert !(obj instanceof CharArray);
                return false;
            }
        }

In case of multiple exceptions or just to be sure to catch the right condition using the assertion may help (assertions are checked by default during tests and ignored during normal program execution to save speed). Again we gain in safety and clarity of intent while the performances remain the same.

Null check

To access a nested property in Java is always a bit frustrating because of the many checks you have to do.

        if (obj != null && 
                obj.getProperty() != null && 
                obj.getProperty().getSecondProperty() != null) {
            c = obj.getProperty().getSecondProperty().toString();
        } else {
            c = null;
        }

Using exceptions the code improves considerably.

        try {
            c = obj.getProperty().getSecondProperty().toString();
        } catch (NullPointerException e) {
            c = null;
        }

Conclusions

Exceptions are often misused or neglected in Java but they are a powerful mechanism to manage code paths reserved to exceptional events.

  • They offer more safety over conditional checks because they rely on checks made by other code (or the JVM);
  • The default action is clearly stated against exceptional ones;
  • The management of the exceptional event can escalate to upper levels if needed;
  • There is no loss in speed whatsoever.

EDIT: a mistake in a test made me think that using Exceptions was statistically faster than to check a condition. However the corrected test showed that the two mechanisms have the same execution speed so what to choose remains basically a matter of style. This article was heavily based on the result of that mistake and I proposed various techniques that should have improved the performances of some common code idioms. I have rewritten it completely based on this new evidence. I am sorry for having been inaccurate.

April 13, 2012

The N Queens Puzzle

Filed under: java, theory — fillumina @ 3:43 pm

This is my proposed solution to the N Queens Puzzle proposed by DZone.

I think that there are 2 possible strategies to solve the problem:

  1. to generate every possible combinations of queens on the board and check them for validity
  2. to generate the solutions step by step so being able to reject a solution early if it is not valid

The second algorithm is obviously the best because it makes it possible to reject a solution early on the process sparing to check a vast amount of combinations. It employs the building of a decision tree that proceeds adding a new queen on the board only if the preceding queens are in a valid configuration.

A decision tree algorithm can be parallelized pretty well (each new branch can be assigned to a different processor) so I decided to try the new Java 7 Fork-Join framework (see Fork-Join article by Ricardo Zuastri).

I had to rewrite the List so to avoid having to remove anything from it or being forced to duplicate it. My solution is a reversed linked list where each successive element points back to its parent. It is both fast and memory efficient for our purpose (if you don’t mind reading the list backwards).

The performances are really impacted by the outputting of data (using System.out.println()), so, commenting out the lines that actually performs the printing it seems that the parallel algorithm employs 60% of the time spent by the sinlge threaded algorithm so gaining a significant advantage (my test machine is equipped with 4 processors).


public class Solution {
    private AtomicInteger solutionNumber = new AtomicInteger(0);
    private Queue<String> output = new ConcurrentLinkedQueue<>();

    public static Solution findSolutions(final ForkJoinPool pool) {
        final Solution solution = new Solution();
        pool.invoke(new Step(solution));
        return solution;
    }

    public void addSolution(final Sequence sequence) {
        solutionNumber.incrementAndGet();
        // in a multithreaded environment System.out.println() blocks
        // so it must be avoided
        output.add(sequence.toString());
    }

    public int getSolutionNumber() {
        return solutionNumber.get();
    }

    public String getOutput() {
        final StringBuilder buf = new StringBuilder();
        for (String str: output) {
            buf.append(str).append('\n');
        }
        return buf.toString();
    }

    public static void main(String[] args) {
        Solution solution = findSolutions(new ForkJoinPool());
        System.out.println(solution.getOutput());
        System.out.println("found " + solution.getSolutionNumber() + " solutions.");
    }
}

 


 public class Step extends RecursiveAction {
    private static final long serialVersionUID = 1L;
    private final Sequence sequence;
    private final Solution solution;

    public Step(final Solution solution) {
        this(solution, new Sequence());
    }

    private Step(final Solution solution, final Sequence sequence) {
        this.solution = solution;
        this.sequence = sequence;
    }

    @Override
    protected void compute() {
        final int column = sequence.size();
        if (column == 8) {
            solution.addSolution(sequence);
        } else {
            final List<Step> steps = new ArrayList<>(8);
            for (int row = 0; row < 8; row++) {
                final Queen queen = new Queen(row, column);
                if (sequence.isFriendlyWith(queen)) {
                    steps.add(new Step(solution, sequence.add(queen)));
                }
            }
            invokeAll(steps);
        }
    }

}

 


 /**
 * It's a reversed linked list with the characteristic that a new element
 * is always added to an existing list so avoiding having to copy the list.
 *
 * @author fra
 */
public class Sequence implements Iterable<Queen> {
    private final int size;
    private final Entry head;

    private static class Entry {
        final private Queen queen;
        final private Entry next;

        public Entry(final Queen queen, final Entry next) {
            this.queen = queen;
            this.next = next;
        }
    }

    public Sequence() {
        this(null, 0);
    }

    private Sequence(final Entry entry, final int size) {
        this.head = entry;
        this.size = size;
    }

    /** Returns a new list with the new element on top */
    public Sequence add(final Queen queen) {
        final Entry entry = new Entry(queen, head);
        return new Sequence(entry, size + 1);
    }

    public boolean isFriendlyWith(final Queen queen) {
        for (final Queen q: this) {
            if (q.attack(queen)) {
                return false;
            }
        }
        return true;
    }

    public int size() {
        return size;
    }

    @Override
    public String toString() {
        final StringBuilder buf = new StringBuilder();
        for (final Queen q: this) {
            buf.append(q).append(' ');
        }
        return buf.toString();
    }

    /** The list is iterated in reverse order */
    @Override
    public Iterator<Queen> iterator() {
        return new Iterator<Queen>() {
            private Entry current = new Entry(null, head);

            @Override
            public boolean hasNext() {
                return current != null && current.next != null;
            }

            @Override
            public Queen next() {
                current = current.next;
                return current.queen;
            }

            @Override
            public void remove() {
                throw new UnsupportedOperationException("Not supported yet.");
            }
        };
    }

}

 


 public class Queen {
    private final int row, column, diagonalLeft, diagonalRight;

    public Queen(final int row, final int column) {
        this.row = row;
        this.column = column;
        this.diagonalLeft = row + column;
        this.diagonalRight = row - column;
    }

    public boolean attack(final Queen queen) {
        return this.row == queen.row ||
                this.column == queen.column ||
                this.diagonalLeft == queen.diagonalLeft ||
                this.diagonalRight == queen.diagonalRight;
    }

    public int getColumn() {
        return column;
    }

    public int getRow() {
        return row;
    }

    @Override
    public String toString() {
        return "" + ((char)(row + '1')) + ((char)(column + 'a'));
    }
}

April 12, 2012

LessCSS Converter

Filed under: CSS, java, LessCSS — fillumina @ 10:26 pm

LessCSS is a very interesting project but I didn’t find any converter that could help me to do some tests. Yes there is a nice script here but it’s not very practical to use if you just want to do some practice. So I just did one. It is made in java and there is a little installer. The installer is actually very basic and just copies the needed jar files into a directory of your choice. If you have the latest Java SE installed (if not just go here ) you should be able to run the program by clicking over the LessCssConverter.jar file. The code is open sourced and can be watched, downloaded and contributed on GitHub: library and GUI. Please drop me a comment if you find it useful, if you have suggestions for improvements or if you have problems running it.

March 30, 2012

Using UUID to help entities’ equals() and hashCode()

Filed under: Uncategorized — fillumina @ 3:01 pm

Using ID in equals() and hashCode() is a bad idea: if the id is created by the database it is not available until the entity is actually saved on the db and this would make the equals() and hashCode()’s implementation fail. Of course the id may be imposed by the user but this can be done at the expense of performances because databases’ indexes are optimized to use autogenerated ids (to avoid having to check for uniqueness). Hibernate documentation itself recommend to avoid this idiom.

On the other hand, equals() and hashCode() are essential for the correct working of hibernate and it is required that hashCode() must stay stable. That means that hashCode() must not be calculated over mutable fields but using only a natural immutable key. Most entities does not have this key available (in fact there are very few objects that have such a candidate) and we cannot use the id so there is another chance for our beloved UUID to stand up!

Usage of UUID as PK is discouraged because it is composed of highly random data which works very badly with balanced trees that are the ways indexes are built in a database. In practice the UUID’s randomness hurst key locality dispersing the IDs too much over the index. This doesn’t count for small tables where the index is all in memory but has a heavy impact when the tables grow. So using UUID as PK means a severe loss in performances (some calculated even 200 times slower) and, because it’s longer than the 8 bytes used by a long id, it means also that all FK would also be longer increasing considerably the database’s memory and disk footprint.

But the fact that we cannot use UUID as PK but we still need it as an equals() and hashCode() candidate lend to the fact that we can simple use UUID as a stable field in our entity. This comes at a 16 bytes cost, which is not that much for a simple field and helps really a lot in the entity’s management! As a bonus we have an easier path for database’s extraction and merge (it’s not that straightforward as it would be having UUID as PK, but still much easier than not having it at all).

TL;DR use a long as ID and calculate equals() and hashCode() using an UUID field.

I know this sounds like a “catch it all” to someĀ  and maybe an overkill to many, I don’t really intend to use it unless a natural stable key doesn’t present itself (which is rare I admit). So if you have comments I am glad to hear them! Anyway I’m implementing it in my current project so I would probably be back in a short time with some useful heads-up.

References

For not using ID in equals() and hashCode():

For not using UUID as primay key:

About equals() and hashCode(): use not autogenerated IDs:

March 26, 2012

Correct use of equals() and hashCode() with hibernate

Filed under: Uncategorized — fillumina @ 4:32 pm

Hibernate (and ehcache) doesn’t work very well with the standard equals() and hashCode() implementations and there are some facts you should be aware of.

The problem is that when hibernate returns a managed entity most of the times it is not the real entity but a proxy. It may be a java proxy or a managed bytecode (see. CGLib or JavaAssist) but the fact is that the returned object is not really the one you expect (it even doesn’t have the same class). For most things this doesn’t really matter (that’s the beauty of it), but sometimes problems arise. For example when you need to compare a managed entity with one you have just created. If you use this standard equals() idiom:

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (<getClass() != obj.getClass()) {             // 1
            return false;
        }
        final NamedMedia other = (NamedMedia) obj;
        if (!Objects.equals(this.name, other.name)) {   // 2
            return false;
        }
    }

you may find yourself in trouble:

  1. The comparison getClass() != obj.getClass() would never let go a proxy (which is of a different class) so your entities will be different even if your implementation would require they are not!
  2. The comparison Objects.equals(this.name, other.name) could not work because sometimes the managed object cannot access private fields (they usually proxy accessors) and the field is to be lazily loaded (but it is not since the proxy, that should load it, is not aware that you want to access it)

The solution is:

  • Compare classes compatibility instead of equality: getClass().isAssignableFrom(obj.getClass())
  • Use accessors for comparisons (and hashCode()): Objects.equals(this.getName(), other.getName())

This is the final code:

    @Override
    public boolean equals(final Object obj) {
        if (obj == null) {
            return false;
        }
        if (!getClass().isAssignableFrom(obj.getClass()) ||
                obj.getClass().isAssignableFrom(getClass())) {
            return false;
        }
        final NamedMedia other = (NamedMedia) obj;
        if (!Objects.equals(this.getName(), other.getName())) {
            return false;
        }
    }
Note

Hibernate uses hashCode() internally to maintain its hash based collections and caches, and it is fundamental for hashCode() to be stable. That means that each field used for the hashCode() calculation must be immutable! This is quite a requirement and that’s why many users fell into the temptation of using the ID.

March 21, 2012

Check an anagram in O(n)

Filed under: java, theory — fillumina @ 2:50 pm

I just report here my solution for the anagram algorithm proposed in the Java Anagram Buster.

public class Prime {

    public static int[] calculateFirst(final int number) {
        final int[] primes = new int[number];
        int counter = 1, index = 0;
        boolean isPrime;
        while (index < number) {
            counter++;
            isPrime = true;
            for (int i=0; i<index; i++) {
                if (counter % primes[i] == 0) {
                    isPrime = false;
                    break;
                }
            }
            if (isPrime) {
                primes[index] = counter;
                index++;
            }
        }
        return primes;
    }
}
public class AlphabetIndexer {
    public final static int ALPHABET_LENGTH = 'Z' - 'A' + 1;

    public static int getIndex(final char c) {
        int value = c - 'A';
        if (value < 0 || value > ALPHABET_LENGTH) {
            value = c - 'a';
            if (value < 0 || value > ALPHABET_LENGTH) {
                return -1;
            }
        }
        return value;
    }
}
public class Anagram {
    private final static int[] PRIMES = Prime.calculateFirst('Z' - 'A' + 1);

    public static int calculateValue(final String phrase) {
        int value = 1;
        for (char c: phrase.toCharArray()) {
            int index = AlphabetIndexer.getIndex(c);
            if (index != -1) {
                value += PRIMES[index];
            }
        }
        return value;
    }

    public static boolean isAnagram(final String phrase1, final String phrase2) {
        return calculateValue(phrase1) == calculateValue(phrase2);
    }
}

The code is badly divided into classes to the only use of better testing and division of concerns. The speed complexity is O(n) because it runs over each given phrase once to calculate the respective integer value and then compares them for equality. So it makes roughly 2 * N + K operations which is O(n). Using prime numbers can speed up the algorithm slightly by simplifying the final check for equality.

Theory

A prime numbers’ property is that they cannot be factorized, so the factorization of a non prime number is fixed. i.e. given J,K,L prime numbers the numbers N = aJ + bK + cL and M = dJ + eK + fL are equals only and only if a = d, b = e and c = f.
So encoding the occurrence of the characters in a phrase using a sequence of prime numbers results in two equal numbers if and only if the same characters are present in both phrases.

Older Posts »

Theme: Shocking Blue Green. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.