As many people pointed out the hashCode()
method was to blame. It was only generating around 20,000 codes for 26 million distinct objects. That is an average of 1,300 objects per hash bucket = very very bad. However if I turn the two arrays into a number in base 52 I am guaranteed to get a unique hash code for every object:
public int hashCode() {
// assume that both a and b are sorted
return a[0] + powerOf52(a[1], 1) + powerOf52(b[0], 2) + powerOf52(b[1], 3) + powerOf52(b[2], 4);
}
public static int powerOf52(byte b, int power) {
int result = b;
for (int i = 0; i < power; i++) {
result *= 52;
}
return result;
}
The arrays are sorted to ensure this methods fulfills the hashCode()
contract that equal objects have the same hash code. Using the old method the average number of puts per second over blocks of 100,000 puts, 100,000 to 2,000,000 was:
168350.17
109409.195
81344.91
64319.023
53780.79
45931.258
39680.29
34972.676
31354.514
28343.062
25562.371
23850.695
22299.22
20998.006
19797.799
18702.951
17702.434
16832.182
16084.52
15353.083
Using the new method gives:
337837.84
337268.12
337078.66
336983.97
313873.2
317460.3
317748.5
320000.0
309704.06
310752.03
312944.5
265780.75
275540.5
264350.44
273522.97
270910.94
279008.7
276285.5
283455.16
289603.25
Much much better. The old method tailed off very quickly while the new one keeps up a good throughput.