Jon's disassemblies show, that the difference between the two versions is that the fast version uses a pair of registers (esi,edi
) to store one of the local variables where the slow version doesn't.
The JIT compiler makes different assumptions regarding register use for code that contains a try-catch block vs. code which doesn't. This causes it to make different register allocation choices. In this case, this favors the code with the try-catch block. Different code may lead to the opposite effect, so I would not count this as a general-purpose speed-up technique.
In the end, it's very hard to tell which code will end up running the fastest. Something like register allocation and the factors that influence it are such low-level implementation details that I don't see how any specific technique could reliably produce faster code.
For example, consider the following two methods. They were adapted from a real-life example:
interface IIndexed { int this[int index] { get; set; } }
struct StructArray : IIndexed {
public int[] Array;
public int this[int index] {
get { return Array[index]; }
set { Array[index] = value; }
}
}
static int Generic<T>(int length, T a, T b) where T : IIndexed {
int sum = 0;
for (int i = 0; i < length; i++)
sum += a[i] * b[i];
return sum;
}
static int Specialized(int length, StructArray a, StructArray b) {
int sum = 0;
for (int i = 0; i < length; i++)
sum += a[i] * b[i];
return sum;
}
One is a generic version of the other. Replacing the generic type with StructArray
would make the methods identical. Because StructArray
is a value type, it gets its own compiled version of the generic method. Yet the actual running time is significantly longer than the specialized method's, but only for x86. For x64, the timings are pretty much identical. In other cases, I've observed differences for x64 as well.