If you run the GWT compiler with the -localWorkers flag, the compiler will compile multiple permutations in parallel. This lets you use all the cores of a multi-core machine, for example -localWorkers 2 will tell the compiler to do compile two permutations in parallel. You won't get order of magnitudes differences (not everything in the compiler is parallelizable) but it is still a noticable speedup if you are compiling multiple permutations.
If you're willing to use the trunk version of GWT, you'll be able to use hosted mode for any browser (out of process hosted mode), which alleviates most of the current issues with hosted mode. That seems to be where the GWT is going - always develop with hosted mode, since compiles aren't likely to get magnitudes faster.