I actually thought of you recently – when we finally had to crack SunMD5 hashes in the recent contest, and then when JimF on our team (after the contest, unfortunately) produced a SIMD implementation of SunMD5 for JtR (not included in a released version yet, but will be).
So, yes, Jim broke SunMD5, in a sense. We’re currently getting a 4.4x speedup over single-password code (that is, over straightforward algorithm similar to your original) when hashing multiple candidate passwords at once on a CPU with 128-bit SIMD vectors (the specific 4.4x figure is when using one core in my FX-8120, with XOP intrinsics; other x86-64 CPUs give similar speedup figures – also near 4x). IIRC, such use of SIMD is something you meant to defeat with the data-dependent branching – well, this is not so simple. 🙂
While we could come up with a password hashing method where data-dependent branching would be such that it’d be more difficult for SIMD implementations (Jim wanted to propose a fix for SunMD5) and also for GPUs’ eager execution, my opinion is that it’s not necessarily a good thing to do because it limits the defender’s use of hardware (thus giving attackers with specialized hardware a greater advantage), because it introduces side-channel leaks (yes, SunMD5 is unsafe in this respect), and because it’s arguably too complex.
I just thought you could want to know where this concept went so far.
So, yeah, not perfect, but for a lashed-up algorithm to only be sped-up by a factor of 4.4x over 10 years is fairly impressive; Morris
crypt() was sped-up by a factor of several thousand times over the period from 1991-1995.
They could probably crush on it some more but still, I’m happy.
Side channel? Bah. Handwave