[pycrypto] Crypto.Random crashes due to unaligned access

Greg Price gnprice at gmail.com
Sun Oct 27 16:15:31 PDT 2013

On Sun, Oct 27, 2013 at 3:07 PM, Sebastian Ramacher
<sebastian+lists at ramacher.at> wrote:
> I debugged this for a while and the problem is not _mm_loadu_si128.
> That's fine. It generates the correct movdqu instruction for that. The
> problem is the rk[0] = ... part. On amd64, ek and dk from block_state
> get aligned at 16 byte boundaries and everything works out properly.
> However, on i386 this does not appear to be true.

Now that my messages are making it to the list:

Yes.  The guarantee from malloc(), which PyObject_New() relies on, is
that the returned pointer is suitably aligned for any built-in type.
On i386 this means 4-byte alignment; on amd64 8-byte alignment.  The
actual behavior of glibc malloc() is a little nicer: it's aligned to
the greater of 2 * sizeof(size_t) and __alignof__ (long double), so 8
bytes on i386 and 16 bytes on amd64. [1]

This is why on amd64 all is well and on i386 it often fails.

The problem is that 'rk' has a type that implies 16-byte alignment.
The __m128i type is defined in emmintrin.h:

typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));

So our choices are (a) place 'rk' 16-byte aligned to match its type,
or (b) change its type to guarantee no alignment (beyond that of
built-in types.)  I don't see a way to do (b) without making the code
significantly messier, and probably slowing it down measurably too
because of the unaligned loads and stores.  To do (a), we can move
st->ek and st->dk to a separate buffer which we allocate ourselves
with something like posix_memalign().  My recommendation is (a).

Thanks for looking at this!


[1] Details at https://git.kernel.org/cgit/docs/man-pages/man-pages.git/commit/?id=25630b276

More information about the pycrypto mailing list