[pycrypto] Crypto.Random crashes due to unaligned access

Sun Oct 27 16:00:12 PDT 2013

And the other message that didn't make it before.

Greg

---------- Forwarded message ----------
From: Greg Price <gnprice at gmail.com>
Date: Thu, Oct 24, 2013 at 12:53 PM
Subject: Re: Crypto.Random crashes due to unaligned access
To: Dwayne Litzenberger <dlitz at dlitz.net>
Cc: pycrypto at lists.dlitz.net

GCC 4.6.3, and same with 4.8.1 (see my later message).  "setup.py
test" crashes too.

See my follow-up last night for more of a diagnosis -- the problem is
that 'rk', aka st->ek, has a type that implies 16-byte alignment, and
we don't get that from PyObject_New.  That allocation comes (I think)
straight from malloc(), which with glibc on a 32-bit x86 system gives
only 8-byte alignment.

One fix would be to move st->ek and st->dk to a separate buffer we
allocate with something like posix_memalign().  This wouldn't require
any copying, as we fill those buffers ourselves in the first place.
It'd just add a layer of indirection in the AESNI.c block_state
struct.

Alternatively we could try to eliminate the 16-byte alignment, but I
don't immediately see a way of doing that without making the code much
messier, and it'd probably also slow the code down.

Greg

On Thu, Oct 24, 2013 at 9:59 AM, Dwayne Litzenberger <dlitz at dlitz.net> wrote:
> Hi Greg!
>
> What version/build of GCC is this?  Does "setup.py test" crash for you as well?
>
> I'd rather figure out how to fix the problem than to start making copies of the key.
>
> Greg Price <gnprice at gmail.com> wrote:
>>I get the following crash in a PyCrypto built from the current master,
>>af058ee (aka v2.6.1-136-gaf058ee):
>>
>>>>> import Crypto.Random
>>>>> Crypto.Random.new().read(1)
>>Segmentation fault (core dumped)
>>
>>This is on i686.  I compiled with GCC 4.6.3 (or "Ubuntu/Linaro
>>4.6.3-1ubuntu5".)
>>
>>GDB shows the crash is here:
>>
>>Program received signal SIGSEGV, Segmentation fault.
>>aes_key_setup_enc (keylen=32, cipherKey=
>>0x841b1bc
>>"L\fB2\244\225\235\206^\242\305\305b\201\200\335ņ{d\240\343\262;m\361\243\276u~\337&",
>>rk=
>>    0x84900a8) at src/AESNI.c:122
>>122            rk[0] = _mm_loadu_si128((const __m128i*) cipherKey);
>>
>>at which the instruction is
>>
>>(gdb) x/i $pc
>>=> 0xb78f2600 <ALGnew+2160>: movdqa %xmm0,0x40(%esi)
>>
>>This is an aligned store.  The documentation of MOVDQA says it should
>>be 16-byte aligned.  The value of rk (aka %esi + 0x40) is only 8-byte
>>aligned:
>>
>>(gdb) p rk
>>$5 = (__m128i *) 0x84900a8
>>(gdb) p/x $esi
>>$9 = 0x8490068
>>
>>It's not clear to me why GCC generated an aligned instruction here --
>>in fact, the definition of _mm_loadu_si128 in my emmintrin.h appears
>>to be
>>
>>extern __inline __m128i __attribute__((__gnu_inline__,
>>__always_inline__, __artificial__))
>>_mm_loadu_si128 (__m128i const *__P)
>>{
>>  return (__m128i) __builtin_ia32_loaddqu ((char const *)__P);
>>}
>>
>>and the name of that builtin sure sounds more like MOVDQU than MOVDQA.
>> Perhaps GCC somehow decides that it can prove the pointer is aligned
>>here.
>>
>>I don't know why GCC makes this mistake, or (since it's never the
>>compiler's fault) which code is lying to it about something being
>>aligned. Anyone know how to investigate this kind of question?
>>
>>A workaround would be to make sure that the cipherKey argument to
>>aes_key_setup_enc() in src/AESNI.c is always 16-byte aligned.  At
>>present, that argument comes straight from the first Python-level
>>argument to _AESNI.new(); see the PyArg_ParseTupleAndKeywords() call
>>in src/block_template.c.  I guess to implement this workaround we'd
>>copy the key to a new, aligned buffer if it's not aligned.
>>
>>I can send a patch for that workaround if it seems like the best
>>approach.  Happy to hear alternatives, and of course it'd be most
>>satisfying if we can understand why the compiler is emitting this
>>output in the first place.
>>
>>Greg
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.