[pycrypto] Crypto.Random crashes due to unaligned access
Greg Price
gnprice at gmail.com
Sun Oct 27 15:58:52 PDT 2013
Excellent, Mailman is taking my messages now. Here's more diagnosis.
Greg
---------- Forwarded message ----------
From: Greg Price <gnprice at gmail.com>
Date: Thu, Oct 24, 2013 at 1:17 AM
Subject: Re: Crypto.Random crashes due to unaligned access
To: pycrypto at lists.dlitz.net, Dwayne Litzenberger <dlitz at dlitz.net>
Same behavior with GCC 4.8.1, from the latest Ubuntu release. Details
below, but first:
On further inspection, the problem isn't from _mm_loadu_si128. That
function is generating a perfectly good MOVDQU unaligned load, which
is the previous instruction:
$ objdump -dr .../_AESNI.so
25fc: f3 0f 6f 02 movdqu (%edx),%xmm0
2600: 66 0f 7f 46 40 movdqa %xmm0,0x40(%esi)
The problem instruction, at 2600 here, is from the "rk[0] =" part of
the line. We're storing to an lvalue of type __m128i, which is
apparently 16 bytes long:
// emmintrin.h
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
So GCC naturally believes it can do a 16-byte-aligned store.
Fortunately, 'rk' is an address we control. It's st->ek, where st
came from newALGobject in src/block_template.c. The alignment there
comes from PyObject_New, but we could add an indirection to storage we
allocate directly to ensure alignment. (Sadly I don't think the
Python C API has a cousin of PyObject_New that takes an alignment.)
Alternatively, we could try to avoid promising the compiler it can do
aligned stores. I think memcpy() would do it for these assignments,
but we use rk[i] for various i as arguments to the KEYEXP* macros, and
the underlying functions for those macros take __m128i arguments, so
in principle the compiler could generate aligned loads and stores from
those calls too. I'm not immediately thinking of a clean way to carry
out this approach.
Thoughts on the aligned-allocation solution, or others?
Greg
PS:
$ gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu8) 4.8.1
Same crash. The code generated is slightly different, but
substantively the same:
(gdb) x/i $pc
=> 0xb7acb6c5 <ALGnew+2197>: movdqa %xmm2,0x40(%edi)
(gdb) p/x $edi
$1 = 0x8417418
(gdb) p rk
$2 = (__m128i *) 0x8417458
GCC 4.8.2 just released a week ago; haven't tried it with that. I
don't think the compiler is at fault, though, especially after my new
realization above.
On Wed, Oct 23, 2013 at 11:42 PM, Greg Price <gnprice at gmail.com> wrote:
> I get the following crash in a PyCrypto built from the current master,
> af058ee (aka v2.6.1-136-gaf058ee):
>
>>>> import Crypto.Random
>>>> Crypto.Random.new().read(1)
> Segmentation fault (core dumped)
>
> This is on i686. I compiled with GCC 4.6.3 (or "Ubuntu/Linaro 4.6.3-1ubuntu5".)
>
> GDB shows the crash is here:
>
> Program received signal SIGSEGV, Segmentation fault.
> aes_key_setup_enc (keylen=32, cipherKey=
> 0x841b1bc "L\fB2\244\225\235\206^\242\305\305b\201\200\335ņ{d\240\343\262;m\361\243\276u~\337&",
> rk=
> 0x84900a8) at src/AESNI.c:122
> 122 rk[0] = _mm_loadu_si128((const __m128i*) cipherKey);
>
> at which the instruction is
>
> (gdb) x/i $pc
> => 0xb78f2600 <ALGnew+2160>: movdqa %xmm0,0x40(%esi)
>
> This is an aligned store. The documentation of MOVDQA says it should
> be 16-byte aligned. The value of rk (aka %esi + 0x40) is only 8-byte
> aligned:
>
> (gdb) p rk
> $5 = (__m128i *) 0x84900a8
> (gdb) p/x $esi
> $9 = 0x8490068
>
> It's not clear to me why GCC generated an aligned instruction here --
> in fact, the definition of _mm_loadu_si128 in my emmintrin.h appears
> to be
>
> extern __inline __m128i __attribute__((__gnu_inline__,
> __always_inline__, __artificial__))
> _mm_loadu_si128 (__m128i const *__P)
> {
> return (__m128i) __builtin_ia32_loaddqu ((char const *)__P);
> }
>
> and the name of that builtin sure sounds more like MOVDQU than MOVDQA.
> Perhaps GCC somehow decides that it can prove the pointer is aligned
> here.
>
> I don't know why GCC makes this mistake, or (since it's never the
> compiler's fault) which code is lying to it about something being
> aligned. Anyone know how to investigate this kind of question?
>
> A workaround would be to make sure that the cipherKey argument to
> aes_key_setup_enc() in src/AESNI.c is always 16-byte aligned. At
> present, that argument comes straight from the first Python-level
> argument to _AESNI.new(); see the PyArg_ParseTupleAndKeywords() call
> in src/block_template.c. I guess to implement this workaround we'd
> copy the key to a new, aligned buffer if it's not aligned.
>
> I can send a patch for that workaround if it seems like the best
> approach. Happy to hear alternatives, and of course it'd be most
> satisfying if we can understand why the compiler is emitting this
> output in the first place.
>
> Greg
On Thu, Oct 24, 2013 at 9:59 AM, Dwayne Litzenberger <dlitz at dlitz.net> wrote:
> Hi Greg!
>
> What version/build of GCC is this? Does "setup.py test" crash for you as well?
>
> I'd rather figure out how to fix the problem than to start making copies of the key.
>
> Greg Price <gnprice at gmail.com> wrote:
>>I get the following crash in a PyCrypto built from the current master,
>>af058ee (aka v2.6.1-136-gaf058ee):
>>
>>>>> import Crypto.Random
>>>>> Crypto.Random.new().read(1)
>>Segmentation fault (core dumped)
>>
>>This is on i686. I compiled with GCC 4.6.3 (or "Ubuntu/Linaro
>>4.6.3-1ubuntu5".)
>>
>>GDB shows the crash is here:
>>
>>Program received signal SIGSEGV, Segmentation fault.
>>aes_key_setup_enc (keylen=32, cipherKey=
>>0x841b1bc
>>"L\fB2\244\225\235\206^\242\305\305b\201\200\335ņ{d\240\343\262;m\361\243\276u~\337&",
>>rk=
>> 0x84900a8) at src/AESNI.c:122
>>122 rk[0] = _mm_loadu_si128((const __m128i*) cipherKey);
>>
>>at which the instruction is
>>
>>(gdb) x/i $pc
>>=> 0xb78f2600 <ALGnew+2160>: movdqa %xmm0,0x40(%esi)
>>
>>This is an aligned store. The documentation of MOVDQA says it should
>>be 16-byte aligned. The value of rk (aka %esi + 0x40) is only 8-byte
>>aligned:
>>
>>(gdb) p rk
>>$5 = (__m128i *) 0x84900a8
>>(gdb) p/x $esi
>>$9 = 0x8490068
>>
>>It's not clear to me why GCC generated an aligned instruction here --
>>in fact, the definition of _mm_loadu_si128 in my emmintrin.h appears
>>to be
>>
>>extern __inline __m128i __attribute__((__gnu_inline__,
>>__always_inline__, __artificial__))
>>_mm_loadu_si128 (__m128i const *__P)
>>{
>> return (__m128i) __builtin_ia32_loaddqu ((char const *)__P);
>>}
>>
>>and the name of that builtin sure sounds more like MOVDQU than MOVDQA.
>> Perhaps GCC somehow decides that it can prove the pointer is aligned
>>here.
>>
>>I don't know why GCC makes this mistake, or (since it's never the
>>compiler's fault) which code is lying to it about something being
>>aligned. Anyone know how to investigate this kind of question?
>>
>>A workaround would be to make sure that the cipherKey argument to
>>aes_key_setup_enc() in src/AESNI.c is always 16-byte aligned. At
>>present, that argument comes straight from the first Python-level
>>argument to _AESNI.new(); see the PyArg_ParseTupleAndKeywords() call
>>in src/block_template.c. I guess to implement this workaround we'd
>>copy the key to a new, aligned buffer if it's not aligned.
>>
>>I can send a patch for that workaround if it seems like the best
>>approach. Happy to hear alternatives, and of course it'd be most
>>satisfying if we can understand why the compiler is emitting this
>>output in the first place.
>>
>>Greg
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
More information about the pycrypto
mailing list