[pycrypto] Initial review of Thorsten's Py3k changes

Thorsten Behrens sbehrens at gmx.li
Sun Apr 17 08:15:57 CST 2011


I am going back into the code to take a peek at your suggestion.


On 1/29/2011 8:47 PM, Dwayne C. Litzenberger wrote:
> Have a look in the various common.py files.  All of the hex test vectors are
> being fed through either a2b_hex or b2a_hex.  I think it should be possible
> to make versions of b2a_hex and a2b_hex that also do bytes->str and
> str->bytes conversions, respectively.
>
> The following code works in both Python 2.1 and Python 3.2b2:
>
>       from binascii import b2a_hex as _b2a_hex, a2b_hex as _a2b_hex
>       from codecs import ascii_decode as _ascii_decode
>       def bin2hex(bts):
>           """Like b2a_hex, but returns a str instead of bytes in Python 3.x"""
>           return _ascii_decode(_b2a_hex(bts))[0]
>       def hex2bin(s):
>           """Like a2b_hex, but expects a str instead of bytes in Python 3.x"""
>           return _a2b_hex(s.encode('ascii'))

This would actually make things worse. That it works at all is to be 
considered a bug - there's a TODO I have not followed up on yet, and 
that TODO is to add type-checking to all functions so that an error is 
returned if a parameter is not "an object interpretable as a buffer of 
bytes". That is, if encode() is called with a unicode (str) object, that 
should raise an error.

The reason I believe that pycrypto should check type is that the Python 
3.x stdlib behaves that way:

 >>> from hashlib import sha1
 >>> h = sha1()
 >>> h.update("lorem")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
 >>> h.update(b"lorem")
 >>> print (h.hexdigest())
b58e92fff5246645f772bfe7a60272f356c0151a

For consistency, I have both Crypto.Hash and Crypto.Cipher behaving this 
way. The changes are in the doc, but in a nutshell:

Crypto.Hash

Python 3.x: digest() returns a bytes object
Python 3.x: hexdigest() returns a bytes object
Python 3.x: The passed argument to update() must be an object
interpretable as a buffer of bytes

Crypto.Cipher

new()
Python 3.x: ```mode`` is a string object; ```key``` and ```IV``` must be
objects interpretable as a buffer of bytes.

cipher object
Python 3.x: ```IV``` is a bytes object.

decrypt()
Python 3.x: ```string``` must be an object interpretable as a buffer of 
bytes.
decrypt() will return a bytes object.

encrypt()
Python 3.x: ```string``` must be an object interpretable as a buffer of 
bytes.
encrypt() will return a bytes object.


If these new conventions will cause an issue, let's discuss that now, 
before I add the type-checking.


All that having been said, I still think it should be possible to have 
the vectors be Unicode literals and to convert them to a bytes object 
when reading them in. It just will need to be done in a different part 
of the code.

Thorsten



More information about the pycrypto mailing list