[pycrypto] Initial review of Thorsten's Py3k changes
Thorsten Behrens
sbehrens at gmx.li
Sun Apr 17 08:15:57 CST 2011
I am going back into the code to take a peek at your suggestion.
On 1/29/2011 8:47 PM, Dwayne C. Litzenberger wrote:
> Have a look in the various common.py files. All of the hex test vectors are
> being fed through either a2b_hex or b2a_hex. I think it should be possible
> to make versions of b2a_hex and a2b_hex that also do bytes->str and
> str->bytes conversions, respectively.
>
> The following code works in both Python 2.1 and Python 3.2b2:
>
> from binascii import b2a_hex as _b2a_hex, a2b_hex as _a2b_hex
> from codecs import ascii_decode as _ascii_decode
> def bin2hex(bts):
> """Like b2a_hex, but returns a str instead of bytes in Python 3.x"""
> return _ascii_decode(_b2a_hex(bts))[0]
> def hex2bin(s):
> """Like a2b_hex, but expects a str instead of bytes in Python 3.x"""
> return _a2b_hex(s.encode('ascii'))
This would actually make things worse. That it works at all is to be
considered a bug - there's a TODO I have not followed up on yet, and
that TODO is to add type-checking to all functions so that an error is
returned if a parameter is not "an object interpretable as a buffer of
bytes". That is, if encode() is called with a unicode (str) object, that
should raise an error.
The reason I believe that pycrypto should check type is that the Python
3.x stdlib behaves that way:
>>> from hashlib import sha1
>>> h = sha1()
>>> h.update("lorem")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
>>> h.update(b"lorem")
>>> print (h.hexdigest())
b58e92fff5246645f772bfe7a60272f356c0151a
For consistency, I have both Crypto.Hash and Crypto.Cipher behaving this
way. The changes are in the doc, but in a nutshell:
Crypto.Hash
Python 3.x: digest() returns a bytes object
Python 3.x: hexdigest() returns a bytes object
Python 3.x: The passed argument to update() must be an object
interpretable as a buffer of bytes
Crypto.Cipher
new()
Python 3.x: ```mode`` is a string object; ```key``` and ```IV``` must be
objects interpretable as a buffer of bytes.
cipher object
Python 3.x: ```IV``` is a bytes object.
decrypt()
Python 3.x: ```string``` must be an object interpretable as a buffer of
bytes.
decrypt() will return a bytes object.
encrypt()
Python 3.x: ```string``` must be an object interpretable as a buffer of
bytes.
encrypt() will return a bytes object.
If these new conventions will cause an issue, let's discuss that now,
before I add the type-checking.
All that having been said, I still think it should be possible to have
the vectors be Unicode literals and to convert them to a bytes object
when reading them in. It just will need to be done in a different part
of the code.
Thorsten
More information about the pycrypto
mailing list