[pycrypto] AES, python 2.7 vs 3

Paul_Koning at Dell.com Paul_Koning at Dell.com
Fri Feb 7 09:01:04 PST 2014


That’s what pycrypto needs to do, yes.  From what Dwayne says, it sounds like that’s currently not finished yet.

The easiest way to look at this is as a data type matching exercise.  Cryptographic operations are functions that operate on sequences of bytes.  Unicode strings are NOT sequences of bytes — they are an entirely different data type.

It is valid to speak of specific encodings of Unicode strings as sequences of bytes, but the key point is that you have to do the encoding — which means, first of all, choosing WHICH encoding — in order to have that sequence of bytes.

Since you have to make those choices, it’s not safe for APIs like crypto to accept strings and effectively do some encoding as a side effect.  Better to require bytes in the interface, and let you handle the encode/decode steps explicitly, in the way you want them to be done.

	paul

On Feb 7, 2014, at 11:47 AM, Dave Pawson <dave.pawson at gmail.com> wrote:

> Thanks Paul. I was going along those lines, but without much understanding?
> Is it correct that pycrypto relies on byte strings and *cannot* deal
> with Unicode strings?
> 
> regards
> 
> On 7 February 2014 16:26,  <Paul_Koning at dell.com> wrote:
>> 
>> On Feb 7, 2014, at 1:29 AM, Dave Pawson <dave.pawson at gmail.com> wrote:
>> 
>>> On 6 February 2014 16:49, Dwayne Litzenberger <dlitz at dlitz.net> wrote:
>>>> On Mon, Feb 03, 2014 at 03:51:06PM +0100, Mirko Dziadzka wrote:
>>>>> 
>>>>> Hi
>>>>> 
>>>>> Without testing or verifying:
>>>>> 
>>>>> You should probably use bytes in Python 3, not characters.
>>>> 
>>>> 
>>>> Yeah, this is a known bug.   Under Python 3, PyCrypto should raise a
>>>> TypeError in most places where unicode strings are used instead of bytes,
>>>> but right now it just does weird things.  Sorry about that.
>>> 
>>> 
>>> Recently playing about with reading / writing between Linux / Windows,
>>> one symptom I came across was that the decrypted string contained b' <string>'
>>> is that a part of this bug?
>> 
>> b’something’ is Python 3 syntax for a “bytes” literal.
>> 
>> It’s a good idea to spend some time reading the Python 3 documentation on text vs. bytes — “str” vs. “bytes” type.  They really got this right, and it completely cures the ugly mess you get in Python 2 when dealing with strings vs. their encoding.
>> 
>> But it is new, and it’s different from what was done before — for good reasons.
>> 
>> One short summary is: “str” contains text, it’s abstract, it represents Unicode characters but does not talk about encoding.  “bytes” contains sequences of bytes (octets — 8 bit integers), it’s concrete, it represents encoding of data in the outside world.  You convert “str” to/from “bytes” when doing I/O.  Anytime you are doing operations that need sequences of bytes, the type has to be “bytes”.  Encryption is one such operation.
>> 
>>        paul
>> _______________________________________________
>> pycrypto mailing list
>> pycrypto at lists.dlitz.net
>> http://lists.dlitz.net/cgi-bin/mailman/listinfo/pycrypto
> 
> 
> 
> -- 
> Dave Pawson
> XSLT XSL-FO FAQ.
> Docbook FAQ.
> http://www.dpawson.co.uk
> _______________________________________________
> pycrypto mailing list
> pycrypto at lists.dlitz.net
> http://lists.dlitz.net/cgi-bin/mailman/listinfo/pycrypto


More information about the pycrypto mailing list