[pycrypto] the sad state of pycrypto
Dwayne C. Litzenberger
dlitz at dlitz.net
Sun Nov 9 18:01:56 CST 2008
On Sun, Nov 09, 2008 at 11:49:49AM -0800, Paul Hoffman wrote:
>On Sun, Nov 9, 2008 at 8:31 AM, Dwayne C. Litzenberger <dlitz at dlitz.net> wrote:
>> Really? Many developers still use MD5 in new applications.
>
>MD5 is still perfectly usable in applications that do not rely on the
>collision resistance and only need 128 bits of preimage resistance.
>For example, HMAC-MD5 has been proven to be secure even is the
>collision resistance is near zero.
MD5 was _never_ collision-resistant; We just thought it was. It's possible
that MD5 is not safe for any purpose, and that we just currently think it
is. Maybe it's safe, and maybe not, but it's not a conservative choice for
new applications.
Also, I'm not sure what security proof you're referring to, but see
"Forgery and Partial Key-Recovery Attacks on HMAC and NMAC Using Hash
Collisions": http://eprint.iacr.org/2006/319
> A hashed signature algorithm can use MD5 with no problems.
I'm sure you don't mean that. Any time you someone signs a message
provided by a third party (such as when certifying a computer program or
when adding a digital timestamping to a document), the hash function they
use needs to be collision-resistant.
>> Following your line of reasoning,
>> there was nothing wrong with RandomPool; It was simply being misused---by
>> practically everyone. I disagree, and RandomPool is now deprecated.
>
>That is not my line of reasoning. RandomPool was unsafe at any speed.
>MD5 is safe for many purposes.
No, RandomPool was safe if you used it correctly, which meant you had to
feed it entropy from somewhere, and you had to monitor the entropy
estimate. Few people actually did that, but if they did, RandomPool worked
fine.
>> Overly optimistic developers (or their micro-managing bosses) routinely
>> make design choices favouring speed or portability over security, and
>> it's the _users_ who suffer the consequences.
>
>If someone knows enough about MD5 to know that it is faster than
>SHA-1, or that it is more portable than SHA-1, knows about its
>properties enough to use it.
I still think you're being overly optimistic. Smart developers still make
fatal mistakes with crypto, and I have empirical evidence to back that up:
1. Zooko said:
"I happen to know a somewhat famous developer who once looked
through the Crypto++ API and chose DES-XEX without (I think)
realizing that it was DES-X and not Triple-DES."
2. RandomPool was misused---twice---in Paramiko. See
http://lists.dlitz.net/pipermail/pycrypto/2008q3/000000.html
3. A Google Code Search for RandomPool turned up a bunch of uses, none
of which were correct.
Developers of crypto libraries are in a position to reduce the number of
mistakes their downstream users accidentally make. I intend to make full
use of this ability. (But see below.)
>If you really want the library to be in nanny mode, simply rename the
>function from "MD5" to something like "idontwantyoutouseMD5". This is
>a serious suggestion. Self-documenting function names are surprisingly
>useful.
Aside from the maintainability benefits, I don't want to drop algorithms
that people need for legacy reasons, even if they would be well-advised not
to use them in new applications. That's why I like the policy idea instead
of dropping or renaming modules. That way, developers can make less
conservative choices if they need to, but they'll be less likely to do so
accidentally, and reviewers will have an easier time checking for these
mistakes.
On the other hand, I don't mind dropping algorithms that nobody actually
uses. It's not just about "nanny mode": Code no longer present is code I
don't have to spend my limited time maintaining. That's why I asked about
MD2. Do you know of anyone who uses PyCrypto who needs MD2 support?
>> If it's licensed to everyone on an automatic, royalty-free basis, then
>> it's not _encumbered_ by a patent, just _covered_ by a patent.
>
>Some pedants would not slice and dice it that way.
It's not "slicing and dicing"; It's the only way to deal with the insanity
of various patent systems around the world and still actually develop
anything.
If I take the claims of every patent at face value (which I have to, since
the courts say I'm not qualified to do anything else, because I'm not a
patent attorney) then I must assume that every program I could possibly
write is covered by many patents. However, most of these patents don't
cause any actual problems, for whatever reason (which could be that my
reading of the patents is too broad, or that the patents are invalid, or
that the patent holder doesn't want to enforce them, or that the patents
have been explicitly licenced to everyone on a royalty-free basis). That's
how we manage to write software without getting sued into oblivion. Well,
most of the time.
So, like everybody else, I don't read patents until they have expired.
This means I can be wrong about what's patented and what's not, but patent
law gives me no other choice.
My policy is that if I think an algorithm is patent-encumbered, then it's
not getting included into PyCrypto; If it's already included, then it gets
dropped. Patent holders who create encumbrances will get every bit of
exclusivity they ask for, and they deserve whatever lack of market
penetration comes with it.
>> My understanding was that SHA-224 and SHA-384 had additional patent
>> encumbrances that are did not apply to SHA-256 and SHA-512. That
>> understanding probably came from Wikipedia, and it may be incorrect.
>
>I see nothing in the current version of the Wikipedia page that says
>that, and I have never heard of any such encumbrances. If there were,
>the NSA would be amazingly remiss in filing an IPR statement with the
>IETF for the family as a whole but not those members.
Yeah, it sounds like I might have been mistaken about the patent situation
regarding SHA-224 and SHA-384.
>> In any case, SHA-224 and SHA-384 are just weakened versions of SHA-256 and
>> SHA-512, so I'm not inclined to add them without good reason.
>
>I am not a proponent of either function, but I can channel those who
>are. SHA-224 is designed to have "matched impedance" with TripleDES,
>which has 112 bits of strength. Similarly, SHA-384 is matched to
>AES-196. I find the impedance idea goofy, but some folks like it.
I agree that it's goofy. I really don't see why a person couldn't just
truncate an ordinary SHA-256/512 hash if they want "matched impedance",
rather than also mucking about with the initial values. If we want to
avoid allowing someone to truncate an SHA-256 hash to make a valid 224-bit
hash, then we can define separate hash functions like so:
H_256(m) := SHA-256("SHA-256" || m)
H_224(m) := SHA-256("SHA-224" || m)[:224]
This would have the same effect, and wouldn't involve messing with the
internals of the hash function. SHA-224/384 look like hacks to support
some bizarre U.S. government system that PyCrypto will never be approved
for anyway. :-)
I might reconsider adding SHA-224/384 at some point in the future if
there's some realistic interoperability need for it (e.g. important
free/open-source software that depends on PyCrypto suddenly needs to
support it for some reason), but for now I think it would just make
PyCrypto more complex than it needs to be.
--
Dwayne C. Litzenberger <dlitz at dlitz.net>
Key-signing key - 19E1 1FE8 B3CF F273 ED17 4A24 928C EC13 39C2 5CF7
Annual key (2008) - 4B2A FD82 FC7D 9E38 38D9 179F 1C11 B877 E780 4B45
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.dlitz.net/pipermail/pycrypto/attachments/20081109/bf6b5fcd/attachment-0001.pgp
More information about the pycrypto
mailing list