How much longer can a critique of a manual page run than the actual page itself? I hypothesize: quite a lot longer. (Edit: someone has submitted a patch to address some of these concerns. Jump to the bottom for expanded thoughts on why I don’t submit these myself.)
"PDKBF" stands for AUGH I screwed it up already. Let’s try again. "PBKDF" stands for Password-Based Key Derivation Function, which is basically the only real-world usecase of deliberately slowing down your own computation. Here’s a crash course in the theory as it pertains to its use in webapps: we collectively made a huge mistake when we chose fast hasing algorithms such as MD5 and SHA1 as a basis for password security. Faster to calculate is faster to crack, and in particular they lend themselves well to GPU computing. A password hashing algorithm should be as slow as possible without interrupting the functionality of your login process. PB-whatever is a hacky but functional fix for this which is essentially just a wrapper that repeats hashing in a loop for a number of iterations under your control. Super.
I am totally 100% for including this in the standard library of PHP due to its, well, standardness. (That being said, I have been asked to point out that there is another new function which is the recommended way to hash passwords in PHP.) It was proposed last year, accepted by a vote of 9-0, and implemented in PHP 5.5. Unfortunately, many prebuilt PHPs in repositories are still on PHP 5.3 (which dulls the joy of hearing that some truly vile misfeatures are finally complelely removed in 5.4) so if you don’t have full control of your environment this may not be available to you for a while yet. (This was actually the first time I ever had to compile PHP completely from scratch. It turns out to not be a horrible process; they have, at least, got this “deploying” thing nailed.)
So why are we here? Well, a faithful follower slipped me a tip to check out the documentation. It turned out I agreed: I don’t like it. It also turns out I am acquainted with the person who both proposed the RFC and implemented the actual code. Awk-ward. Can I be my usual cruel, demanding, and unforgiving self in the face of the actual hearts and souls of PHP developers? Dangit, I intend to try.
Actual footage of the author of PHP Manual Masterpieces.
Let’s be clear: I have read the backing C code of this feature and I see nothing wrong with the actual functionality. My issues are strictly with the documentation and the API, both of which are very PHP-ish in the sorts of ways that drive me to hateblog about a programming language on a Friday night. It turns out there are people who are totally okay with these design decisions, and I can’t help that their subjective tastes are wrong, but that’s just how it is.
Issue The First: Non-copypaste-safe cryptography
We all know that any and all example code will be used in production somewhere. If it doesn’t error-check, production won’t. If it uses unreasonable defaults, so will production. One can argue that it’s okay to have “some assembly required” example code if the documentation itself – that is, on the same page – clearly explains what assembly is required where. That’s not happening here.
In this case: the documentation shows
$salt as a constant string, with no mention that this is bad, when the only safe thing to do in the common use case is absolutely not have one constant string. Setting a salt to a constant pretty much destroys the entire point of a salt; many people are under the impression that since a constant one will still defeat rainbow tables that it has done its job, but that’s living in the nineties. The real threat is massively parallel cracking. Having the same salt across all hashes does not do very much to stop that.
This is what the original RFC’s sample documentation used:
$salt = mcrypt_create_iv(16, MCRYPT_DEV_URANDOM);
And that’s GOOD! But it turns out that mcrypt absolutely cannot be relied on at all to be even probably present, so file that away under Issue The First Subpoint The First: the illusion of PHP having a reasonable supply of built-in cryptography functionality.
That’s an awful lot of words to say that what I want to see is:
$salt = PHP'S_BUILT_IN_SALT_GENERATOR(); // use a unique salt per hash. See [here] for details!
Whenever I say things like this, someone always pops up to say that it’s the consuming developer’s job to already know this stuff. Good thing PHP is a language explicitly targeted at seasoned, well-trained experts who have studied cryptography in university.
Issue The Second: Catastrophically Fail and Carry On
PHP has a deep-seated obsession with never, ever terminating execution with an error, except for stupid reasons. If it’s anything short of the underlying computer physically exploding, PHP’s policy is to return a nonsensical answer and continue with execution. Compounding this problem is that it’s totally normal to disable displaying errors entirely. (Technically, PHP only calls it an error if it is fatal: otherwise it’s a warning, a notice, et cetera.) The result is quite foreseeable: any and all non-fatal errors will go unnoticed somewhere.
Sometimes this isn’t a big deal, but this is cryptography. This isn’t like
mt_rand() which is documented as not cryptographic but is often abused for it: this is explicitly intended for cryptographic use. The stakes are high by default. My issue here is that there are multiple ways to cause
hash_pbkdf2() to non-fatally return
false when a cryptographically usable string is expected.
False is of course a perfectly serviceable thing in PHP to use as a string with no explicit casting. Do you see the problem yet?
It is a little bit too easy for code to begin spitting out the exact same output for all inputs and have this go unnoticed. Maybe someone typos a hash name, or some function upstream spits out -1 as an error code and this gets used as the length, or in the future a new hash algorithm is added and someone runs it on an older version of PHP that doesn’t have that algo yet. The end result is that two completely different passwords would end up with the same meaningless “hash” (even in the face of the legendary TRIPLE EQUALITY operator) and cause a catastrophic failure of security.
It is my strongly-held opinion that all errors in cryptographic code should be fatal. If the intended results cannot be obtained then end the world rather than risk an empty string being treated as a meaningful result to be used in security decisions.
To keep this focused on the manual: it does not explicitly say that you can get
false as a result. It’s left to be the sort of inference made by the highly experienced programmers we already sarcastically established are PHP’s exclusive audience. If the designers do not want to change errors in this function to fatal, the documentation should be made much more explicit about failure modes.
Issue The Third: Metric or Imperial Bytes?
What’s the one thing your high school physics teacher always told you? Take a look at this documentation excerpt and see if you can recall.
The length of the derived key to output. If 0, the length of the supplied algorithm is used.
Need a hint? ALWAYS WRITE DOWN YOUR UNITS!
Well, you might think, this isn’t that big a deal: run the function once and see how long a string it outputs and compare it to the length you passed. It’s gonna be either bits or bytes, right?
BZZT Wrong. The length is measured in characters of the final PHP string (which you might assume is the same as bytes but hold on). Well that’s not so bad, right? At least it’s consistent?
BZZT Double wrong! The length parameter has an undocumented interdependency with the raw output boolean parameter! If it’s false (the default), the hash is measured in hexadecimal digits aka nibbles converted to full characters whereas if it’s true it’s measured in bytes converted to characters. What this means is: the number of actual crytographically significant bits in the result may be HALF or DOUBLE what you were expecting. The prior in particular may be catastrophic.
In Conclusion: (╯°□°）╯︵ ┻━uoıʇɐʇuǝɯnɔop━┻
I feel like this function’s documentation and API are set up to cause issues in the usual PHP sorts of ways. Since it is cryptographic functionality designed to be used in security, I don’t feel bad making exacting demands for it being completely predictable and explicit.
Why do I post this stuff on a tumblr instead of trying to get involved with PHP’s documentation project? Because that would take all the joy out of being angry as a hobby.
Okay, actually, let’s expand on that: editing this one page to be more explicit on the correct use of salts doesn’t solve PHP’s documentation-of-dangerous-stuff problem. Editing this one page to remind that PHP functions like to return a meaningless answer alongside a soft error doesn’t solve PHP’s maddening tendency to sprinkle these everywhere. So on and so forth. (The length parameter one is at least an issue pretty specific to this page, I think? I hope.)
I’m not “involved with” PHP and I don’t want to be. It seems all the PHP contributors i know are ex-contributors because they got fed up with the community and bailed. Does complaining about their documentation and not personally editing it make me the world’s biggest prissy brat? Maybe. But I’ve got quite enough flamewars to juggle already on the feminism, secularism, and LGBT (add letters as necessary in context) fronts, and hobbies that very deeply personally matter to me using up my evenings and weekends. If people who choose to allocate their hobby points to working on PHP agree with me that they have a systematic design and documentation problem that needs a systematic answer, that’s super great. Otherwise, this is a monument of warning to passerby, and nothing more.