PHP Manual Masterpieces

RSS
Mar 3

Nothing Is Deprecated, Everything Is Permitted

Blogging something real quick while I maybe look to see what kind of fun factor I can get out of the Mt. Gox leak:

It is absolutely 100% true that PHP, the platform, has finally deprecated a bunch of Dumb Stuff, and in some cases has even gotten past the deprecation stage to remove the Dumbest Stuff completely. This is good! This is great.

But that doesn’t mean PHP, the community, is suddenly absolved of the problems those misfeatures brought in the first place. For one thing, most PHP deployments are not continuously upgrading to the newest version of PHP or even the newest stable. The last time I wanted to test something, I had to pull PHP and compile it from scratch because the features in question weren’t in Debian yet. However, some people were already saying that the assorted changes in PHP 5.4 meant I couldn’t pick on those things anymore, before 5.4 was available through standard package distribution!

More critically, though, is that the amount of PHP code in production which is old will always exceed that which is new. The normal lifecycle of a piece of code is to write it once, make notable adjustments once or twice, and put it on minimum maintenance forever until the website goes away. The majority of commercial PHP code that I have seen with my own eyes clearly dates to the PHP 4 era or was written by someone who stopped learning in the PHP 4 era, which was the perfect storm of popularity and screwed-up-ness. This code did not magically get better when it was acknowledged that certain language features were bad news. It stayed more or less exactly the same. It’s still running.

Old code does not magically improve. New code is often written with reference to old code. Heck, new code is often literally just copy-pasted from a forum comment written in 2003. PHP’s misfeatures will persist many years past the official attempt to fix them. And that’s terrible :(

Jan 8

I thought @eevee was putting me on. And now you think I’m putting you on. But I’m not.

But it’s not like anyone would ever notice - no-one is blocking javascript on php.net after the malicious javascript incident! You know, the one we’ve been waiting for the full postmortem on since October? In all seriousness, if they cannot determine the root cause of the breach, they should say so for the record: it happens. It means whatever went wrong could probably go wrong again, but, it happens. From my point of view, their incident response kind of fell apart for a while due to confusion; I hope they have regrouped and learned from the experience so that it goes smoother next time. I’m not being snarky. Incident response is hard.

I’d laugh and cry myself to death, though, if this was them getting hacked again but apparently it’s just someone with commit access being cutesy.

Jan 1

Regarding "On the interest of being fair", h t t p : / / a x o n f l u x . c o m / 5 - q u o t e s - b y - t h e - c r e a t o r - o f - p h p - r a s m u s - l e r d o r f -- why wouldn't tumblr let me include links?!

Anonymous

Scorched earth anti-spam policy. I guess we know where Twitter learned it from. For readers’ convenience: link that Tumblr’s markdown parser better figure out godsdam

I personally am of the opinion that not everybody needs to like programming and take an artist’s pride in their results - but I do kinda maybe want the people who make the tools intended for reuse by thousands to have that quality.

I really like how Tumblr creates answers to asks in WYSIWYG then when I edit them they suddenly become markdown except with HTML tags everywhere. Isn’t this site written in PHP…? ;)

In the interest of being fair

I felt compelled to add this note that I understand Rasmus Lerdorf was about 20 years younger than he is now when he started PHP. I understand that most of PHP’s problems are rooted in gaining too much traction too quickly and nobody wanted to introduce breaking changes. I lament that PHP won, not that it ever existed. It’s pretty typical of a personal project from the 90s.

But dang. He was actually older when he released PHP than I am now. … yep back to feeling judgmental and smug

I’m crying. Literally crying. Actual tears in my eyes. Salty. They sting. PHP is physically hurting me.

All of these freaking ridiculous function names we’ve been stuck with for twenty years are because of a POORLY CHOSEN HASH FUNCTION ON A DATASET OF ONE HUNDRED SHORT STRINGS.

Screencap via @DefuseSec because the actual site is down, presumably from everyone gawking in sheer disbelief.

I’m crying. Literally crying. Actual tears in my eyes. Salty. They sting. PHP is physically hurting me.

All of these freaking ridiculous function names we’ve been stuck with for twenty years are because of a POORLY CHOSEN HASH FUNCTION ON A DATASET OF ONE HUNDRED SHORT STRINGS.

Screencap via @DefuseSec because the actual site is down, presumably from everyone gawking in sheer disbelief.

Language Field Trip: IDL

All aboard the school bus, we’re going on a field trip. Did you know there are things that might actually be worse than PHP? It’s true! It’s true and it causes me to doubt the goodness of the cosmos. If you are a tumblr URL purist, I apologize for the deviation from the strict theme of the PHP manual, but I promise there are truly some other masterpieces of program design to be unearthed.

A few years ago, as I was finishing up my degree, I tried very hard to get a job as a programmer for some radio astronomers because radio astronomy freakin’ rules. Unfortunately I graduated right into the very heart of the bad economy, so that didn’t pan out and now I’m a professional hacker or something (I’m not really sure). Preparing for the interviews, however, brought me into sustained contact with a commercial programming language environment called Interactive Data Language aka IDL. It’s for scientific programming, it has some neat things like built-in cartography data, and it’s terrible.

IDL dates to the late 1970s and it shows in every facet of its being. There is of course a reason anyone ever used it in the first place: it is a language oriented to efficient transforms of entire arrays, which is exactly what scientists working on datasets want. In modern times, languages like Python have filled this role about six bajillion times better, but the dark legacy of, well, legacy code lives on. There have been improvements in recent years – apparently it now has automatic GC(!) and a lot of new graph types – but most of the things I’ll point out here can’t be changed without breaking legacy code, and perusing the current code samples on the site does not make the language seem particularly fundamenantally improved.

To avoid the tedium of retypesetting several tables and code listings, this manual masterpiece is structured around screenshots taken from a book called Practical IDL Programming by Liam E. Gumley. It’s a bit dated but, as mentioned, they had a legacy problem then and they have a legacy problem now. (The current website of IDL does a good job of not making it at all obvious where the official documentation is. It’s here.) The screenshots constitute a very small portion of the overall book, mostly from chapter 2, used for critique purposes bla bla bla. (If you are in tumblr dashboard view, click/tap on any image thumbnail to expand all of them.)

Let’s begin with giving you a taste of what we’re dealing with: this is a while loop from a larger program.

I want to point out one thing in particular. on_ioerror sets a goto for any future IO errors within the current function scope (so why is the statement inside a while loop?). That should set the tone for how this language works. (For the record, I am a fan of a well-placed goto in low-level code; after all, sometimes I program in asm for fun.)

I don’t even have any idea what order I should present these in. It’s just a steady trickle of arbitrary WTF.

Tiny Integers

Quick! What’s the default integer size in a Big Data scientific programming environment? 64-bit, or do we cheap out and use 32-bit to align with the native width of more modest machines? Or do we define it to be the width of the currently executing machine?

Don’t be ridiculous! Integers are sixteen bit. 32 bits are for long types! (And note how it freely admits that Typecast Hell is a threat you must stand ever vigilant against, as though a programming language actively creating problems for you is simply how they are.)

There may be some vague idea that this is to align with 16-bits-per-pixel image storage formats which I presume were more dense on the ground in the 80s. Or maybe most scientists really did have 16-bit machines (can’t afford a VAX?) and the performance penalty of using larger integers was a huge problem. I don’t know, I wasn’t born yet. In any case, this is a wonderful inheritance passed down from generation to generation: having to remember in 2013 that all your low integer literals are being declared as signed 16-bit. Check this out:

That’s right! You have to remember to explicitly cast your literals if you want them to be comparable to a number north of about thirty two thousand!

Has your head hit the desk yet? Get a pillow. Trust me.

Odd and Even Booleans

Hey, you know what would save like, one whole opcode in the runtime’s boolean routine? If we only checked the lowest bit of an integer! BRILLIANT here is your Christmas bonus, Engineer Shortsighted! Oh no your Christmas bonus is an even number of dollars so if(ChristmasBonus) doesn’t evaluate to true.

So… 2.0 is true and 2 is false? – faithful follower @sakjur

This has consequences that break perfectly sensible design patterns:

And standard library routines explicitly defy the linguistic definition of boolean:

And this is as good a place as any to mention that not setting a flag is not necessarily the same thing as setting the flag to false??? Apparently an example is /noclip in graph drawing. I dunno.

Procedures and Functions, Parameters and Keywords

IDL maintains a first-class distinction between procedures (doesn’t return a value) and functions (does return a value) which I think most people see as kind of pointless these days; even C doesn’t care very much. This in and of itself is just a quirk, but the syntax for calling them is completely different and in the case of procedures is just weird:

IDL> procedurename, argument, argument

It’s just like… a comma-delimited list, floating in space? The name of the procedure is not differentiated from its arguments except by virtue that it’s first. It’s gross and I don’t see any reason it should be structured differently from function calls, which take a more typical name(arg, arg) style.

IDL also has a first-class distinction between mandatory arguments, called parameters, and optional arguments, called keywords (in contrast to what “keyword” means in most other languages). “Mandatory” is apparently a bit too strong of a term because “a well-written procedure or function will check that any mandatory input arguments are defined before doing anything else.” An apparently intentional misfeature is you can pass non-existent variables as arguments and expect them to suddenly have meaningful contents in the caller’s scope as a side effect of the function.

Of course, the language contains both pass-by-value and pass-by-reference, and which applies when is of course entirely consistent and intuitive!

I mean, it’s obvious to everyone here that an array which is a subset of another array is a fundamentally different type of data than an array that isn’t, right? Of course such rules would be totally different. (I’m contractually obligated to tell you that my best friend wants you to know this is also how Python does it. Well, I never claimed to want to marry Python, now did I! Edit: Except in Numpy, apparently, where it works the way I think is Right and True, which is probably why I thought Python was Right and True, as I’ve used Numpy for something before.)

Since procedures and functions are completely not the same thing, of course the error messages for not being able to find one are completely different:

Read that second one carefully and let the horror sink in: it cannot distinguish between an invalid function name and an uninitialized array. Unless of course you happened to use a single keyword argument to your invalid function name, in which case you get a third unique error message:

Arrays

This sounds reasonable in isolation:

This sounds reasonable in isolation:

But these are both true in the same language. You see, bad indexes in an array are less bad than lone wolf bad indexes. The companionship tames them.

We already hinted at this one: shipping syntax ambiguity, waking up the next morning, and shipping both ambiguity and non-ambiguity going forward.

Pointers

Yes, it’s a high level language. Yes, there are pointers. I suppose they’re really handles or something.

Accessing undefined variables through pointers: a critical and useful feature and definitely not a cause of interesting bugs.

Quirky?

Yucky.

Assorted Brain Damage

Followed by a code sample that explodes for x = 0 due to lack of short circuiting, of course.

The creat school of function naming thought - ie Ken Thompson’s Regret.

Excerpt from a much larger table – the implication being that there is no hexadecimal notation.

Wait, there are objects?! (And strings are limited to 32 kilobytes?!) THERE ARE OBJECTS?!?! AND YOU’RE JUST NOT GONNA MENTION THAT AGAIN IN OVER FIVE HUNDRED PAGES?!?!

"Don’t bother correctly specifying the expected input. That will just increase the rate at which malformed data is rejected instead of stuffed into places it doesn’t fit!"

And we run entire labs on this.

Nov 8

I Can’t Spell PBKDF

How much longer can a critique of a manual page run than the actual page itself? I hypothesize: quite a lot longer. (Edit: someone has submitted a patch to address some of these concerns. Jump to the bottom for expanded thoughts on why I don’t submit these myself.)

"PDKBF" stands for AUGH I screwed it up already. Let’s try again. "PBKDF" stands for Password-Based Key Derivation Function, which is basically the only real-world usecase of deliberately slowing down your own computation. Here’s a crash course in the theory as it pertains to its use in webapps: we collectively made a huge mistake when we chose fast hasing algorithms such as MD5 and SHA1 as a basis for password security. Faster to calculate is faster to crack, and in particular they lend themselves well to GPU computing. A password hashing algorithm should be as slow as possible without interrupting the functionality of your login process. PB-whatever is a hacky but functional fix for this which is essentially just a wrapper that repeats hashing in a loop for a number of iterations under your control. Super.

I am totally 100% for including this in the standard library of PHP due to its, well, standardness. (That being said, I have been asked to point out that there is another new function which is the recommended way to hash passwords in PHP.) It was proposed last year, accepted by a vote of 9-0, and implemented in PHP 5.5. Unfortunately, many prebuilt PHPs in repositories are still on PHP 5.3 (which dulls the joy of hearing that some truly vile misfeatures are finally complelely removed in 5.4) so if you don’t have full control of your environment this may not be available to you for a while yet. (This was actually the first time I ever had to compile PHP completely from scratch. It turns out to not be a horrible process; they have, at least, got this “deploying” thing nailed.)

So why are we here? Well, a faithful follower slipped me a tip to check out the documentation. It turned out I agreed: I don’t like it. It also turns out I am acquainted with the person who both proposed the RFC and implemented the actual code. Awk-ward. Can I be my usual cruel, demanding, and unforgiving self in the face of the actual hearts and souls of PHP developers? Dangit, I intend to try.

Actual footage of the author of PHP Manual Masterpieces.

Let’s be clear: I have read the backing C code of this feature and I see nothing wrong with the actual functionality. My issues are strictly with the documentation and the API, both of which are very PHP-ish in the sorts of ways that drive me to hateblog about a programming language on a Friday night. It turns out there are people who are totally okay with these design decisions, and I can’t help that their subjective tastes are wrong, but that’s just how it is.

Issue The First: Non-copypaste-safe cryptography

We all know that any and all example code will be used in production somewhere. If it doesn’t error-check, production won’t. If it uses unreasonable defaults, so will production. One can argue that it’s okay to have “some assembly required” example code if the documentation itself – that is, on the same page – clearly explains what assembly is required where. That’s not happening here.

In this case: the documentation shows $salt as a constant string, with no mention that this is bad, when the only safe thing to do in the common use case is absolutely not have one constant string. Setting a salt to a constant pretty much destroys the entire point of a salt; many people are under the impression that since a constant one will still defeat rainbow tables that it has done its job, but that’s living in the nineties. The real threat is massively parallel cracking. Having the same salt across all hashes does not do very much to stop that.

This is what the original RFC’s sample documentation used:

$salt = mcrypt_create_iv(16, MCRYPT_DEV_URANDOM);

And that’s GOOD! But it turns out that mcrypt absolutely cannot be relied on at all to be even probably present, so file that away under Issue The First Subpoint The First: the illusion of PHP having a reasonable supply of built-in cryptography functionality.

That’s an awful lot of words to say that what I want to see is:

$salt = PHP'S_BUILT_IN_SALT_GENERATOR(); 
// use a unique salt per hash. See [here] for details!

Whenever I say things like this, someone always pops up to say that it’s the consuming developer’s job to already know this stuff. Good thing PHP is a language explicitly targeted at seasoned, well-trained experts who have studied cryptography in university.

Issue The Second: Catastrophically Fail and Carry On

PHP has a deep-seated obsession with never, ever terminating execution with an error, except for stupid reasons. If it’s anything short of the underlying computer physically exploding, PHP’s policy is to return a nonsensical answer and continue with execution. Compounding this problem is that it’s totally normal to disable displaying errors entirely. (Technically, PHP only calls it an error if it is fatal: otherwise it’s a warning, a notice, et cetera.) The result is quite foreseeable: any and all non-fatal errors will go unnoticed somewhere.

Sometimes this isn’t a big deal, but this is cryptography. This isn’t like mt_rand() which is documented as not cryptographic but is often abused for it: this is explicitly intended for cryptographic use. The stakes are high by default. My issue here is that there are multiple ways to cause hash_pbkdf2() to non-fatally return false when a cryptographically usable string is expected. False is of course a perfectly serviceable thing in PHP to use as a string with no explicit casting. Do you see the problem yet?

It is a little bit too easy for code to begin spitting out the exact same output for all inputs and have this go unnoticed. Maybe someone typos a hash name, or some function upstream spits out -1 as an error code and this gets used as the length, or in the future a new hash algorithm is added and someone runs it on an older version of PHP that doesn’t have that algo yet. The end result is that two completely different passwords would end up with the same meaningless “hash” (even in the face of the legendary TRIPLE EQUALITY operator) and cause a catastrophic failure of security.

It is my strongly-held opinion that all errors in cryptographic code should be fatal. If the intended results cannot be obtained then end the world rather than risk an empty string being treated as a meaningful result to be used in security decisions.

To keep this focused on the manual: it does not explicitly say that you can get false as a result. It’s left to be the sort of inference made by the highly experienced programmers we already sarcastically established are PHP’s exclusive audience. If the designers do not want to change errors in this function to fatal, the documentation should be made much more explicit about failure modes.

Issue The Third: Metric or Imperial Bytes?

What’s the one thing your high school physics teacher always told you? Take a look at this documentation excerpt and see if you can recall.

length    

The length of the derived key to output. If 0, the length of the supplied algorithm is used.

Need a hint? ALWAYS WRITE DOWN YOUR UNITS!

Well, you might think, this isn’t that big a deal: run the function once and see how long a string it outputs and compare it to the length you passed. It’s gonna be either bits or bytes, right?

BZZT Wrong. The length is measured in characters of the final PHP string (which you might assume is the same as bytes but hold on). Well that’s not so bad, right? At least it’s consistent?

BZZT Double wrong! The length parameter has an undocumented interdependency with the raw output boolean parameter! If it’s false (the default), the hash is measured in hexadecimal digits aka nibbles converted to full characters whereas if it’s true it’s measured in bytes converted to characters. What this means is: the number of actual crytographically significant bits in the result may be HALF or DOUBLE what you were expecting. The prior in particular may be catastrophic.

In Conclusion: (╯°□°)╯︵ ┻━uoıʇɐʇuǝɯnɔop━┻

I feel like this function’s documentation and API are set up to cause issues in the usual PHP sorts of ways. Since it is cryptographic functionality designed to be used in security, I don’t feel bad making exacting demands for it being completely predictable and explicit.

Why do I post this stuff on a tumblr instead of trying to get involved with PHP’s documentation project? Because that would take all the joy out of being angry as a hobby.

Okay, actually, let’s expand on that: editing this one page to be more explicit on the correct use of salts doesn’t solve PHP’s documentation-of-dangerous-stuff problem. Editing this one page to remind that PHP functions like to return a meaningless answer alongside a soft error doesn’t solve PHP’s maddening tendency to sprinkle these everywhere. So on and so forth. (The length parameter one is at least an issue pretty specific to this page, I think? I hope.)

I’m not “involved with” PHP and I don’t want to be. It seems all the PHP contributors i know are ex-contributors because they got fed up with the community and bailed. Does complaining about their documentation and not personally editing it make me the world’s biggest prissy brat? Maybe. But I’ve got quite enough flamewars to juggle already on the feminism, secularism, and LGBT (add letters as necessary in context) fronts, and hobbies that very deeply personally matter to me using up my evenings and weekends. If people who choose to allocate their hobby points to working on PHP agree with me that they have a systematic design and documentation problem that needs a systematic answer, that’s super great. Otherwise, this is a monument of warning to passerby, and nothing more.

Nov 3

So PHP. Such Documented

So some infosec acquaintances of mine have dropped a random seed cracker for PHP’s mt_rand() - that is, given a good sample of the random output, determine what the original seed input was, and thereby recreate the entire random stream. Depending on the application, this could be used to crack encryption, cheat at games, et cetera. The fact that it is possible to do in a reasonable timeframe is kind of worrying in and of itself but that is besides the point. The point is PHP is a mess.

What is mt_rand()? The “mt” stands for "Mersenne Twister" because we should be building the name of the underlying algorithm directly into the namespace and forcing consumers of the API (you) to sit around saying “which version of rand() should I be using?” Fortunately, the documentation is here to help us decide!

mt_rand — Generate a better random value

Better than…?

Many random number generators of older libcs have dubious or unknown characteristics and are slow. By default, PHP uses the libc random number generator with the rand() function. The mt_rand() function is a drop-in replacement for this. It uses a random number generator with known characteristics using the Mersenne Twister, which will produce random numbers four times faster than what the average libc rand() provides.

This is such the most PHP thing. Most languages, I think, would do the exact opposite thing when faced with this problem: they would change the implementation of rand() to a known good one and shuffle off the old version to old_rand() if you absolutely needed to keep the old, platform dependent one (ie the seed stream was not portable between machines anyway). But no! Why do that when you can leave the bad one in place at the obvious name and implement the good one with an awkward name?

But of course, the documentation of rand() will clearly point out that it’s effectively deprecated and one should always use mt_rand(), right?

Nah. Sounds like a lot of work. Stuffing it in “See Also” should do the trick.

A quick look on github suggests that whether people use rand() or mt_rand() is about 50/50. And mt_rand() isn’t “cryptographically secure” anyway - for that you need OpenSSL! Github shows about ten thousand results for that versus about a million results for rand()/mt_rand().

I literally almost fell out of my chair laughing when I saw the awkwardness of this design and the asymmetry of the documentation. This sort of namespace clutter is just the most PHP thing.

PHP 2.0: A Review In Retrospect

This is not about PHP as we now know it in the waning months of 2013. This is about the waning months of the year 1997. I was nine years old. My life was not yet overshadowed by haphazard scripting languages. Somewhere in the wilderness, during a savage thunderstorm in the dead of night, a Danish Canadian pushed the 2.0 revision of his personal home page generator’s tarball to a web server.

It was never intended to go beyond my own private use.

Thus begins the manual. Thus begins the nightmare.

I had been told that PHP 2 had very little to do with PHP as I know it, which begins with PHP 4. However, while the underlying interpreter may have been rewritten multiple times, it’s clearly the same language with the same ideology, just with some curious syntactic choices that were later remedied. For example, switch statements – I kid you not – used a semicolon instead of a colon after the case condition. I have never seen or heard of anything like this and I assume it was to facilitate a lazier parser at the expense of asking the programmer to remember something weird.

This review is written in a more or less top-to-bottom reading of the manual rather than sorted by severity or lulz factor.

The first thing you will notice if you run a page through PHP/FI is that it adds a footer with information about the number of times your page has been accessed.

The first thing you will notice is that it’s PHP/FI (“Form Interpreter”), except when it’s just PHP or it’s just FI and they’re all the same thing and whether we call it any of these three things depends on the compile time options.

The difference between PHP and FI is only a conceptual one. Both are built from the same source distribution. When I build the package without any access logging or access restriction support, I call my binary FI. When I build with these options, I call it PHP.

But, anyway, it doesn’t actually mention how to turn off publicly displaying the page hit counter. I hope that’s everything you ever wanted in a web page. (You disable it by putting phpShowInfo off in any of srm.conf, one of two different sections of access.conf, or an .htaccess. What order are those prioritized in? Who knows.)

This is the first code sample:

<FORM ACTION="/cgi-bin/php.cgi/~userid/display.html" METHOD=POST>
<INPUT TYPE="text" name="name">
<INPUT TYPE="text" name="age">
<INPUT TYPE="submit">
</FORM>

Your display.html file could then contain something like:

<?echo "Hi $name, you are $age years old!<p>">

I’m alert(“XSS”) years old. Wait, did Javascript even exist in 1997? I don’t know, I was nine. My mom didn’t let me use computers because the pedophiles would eat me. (Wikipedia says it was already in Internet Explorer by then.) In any case, there are two key things to take away from this:

  • There is zero mention here or anywhere else (aside from a casual mention of magic quotes) about input/output filtering in even the broadest sense. The standard library has HtmlSpecialChars() but it’s not pointed out that you need to use it. Something that is completely mandatory for having a safely functional website is simply ignored (and it’s not because the PHP runtime does it automatically – it absolutely didn’t and doesn’t). Mind you, this version of PHP does have SQL bindings. Oh bother. No wonder the internet got to being such a mess.

  • That is in fact the entire display.html. This implies that register_globals is simply how the runtime works. Oh. Oh no. I feel faint. If you’re not familiar with PHP: register_globals auto-populates the script’s global namespace with variables based on the user-submitted form. The ways for an end-user to abuse this are myriad and in fact it’s so bad that PHP not only deprecated it but actually removed it entirely which is a pretty huge thing for the Language of Backwards Compatibility.

It’s easy to see how the 2000s were characterized by websites being so easily hackable. The foundations laid in the mid to late 90s were shaky as jello and weren’t systematically rectified in a timely fashion. In my ever so humble opinion PHP still is only a step and a half past this.

A second rather large caveat with Apache 1.0.x is that it does not align double types correctly on most architectures. You may find yourself getting strange bus errors from your httpd when using mod_php.

This isn’t PHP’s fault by any means but the WTF factor just shines bright.

PHP 2 comes with a weird magic trapdoor to configure access controls by appending ?config to the URL you want to configure. The password defaults to your unix username and falls back on “php” if it can’t figure it out. It mentions it’s “a good idea” to change the password but doesn’t actually mention how. (I’m guessing it might be on that magic config page itself.)

Note that the built-in PHP/FI based access control is likely to be discontinued in future versions. You should seriously consider using the security mechanism that comes with your web server instead.

I’m glad they actually deprecated it so quickly because the idea just feels kinda icky.

If a PHP variable is defined by the POST method data, or if the variable is defined by the HTTP daemon in the Unix environment, then GET method data cannot overwrite it. This is to prevent somebody from adding ?REMOTE_HOST=some.bogus.host to their URL’s and thus tricking the PHP logging mechanism into recording this alternate data. POST method data is however allowed to overwrite these variables.

Because as we all know POST cannot be performed by any means except holy magic.

By adding: ?EMAIL_ADDR= to any links on a page where the user’s email address is known, you may propagate it to the next page. The PHP logging system will automatically look for this variable and record its value as the user’s e-mail address in the logs.

Weirdly specific use cases hard coded into the runtime: check. It’s PHP all right.

Of note is that PHP already contained GD image processing bindings at this early stage. This was pretty much the entire reason I got into programming and my dynamically generated forum signature images were swank as heck. Of further note is that the Image* functions are simply dumped into the main (only) namespace – of course – and this was never corrected in later revisions of PHP – of course. Bonus points for the manual linking directly to a specific tarball of GD 1.3 rather than to the website proper.

PHP/FI can also be compiled to automatically escape any forward single quote ( ’ ) and double quote ( ” ) characters found in GET or POST data. If the MAGIC_QUOTES variable is defined in the php.h file then these quotes will be automatically escaped making it easier to pass form data directly to Postgres queries.

Another misfeature that did in fact face the guillotine in PHP 5.4 but at least this one was trying to help.

I’m disappointed that all the database sample code uses select * from table1 as the example so I can’t cite them for straight-up injection in the samples. However it is worth noting that some of these code samples seem to have suffered encoding corruption at some point. In particular the closing brace and semicolon have simply fallen off some lines.

eregi("(ozilla.[23]|MSIE.3)",$HTTP_USER_AGENT); Returns true if client browser is Netscape 2, 3 or MSIE 3.

Why no “M”??? I’m going to lose sleep over this. (It’s a case insensitive regex, so it’s not a clever hack around mozilla vs Mozilla.)

The following escape sequences are supported in most places where a quoted string argument is used:

\a --> bell
\b --> backspace
\n --> linefeed
\r --> carriage return
\t --> tab
\nnn --> octal char
\xXX --> hex char

Most places. But we won’t tell you the exceptions – that would be too easy!

A couple of functions in the PHP/FI script language epxect octal arguments to denote Unix-style permission parameters. In this octal notation 3 bits are used to represent the values 0-7. Each bit of the three represents a specific permission. Octal is traditionally noted in some contexts by a leading 0, such as 0755. You do not need to use this leading 0 in PHP since the functions that expect octal parameters are will simplyassume that the parameter is octal.

(Typos original.)

Wait hang on

simply assume that the parameter is octal

(sound of implications of deliberately reading a base-ten number in some other lower base for no good reason without mention of what happens if you, say, use the digit 9)

contains sound

my gods it’s full of hideous contortions of spacetime

PHP did in fact later remove this “convenient feature” and begin respecting the base of what you actually passed.

Moving on, the syntax!

Each PHP instruction starts with <? and ends with a >.

I’m curious why this later changed, since this isn’t necessarily bad per se although I prefer <? ?> for obvious visual balance.

(Edit: people on reddit have suggested this probably caused an ambiguous parse in some cases with respect to the greater-than operator. Now that they mention it, it’s glaringly obvious…)

Three types of variables are supported. Long integer, Double precision floating point and character strings.

We have since gained booleans and NULL. Why yes, of course NULL is a type, why wouldn’t it be?

$a = $b + $c; can do a couple of different things. If $b is a number, the numerical value of $c is added to $b and the sum is stored in $a. In this case the type of $c is irrelevant. The operation is guided by the type of the first variable. If $b is a string, then the string value of $c is appended to $b and the resultant string is placed in $a.

Different people have different feelings on whether or not string concatenation should be a distinct operator. I think it should. (And this is how PHP is now.) But “the operation is guided by the type of the first variable (on the right-hand side)” is a trumpeted warning of surprises when you rearrange things for whatever reason.

$a = "hello"; $$a = "world";

It turns out we’ve always had variable variables, great. (Not great.)

<?
      if($a==5 &&  $b!=0 );
          $c = 100 + $a / $b;
    endif;
>

Check the semicolon weirdness. Eww. It also simultaneously supports C-style with curly braces. Because let’s complicate the parser with two styles. Fortunately PHP grew out of this OH WAIT.

This is the list of “valid operators”:

<? $a = 2 + 1 > Addition
<? $a = 2 - 1 > Subtraction
<? $a = 2 * 1 > Multiplication
<? $a = 2 / 1 > Division
<? $a = 2 % 1 > Modulus
<? $a = 2 ^ 1 > Bit-wise Exclusive OR

The C-like incremental operators += and -= are supported. ie. <? $a += $b>

But not *= or /= ?

The C-like bit-wise operators &=, |= and ^= are supported. ie. <? $a &= 4>

Okay so we can only self-assign with bitwise operators weird

This is equivalent to: <? $a = $a & 4>

BUT ACCORDING TO YOUR OWN FREAKING TABLE THAT’S NOT AN OPERATOR!!!!!?!

Okay. Sit down for this next one. It’s a doozy.

A previous section talked about GET and POST method data and variables. If you think about it, you may be able to envision a security issue.

A security issue? Never.

bla bla bla GET is bad

PHP provides a SecureVar() function which is used to mark variables names as being secure variables. These secure variables can only be set directly in a PHP script, or they can come from a POST method form. They cannot be set using the GET method variable definition mechanism. From our above scenario, if we placed the line: <?SecureVar("data")> near the beginning of our second page, then the GET method trick would not work. The “data” variable would appear to be empty unless it came directly from the POST method form on the first page.

Please note that POST-method forms are not intrinsically secure. People can emulate the posting of any data to a form by simply telnetting to the HTTP port on your system. You need to take appropriate security measures to stop people from doing this if in fact security is a concern.

Do I even need to say anything? Okay, fine.

PHP 2.0: Make useless feature. Name it “Secure”. Note that it’s not secure. Offer no actual advice on how to do it correctly. Brilliant!

People defend PHP by saying it’s a lot better than it used to be.

Well, they’re not wrong.

PS I wrote a novel and it has nothing to do with hateblogging about PHP.

Unexpected follow surge; guess I’ll have to come up with more content. Some people have linked me some good leads for absurdity. But did I get linked somewhere or what (if you end a post on tumblr in a question mark people can reply, did you know that)?