PHP Manual Masterpieces

RSS

PHP 2.0: A Review In Retrospect

This is not about PHP as we now know it in the waning months of 2013. This is about the waning months of the year 1997. I was nine years old. My life was not yet overshadowed by haphazard scripting languages. Somewhere in the wilderness, during a savage thunderstorm in the dead of night, a Danish Canadian pushed the 2.0 revision of his personal home page generator’s tarball to a web server.

It was never intended to go beyond my own private use.

Thus begins the manual. Thus begins the nightmare.

I had been told that PHP 2 had very little to do with PHP as I know it, which begins with PHP 4. However, while the underlying interpreter may have been rewritten multiple times, it’s clearly the same language with the same ideology, just with some curious syntactic choices that were later remedied. For example, switch statements – I kid you not – used a semicolon instead of a colon after the case condition. I have never seen or heard of anything like this and I assume it was to facilitate a lazier parser at the expense of asking the programmer to remember something weird.

This review is written in a more or less top-to-bottom reading of the manual rather than sorted by severity or lulz factor.

The first thing you will notice if you run a page through PHP/FI is that it adds a footer with information about the number of times your page has been accessed.

The first thing you will notice is that it’s PHP/FI (“Form Interpreter”), except when it’s just PHP or it’s just FI and they’re all the same thing and whether we call it any of these three things depends on the compile time options.

The difference between PHP and FI is only a conceptual one. Both are built from the same source distribution. When I build the package without any access logging or access restriction support, I call my binary FI. When I build with these options, I call it PHP.

But, anyway, it doesn’t actually mention how to turn off publicly displaying the page hit counter. I hope that’s everything you ever wanted in a web page. (You disable it by putting phpShowInfo off in any of srm.conf, one of two different sections of access.conf, or an .htaccess. What order are those prioritized in? Who knows.)

This is the first code sample:

<FORM ACTION="/cgi-bin/php.cgi/~userid/display.html" METHOD=POST>
<INPUT TYPE="text" name="name">
<INPUT TYPE="text" name="age">
<INPUT TYPE="submit">
</FORM>

Your display.html file could then contain something like:

<?echo "Hi $name, you are $age years old!<p>">

I’m alert(“XSS”) years old. Wait, did Javascript even exist in 1997? I don’t know, I was nine. My mom didn’t let me use computers because the pedophiles would eat me. (Wikipedia says it was already in Internet Explorer by then.) In any case, there are two key things to take away from this:

  • There is zero mention here or anywhere else (aside from a casual mention of magic quotes) about input/output filtering in even the broadest sense. The standard library has HtmlSpecialChars() but it’s not pointed out that you need to use it. Something that is completely mandatory for having a safely functional website is simply ignored (and it’s not because the PHP runtime does it automatically – it absolutely didn’t and doesn’t). Mind you, this version of PHP does have SQL bindings. Oh bother. No wonder the internet got to being such a mess.

  • That is in fact the entire display.html. This implies that register_globals is simply how the runtime works. Oh. Oh no. I feel faint. If you’re not familiar with PHP: register_globals auto-populates the script’s global namespace with variables based on the user-submitted form. The ways for an end-user to abuse this are myriad and in fact it’s so bad that PHP not only deprecated it but actually removed it entirely which is a pretty huge thing for the Language of Backwards Compatibility.

It’s easy to see how the 2000s were characterized by websites being so easily hackable. The foundations laid in the mid to late 90s were shaky as jello and weren’t systematically rectified in a timely fashion. In my ever so humble opinion PHP still is only a step and a half past this.

A second rather large caveat with Apache 1.0.x is that it does not align double types correctly on most architectures. You may find yourself getting strange bus errors from your httpd when using mod_php.

This isn’t PHP’s fault by any means but the WTF factor just shines bright.

PHP 2 comes with a weird magic trapdoor to configure access controls by appending ?config to the URL you want to configure. The password defaults to your unix username and falls back on “php” if it can’t figure it out. It mentions it’s “a good idea” to change the password but doesn’t actually mention how. (I’m guessing it might be on that magic config page itself.)

Note that the built-in PHP/FI based access control is likely to be discontinued in future versions. You should seriously consider using the security mechanism that comes with your web server instead.

I’m glad they actually deprecated it so quickly because the idea just feels kinda icky.

If a PHP variable is defined by the POST method data, or if the variable is defined by the HTTP daemon in the Unix environment, then GET method data cannot overwrite it. This is to prevent somebody from adding ?REMOTE_HOST=some.bogus.host to their URL’s and thus tricking the PHP logging mechanism into recording this alternate data. POST method data is however allowed to overwrite these variables.

Because as we all know POST cannot be performed by any means except holy magic.

By adding: ?EMAIL_ADDR= to any links on a page where the user’s email address is known, you may propagate it to the next page. The PHP logging system will automatically look for this variable and record its value as the user’s e-mail address in the logs.

Weirdly specific use cases hard coded into the runtime: check. It’s PHP all right.

Of note is that PHP already contained GD image processing bindings at this early stage. This was pretty much the entire reason I got into programming and my dynamically generated forum signature images were swank as heck. Of further note is that the Image* functions are simply dumped into the main (only) namespace – of course – and this was never corrected in later revisions of PHP – of course. Bonus points for the manual linking directly to a specific tarball of GD 1.3 rather than to the website proper.

PHP/FI can also be compiled to automatically escape any forward single quote ( ’ ) and double quote ( ” ) characters found in GET or POST data. If the MAGIC_QUOTES variable is defined in the php.h file then these quotes will be automatically escaped making it easier to pass form data directly to Postgres queries.

Another misfeature that did in fact face the guillotine in PHP 5.4 but at least this one was trying to help.

I’m disappointed that all the database sample code uses select * from table1 as the example so I can’t cite them for straight-up injection in the samples. However it is worth noting that some of these code samples seem to have suffered encoding corruption at some point. In particular the closing brace and semicolon have simply fallen off some lines.

eregi("(ozilla.[23]|MSIE.3)",$HTTP_USER_AGENT); Returns true if client browser is Netscape 2, 3 or MSIE 3.

Why no “M”??? I’m going to lose sleep over this. (It’s a case insensitive regex, so it’s not a clever hack around mozilla vs Mozilla.)

The following escape sequences are supported in most places where a quoted string argument is used:

\a --> bell
\b --> backspace
\n --> linefeed
\r --> carriage return
\t --> tab
\nnn --> octal char
\xXX --> hex char

Most places. But we won’t tell you the exceptions – that would be too easy!

A couple of functions in the PHP/FI script language epxect octal arguments to denote Unix-style permission parameters. In this octal notation 3 bits are used to represent the values 0-7. Each bit of the three represents a specific permission. Octal is traditionally noted in some contexts by a leading 0, such as 0755. You do not need to use this leading 0 in PHP since the functions that expect octal parameters are will simplyassume that the parameter is octal.

(Typos original.)

Wait hang on

simply assume that the parameter is octal

(sound of implications of deliberately reading a base-ten number in some other lower base for no good reason without mention of what happens if you, say, use the digit 9)

contains sound

my gods it’s full of hideous contortions of spacetime

PHP did in fact later remove this “convenient feature” and begin respecting the base of what you actually passed.

Moving on, the syntax!

Each PHP instruction starts with <? and ends with a >.

I’m curious why this later changed, since this isn’t necessarily bad per se although I prefer <? ?> for obvious visual balance.

(Edit: people on reddit have suggested this probably caused an ambiguous parse in some cases with respect to the greater-than operator. Now that they mention it, it’s glaringly obvious…)

Three types of variables are supported. Long integer, Double precision floating point and character strings.

We have since gained booleans and NULL. Why yes, of course NULL is a type, why wouldn’t it be?

$a = $b + $c; can do a couple of different things. If $b is a number, the numerical value of $c is added to $b and the sum is stored in $a. In this case the type of $c is irrelevant. The operation is guided by the type of the first variable. If $b is a string, then the string value of $c is appended to $b and the resultant string is placed in $a.

Different people have different feelings on whether or not string concatenation should be a distinct operator. I think it should. (And this is how PHP is now.) But “the operation is guided by the type of the first variable (on the right-hand side)” is a trumpeted warning of surprises when you rearrange things for whatever reason.

$a = "hello"; $$a = "world";

It turns out we’ve always had variable variables, great. (Not great.)

<?
      if($a==5 &&  $b!=0 );
          $c = 100 + $a / $b;
    endif;
>

Check the semicolon weirdness. Eww. It also simultaneously supports C-style with curly braces. Because let’s complicate the parser with two styles. Fortunately PHP grew out of this OH WAIT.

This is the list of “valid operators”:

<? $a = 2 + 1 > Addition
<? $a = 2 - 1 > Subtraction
<? $a = 2 * 1 > Multiplication
<? $a = 2 / 1 > Division
<? $a = 2 % 1 > Modulus
<? $a = 2 ^ 1 > Bit-wise Exclusive OR

The C-like incremental operators += and -= are supported. ie. <? $a += $b>

But not *= or /= ?

The C-like bit-wise operators &=, |= and ^= are supported. ie. <? $a &= 4>

Okay so we can only self-assign with bitwise operators weird

This is equivalent to: <? $a = $a & 4>

BUT ACCORDING TO YOUR OWN FREAKING TABLE THAT’S NOT AN OPERATOR!!!!!?!

Okay. Sit down for this next one. It’s a doozy.

A previous section talked about GET and POST method data and variables. If you think about it, you may be able to envision a security issue.

A security issue? Never.

bla bla bla GET is bad

PHP provides a SecureVar() function which is used to mark variables names as being secure variables. These secure variables can only be set directly in a PHP script, or they can come from a POST method form. They cannot be set using the GET method variable definition mechanism. From our above scenario, if we placed the line: <?SecureVar("data")> near the beginning of our second page, then the GET method trick would not work. The “data” variable would appear to be empty unless it came directly from the POST method form on the first page.

Please note that POST-method forms are not intrinsically secure. People can emulate the posting of any data to a form by simply telnetting to the HTTP port on your system. You need to take appropriate security measures to stop people from doing this if in fact security is a concern.

Do I even need to say anything? Okay, fine.

PHP 2.0: Make useless feature. Name it “Secure”. Note that it’s not secure. Offer no actual advice on how to do it correctly. Brilliant!

People defend PHP by saying it’s a lot better than it used to be.

Well, they’re not wrong.

PS I wrote a novel and it has nothing to do with hateblogging about PHP.