PHP Manual Masterpieces

RSS
Jun 3

Unsettling

isset — Determine if a variable is set and is not NULL

We’re not even through the first sentence of this manual page and I am already wondering why we’re conflating two completely different definitions of a variable being “set” in one function. This function simultaneously tests whether the variable name exists and whether the value of the variable is non-null. In my personal opinion, those are two very different questions. I don’t need a special snowflake function to test if the contents of a variable is null, but I do need one to inquire of the runtime if a variable name has been declared. (Insofar as PHP declares anything, which it doesn’t, really.)

Special snowflake functions is kind of the name of the game with PHP, though. We’ll get back to that…

… because isset() isn’t even a function. It’s a godsdammed token. A token. Parsed by the parser. This is like sizeof() in C, except it does run at runtime rather than returning a constant at the parsing stage, unless the PHP parser has solved the halting problem. The documentation calls it a “language construct,” implying, well, I don’t know what that implies, because everything is a language construct. This particular quirk has already been covered in the masterpiece of PHP hate blogging, but it bears some pondering. Note that isset() is documented under the list of builtin PHP functions, and “function” is in the URL. It returns a bool, like a function would. The primary difference is that you get a parse error if you pass an expression rather than a comma-delimited list of one or more variables.

I think I have reconstructed the thought process behind this, and it’s a good point: “you know, PHP needs a way to inspect its environment at runtime to determine whether a particular variable has been instantiated. This is obviously perfectly suited for a built-in function. But wait! What if they call isset($a + "b"), which is not a bare variable?”

php > $a = 2; if(isset($a+"b")) { echo "true\n"; } else { echo "false\n"; }

Parse error: parse error, expecting’,” or ')'' in php shell code on line 1

My solution, assuming we can’t interpolate the result of the expression and arrive at “ab”, would be to return false. It’s not a variable. Therefore it is not set. Perhaps one would prefer it to return NULL, indicating the question has no answer. I’m big on generalizing and minimizing special snowflake cases. This is not, in my opinion, such a profound problem of engineering as to justify a massive exception to the concept of functions.

Of course, isset() is not the only pseudo-function implemented like this; empty(), unset(), and probably others exist. However:

php > const a = 2; if(defined('a')) { echo "true\n"; } else { echo "false\n"; }

true

php > const a = 2; if(defined('a'+'b')) { echo "true\n"; } else { echo "false\n"; }

false

Look at that! defined() uses the exact implementation I have suggested, but isset() does not. You can throw anything into the argument of defined() - some numbers, NULL, whatever - and it will return false without throwing a hissy fit. I note that defined() takes a string argument, which is, I assume, the key implementation difference (please don’t make me go look, I don’t want to go back to the dark place). (Also, allow me to complain how these functions are vaguely named, which requires that you look them up in documentation.) Seriously, what the hell is this, what are you doing with your life PHP, special casing all over the place and doing so inconsistently, didn’t your mother ever tell you - deep breath, sorry about that. I get really worked up about programs sometimes.

Anyway - remember what I was saying about special snowflake functions?

The filter class apparently has its own, slightly different implementation, which will only test the arrays used to store user inputs. Why did they need to clutter up the runtime with a nearly identical implementation instead of isset()? Perhaps they were annoyed by stupid, pointless parse errors and reimplementing under the standard library’s nose was easier than getting it changed. I don’t know. Perhaps there was a really good reason they needed a separate isset(), having to do with the undocumented fact that it tests the original input array sans any runtime changes. I don’t know, because PHP never, ever documents the reason for squat.

php > if(isset(NULL)) { echo "true\n"; } else { echo "false\n"; }

Parse error: parse error, expecting 'T_PAAMAYIM_NEKUDOTAYIM' in php shell code on line 1

wat

Mar 9

The more you know ♒⭐

Did you know that PHP has two entirely different syntax styles for if/else clauses?


if ($a > $b) {
echo "a is bigger than b";
}


if($a > $b):
echo $a." is greater than ".$b;
endif;

These are both lovely and sensible ways to delineate control structures. Having both in the same programming language, however, is madness.

Did you know that mixing these two styles in close proximity can result in baffling errors, even if they’re nested scopes that seem conceptually separate?

Did you know that the brace style accepts both “elseif” and “else if”, but colon style only accepts “elseif”?

I wouldn’t lie to you. It really is that nonsensical.

Did you know this can also be done with loop structures? Did you know the documentation offers no rationale for why these two different syntaxes were implemented, complicating the parser and increasing opportunity for weird bugs because…?

The more you know, the more you’re left to wonder.

Mar 5

I am so comically angry right now

Hey uhh so strcmp() huh pretty simple I mean what could be simpler

Returns < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal.

Pretty straightforward right how could this possibly be messed up it’s freaking strcmp either the strings are equal or they ain’t right

http://danuxx.blogspot.co.uk/2013/03/unauthorized-access-bypassing-php-strcmp.html

GODS FREAKING DARN IT ALL TO HECK.

This is why I hate you, PHP! This is why I consider you an unmitigated failure, a pox on the good name of programming! 

Dear language designer, pick one:

  • A language with a very casual typing system;
  • A language with standard API functions that cannot return an error indicating incompatible types were passed at runtime.

NOT BOTH. YOU CAN’T HAVE BOTH.

When I shared this on Twitter, legions of people showed up to say it was the programmer’s fault for using strcmp(). Riddle me this: is strcmp() part of the standard PHP API? Yes. Is it deprecated? No. Are there any caution notes in the official documentation? No. I don’t care how hard you try to blame the programmer, this is fundamentally bad language design. It’s a trap.

You can easily cut yourself on the standard API of many languages. The difference is that languages like C are labeled “sharp knife” and PHP is labeled “child-safe scissors.” Cutting yourself should not be this easy.

Dear sweet merciful angels I want to stab this language, twist the knife, yank out its still-beating heart and smash it against a rock.

 (╯°□°)╯︵ ┻━┻  0xabad1dea OUT.

when will you start doing some more articles ?

Anonymous

Oh, sorry! I haven’t been forced to look at any PHP documentation for a while, can you tell? :)

For once, I’m on Internet Explorer’s side

My “pick a page at random and just read” strategy continues to pay off.

You just know that the documentation for the upload features of PHP is going to have some really terrible implementations of handling file uploads in the comments. And it most certainly does.

Simply not validating uploads, however, is so pedestrian. We want to go the extra mile, and the solution lies in a place you’d never expect for a PHP wonder.

From a comment of 2007 vintage:

As far as I understand IE has his own MIME types based on the values stored in a registry. To locate this “feature” I spent a lot of time and was granted with a perfect headache. :) In my case I tried to upload a CSV file on a server and abort a script in case if the corresponding file isn’t of a desired type. And it work fine with Opera and stuck with IE. So the workaround is that you should add this values in your windows registry (I have windows xp box and it works fine for me)

  [HKEY_CLASSES_ROOT\.csv]
  "Content Type"="application/vnd.ms-excel"
  @="Excel.CSV"
  [HKEY_CLASSES_ROOT\.csv\Excel.CSV]
  [HKEY_CLASSES_ROOT\.csv\Excel.CSV\ShellNew]

I actually wasn’t sure at first if the OP meant to modify the registry on the client or server side, because if I’ve noticed a trend in PHP manual comments, it’s that a surprisingly large portion of them are clearly running PHP on Windows servers. (Is this representative of PHP programmers in general, or only the kind that post comments on manual pages? More data needed.) After carefully reading it about a dozen times, however, I am pretty sure they mean client side. Let that sink in.

“Want to use my website? Modify your Windows registry!”

How… what… I mean… I know IE is a non-compliant pain to work with (especially back in 2007), but come on! Did it really never occur to this person that maybe they were not implementing things in the ideal way? Client-specified mimetypes, just like server-specified mimetypes, are completely arbitrary. The fact that IE doesn’t come with a preset for CSV files should have been the dead giveaway that made this programmer pause and reconsider their solution.

I bet they had some stupid case statement for the slightly different mimetypes returned by four different kinds of browsers, too, and that it broke every few months, and it still never occurred to this person that their solution was not on solid ground. I simply can’t have faith that it went down any other way.

Oct 9

Drop the ball

When your language manual straight-up has a page called Type Juggling, you’re gonna have a bad time.

Juggling is hard enough without adding knives and flaming brands to the mix, don’t you think?

From an oooold comment of the 2005 vintage:

In my much of my coding I have found it necessary to type-cast between objects of different class types.

function ClassTypeCast(&$obj,$class_type){
    if(class_exists($class_type,true)){
        $obj = unserialize(preg_replace"/^O:[0-9]+:\"[^\"]+\":/i", 
      "O:".strlen($class_type).":\"".$class_type."\":", serialize($obj)));
    }
}

That’s a bit dense, so let it sink in. This person is serializing an arbitrary class, using regex to replace the serialized class name and its length, and unserializing to coerce PHP into some ghastly faux “typecast” of two different classes. I don’t even want to know what can go wrong here, because it can’t be pretty. I want to slam the lid shut on this idea and lose the key down a hell pit.

Just kidding, let’s go Pandora on this box of badness.

class MyClass {
    private $foo = null;

    function setString($str) {
        $this->foo = $str; }
    
    function awful() {
        return "This is awful!"; }
}

class MyClass2 {
    private $foo2 = null;

    function setString2($str) {
        $this->foo2 = $str; }
    
    function awful() {
        return "This is awesome!"; }
}

  $my1 = new MyClass();

  $my1->setString("kill me now");

  ClassTypeCast($my1, "MyClass2");

  echo $my1->awful() . "<br>";

  var_dump($my1);

  This is awesome!
  object(MyClass2)#2 (2) { ["foo2:private"]=> NULL 
  ["foo:private"]=> string(11) "kill me now" }

So as you can see, a quick test suggests that the resulting mangled class will have the data members of both classes. How in tarnation does that even happen, PHP? When there is a naming conflict, it seems the new type takes precedence, but you get all these lingering hangers-on if the two types didn’t have the exact same method and variable names. I bet there are all sorts of difficult-to-foresee side effects resulting from tampering with this nonsense.

The only saving grace of this post is that they actually managed to lose a critical punctuation mark somewhere in there (an exercise for the reader), which causes it to fail to compile as written, which should stop the first tier of naive PHP programmers from cut-and-paste catastrophe.

Put down the serialize() and back away slowly.

Oct 8

Layers of mistaken, like an ogre.

From the comments section on the documentation of microtime:

Something I noticed: If you directly echo the time difference between start and the end you’ll get a negative value most of the time.

  <?php
  //This will produce in negative value:
  $start = microtime();
  for($i=100;$i>0;$i--){
  echo $i;
  }
  $end = microtime();

  echo 'Took:'.$end-$start.' seconds';
  ?>

(I’d like to note that tumblr does not mess up the indentation, it comes to us exactly so.)

Running this script exactly as given, I received this output:

-0.24061 seconds

Oh wow, they’re right! That’s really wei - - - - Hang on. echo 'Took:'.$end-$start.' seconds'; Look at this line as long as you need to. Something looks different, right?

Where did the “Took:” go in the output? Something’s wrong. This coder has made a crucial mistake regarding the functionality of microtime(), but before we even worry about that…

  $foo = 17;
  $bar = 15;
  echo "Concatenation".$foo."rocks!";
  echo "Concatenation".$foo;
  echo "Concatenation".$foo-$bar."rocks!"; // LOOK HERE
  echo "Concatenation".($foo-$bar)."rocks!";

  Concatenation17rocks!
  Concatenation17
  -15rocks! // AND HERE
  Concatenation2rocks!

You know what? It took me a really long time to work out how PHP arrived at this bizarre result. Now I would have supposed that the concatenation operator would be either higher or lower than the subtraction operator, depending on the mood of the language designer, but to my surprise, in PHP they have the same precedence, meaning we fall back on the left-to-right rule.This means we have:

(("Concatenation" . $foo) - $bar) . "rocks!";

which is equivalent to:

`(“Concatenation17” - 15) . “rocks!”;

When you recall that a string cast to integer yields 0 if it doesn’t begin with an ASCII-encoded number, suddenly this makes sense. 0 - 15 is in fact -15, giving us “-15rocks!”. That’s one problem down.

The other problem, as I already hinted, is API misuse. microtime() returns a string that contains representations of two values with a space in between, like this:0.33485800 1349741452. Coercing that to a number will only run up until the space, meaning that subtracting two consecutive results is not meaningful if we’ve crossed a seconds boundary in the meantime. We can fix this simply by changing the calls to microtime(true) which, per the documentation, returns a normal float value of those two numbers added together. With that in mind…

  // I think you will find this altogether more what you wanted.
  $start = microtime(true);
  /* time waster goes here */
  $end = microtime(true);
  
  echo 'Took:'.($end-$start).' seconds<br>';
  // I don't like e-notation, personally
  // (and ironically this circumvents the precedence problem)
  printf ("Took %f seconds<br>",$end-$start);


  Took:2.4080276489258E-5 seconds
  Took 0.000024 seconds

That looks an awful lot like a reasonable result. I declare victory over PHP yet again.

Oct 7

Revolutionary input validation

int frenchtojd ( int $month , int $day , int $year )

Converts a date from the French Republican Calendar to a Julian Day Count. These routines only convert dates in years 1 through 14 (Gregorian dates 22 September 1792 through 22 September 1806). This more than covers the period when the calendar was in use.

I am so glad we have this in our standard library. This is such a critical use case of calendars on web sites.

I searched Google Code and Github and I honestly could not find a single proper user of this function. All the hits were the string just appearing in geshi’s syntax highlighting token tables. It’s made it into some calendar conversion widgets because it just happens to be there, I guess.

Notice, however, that the documentation does not say what happens if you pass in invalid data. Does it return a negative number? Raise an error condition of some kind? Return Napoleon’s birthday? Who freaking knows! I’d try to find out empirically, but my online PHP shell has actually disabled this function for vague “security reasons.”

So again, let’s risk heart attack and check out the source code which has not been modified in eleven years.

Zero is returned when the input date is detected as invalid or out of the supported range. The return values will be > 0 for all valid, supported dates, but there are some invalid dates that will return a positive value. To verify that a date is valid, convert it to SDN and then back and compare with the original.

… I’m not even going to snark about that.

Anticipating hate mail from exactly one history professor in France who’s been using this for eleven years and doesn’t want to have to reimplement this.

Oct 7

The documentation clearly says raptors.

From the comment section of the documentation on exceptions:

To continue the execution code after throw new Exception, goto operator can be used, like this:

<?php
try {
    echo 'one';
    throw new Exception('-error-'); a:
    echo 'two';
} catch (Exception $e) {
    echo $e->getMessage();
    goto a;
}
//output: one-error-two
?>

AAAAAAAAA - hang on, when did PHP get goto? 5.3? All right then - AAAAAAH WHAT ARE YOU DOING.

Now before you send me hatemail, I’m an asm programmer and I actually like goto, in the proper context. It’s probably a good thing PHP didn’t have goto back in the 4.2 days when I was learning, or I would have thought ‘twas the neatest thing and used it everywhere. PHP’s goto apparently has some restrictions on what sort of scopes you can jump between, which is a good thing.

Apparently, jumping from a catch block into the middle of a try block is not one of these restrictions.

I’m on vacation right now, and I’ve been using this online thingie to test the PHP snippets I’ve been posting about. I just noticed it’s actually still on PHP 5.2, so I haven’t gotten to witness this executing - but I will take it on faith that it works.

Try-catch is one of the most structured concepts there is in programming, and goto is the sworn enemy of that. If you want to make sure that the next thing after an exceptable line in a try block always executes, don’t put it in the same try block! Just put it bare after the try/catch or, if it’s also exceptable, in another try block. (I feel like “exceptable” may not be the scientific term.) Obviously the example in the comment is a proof-of-concept, but you are going to get yourself into trouble so fast like this, unknown commenter. Do not trifle with the gods of program control flow, for they are subtle and quick to segfault. (Normally I wouldn’t worry about segfaulting in an interpreted language, but, well, PHP. It happens.)

Let me put on my security auditor hat. When you use goto to defeat control flow, you are making it a lot freaking harder to verify the correctness of your program. Terrible, wicked bugs will hide in the nooks and crannies of your supposed cunning to devour you. More importantly, your auditor will go looking for her murderin’ axe that she keeps in that closet you’re not allowed to open.

And if you’re still not convinced, if you check the documentation on goto, it clearly says beneath the examples that if you actually use this feature, you will be eaten by raptors. (Who approved this for mainline?!)

Oct 6

Two’s complewhat

I continued reading the comments on the bizarre dechex conversion function which works on an “unsigned int” type that doesn’t actually exist in PHP. It says that negative signed numbers shall be treated as unsigned.

Someone took this to mean it creates incorrect results for negative numbers and tried to roll their own which could produce the negative hex representations of negative integers. Ignore the fact that it doesn’t prepend 0x to the output as neither does dechex().

function dec_to_hex($dec) 
{ 
    $sign = ""; // suppress errors 
    if( $dec < 0){ $sign = "-"; $dec = abs($dec); } 

/* ... an array-index based algorithm goes here */
    
    return $sign . $h; 
} 

If you pass 256, you get the output 100. If you pass -256, you get the output… -100. With the literal unary operator.

This opens an interesting question about what maketh a negative number. In a purely abstract mathematical sense, slapping a negative sign on a positive number is “correct” in any base. But this is computer programming, and hexadecimal is special, because we use it to map directly to literal bit values whereas decimal is an abstraction.

Assuming we add the 0x prefix (after the unary dash) to make real hexadecimal number tokens out of these results, we can do math like this: 257 + -0x100 returns 1 as expected. This is because PHP is taking the unary operator and doing a two’s complement negation of 0x100 at runtime. -0x100 is not the actual hex representation of -256, it is an expression that evaluates to it. It’s like returning the string “2 + 3” and saying you’ve returned “5”. Only… sort of.

The actual negation of 0x100 is 0xFFFFFF00 on 32-bit or 0xFFFFFFFFFFFFFF00 on 64-bit (note the lack of negative signs). If you are unsure where all those F’s came from, check out two’s complement representation. Honestly, this is a very complex subject and I don’t blame beginner programmers, or ones who have only ever dealt with scripting languages, for falling down on this.

Here’s the thing, though. If you call dechex(-256), you will get 0xFFFFFF00, the correct result, even though the documentation explicitly warns that it coerces input to be unsigned. When you understand why, you have achieved binary representation nirvana.

I did not understand hexadecimal, binary, signed and unsigned, and casting - all concepts which PHP exposes but does not handle very gracefully - until PHP stopped being my only programming language. And that’s your PSA for the day.