PHP’s humble equality operator can return some confusing results. Sharon Lee Levy digs into the Zend engine’s source code to find a solution.

Equality in PHPland
Kommentare

Image licensed by Ingram Image

One of the questions posed during the Q&A period of my talk “PHP: Quirks, Gotchas & Wizardry” at the Web and PHP Conference, September 2013 concerns how to apply the equality operator appropriately, in light of its potential for misuse. The question raises another — why does PHP even bother to retain that operator?

PHP has two operators to test for equality, “==” and “===”, respectively equality and identity, similar to JavaScript. Douglas Crockford in his book JavaScript: The Good Parts advocates that users avoid double-equals which he deems as evil. He claims that “==” and “!=” present issues because “…they attempt to coerce the values. The rules by which they do that are complicated and unmemorable.” His words might also describe the corresponding PHP counterparts if you review the equality comparison tables. A noteworthy difference between equality in PHP and that of JavaScript involves comparing null with false which yields true in PHP but false in JavaScript.

Years ago, PHP developers favored equality far more than identity. Since the double-equals symbol occurs in math as well as the C programming language, often part of the skill set of PHP’s early adopters, the bias naturally appeared. Nowadays, after misadventures in PHPland, you may hear developers advocate vigorously for replacing the loosely-typed equality with the stricter identity, suggesting that otherwise there may be hell to pay.

The following snippet, devoid of hashing for simplicity’s sake, vividly illustrates the potential peril of coding with ‘==’:


$user_pass validates because strings beginning with a number acquire that value in a numeric context. So, you may do crazy addition, as follows:


If you replace equality in the previous password example with identity, the operands fail to match in type and consequently differ in value, too, denying the user access.

The website phpsadness.com displays a stunning, related comparison of two numeric strings, each exceeding the boundaries for integers. The code provoked a heated online debate. Formerly, PHP converted such strings to floats, despite precision issues. Core contributor Gustavo Lopes resolved the matter by modifying the Zend engine’s zendi_smart_strcmp to adhere to the following rule annunciated in a subsequent related discussion:

If both strings look like integers (no decimal separator nor exponent) but they were both converted to doubles because of being too large in absolute value, if they both compare equal in a double comparison, and if they're both larger than 2^53-1 in absolute value, then compare them as a string.

This description should have reassured users about the reliability of equality but the fallout from this episode may have soured some on double-equals.

Applicability of Equality

One should keep an open mind; sometimes equality offers the best choice. Suppose you were to compare objects, as follows:

clr = "blue";

$objB = new StdClass();
$objB->clr = "blue";

$objC = $objA;

var_dump( $objA ==   $objB);  // true
var_dump( $objA === $objB);   // false

$objC->clr = "green";

var_dump( $objA === $objC);  // true
var_dump( $objA ==   $objB); // false

Since $objA and $objB are instances of the same class rather than referring to the same object, the only sensible basis for comparison involves equality. By means of “==”, PHP saves you from having to manually check whether the instances share the same properties and values. Changing the color of $objC affects $objA, of course, destroying the equivalency of $objA with $objB since these instances now differ in their respective color value.

The other scenario for “==” involves numeric strings which web applications encounter in data sent via the http protocol, i.e. form data which whether originally character-based or numeric, comes across as text. Databases, too, return data regardless of type generally as strings. Using the “==” simplifies the task of comparing numbers with strings, as follows:


If the code were to use “===” the first song title would fail to display since the expression would evaluate as false. The double-equals greatest utility consists in permitting you to conveniently compare different data types with each another.

In the following example, I have some code in Python 2.51 as follows:

name = "Wanda"
print "W"+"anda"           # Wanda
print name is "W"+"anda"   # true
print name == "W"+"anda"   # true

I’ve moved it over to PHP for kicks with the least amount of tampering:


PHP requires the dot operator for concatenation instead of the ‘+’ symbol. It avoids ambiguity such as confusing an expression for representing addition rather than joining strings. But, let’s live with the code as is and see what transpires. The two strings sum up to a zero, since each evaluates as zero in the numeric context afforded by the plus operator. The third statement results in false since “Wanda” is not identical to zero. False fails to make an appearance for performance reasons, even though its absence may surely rankle some users. In the last line, the double-equals coerces $name to acquire the value of zero owing to the numeric context, causing the comparison to evaluate as true, whose value in the string context of construct print is "1".

If we alter the context and compare “Wanda” with Boolean data, then $name will temporarily acquire the value of true, as follows:


The Transitivity Issue

The author of “PHP: a fractal of bad design” criticizes PHP’s equality for lacking transitivity and so does www.phpwtf.org. Transitivity is a property of equality according to mathematical logic, such that if a equals b, and b equals c, then a must equal c. So, if equality is not transitive in PHP, must you scrupulously avoid its use? Mercifully, PHP compensates by allowing you to alter the context to obtain the desired result. Consider the following:


Behind the scenes temporary conversions occur with respect to the above snippet. There are all sorts of terms for describing this action: type juggling, type promotion, typecasting, and autoconversion. Take your pick! The “==” operator converts “oranges to apples,” so PHP may compare “apples with apples.” The first line compares a string with an integer, which bestows numeric context and causes the operands to compare true. But a Boolean context distinguishes the next line which promotes the left operand to Boolean false. The last line may seem odd until you recall that in PHP in Boolean context only the empty string and “0” evaluate as false; every other string is true.

If you’re displeased with the preceding example’s final result, you can simulate transitivity by manually manipulating the context, as follows:


The int cast promotes the value of the left operand to a zero which in Boolean context changes its value to false, so the expression evaluates as true.

The impact of equality lacking transitivity is debatable. While it effects sorting, since the sort() function is based on the quick-sort algorithm (a comparison algorithm), PHP compensates for this deficit by offering developers a choice of flags.

Equality Strangeness

Equality sometimes creates a weird equivalence as follows:


Why is the result true instead of false? Someone made an arbitrary decision to ignore the white space in a numeric string, so the expression becomes a binary comparison of “2” with itself. Apparently, a concern for backwards compatibility (BC) demands maintaining this oddity (see markmail.org/message/ymzzvyhyf32fldiu).

Internally, PHP determines whether the string qualifies as numeric in the C-source code of function numeric_string_ex, an improved version of the former is_numeric_string. The function reads the value, skipping any blank space before examining the next character, which the following excerpt reveals:

/* Skip any whitespace
 * This is much faster than the isspace() function */


while (*str == ' ' || *str == 't' || *str == 'n' || *str == 'r' || *str == 'v' || *str == 'f') {
    str++;
    length--;
}

Renowned PHP expert, Dr. Paul Biggar suggests a macro as a neater solution than the preceding code in a code review. I decided to try myself implementing his suggestion as follows:

#define ZEND_IS_WHITESPACE(p,length)   
  while (*p == ' ' ||                
 *p == 't' ||                       
 *p == 'n' ||                         
 *p == 'r' ||                       
 *p == 'v' ||                       
 *p == 'f') {                       
p++; length--; }

Using this macro, the programmer could substitute the “Skip any whitespace” block of code with the following:

ZEND_SKIP_WHITESPACE(str,length);

Of course, macros can have side-effects so one should apply them with great care and thought per the recommendations of gnu.org. Perhaps, for that very reason PHP’s source code intentionally lacks such a macro.

One may avoid a MACRO entirely, by defining white space as follows:

#define WHITESPACE ' ','t','n','r','v','f' 

Then, one could modify is_numeric_string_ex by declaring the following variables:

char space_chars[] = { WHITESPACE };
int i, length = 0;

The last step consists of revising the “skipping white space” code as follows:

length = strlen(str);
for ( i=0; i <= length; i++ ) {
     if (*str == space_chars[i]) {
       str++;
       length--;
     }
}

One may argue that either one of the preceding solutions involves more lines of code than the present version of is_numeric_string_ex. However, code legibility is an important aesthetic value, especially for efficiently maintaining the code. This one, small example of modicum import will hopefully pique your interest in taking a look at what lies beneath the proverbial hood of PHP.

Testing for Falsity and Zero

Returning to equality, you may find it convenient with respect to functions that may return 0 in a true result. For example, take the case of the zeroth element in an array which holds the value you seek. Or, there may be a situation which requires testing on the basis of falsity, as follows:

 "late", false => "hood", 4 => "site");

$del = function ( $array, $key ) {
 unset( $array[ $key ] );
     return array_values( $array );
};
$seek = "hood";
if (  ( $dex = array_search( $seek, $arr ) ) == false ) {
$arr = $del( $arr, $dex );
}
var_dump($arr); //[ 0 => "late", 1 => ”site”]

This snippet searches for an element whose value is “hood” and has a key of false. The key will be promoted to an integer of zero according to the rules governing key identifiers, again a temporary change. $dex will possess that value after array_search() locates the value and returns its key promoted to zero. Using === instead of == would have caused the conditional to evaluate as false, so once again the flexibility of the loose double-equals operator prevails over the strict triple-equals. The array’s ‘reindexing’ owes to array_values() outputting the input array of values numerically indexed.

The Action behind the Scenes

All this discussion about equality and its coerciveness may seem like magic. I strongly recommend reading the C-source code if you seek to gain a deeper understanding of how equality works in PHP. Consider the following example:


To understand what happens internally with the above snippet, start by reading “The Anatomy of Equals”. (You may also review this snippet's opcodes online). The article helps you gain an understanding of opcodes, handlers and the fast_equal_function() (see lxr.php.net for sourcecode). The default processing of this function looks for operands that are either long integers or doubles, but we have neither in this particular example! What now? The function invokes compare(). So, is that it? Unlikely, especially given that, there may be a setback. None of the specified switch cases matches our data, but hopefully the default pertains. If you inspect it, you may begin to see the glimmerings of light at the end of the tunnel. Look at the section that begins with “if (!converted) { “ and you’ll see a conditional that matches what we have:

else if (Z_TYPE_P(op2) == IS_BOOL) {
    zendi_convert_to_boolean(op1, op1_copy, result);
    ZVAL_LONG(result, ZEND_NORMALIZE_BOOL(Z_LVAL_P(op1) - Z_LVAL_P(op2)));
    return SUCCESS;
}

Since the second operand is a Boolean value, the MACRO zendi_convert_to_boolean converts the string. The next line performs the comparison, allowing for return values of 0, 1, or -1, which respectively correspond to $a being equivalent to $b, or else greater than or inferior to $b. One quick comment about zendi_convert_to_boolean. The code for this function explains why “0” fails to evaluate as true despite containing a value. Inspecting a portion of the MACRO below, reveals that this choice, rather than mandated by logic, derives from a design decision:

 case IS_STRING: 
     if (Z_STRLEN_P(op) == 0                              
         || (Z_STRLEN_P(op)==1 && Z_STRVAL_P(op)[0]=='0')) { 
         Z_LVAL(holder) = 0;                              
     } else {                                             
       Z_LVAL(holder) = 1;                                
     }                                                    
     break;                                               

Beyond the Code

Equality in PHP is really about equivalence more than anything else. This notion is further reflected in today’s PHP Community. Predominantly male contributors flock to the PHP Internals List. Or, some may prefer to congregate on the PHP-FIG forum. Separate groups, equivalent to the extent that they are both devoted to improving PHP, but they are not identical in purpose.

As for the women, where do they go? By in large, they interface vis a vis the PHPWomen IRC. Unlike the other two major groups, PHPWomen traditionally has focused as a support group mainly for female users although male developers have joined, too.

Imagine someday a convention with all three groups in attendance! Who knows how the equality factor in the community might fare as the result of such a catalytic experience? The PHP Community as a whole might gain tremendous benefit from such a synergy!

Other Links of Interest:

Unsere Redaktion empfiehlt:

Relevante Beiträge

Meinungen zu diesem Beitrag

X
- Gib Deinen Standort ein -
- or -