PHP 5.5 Release Manager Julien Pauli explains how PHP’s source code handles the language’s most essential – and, on the surface, simplest – feature.

How PHP manages variables
Kommentare

You all know $variables in PHP. Here we’ll show how PHP manages those variables as you write lines of code. We’ll see that our language is clever about memory management of variables, and it tries to do its best to have a low memory footprint. However, a full understanding of the model is needed if you don’t want PHP to do weird things such as copying them or not freeing them.


©iStockphoto.com/danleap

What are PHP variables?

PHP variables are represented internally by a container called zval, a C structure. It’s defined in zend.h in the PHP source code.

struct _zval_struct {
    /* Variable information */
    zvalue_value value;                /* value */
    zend_uint refcount__gc;
    zend_uchar type;        /* active type */
    zend_uchar is_ref__gc;
};

The value field represents the effective value (an integer, a string, and object…). Then come two fields related to memory management: is_ref and refcount.

refcount is an integer indicating how many symbols point to this zval. A symbol is typically a PHP variable (like $a), but it can also be an internal C variable, as zvals are also used internally by the engine.

is_ref is an integer representing a boolean value. It is by default set to zero, meaning that the zval has not yet been affected by a reference (&$a in PHP syntax). When set to 1, this means that the zval is a reference, and we will see together that whether or not a zval is a reference will deeply change the whole behavior of PHP when it comes to play with that zval.

Composite types

In PHP, lots of things are zvals. For example, how many zvals can you count in this tiny script?

$a = 'foo';

Obviously yes: one.

If we want to represent this with a picture, we’d do it like below.

Figure 1: A representation of the zvals in a simple variable declaration.

What about this other little code?

$a = array('foo'=>'bar', 42);

There are 3 zvals: the string bar, the integer 42 and the array which encapsulates them. This is illustrated below.

Figure 2: A representation of the zvals in a slightly more complex line.

Notice that the keys in the array, 'foo' and 1 here, are not zvals – only the values are. Remember, in PHP, all types are zvals. Let’s have a look at objects:

class Foo {
	public $a = 42;
	protected $bar = 'default';
}
$obj = new Foo;

Here, we created 3 zvals as well: one for the object, in the $obj symbol, and one for each of its attributes ($a, and $bar).

You’ll notice that objects behave like arrays: from this point of view, yes they do. Objects, like arrays, are compound types, so they eat one zval for themselves, and one for each type they contain. That’s all simple!

How PHP manages variables

Now, let’s have a look at how PHP manages those zvals when you, as a programmer, use them in a day to day work.

First, let’s not talk about references. Look at how PHP is smart about not duplicating zval memory when you just copy a PHP variable’s content to another:

$a = 'foo';
$b = $a;

Figure 3: Copying a PHP variable's content to another.

The zval container is represented in yellow in the pictures. This is what consumes memory, and what you (or PHP) should absolutely prevent duplication of, and free up as soon as possible. In grey, we represent symbols: PHP variables. The grey zones don’t really consume memory. Actually they do, but so little that you really can basically forget about it.

So, we can tell that the two scripts below consume the same amount of variable memory:

// Script 1
$a = 'foo';
$b = $a;

// Script 2
$a = 'foo';

As you might have noticed, the refcount incremented when $b has just pointed to the zval. Refcount is the trick behind memory management in PHP variables: it shows how many symbols currently point to a zval container.

Copy on write

Copy On Write (abbreviated as ‚COW‘) is a trick designed to save memory. It is used more generally in software engineering. It means that PHP will copy the memory (or allocate new memory region) when you write to a symbol, if this one was already pointing to a zval. For example:

$a = "foo";
$b = $a;
$a = 17;

Figure 4: The COW trick in action.

Here’s another simple example:

$a = "foo";
$b = $a;
$c = $b;
$b = "bar";
unset($a);

Figure 5: How the unset() function affects refcounts.

Notice how PHP plays with the refcount value: it increments and decrements it as long as symbols point or leave a zval. Just enough memory is allocated.

You also can see that the unset() construct in PHP does not necessarily frees memory: unset() just decrements the refcount by one. If, and only if, the refcount reaches zero, then PHP knows no more symbols point to the zval, and will free the refcount immediately and automatically.

Also, as long as we don’t play with references, is_ref remains at zero, and the behavior we just described applies: it’s logical and easy to understand.

If you want to go deeper and experiment by yourself, you can fetch both is_ref and refcount values from PHP land using the well-known Xdebug extension, which provides a function for that: xdebug_debug_zval().

Let’s look at a last example involving arrays:

$a = array("foo"=>"bar", 1 => 42);
$b = $a["foo"];
$c = $b;
$b = 18;
unset($a['foo']);
$a[1] = $b;

Figure 6: Using unset() on array values.

As you can see, the same behavior applies, and the same important rule: only when a zval’s refcount drops to zero is the zval is freed, never before.

Functions (methods)

First of all, functions and methods are the same. That being said, let’s see how they work when we talk about symbols and zvals.

What you have to remember, is that when a function is created, a scope is created with it. Aside from global variables which drill the scopes, variables created inside a function are not reachable from the outside.

Also, keep in mind the parts that overlap scopes: function arguments and the return value have their refcount incremented. Let’s have a look at an easy example.

As you can see on the picture, the stack consumes the zvals, thus it increments their refcount. Remember that PHP allows you to reach the function parameters using func_get_arg() and family. Remember also that it’s not just PHP variables that can point to a zval: The internal engine may use those zvals, thus incrementing as well their refcount.

When PHP leaves the function, it destroys its stack and parameters, just as would happen if you used unset(). The refcount decrements, and the same rule as before is applied: if it reaches zero, PHP frees the corresponding zval memory.

Once more, all of this is automatic and the programmer doesn’t need to bother about allocating and freeing memory.

References

We just talked about very classical PHP usage, thus this is how you should use PHP. Let’s talk about references now.

First of all, recall what a reference is: it is triggered when you use the & sign in PHP.

A quick example:

$a = 'string';
$b = &$a;
$b = 1;

You know that at the end of the above script, $a and $b both contain the same value: the integer 1. What happened when you used a reference is that you tied both symbols $a and $b to the same zval. This is what you think, but wait – just using a single affectation (like $b = $a) does the same, right? We just demonstrated that in the last section…

The answer is yes. In fact, writing $a = $b; or $a = &$b; does the same thing internally, for this single affectation line. What references really do is that they deeply change the behavior of PHP when Copy On Write comes to play. (the line $b = 1;).

Figure 8: How references affect zvals.

First, you see that the & addition increases is_ref to 1, but the refcount’s meaning is not changed: this is the number of symbols pointing to the zval.

However, the is_ref flag will change PHP behavior when you change a symbol (COW): PHP does not separate the value any more, and instead directly changes it, so you have the illusion that $a and $b are tied together. Simple and efficient, but take care: there are some tricks you should know about!

Let’s view another example showing that all the ‚duplicate or not duplicate memory‘ behavior is changed with references:

$a = "string";
$b = &$a;
$c = $b;

Figure 9: Unusual zval behaviour with references.

Watch the third line carefully: $c = $b;

Here, PHP is forced to allocate a new memory bloc, and to duplicate the string ’string‘ into it, because of the preceding reference. If the reference had not have been there, there will be one and just one zval container, having a refcount of 3, like we are used to seeing.

But the reference forced PHP to duplicate it, because now, when you change $c, you don’t expect to change $b do you? As $c is not a reference to $b. That’s right: notice how when using references, PHP copies memory in places you don’t necessarily expect.

Here’s another example:

The behavior is as you would expect when using PHP and references, but watch how PHP has to duplicate the 'bar' string because of the reference. It does not allocate memory on the affectation $var = 'bar', because $var points to a reference. However, it does when the function returns, because this function returns by value, and we tell it to return what has been affected by reference (return $var;). It then duplicates the zval so that when you change what’s returned by the function, you don’t change what’s pointed to by $var and $a: this is just expected PHP behavior isn’t it?

Remember one thing: references don’t (necessarily) save memory. Please, don’t think that any more. Haven’t we dispelled that myth?

With reference affectations in PHP, memory duplication is often delayed compared to value affectation. However, sooner or later new memory zvals will be created – except for the case when you truly follow a reference from start to end, and never break it explicitly, or implicitly by using PHP in a way that it has to duplicate.

Watch carefully this last trick about references:

function foo(&$var)
{
    if (strlen($var) > 3) {
        return $var;
    } else {
        $var .= '_uppercased';
        return strtoupper($var);
    }
}

$value = 'barbaz';
echo foo($value);

Yes, this is very useless and silly code, but it’s worth commenting on. Can you, by yourself, guess when PHP will need to allocate a new zval and duplicate its content?

You saw that $var, in the foo() function, is passed by reference. Inside the foo() function, $var has an is_ref equals to 1. What happens when you call both strlen() and strtoupper()? Well, as those functions‘ parameters are passed by value (see the manual entries), and $var is bound to a reference, PHP will duplicate the memory at every call of such functions.

Yes, PHP has no way to know if strlen() will try to modify its argument. What would happen if it would? $var would be modified, as strlen() doesn’t take references as arguments. We don’t want that, so PHP is forced to duplicate the $var zval to make a new one with the exact same value. But in this case it would have to set is_ref to 0, before calling both strlen() and strtoupper(), and then free the zval just after the call! This is a waste which could impact performance: you got tricked by references.

If you hadn’t passed $var by reference to your function foo(), no duplication at all would have happened (sure, you have this string concatenation to take care of, but, dear reader, it’s your job to find something better).

Don’t panic, duplicating the 'barbaz' string is a matter of nanoseconds nowadays. But that’s not the case at all when arrays are involved, especially heavy, complex arrays with lots of slots inside. In such a case, the values inside the array are not duplicated but their refcount is incremented. However, the array itself gets copied, which can be resource-consuming if it is very big (duplicating a one million slot array takes about 0.3sec under 2013 desktop hardware running PHP 5.5).

Memory leaks and garbage collection

First of all, let’s define quickly what the process of garbage collection is. Garbage collection is a “mechanism which tracks objects‘ usage and frees their memory support as long as they are not reachable in their current scope, or still being used by the programmer”. (This is generic definition of my own.) Applied to PHP, we have already talked about this very concept in past chapters: it is the refcount mechanism applied to the zvals, which indeed frees the zval container as it reaches zero.

That’s right – PHP has been providing a garbage collection mechanism for its variables since the very beginning. What you might have heard about, is ‚Zend Garbage Collector‘ or ‚Zend GC‘, which appeared with PHP 5.3. What’s that about? Well, it’s a bit confusing as many people thought before PHP 5.3, PHP just did not free memory automatically, which is wrong! In fact, the Zend Garbage Collector takes care of circular references for you.

This is well explained under a dedicated php.net page with cool pictures like the ones in this article.

Circular references appears within compound types, like objects or arrays. When two composite types contain reference to each other, or when one composite type contains a reference to itself, this is called a circular reference. This is bad because it’s hard to compute when to free them which one to begin with.

Before PHP 5.3, PHP had no way to track circular references, and as you free your variables (using unset() to drop their refcount to zero, for example), PHP would then leak the entities as they still own each other. Here’s an example:

/* create the two entities, objects here */
$a = new ObjA;
$b = new ObjB;

/* create the circular reference between them */
$a->b = $b;
$b->a = $a;

/* Decrement refcount of both entities */
unset($a, $b);

/* Here, there is no way to fetch the two zvals from PHP land,
but as they own each other, they are still present in memory and
PHP will never free them, before PHP 5.3*/

Figure 11: Circular references.

Starting with PHP5.3, a complex system has been designed to track those particular cases, which tend to happen more and more frequently as users manage more and more objects in their code. With objects, it is easy to have two of them be in circular reference, as you don’t obviously notice it.

PHP now automatically tracks circular references, and frees them from time to time. Should you know you have freeable circular references, and want to force PHP to free them now, you would then use the gc_collect_cycles(); function which has been designed for that.

Let’s try it with our above example:

/* create the two entities */
$a = new ObjA;
$b = new ObjB;

/* create the circular reference between them */
$a->b = $b;
$b->a = $a;

/* Decrement refcount of both entities */
unset($a, $b);

/* Force PHP to clean the circular reference */
$cleaned = gc_collect_cycles();

/* Display how many zvals have been freed by
the circular reference garbage collector */

echo $cleaned; /* 2 */

Circular references eat memory. This is not severe in a web environment, as PHP should run as fast as possible, and anyway it automatically cleans all resources at the end of the request, including obviously circular references PHP variables.

The problem would show up when running long living CLI scripts, aka PHPUnit tests for example. With Zend GC, your memory curve should stay relatively straight.

Conclusion

We tried to show in this article what PHP does underneath the surface when you use its simplest and most essential feature: variables.

PHP represents variables internally using containers called zvals, and it keeps track of how many symbols (PHP variables) point to each zval, so that it can free the zval memory when the number of symbols reaches zero.

We also showed how efficient PHP is when it comes to copy variables in a PHP script. PHP won’t copy the zval associated, in a huge majority of cases. It will only duplicate the zval when it is forced to, and that’s not easy to track at first sight.

Using PHP variable references heavily can change the zval copy behavior in a way which is not necessarily easy to track. Usually, you should not use references, as about 97% of the behaviors you want to implement with PHP can be done with no references at all.

We have not shown all of the tricky features PHP relies on, but you should now have a good starting point of understanding how PHP behaves when we talk about variables and references.

Don’t forget that PHP is written in C, and open source. Sure, its source code is not very easy to deal with, but with a good understanding of C and a little effort, you can start understanding lots of things when reading the source.

If you don’t want to take this path, then don’t assume anything from PHP, especially not that using references everywhere will save memory: in most cases, this is wrong.

Further reading

Julien is a web&system architect. He’s been using PHP for a decade together with frameworks such as Symfony. He is now a PHP contributor and PHP 5.5 release manager, and tries to make PHP better and more efficient.



            
    
        
Unsere Redaktion empfiehlt:

Relevante Beiträge

Meinungen zu diesem Beitrag

X
- Gib Deinen Standort ein -
- or -