Yawd website

jul13

Use of classes and static variables with the Drupal 6 Batch API

Drupal provides a nice API for anyone that needs to process a massive amount of data: the Batch API. This allows for running heavy scripts split into multiple operations while providing a progress bar to monitor the process' progress. However this API is not adequately documented and I had a hard time trying to resolve some issues I come across. I thought of sharing my findings in case someone else needs them.

Share data between operations

If you want to share data between the operation calls you must to use the API's sandbox array. Say you want to count the iterations over an array of items. Your operation should be defined as follows:

function batch_example_process($my_array, &$context) {
   ...
   //this will be called the first time
   if(!isset($context['sanbox']['counter'])) {
       $context['sanbox']['counter'] = 0;
   }

   $count = 0;
   while(isset($my_array[$context['sanbox']['counter']) && $count < 10 ) {
      //process the array here....
      ....
      $count++;
      $context['sanbox']['counter']++;
   }
}

Now for every 10 iterations the API will call the example operation until all items are processed. The sadbox counter will remember its position since the API will feed the array to the next call. Note that not the whole context variable is persisent. For example if you try to use $context['counter'] Drupal will not remember its value on successive operation calls. Additionally, even 'sandbox' is not persistent through multiple operations but succesive calls of a single operation (for multiple operations you can use $context['results'] but this is outside this article's scope)

Static variables

So far so good. Unfortunately though, when using the batch API static variables are reset on each operation call. In my case I coded a custom class with a static variable to hold an array of records so that I didn't need to query the db for frequently used items. Each time an item was found I used to push it to this array. But on the next operation call the array was empty and the system had to query the db again and again on each call. To overcome this you can simply use the context's sandbox array to store your data as showed above. But be careful with your class instances:

Custom class instances - __PHP_Incomplete_Class Object error

It seems that storing custom object instances in sanbox causes PHP to throw an error. After doing some research I found out that Drupal 6 rather attempts to load the context sandbox instance before the actual php file containing your operations is parsed. Therefore your classes are not loaded yet and PHP does not know how to interpret your object instances. To give you a better idea, the resulting __PHP_Incomplete_Class Object error typically occurs when we have assigned a custom class' instance to the $_SESSION variable and we call session_start() prior to loading the class definitions. To avoid this error and be able to properly use your objects you must serialize them prior to assigning them to sandbox and then unserialize them in order use them. This will do the trick:
function batch_example_process($arg0, &$context) {
    if(isset($context['sandbox']['yawd'])) {
        $yawd = unserialize($context['sandbox']['yawd']);
    } else { //Initialize yawd
        $yawd = new Yawd(); //Yawd is my custom class
    }

    //do your stuff here
    $yawd->modify();
    ...

    $context['sandbox']['yawd'] = serialize($yawd);
}

The above snippet is really simple but trust me, it could be a life saver! :)

print_r and echo statements

A last minute update: Make sure you remove possible echo and print_r statements from your operations' code. The Drupal 6 Batch API does not like them. When working on a yawd project the Batch API all of a sudden seemed to freeze and loop forever without actually processing anything. The reason was I had forgotten some print statements used for debugging..

Comments powered by Disqus