WILT: PHP handling long file uploads/ long execution activities

A project I'm working on had a large file upload w' long processing time - causing things to fail. This is how I addressed it.

Scenario

PHP application running in a Docker Swarm environment. The PHP file must process an uploaded XLSX workbook, with multiple worksheets. The processing involves:

  1. Receive the file upload
  2. For each worksheet:
    1. Check to see if it has the columns we're interested in
    2. Process each row, converting some columns to a new format
    3. Insert valid entries into a MySQL database table
  3. Record the process finished
  4. Inform end user of success.

I'm going to go into the process of how I got the problem solved, so skip to the end if you just want what I did. I'm not going to get into the spreadsheet processing, as phpexcel/phpspreadsheet has that covered.

Problem 1: PHP terminates after 30 seconds

This is a simple problem I think most PHP developers hit and solve early in the career. By default, PHP scripts stop running after 30 seconds to prevent a poor coded script from chewing up all resources on a server and crashing things. The fix is a wee bit of code:

1set_time_limit(0);
2ini_set('max_execution_time',0);

By setting the times to 0, PHP won't stop itself. Problem solved! No, because I'm getting Nginx 500 gateway timeout errors.

Problem 2: Docker(Nginx) terminates after 30 seconds

This is doubly annoying - once Docker determines the site it's connecting to isn't responding it stops listening to it. I found that once it got the gateway timeout, I had to restart the dockerised service for Nginx to start listening to it again. This is a problem that I can't address directly as I have no control over the Docker swarm configuration that talks to my applications. Erf.

Solution: Fork

Forking is where one process becomes two, but keeps all its historical memory and the same codebase. This way, when I receive the file, I can split into one process to run through the file, but the other process just informs the user that process is happening. PHP has an extension to add forking functionality called pcntl. Here's the code:

 1$pid = \pcntl_fork();
 2if ($pid == -1) {
 3}
 4if ($pid) {
 5    // Main process
 6    // Send HTML to the user telling them processing is in the background
 7} else {
 8    // Child process
 9    // Process the uploaded file
10}

Gotcha 1: MySQL Disconnects

In my first tests, I was getting the user's HTML result saying the processing was happening; but watching in the background it failed after a few seconds. The error was the MySQL server had gone away. Googling I found what should have been the obvious - the MySQL connection is terminated when the PHP script ends. But the first script informing the user ends before the processing script - it terminates the shared MySQL connection and it falls over.

To fix that, disconnect the MySQL deliberately before forking and then reconnect

 1$pid = \pcntl_fork();
 2Yii::$app->db->close();
 3if ($pid == -1) {
 4}
 5if ($pid) {
 6    // Main process
 7    // Send HTML to the user telling them processing is in the background
 8    Yii::$app->db->open();
 9} else {
10    // Child process
11    // Process the uploaded file
12    Yii::$app->db->open();
13}

Both processes need their own MySQL connector (the main process saves some Audit information so still needs a connection)

Gotcha 2: File clean-up

Once the file is uploaded, after the fork I was getting some access issues. To fix that, I copied the file out of the Upload folder into another temp area that both processes could access. Once the process was completed, it can go.

Gotcha 3: Informing users of process

Once the main process ends, the user's browser has a shiny HTML page saying processing is happening in the background. The user has no visibility into this, though, and just has to trust the process. To get around this, a status file is created by the child process and populated with the progress updates. In the returned HTML page, a quick JavaScript component uses fetch to pull the latest status file every 2 seconds and update a status element in the HTML.

After the process is completed in the back end, the child process waits 20 seconds so that the browser definitely gets the last status update before deleting the status file. The JavaScript loops until it gets a Failure or Completed message.

 1function sleep(ms) {
 2    return new Promise(resolve => setTimeout(resolve, ms));
 3}
 4async function waitfor() {
 5    var keepwaiting = true;
 6    var targetid = document.getElementById('live-update-import-log');
 7    while (keepwaiting) {
 8        let response = await fetch(`/statuses/{$filename}`, {cache: 'no-store'});
 9
10        if (response.ok) { // if HTTP-status is 200-299
11            // get the response body (the method explained below)
12            let txt = await response.text();
13            targetid.innerText = txt;
14            if (txt.indexOf('Completed') != -1 || txt.indexOf('Failed') != -1) {
15                keepwaiting = false;
16            } 
17            await sleep(2000);
18        } else {
19            target.id.InnerText = txt + "\nFAILURE";
20            keepwaiting = false;
21        }
22    }
23}
24waitfor();
  • Sleep is a helper function to wait.
  • The Async waitfor function is async so it's not main thread blocking
    • The ${filename} is populated automatically by the PHP to the named status file.
    • Cache is disabled so it gets a new status every call
    • When the fetch is returned, it updates the status element.
    • If the latest status contains Completed or Failed, then stop processing as we're done.

Result