Results 1 to 5 of 5
  1. #1
    Join Date
    Jan 2010
    Posts
    16

    Unanswered: 5 scripts accessing one table simultaneously without selecting same rows

    I will have a table of rows which need processing (5000+ rows).

    The PHP fork function will be used to run multiple scripts to process the rows.

    The PHP script:
    1. select one unprocessed row to process.
    2. update row as processed.
    3. process (about 1 second each row).
    4. back to 1.

    How do I make sure that between steps 1 and 2 another script will not select the same row?

    Is it possible to use transactions/innodb to lock the row to prevent any other script reading the same row?

    Cheers.

  2. #2
    Join Date
    Jun 2010
    Location
    New Zealand
    Posts
    15
    Firstly, out of curiosity, is there much of a performance gain to be had by running many scripts at once? Is it worth the problem caused by the possibility of multiple scripts accessing the same row?

    But to answer your question, InnoDB locks rows that are being accessed so you shouldn't need to manually lock them or check if they are locked already. I might not fully understand what you're trying to do but what I would do is first select the ID/Primary Key from every row and store them in a global array. Each script can then pick a row from the array, delete that row from the array so no other scripts pick it, and then do the processing.

    Example:
    PHP Code:
    global $records = array();
    $result mysql_query("SELECT id FROM table");

    while (
    $row mysql_fetch_array($result)) {
          
    $records[] = $row['id'];
    }

    //Run the multiple scripts now 
    And then in each script...
    PHP Code:
    $my_id $records[count($records) - 1];
    unset(
    $records[$my_id]); //This row's ID is no longer in the array for other scripts to access
    $query 'SELECT ... FROM table WHERE id = '.$my_id;
    //Do your processing. 
    ...Is that vaguely what you're after or am I on the wrong track?

  3. #3
    Join Date
    Jan 2010
    Posts
    16
    Hi, yes it is important performance wise, each row may take up to 3 seconds, which means it could take multiple hours for the script to complete.

    Your example is exactly what I need, but I need to absolutely guarantee that between these two lines:
    Code:
    $my_id = $records[count($records) - 1];
    unset($records[$my_id]);
    Two of the same scripts will not get to the same ID at the same time, and process one row twice.

    I am not sure if this is possible or not. It seems possible but unlikely.

  4. #4
    Join Date
    Jun 2010
    Location
    New Zealand
    Posts
    15
    Those two statements sould only take several miliseconds to execute so the chances of two scripts grabbing the same row are very low. Could you add an 'updated' field (with default value as 0) to the table? When a script updates the row it sets the 'updated' value to 1 (true), and the queries to update rows can use ...WHERE updated = 0

    ...a little crude but you would absolutely guarantee that a record will not be updated twice.

  5. #5
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Another way to ensure you do not process same rows by the scripts is to break them into ranges. At a prior engagement we had a client that ran some batch processing based on the last 2 digits of their person ID. In job1 they processed partial ids 0 - 24, then 25 - 49.... This allowed them to cut a 10 hour process down to 2.5 hours.
    Dave

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •