PDA

View Full Version : How can I prevent concurrent script execution?


mikebnova
08-19-02, 14:27
The data source for my system consists of small text files being created more or less randomly in over 100 remote locations. MQSeries (Retail Interchange) detects and delivers these files to our head office server (4-CPU RS/6000), and as an added bonus kicks off the AIX 4.3.1 script which loads the contents into a database. The problem is that only one instance of the script can execute at any given time. I used this code in an attempt to manage things :

#beginning of script
LOCKFILE=/x/y/z/jobname.lock
...
while [ -f $LOCKFILE ]
do
sleep 15
done
touch $LOCKFILE
...
rm -f $LOCKFILE
#end of script

The first instance finds no lock file, creates one with the touch command, and carries on. If a second (or third, or fourth,etc) instance is started, it finds a lock file exists, which makes the script wait until the first instance finishes. Whichever instance 'wakes up' first after the lock is deleted gets to re-create the lock file, and the rest keep on waiting. This works very well most of the time, but every once in a while I still get 2 script instances running in parallel! If two instances are started at EXACTLY the same microsecond they both get to the WHILE loop at the same time, see no lock file, and then they both touch the lock file at the same time and carry on (disastrously...).

If 'touch' returned a special code that said 'File had to be created', that would do it, but of course touch doesn't have that. I've been thinking about some combination of noclobber and echo "" >$LOCKFILE but I can't find any information about AIX command granularity. What happens if two processes try to create the same file at the exact same time? Does one always fail or is there still a chance that both instances will blindly forge ahead anyway? What's the best way to do this sort of thing?

sathyaram_s
08-20-02, 05:19
Insted of checking for and creating a lock file, why not check for and create a link to a lock file ...

ie

if a link exists, SLEEP , otherwise create the link ...

In this case the second job(if running in parallel) trying to create the link will get a positive error code which u can capture and SLEEP ..

Once you finsih your mainscript, remove the LOCKLINK instead of LOCK

May be there are better methods ...

Cheers

Sathyaram


Originally posted by mikebnova
The data source for my system consists of small text files being created more or less randomly in over 100 remote locations. MQSeries (Retail Interchange) detects and delivers these files to our head office server (4-CPU RS/6000), and as an added bonus kicks off the AIX 4.3.1 script which loads the contents into a database. The problem is that only one instance of the script can execute at any given time. I used this code in an attempt to manage things :

#beginning of script
LOCKFILE=/x/y/z/jobname.lock
...
while [ -f $LOCKFILE ]
do
sleep 15
done
touch $LOCKFILE
...
rm -f $LOCKFILE
#end of script

The first instance finds no lock file, creates one with the touch command, and carries on. If a second (or third, or fourth,etc) instance is started, it finds a lock file exists, which makes the script wait until the first instance finishes. Whichever instance 'wakes up' first after the lock is deleted gets to re-create the lock file, and the rest keep on waiting. This works very well most of the time, but every once in a while I still get 2 script instances running in parallel! If two instances are started at EXACTLY the same microsecond they both get to the WHILE loop at the same time, see no lock file, and then they both touch the lock file at the same time and carry on (disastrously...).

If 'touch' returned a special code that said 'File had to be created', that would do it, but of course touch doesn't have that. I've been thinking about some combination of noclobber and echo "" >$LOCKFILE but I can't find any information about AIX command granularity. What happens if two processes try to create the same file at the exact same time? Does one always fail or is there still a chance that both instances will blindly forge ahead anyway? What's the best way to do this sort of thing?

mikebnova
08-20-02, 13:37
Perhaps a better way might be suggested, but in the meantime your idea certainly works! I wrote a couple of test scripts and scheduled them to run against each other once a minute for a couple of hours. Most of the time there was enough delay between each of the processes start times (due to OS/cron itself/other tasks interfering) that they ran 'normally', but during this time I also logged two occasions where the old code would have failed and the new code caught it and stopped the second task from proceeding. I don't know why I didn't think of the link command myself; my brain must still think it's on vacation even though I'm back at work. Thanks!

sathyaram_s
08-21-02, 07:58
Honestly, the solution, or let me call it a workaround, I suggested is a dirty way of doing this ...

I'm curious to see what the Scripting gurus in this forum have to say ...



Originally posted by mikebnova
Perhaps a better way might be suggested, but in the meantime your idea certainly works! I wrote a couple of test scripts and scheduled them to run against each other once a minute for a couple of hours. Most of the time there was enough delay between each of the processes start times (due to OS/cron itself/other tasks interfering) that they ran 'normally', but during this time I also logged two occasions where the old code would have failed and the new code caught it and stopped the second task from proceeding. I don't know why I didn't think of the link command myself; my brain must still think it's on vacation even though I'm back at work. Thanks!

kbadeau
08-27-02, 12:20
Since you are talking about microseconds, why not simply do another test -f just before doing the actual touch?

Overall though I'd probably opt more for a daemon type script that stays on all the time and "listens" for files as opposed to kicking off the same process over and over upon arrival.