Sometimes there's the need to ensure that a script is only executed one time. Imagine some cronjob to do something very important, which will fail or corrupt data if it accidently runs twice. In these cases, a form of MUTEX
(mutual exclusion) is needed.
The basic procedure is simple: The script checks out if a specific condition (locking) is present at startup, if yes, it's locked - it doesn't start.
This article describes the locking with common UNIX® tools. There are various other special locking tools outside, of course. But they're not standardized, or better: You can't be sure that they're present where you want to run your scripts. Of course, a tool designed for exactly this purpose does the job much better than all general code in here.
As told above, a special tool for locking is the 100% solution. You don't have race conditions, you don't need to work around specific limits, and all those issues.
The best way to set a global lock condition is the UNIX® filesystem. Variables aren't enough, as each process has its own private variable space, but the filesystem is global to all processes (yes, I know about chroots... special case). You can "set" several things in a filesystem that can be used as locking indicator:
To create a file or set a file timestamp, usually the command touch is used. That implies the following problem: A locking mechanism would check the existance of the lockfile, if it doesn't exist, it would create one (lock) and continue. These are two steps! That means, it's not one atomic operation. There's a small amount of time between checking and creating, where another instance of the same script could perform locking (because when it checked, the lockfile wasn't there)! In that case you would have 2 instances of the script running, both think they succesfully locked, and both think they can operate without collisions. Setting the timestamp would be similar: One step to check the timespamp, a second step to set the timestamp.
A simple way to get that is to create a lock directory - the mkdir command. It will
With mkdir it seems, we have our two steps in one simple operation. A (very!) simple locking code might look like this now:
if mkdir /var/lock/mylock; then echo "Locking succeeded" >&2 else echo "Lock failed - exit" >&2 exit 1 fiIn case
mkdir
reports an error, the script will exit at this point - the MUTEX did its job!
In case the directory is removed after setting a successful lock while the script is still running, the lock is lost.Doing chmod -w for parent directory containing the lock directory can be done but it is also not atomic.Maybe a while loop checking continously for the existence of the lock in background and sending a signal such as USR1 if the directory is found non-existent can be done.The signal would need to be trapped.I am sure there would be a better solution than this suggestion — sn18 2009/12/19 08:24
Note: On my way through the Internet I found some people wondering if the mkdir
way will work "on all filesystems". Well, let's say it should. The syscall under mkdir
is guarenteed to work atomic in all cases, at least on Unices. A problem can be a shared filesystem on NFS or a real cluster filesystem. There it depends on the mount options and the implementaation. However, I successfully use this simple way on top of an Oracle OCFS2 filesystem in a 4-node cluster environment. So let's just say "it's expected to work under normal conditions".
noclobbing
redirection + example
This code was taken from a script that controls PISG to create statistical pages from my IRC logfiles. It doesn't matter for you, I just note that to tell you that this code works and is used. There are some additional things compared to the very simple example above:
I don't show various details - like determinating the signal by which the script was killed - here, I just show the most relevant code:
#!/bin/bash # lock dirs/files LOCKDIR="/tmp/statsgen-lock" PIDFILE="${LOCKDIR}/PID" # exit codes and text for them - additional features nobody needs :-) ENO_SUCCESS=0; ETXT[0]="ENO_SUCCESS" ENO_GENERAL=1; ETXT[1]="ENO_GENERAL" ENO_LOCKFAIL=2; ETXT[2]="ENO_LOCKFAIL" ENO_RECVSIG=3; ETXT[3]="ENO_RECVSIG" ### ### start locking attempt ### trap 'ECODE=$?; echo "[statsgen] Exit: ${ETXT[ECODE]}($ECODE)" >&2' 0 echo -n "[statsgen] Locking: " >&2 if mkdir "${LOCKDIR}" &>/dev/null; then # lock succeeded, install signal handlers before storing the PID just in case # storing the PID fails trap 'ECODE=$?; echo "[statsgen] Removing lock. Exit: ${ETXT[ECODE]}($ECODE)" >&2 rm -rf "${LOCKDIR}"' 0 echo "$$" >"${PIDFILE}" # the following handler will exit the script on receiving these signals # the trap on "0" (EXIT) from above will be triggered by this trap's "exit" command! trap 'echo "[statsgen] Killed by a signal." >&2 exit ${ENO_RECVSIG}' 1 2 3 15 echo "success, installed signal handlers" else # lock failed, now check if the other PID is alive OTHERPID="$(cat "${PIDFILE}")" # if cat wasn't able to read the file anymore, another instance probably is # about to remove the lock -- exit, we're *still* locked # Thanks to Grzegorz Wierzowiecki for pointing this race condition out on # http://wiki.grzegorz.wierzowiecki.pl/code:mutex-in-bash if [ $? != 0 ]; then echo "lock failed, PID ${OTHERPID} is active" >&2 exit ${ENO_LOCKFAIL} fi if ! kill -0 $OTHERPID &>/dev/null; then # lock is stale, remove it and restart echo "removing stale lock of nonexistant PID ${OTHERPID}" >&2 rm -rf "${LOCKDIR}" echo "[statsgen] restarting myself" >&2 exec "$0" "$@" else # lock is valid and OTHERPID is active - exit, we're locked! echo "lock failed, PID ${OTHERPID} is active" >&2 exit ${ENO_LOCKFAIL} fi fi
Discussion
Restarting with
is probably not a good idea though it only works if the script is called from the directory it is contained in. Maybe this
is a little better
If no
chdir()
was made, then aexec "$0"
should work fine IMHO. Can you show me an example?Ok, you're right. It is indeed not a directory problem. But
can still go wrong, depending on how it's called
while this
works flawlessly.
Yea, that actually *is* a problem. But there's no generic solution for that IMHO, since the script couldn't have execution permissions at all, or the shell "sh" is not a Bash etc. Maybe this just needs to be seen as "implementation specific".