Programming
Controlling Python cron jobs with PID on Linux
Recently while working with scheduled python scripts in cron jobs, an interesting requirement turned up: ensuring that the previous scheduled execution had finished prior to allowing the next cron job to run.
This post aims to present one feasible solution for such a requirement, which I believe deserves a post.
Assume you have a scenario with a simple cron job scheduled such as
*/5 * * * * /usr/bin/python /home/kelson/scripts/python_task.py
This schedules the script to be executed every 5 minutes.
Let’s now assume that the script python_task.py
performs a series of tasks. Let’s also assume that the execution time of these tasks varies based on external factors which are out of the control of the script.
It may be common that one script will take 10 seconds to execute. Another execution may take 30 seconds, or a minute, or 2, or 10, and so on.
As you may have recognized, if our script is scheduled to run every 5 minutes and for any external reason our script execution takes over 5 minutes to be executed, another instance of our script will start, so for a brief period, we will have two instances of the script running in parallel. This is exactly the scenario we want to avoid.
Using a PIDFILE
PIDFILE
to control whether the process is already running.#!/usr/bin/python
import os
import time
# Get process PID
PID = str(os.getpid())
PIDFILE = "RUNNING.pid"
def can_it_run():
# Check wether lock PIDFILE exists
if os.path.isfile(PIDFILE):
return False
else:
return True
def run():
# Write lock file containing process PID
file(PIDFILE, 'w').write(PID);
print "Executing under PID " + PID
try:
# Simulating miscellaneous tasks
time.sleep(60)
finally:
# removing lock file upon script execution
os.unlink(PIDFILE)
if __name__ == '__main__':
if can_it_run():
run();
else:
# Retrieving PID of previous execution
old_pid = ''.join(file("RUNNING.pid"))
print "Script already running under PID %s, skipping execution." % old_pid
The previous snippet demonstrates the use of a PIDFILE
to control whether the script is allowed to execute or not, pending the previous script execution completion.
There is a check that happens on each execution, to identify whether the PIDFILE
exists.
When negative, a lock file containing the PID
of the process is created, and the execution proceeds.
When positive, the current execution is skipped, while also printing the PID
of the previous execution.
Note the finally block on the try statement, which removes the lock file upon execution.
Let’s test it?
Execute the script in your terminal and notice an output similar to the following:
$ python pid.py
Executing under PID 320
While the script is running, listing files under the current working directory will reveal the PIDFILE
as shown:
$ ls -l
total 8
-rwxrwxr-x. 1 kelson kelson 523 Sep 19 11:00 python_task.py
-rw-rw-r--. 1 kelson kelson 4 Sep 19 13:19 RUNNING.pid
Let’s display the contents of the file, which will display the PID of the script process:
$ cat RUNNING.pid
320
Great! Now, if you open a second terminal to execute another instance of the script under 60 seconds, observe the following:
$ python pid.py
Script already running under PID 320, skipping execution.
After 60 seconds, the script execution will end, cleaning the PIDFILE
and allowing further script executions.
Terminating old processes
As an alternative approach, instead of skipping the execution of the script let’s kill the old process and start a new one.
For that, let’s make an update on the __main__
method to extract the process PID from the PIDFILE
and kill it before starting a new one:
if __name__ == '__main__':
if can_it_run():
run();
else:
# Retrieving PID of previous execution
old_pid = ''.join(file("RUNNING.pid"))
print "Script already running under PID %s, which is now being terminated." % old_pid
# forcing a new execution by killing old process
os.kill(int(old_pid),signal.SIGTERM)
run()
Note that we retrieve the old_pid
from the PIDFILE
before terminating it, and then a new execution is triggered.
Let’s now test it once again. Open a terminal to execute the script to get an output similar to:
$ python python_task.py
Executing under PID 2022
Let’s now force the execution of a second instance and observe the results
$ python python_task.py
Script already running under PID 2022, which is now being terminated.
Executing under PID 2024
Great! We now ended the python’s old process before spawning a new one. Note that we may consider this approach drastic and the usage will depend on whether the specificities of your script allows for such aggressive termination.
Final Considerations
If for any reason the PIDFILE
exists, but the PID
process it contains is not running, it may be a sign that the script did not shut down gracefully.
This will block the execution of further script executions until the PIDFILE
is removed.
To force the removal of the PIDFILE
independently of how it terminated, the python module atexit may be used, which is not covered in this post but possibly a topic of a future one =).
The full code snippet is available on Gist.
Kelson
//iamkel.devSoftware engineer. Geek. Traveller. Wannabe athlete. Lifelong student. Works at IBM and hosts the @HardcodeCast.