Programming

Controlling Python cron jobs with PID on Linux

Posted by Kelson on May 30th, 2021.

Recently while working with scheduled python scripts in cron jobs, an interesting requirement turned up: ensuring that the previous scheduled execution had finished prior to allowing the next cron job to run.
This post aims to present one feasible solution for such a requirement, which I believe deserves a post.

Assume you have a scenario with a simple cron job scheduled such as

*/5 * * * * /usr/bin/python /home/kelson/scripts/python_task.py

This schedules the script to be executed every 5 minutes.

Let’s now assume that the script python_task.py performs a series of tasks. Let’s also assume that the execution time of these tasks varies based on external factors which are out of the control of the script.

It may be common that one script will take 10 seconds to execute. Another execution may take 30 seconds, or a minute, or 2, or 10, and so on.

As you may have recognized, if our script is scheduled to run every 5 minutes and for any external reason our script execution takes over 5 minutes to be executed, another instance of our script will start, so for a brief period, we will have two instances of the script running in parallel. This is exactly the scenario we want to avoid.

Using a PIDFILE

The proposed solution makes use of a PIDFILE to control whether the process is already running.

#!/usr/bin/python
import os
import time
# Get process PID
PID = str(os.getpid())
PIDFILE = "RUNNING.pid"
def can_it_run():
  # Check wether lock PIDFILE exists
  if os.path.isfile(PIDFILE):
    return False
  else:
    return True
def run():
  # Write lock file containing process PID
  file(PIDFILE, 'w').write(PID);
  print "Executing under PID " + PID
  try:
    # Simulating miscellaneous tasks
    time.sleep(60)
  finally:
    # removing lock file upon script execution
    os.unlink(PIDFILE)
if __name__ == '__main__':
  if can_it_run():
    run();
  else:
    # Retrieving PID of previous execution
    old_pid = ''.join(file("RUNNING.pid"))
    print "Script already running under PID %s, skipping execution." % old_pid

The previous snippet demonstrates the use of a PIDFILE to control whether the script is allowed to execute or not, pending the previous script execution completion.

There is a check that happens on each execution, to identify whether the PIDFILE exists.
When negative, a lock file containing the PID of the process is created, and the execution proceeds.
When positive, the current execution is skipped, while also printing the PID of the previous execution.

Note the finally block on the try statement, which removes the lock file upon execution.

Let’s test it?

Execute the script in your terminal and notice an output similar to the following:

$ python pid.py
Executing under PID 320

While the script is running, listing files under the current working directory will reveal the PIDFILE as shown:

$ ls -l
total 8
-rwxrwxr-x. 1 kelson kelson 523 Sep 19 11:00 python_task.py
-rw-rw-r--. 1 kelson kelson   4 Sep 19 13:19 RUNNING.pid

Let’s display the contents of the file, which will display the PID of the script process:

$ cat RUNNING.pid
320

Great! Now, if you open a second terminal to execute another instance of the script under 60 seconds, observe the following:

$ python pid.py
Script already running under PID 320, skipping execution.

After 60 seconds, the script execution will end, cleaning the PIDFILE and allowing further script executions.

Terminating old processes

As an alternative approach, instead of skipping the execution of the script let’s kill the old process and start a new one.

For that, let’s make an update on the __main__ method to extract the process PID from the PIDFILE and kill it before starting a new one:

if __name__ == '__main__':
  if can_it_run():
    run();
  else:
    # Retrieving PID of previous execution
    old_pid = ''.join(file("RUNNING.pid"))
    print "Script already running under PID %s, which is now being terminated." % old_pid
    # forcing a new execution by killing old process
    os.kill(int(old_pid),signal.SIGTERM)
    run()

Note that we retrieve the old_pid from the PIDFILE before terminating it, and then a new execution is triggered.

Let’s now test it once again. Open a terminal to execute the script to get an output similar to:

$ python python_task.py
Executing under PID 2022

Let’s now force the execution of a second instance and observe the results

$ python python_task.py
Script already running under PID 2022, which is now being terminated.
Executing under PID 2024

Great! We now ended the python’s old process before spawning a new one. Note that we may consider this approach drastic and the usage will depend on whether the specificities of your script allows for such aggressive termination.

Final Considerations

If for any reason the PIDFILE exists, but the PID process it contains is not running, it may be a sign that the script did not shut down gracefully.
This will block the execution of further script executions until the PIDFILE is removed.
To force the removal of the PIDFILE independently of how it terminated, the python module atexit may be used, which is not covered in this post but possibly a topic of a future one =).
The full code snippet is available on Gist.

Kelson

//iamkel.dev

Software engineer. Geek. Traveller. Wannabe athlete. Lifelong student. Works at IBM and hosts the @HardcodeCast.

View Comments (0) ...

Controlling Python cron jobs with PID on Linux

Introduction

Kelson

Using a PIDFILE

Terminating old processes

Final Considerations

Kelson