Are you done yet? Waiting for process termination the right way

TL;DR

If you want to wait for system processes to terminate, with timeout and interruption support, use ps::ps_wait() in ps 1.8.0 or newer. This works on macOS, Linux and Windows.

How is this still a problem?

It is astonishing that this is still a problem today. On Unix systems there is no portable wayy to wait (i.e. poll) for the temination of a set of processes. You cannot create a pollable file descriptor for the termination of a process, at least not in a portable way. There is a a kind-of portable hack to poll subprocess termination, but there is nothing for processes that are not subprocesses of the current process. Various systems have various non-portable solution, so that’s what the new ps::ps_wait() implementation uses.

The self-pipe hack for subprocesses

This is a special case for subprocesses. On Unix systems, when a process terminates, its parent process receives a SIGCHLD signal. (Well, usually it does, but read on.) We can use this signal to hack together a pollable file descriptor:

  1. Block SIGCHLD signals.
  2. Set up a SIGCHLD handler, that closes the write end of a pipe file descriptor, that we’ll create later. (The handler won’t be called until we unblock SIGCHLD signals.)
  3. Check that the subprocess is still running, if not, you can restore the signal handler, unblock the signal and quit here.
  4. Call pipe(2) to open a pipe. Make both ends of the pipe non-blocking.
  5. Unblock SIGCHLD signals.
  6. Start polling the read end of the pipe, possibly with a timeout.
  7. If the poll(2) returns with POLLIN, that means that the write end of the pipe was closed by the SIGCHLD handler, i.e. the subprocess has quit.

None of this changes if you are waiting on multiple processes, except that if you want to know which subprocess has quit, you’ll need to iterate over all subprocesses you are waiting on, to see which ones are still alive.

On all moderns Unix systems with sigaction(2), a signal can be handled with additional information that contains the pid of the sending process, i.e. the subprocess that just terminated for SIGCHLD. This is great in theory, but unfortunately in practice signals can be throttled, i.e. if the OS is tasked to deliver many signals of the same type to the same process, some of them are silently skipped to avoid overwhelming the process. (I have seen this on macOS.) This is fine for SIGINT or SIGTERM, but not fine at all for SIGCHLD, because it means that we cannot rely on the information in the signal, as some subprocesses may terminate without the main process receiving a SIGCHLD signal. So we still need to find all subprocesses, and check if they are alive manually. We might as well ignore the pid information in the sianl. Not great at all.

Another issue with this implementation is that it is hard to include it in an event loop where we would be waiting on other file descriptors as well, because of the boilerplate of the SIGCHLD signal management.

Here is the gist of the current implementation of the self-pipe hack in the processx R package. The parts where we walk over the subprocesses are elsewhere, though.

To work around this mess, processx can start its subprocesses with an extra pipe (it is called the “poll connection” in processx), that is solely used to be able to poll the termination of the subprocess. E.g. the callr package uses the poll connection to make it easier to poll for both output (i.e. stdout and stderr) and process termination, easily, potentially together with polling on sockets, etc.

kqueue on macOS and *BSD systems

kqueue(2) is a generic way of creating a pollable file descriptor that listens on a configurable set of kernel events. Luckily for us, one of the events is the termination (i.e. exit) of a process with a given pid. So all we need to do is adding all processes to the kqueue() and then polling it in a loop, until all of them quit or a timeout kicks in.

An example implementation is in the ps package.

pidfd_open(2) on Linux

pidfd_open(2) is a relatively new system call in Linux, introduced in Linux 5.3. It creates a pollable file descriptor for a process. The file descriptor will be readable when the process exits. (You can’t actually read anything from it.) So you can create a file descriptor for each process, and poll(2) or epoll(7) all of them together.

This is another clean solution. Unfortunately it requires Linux 5.3, and some more conservative distros like RHEL 8.x are still running Linux 4.x. So we also need a fallback solution as well.

Here is the ps_wait() implementation that uses pidfd_open(2) in the ps package.

It also includes the fallback to the inotify(7) hack, which is available on much older Linux systems. I’ll discuss it in the next section. Note that you need to #define SYS_pidfd_open if it is undefined, to be able to compile the source code on an older system, but still use the new solution, if availavle on newer systems. instead of deciding on the implementation at compile time:

Make sure it is always defined at compile timeSee on GitHub
1
2
3
#ifndef SYS_pidfd_open
#define SYS_pidfd_open 434
#endif

inotify() on Linux

I have seen this idea on stack overflow. inotify(7) is a Linux facility to monitor file system events. It is available since Linux 2.6.13.

Ideally we would use inotify(7) to watch the /proc/<pid> directory, but inotify(7) does not work on the /proc file system. However, /proc/<pid>/exe is a symlink to the executable that the process is running, so that is a regular file that we can watch. We ask inotify to report us when a file descriptor to that file is closed. This might indicate that the process has terminated. Not always, though. We can get false reports if the process calls execve(2). In this case the executable is closed, but the process started running a different executable, so we need to continue watching that.

Notice that there is a race condition here. If the process calls execve(2) multiple times in quick succession, it might be running a newer executable than the one we are watching with inotify(7), and we’ll never notice its termination. This should be very rare, but if you are paranoid, you could periodically check if the process is running.

Another potential edge case is when the executable is deleted while the program is still running. Then the inotify(7) calls will fail, I think. (But I haven’t actually tried this.)

Here is the ps_wait() implementation that uses inotify(7) in the ps package on older Linux systems.

It is a good idea to have a manual switch that turns on this implementation so it can be tested on newer systems as well.

How about Windows?

Waiting for process termination is one thing that is simpler to do on Windows. We can use WaitForMultipleObjects() to wait on or up to 64 process handles. If you have more processes than that, then you can wait on the first 64 first, and if they have terminated before the timeout, you wait on the next 64 that are still running, etc. This works because we are not interested in the order of the terminating processes, and we don’t want to act on their termination immediately, either.

At the end, either all processes have terminated, or you reached the timeout. In the latter case you still need to iterate over all process handles to check if they are running or not, because WaitForMultipleObjects() only returns if all processes have terminated. (Well, at least the way we use it with bWaitAll = TRUE.)

Here is the Windows implementation of ps_wait() in the ps package.

Can this be an interruptible wait?

To make the C code of an R package interruptible, one needs to call R_CheckUserInterrupt() periodically. For ps_wait() we need to poll(2) (or kevent(2), epoll_wait(2), etc.) for a short period of time in a loop, checking for interrupts in each iteration. the time.

Cleaning up resources on interrupts may be challenging in this case, because R_CheckUserInterrupt() does not return if an interrupt happened. The cleancall package can help with this, see the ps package for an example.

Is this a hack?

Some of it. The self-pipe hack is of course a hack. inotify(7) on the executable is a hack. The rest are good.