Tailing output
It is easy to take tail -f for granted. It has a
deceptively small responsibility. But a number of details become
apparent when implementing a subset of its functionality.
inotify
Inotify is inherently lossy. There is the possibility of queue
overruns and events being dropped. There is no opting out of the
coalescing of events, and even if there was the events do not
carry enough information to reconstruct the states a file goes
through. Concretely, suppose a process is watching some file via
inotify to read new bytes as they are written. To this end, it
keeps an offset into the file. When inotify signals that the file
has been modified, the process calls fstat(2) and
checks the file size against its offset to determine if the file
has grown. The difference can be read, and the offset updated.
When truncate(2) comes into the picture, the file
can now be made shorter than it was. It is easy enough to detect
this and bring the offset to the new file size, but filesystem
matters are fraught with races. If a process truncates a file and
then quickly writes to it, it is possible for the second
IN_MODIFY event to be queued faster than the watching
process can read the first one. In this case, the OS will coalesce
the two events, and the point to which a file had been truncated
will be lost. Even if the first event is processed in time, the
write could still happen between reading from the inotify fd and
getting an updated struct stat. There is no
consistent way to trace the steps of the end of the file for the
offset to end up in the appropriate place; not with inotify alone,
at any rate.
The behavior of “out-only” events is not obvious at first.
Consider IN_IGNORED. It is one of the events listed
by inotify(7) as potentially being set in the mask
field returned by read(2). The Linux Programming
Interface mentions that removing a watch causes
IN_IGNORED to be generated, and has the event marked
in the “Out” column only. What this means is that
IN_IGNORED will be delivered regardless of the watch
mask, so an application must expect it. Kerrisk spells this out in
a LWN article:
“In addition to the various events for which an application may request notification, there are certain events for which inotify always generates automatic notifications. The most notable of these is IN_IGNORED, which is generated whenever inotify ceases to monitor an object.”
epoll, eventfd
In Go, blocking calls can be made conveniently asynchronous by executing them in a goroutine that sends their results via a channel. The downside is that a goroutine blocked in a syscall cannot be interrupted or selected out via channels unless the syscall itself becomes ready or returns an error. A read on a pipe can be unblocked by widowing the pipe, but there is no such guarantee for reads on an inotify fd.
epoll(7) helps with this when combined with
eventfd(2). In this arrangement, epoll monitors both
the inotify and the eventfd file descriptors. The process can make
the eventfd ready by writing to it, thus unblocking the epoll read
and allowing the logic to move forward. It effectively acts as a
syscall-level channel, with epoll as a select.
eventfd is a very handy little interface, simple
enough to warrant only a side note in TLPI.
Process communication
It is tempting to read commands from stdin and use the shell to
redirect it to a FIFO. However, the semantics of, for example,
echo 'command arg1 arg2' > /tmp/the-fifo make for
a more contrived behavior than is worth dealing with. The shell
will open the FIFO, write to it, and close it. If a reader is
blocking on (redirected) stdin, it will see the bytes, then see
EOF. But it will not go back to blocking on the next read unless
there is at least one write descriptor open for the FIFO. Instead
it will start polling on stdin, which is not what one wants. This
is precisely the reason why, when setting a pipe between two
processes, the reader closes the write end of the inherited pipe:
if it did not, a read(2) would block waiting for data
because its write descriptor would still be open.
All this is easily avoided with a UNIX domain socket. The
lifecycle of the connection, and the separation of the
accept(2) and read(2) calls, make the
perfect barrier to implement a blocking command reading loop. The
set up is only slightly more contrived, with three syscalls
(socket, bind and listen),
but the simplicity when reading more than makes up for that.