summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/vfs/g3doc/inotify.md
diff options
context:
space:
mode:
Diffstat (limited to 'pkg/sentry/vfs/g3doc/inotify.md')
-rw-r--r--pkg/sentry/vfs/g3doc/inotify.md210
1 files changed, 0 insertions, 210 deletions
diff --git a/pkg/sentry/vfs/g3doc/inotify.md b/pkg/sentry/vfs/g3doc/inotify.md
deleted file mode 100644
index 833db213f..000000000
--- a/pkg/sentry/vfs/g3doc/inotify.md
+++ /dev/null
@@ -1,210 +0,0 @@
-# Inotify
-
-Inotify is a mechanism for monitoring filesystem events in Linux--see
-inotify(7). An inotify instance can be used to monitor files and directories for
-modifications, creation/deletion, etc. The inotify API consists of system calls
-that create inotify instances (inotify_init/inotify_init1) and add/remove
-watches on files to an instance (inotify_add_watch/inotify_rm_watch). Events are
-generated from various places in the sentry, including the syscall layer, the
-vfs layer, the process fd table, and within each filesystem implementation. This
-document outlines the implementation details of inotify in VFS2.
-
-## Inotify Objects
-
-Inotify data structures are implemented in the vfs package.
-
-### vfs.Inotify
-
-Inotify instances are represented by vfs.Inotify objects, which implement
-vfs.FileDescriptionImpl. As in Linux, inotify fds are backed by a
-pseudo-filesystem (anonfs). Each inotify instance receives events from a set of
-vfs.Watch objects, which can be modified with inotify_add_watch(2) and
-inotify_rm_watch(2). An application can retrieve events by reading the inotify
-fd.
-
-### vfs.Watches
-
-The set of all watches held on a single file (i.e., the watch target) is stored
-in vfs.Watches. Each watch will belong to a different inotify instance (an
-instance can only have one watch on any watch target). The watches are stored in
-a map indexed by their vfs.Inotify owner’s id. Hard links and file descriptions
-to a single file will all share the same vfs.Watches (with the exception of the
-gofer filesystem, described in a later section). Activity on the target causes
-its vfs.Watches to generate notifications on its watches’ inotify instances.
-
-### vfs.Watch
-
-A single watch, owned by one inotify instance and applied to one watch target.
-Both the vfs.Inotify owner and vfs.Watches on the target will hold a vfs.Watch,
-which leads to some complicated locking behavior (see Lock Ordering). Whenever a
-watch is notified of an event on its target, it will queue events to its inotify
-instance for delivery to the user.
-
-### vfs.Event
-
-vfs.Event is a simple struct encapsulating all the fields for an inotify event.
-It is generated by vfs.Watches and forwarded to the watches' owners. It is
-serialized to the user during read(2) syscalls on the associated fs.Inotify's
-fd.
-
-## Lock Ordering
-
-There are three locks related to the inotify implementation:
-
-Inotify.mu: the inotify instance lock. Inotify.evMu: the inotify event queue
-lock. Watches.mu: the watch set lock, used to protect the collection of watches
-on a target.
-
-The correct lock ordering for inotify code is:
-
-Inotify.mu -> Watches.mu -> Inotify.evMu.
-
-Note that we use a distinct lock to protect the inotify event queue. If we
-simply used Inotify.mu, we could simultaneously have locks being acquired in the
-order of Inotify.mu -> Watches.mu and Watches.mu -> Inotify.mu, which would
-cause deadlocks. For instance, adding a watch to an inotify instance would
-require locking Inotify.mu, and then adding the same watch to the target would
-cause Watches.mu to be held. At the same time, generating an event on the target
-would require Watches.mu to be held before iterating through each watch, and
-then notifying the owner of each watch would cause Inotify.mu to be held.
-
-See the vfs package comment to understand how inotify locks fit into the overall
-ordering of filesystem locks.
-
-## Watch Targets in Different Filesystem Implementations
-
-In Linux, watches reside on inodes at the virtual filesystem layer. As a result,
-all hard links and file descriptions on a single file will all share the same
-watch set. In VFS2, there is no common inode structure across filesystem types
-(some may not even have inodes), so we have to plumb inotify support through
-each specific filesystem implementation. Some of the technical considerations
-are outlined below.
-
-### Tmpfs
-
-For filesystems with inodes, like tmpfs, the design is quite similar to that of
-Linux, where watches reside on the inode.
-
-### Pseudo-filesystems
-
-Technically, because inotify is implemented at the vfs layer in Linux,
-pseudo-filesystems on top of kernfs support inotify passively. However, watches
-can only track explicit filesystem operations like read/write, open/close,
-mknod, etc., so watches on a target like /proc/self/fd will not generate events
-every time a new fd is added or removed. As of this writing, we leave inotify
-unimplemented in kernfs and anonfs; it does not seem particularly useful.
-
-### Gofer Filesystem (fsimpl/gofer)
-
-The gofer filesystem has several traits that make it difficult to support
-inotify:
-
-* **There are no inodes.** A file is represented as a dentry that holds an
- unopened p9 file (and possibly an open FID), through which the Sentry
- interacts with the gofer.
- * *Solution:* Because there is no inode structure stored in the sandbox,
- inotify watches must be held on the dentry. For the purposes of inotify,
- we assume that every dentry corresponds to a unique inode, which may
- cause unexpected behavior in the presence of hard links, where multiple
- dentries should share the same set of watches. Indeed, it is impossible
- for us to be absolutely sure whether dentries correspond to the same
- file or not, due to the following point:
-* **The Sentry cannot always be aware of hard links on the remote
- filesystem.** There is no way for us to confirm whether two files on the
- remote filesystem are actually links to the same inode. QIDs and inodes are
- not always 1:1. The assumption that dentries and inodes are 1:1 is
- inevitably broken if there are remote hard links that we cannot detect.
- * *Solution:* this is an issue with gofer fs in general, not only inotify,
- and we will have to live with it.
-* **Dentries can be cached, and then evicted.** Dentry lifetime does not
- correspond to file lifetime. Because gofer fs is not entirely in-memory, the
- absence of a dentry does not mean that the corresponding file does not
- exist, nor does a dentry reaching zero references mean that the
- corresponding file no longer exists. When a dentry reaches zero references,
- it will be cached, in case the file at that path is needed again in the
- future. However, the dentry may be evicted from the cache, which will cause
- a new dentry to be created next time the same file path is used. The
- existing watches will be lost.
- * *Solution:* When a dentry reaches zero references, do not cache it if it
- has any watches, so we can avoid eviction/destruction. Note that if the
- dentry was deleted or invalidated (d.vfsd.IsDead()), we should still
- destroy it along with its watches. Additionally, when a dentry’s last
- watch is removed, we cache it if it also has zero references. This way,
- the dentry can eventually be evicted from memory if it is no longer
- needed.
-* **Dentries can be invalidated.** Another issue with dentry lifetime is that
- the remote file at the file path represented may change from underneath the
- dentry. In this case, the next time that the dentry is used, it will be
- invalidated and a new dentry will replace it. In this case, it is not clear
- what should be done with the watches on the old dentry.
- * *Solution:* Silently destroy the watches when invalidation occurs. We
- have no way of knowing exactly what happened, when it happens. Inotify
- instances on NFS files in Linux probably behave in a similar fashion,
- since inotify is implemented at the vfs layer and is not aware of the
- complexities of remote file systems.
- * An alternative would be to issue some kind of event upon invalidation,
- e.g. a delete event, but this has several issues:
- * We cannot discern whether the remote file was invalidated because it was
- moved, deleted, etc. This information is crucial, because these cases
- should result in different events. Furthermore, the watches should only
- be destroyed if the file has been deleted.
- * Moreover, the mechanism for detecting whether the underlying file has
- changed is to check whether a new QID is given by the gofer. This may
- result in false positives, e.g. suppose that the server closed and
- re-opened the same file, which may result in a new QID.
- * Finally, the time of the event may be completely different from the time
- of the file modification, since a dentry is not immediately notified
- when the underlying file has changed. It would be quite unexpected to
- receive the notification when invalidation was triggered, i.e. the next
- time the file was accessed within the sandbox, because then the
- read/write/etc. operation on the file would not result in the expected
- event.
- * Another point in favor of the first solution: inotify in Linux can
- already be lossy on local filesystems (one of the sacrifices made so
- that filesystem performance isn’t killed), and it is lossy on NFS for
- similar reasons to gofer fs. Therefore, it is better for inotify to be
- silent than to emit incorrect notifications.
-* **There may be external users of the remote filesystem.** We can only track
- operations performed on the file within the sandbox. This is sufficient
- under InteropModeExclusive, but whenever there are external users, the set
- of actions we are aware of is incomplete.
- * *Solution:* We could either return an error or just issue a warning when
- inotify is used without InteropModeExclusive. Although faulty, VFS1
- allows it when the filesystem is shared, and Linux does the same for
- remote filesystems (as mentioned above, inotify sits at the vfs level).
-
-## Dentry Interface
-
-For events that must be generated above the vfs layer, we provide the following
-DentryImpl methods to allow interactions with targets on any FilesystemImpl:
-
-* **InotifyWithParent()** generates events on the dentry’s watches as well as
- its parent’s.
-* **Watches()** retrieves the watch set of the target represented by the
- dentry. This is used to access and modify watches on a target.
-* **OnZeroWatches()** performs cleanup tasks after the last watch is removed
- from a dentry. This is needed by gofer fs, which must allow a watched dentry
- to be cached once it has no more watches. Most implementations can just do
- nothing. Note that OnZeroWatches() must be called after all inotify locks
- are released to preserve lock ordering, since it may acquire
- FilesystemImpl-specific locks.
-
-## IN_EXCL_UNLINK
-
-There are several options that can be set for a watch, specified as part of the
-mask in inotify_add_watch(2). In particular, IN_EXCL_UNLINK requires some
-additional support in each filesystem.
-
-A watch with IN_EXCL_UNLINK will not generate events for its target if it
-corresponds to a path that was unlinked. For instance, if an fd is opened on
-“foo/bar” and “foo/bar” is subsequently unlinked, any reads/writes/etc. on the
-fd will be ignored by watches on “foo” or “foo/bar” with IN_EXCL_UNLINK. This
-requires each DentryImpl to keep track of whether it has been unlinked, in order
-to determine whether events should be sent to watches with IN_EXCL_UNLINK.
-
-## IN_ONESHOT
-
-One-shot watches expire after generating a single event. When an event occurs,
-all one-shot watches on the target that successfully generated an event are
-removed. Lock ordering can cause the management of one-shot watches to be quite
-expensive; see Watches.Notify() for more information.