diff options
Diffstat (limited to 'pkg/sentry/vfs/g3doc/inotify.md')
-rw-r--r-- | pkg/sentry/vfs/g3doc/inotify.md | 210 |
1 files changed, 0 insertions, 210 deletions
diff --git a/pkg/sentry/vfs/g3doc/inotify.md b/pkg/sentry/vfs/g3doc/inotify.md deleted file mode 100644 index 833db213f..000000000 --- a/pkg/sentry/vfs/g3doc/inotify.md +++ /dev/null @@ -1,210 +0,0 @@ -# Inotify - -Inotify is a mechanism for monitoring filesystem events in Linux--see -inotify(7). An inotify instance can be used to monitor files and directories for -modifications, creation/deletion, etc. The inotify API consists of system calls -that create inotify instances (inotify_init/inotify_init1) and add/remove -watches on files to an instance (inotify_add_watch/inotify_rm_watch). Events are -generated from various places in the sentry, including the syscall layer, the -vfs layer, the process fd table, and within each filesystem implementation. This -document outlines the implementation details of inotify in VFS2. - -## Inotify Objects - -Inotify data structures are implemented in the vfs package. - -### vfs.Inotify - -Inotify instances are represented by vfs.Inotify objects, which implement -vfs.FileDescriptionImpl. As in Linux, inotify fds are backed by a -pseudo-filesystem (anonfs). Each inotify instance receives events from a set of -vfs.Watch objects, which can be modified with inotify_add_watch(2) and -inotify_rm_watch(2). An application can retrieve events by reading the inotify -fd. - -### vfs.Watches - -The set of all watches held on a single file (i.e., the watch target) is stored -in vfs.Watches. Each watch will belong to a different inotify instance (an -instance can only have one watch on any watch target). The watches are stored in -a map indexed by their vfs.Inotify owner’s id. Hard links and file descriptions -to a single file will all share the same vfs.Watches (with the exception of the -gofer filesystem, described in a later section). Activity on the target causes -its vfs.Watches to generate notifications on its watches’ inotify instances. - -### vfs.Watch - -A single watch, owned by one inotify instance and applied to one watch target. -Both the vfs.Inotify owner and vfs.Watches on the target will hold a vfs.Watch, -which leads to some complicated locking behavior (see Lock Ordering). Whenever a -watch is notified of an event on its target, it will queue events to its inotify -instance for delivery to the user. - -### vfs.Event - -vfs.Event is a simple struct encapsulating all the fields for an inotify event. -It is generated by vfs.Watches and forwarded to the watches' owners. It is -serialized to the user during read(2) syscalls on the associated fs.Inotify's -fd. - -## Lock Ordering - -There are three locks related to the inotify implementation: - -Inotify.mu: the inotify instance lock. Inotify.evMu: the inotify event queue -lock. Watches.mu: the watch set lock, used to protect the collection of watches -on a target. - -The correct lock ordering for inotify code is: - -Inotify.mu -> Watches.mu -> Inotify.evMu. - -Note that we use a distinct lock to protect the inotify event queue. If we -simply used Inotify.mu, we could simultaneously have locks being acquired in the -order of Inotify.mu -> Watches.mu and Watches.mu -> Inotify.mu, which would -cause deadlocks. For instance, adding a watch to an inotify instance would -require locking Inotify.mu, and then adding the same watch to the target would -cause Watches.mu to be held. At the same time, generating an event on the target -would require Watches.mu to be held before iterating through each watch, and -then notifying the owner of each watch would cause Inotify.mu to be held. - -See the vfs package comment to understand how inotify locks fit into the overall -ordering of filesystem locks. - -## Watch Targets in Different Filesystem Implementations - -In Linux, watches reside on inodes at the virtual filesystem layer. As a result, -all hard links and file descriptions on a single file will all share the same -watch set. In VFS2, there is no common inode structure across filesystem types -(some may not even have inodes), so we have to plumb inotify support through -each specific filesystem implementation. Some of the technical considerations -are outlined below. - -### Tmpfs - -For filesystems with inodes, like tmpfs, the design is quite similar to that of -Linux, where watches reside on the inode. - -### Pseudo-filesystems - -Technically, because inotify is implemented at the vfs layer in Linux, -pseudo-filesystems on top of kernfs support inotify passively. However, watches -can only track explicit filesystem operations like read/write, open/close, -mknod, etc., so watches on a target like /proc/self/fd will not generate events -every time a new fd is added or removed. As of this writing, we leave inotify -unimplemented in kernfs and anonfs; it does not seem particularly useful. - -### Gofer Filesystem (fsimpl/gofer) - -The gofer filesystem has several traits that make it difficult to support -inotify: - -* **There are no inodes.** A file is represented as a dentry that holds an - unopened p9 file (and possibly an open FID), through which the Sentry - interacts with the gofer. - * *Solution:* Because there is no inode structure stored in the sandbox, - inotify watches must be held on the dentry. For the purposes of inotify, - we assume that every dentry corresponds to a unique inode, which may - cause unexpected behavior in the presence of hard links, where multiple - dentries should share the same set of watches. Indeed, it is impossible - for us to be absolutely sure whether dentries correspond to the same - file or not, due to the following point: -* **The Sentry cannot always be aware of hard links on the remote - filesystem.** There is no way for us to confirm whether two files on the - remote filesystem are actually links to the same inode. QIDs and inodes are - not always 1:1. The assumption that dentries and inodes are 1:1 is - inevitably broken if there are remote hard links that we cannot detect. - * *Solution:* this is an issue with gofer fs in general, not only inotify, - and we will have to live with it. -* **Dentries can be cached, and then evicted.** Dentry lifetime does not - correspond to file lifetime. Because gofer fs is not entirely in-memory, the - absence of a dentry does not mean that the corresponding file does not - exist, nor does a dentry reaching zero references mean that the - corresponding file no longer exists. When a dentry reaches zero references, - it will be cached, in case the file at that path is needed again in the - future. However, the dentry may be evicted from the cache, which will cause - a new dentry to be created next time the same file path is used. The - existing watches will be lost. - * *Solution:* When a dentry reaches zero references, do not cache it if it - has any watches, so we can avoid eviction/destruction. Note that if the - dentry was deleted or invalidated (d.vfsd.IsDead()), we should still - destroy it along with its watches. Additionally, when a dentry’s last - watch is removed, we cache it if it also has zero references. This way, - the dentry can eventually be evicted from memory if it is no longer - needed. -* **Dentries can be invalidated.** Another issue with dentry lifetime is that - the remote file at the file path represented may change from underneath the - dentry. In this case, the next time that the dentry is used, it will be - invalidated and a new dentry will replace it. In this case, it is not clear - what should be done with the watches on the old dentry. - * *Solution:* Silently destroy the watches when invalidation occurs. We - have no way of knowing exactly what happened, when it happens. Inotify - instances on NFS files in Linux probably behave in a similar fashion, - since inotify is implemented at the vfs layer and is not aware of the - complexities of remote file systems. - * An alternative would be to issue some kind of event upon invalidation, - e.g. a delete event, but this has several issues: - * We cannot discern whether the remote file was invalidated because it was - moved, deleted, etc. This information is crucial, because these cases - should result in different events. Furthermore, the watches should only - be destroyed if the file has been deleted. - * Moreover, the mechanism for detecting whether the underlying file has - changed is to check whether a new QID is given by the gofer. This may - result in false positives, e.g. suppose that the server closed and - re-opened the same file, which may result in a new QID. - * Finally, the time of the event may be completely different from the time - of the file modification, since a dentry is not immediately notified - when the underlying file has changed. It would be quite unexpected to - receive the notification when invalidation was triggered, i.e. the next - time the file was accessed within the sandbox, because then the - read/write/etc. operation on the file would not result in the expected - event. - * Another point in favor of the first solution: inotify in Linux can - already be lossy on local filesystems (one of the sacrifices made so - that filesystem performance isn’t killed), and it is lossy on NFS for - similar reasons to gofer fs. Therefore, it is better for inotify to be - silent than to emit incorrect notifications. -* **There may be external users of the remote filesystem.** We can only track - operations performed on the file within the sandbox. This is sufficient - under InteropModeExclusive, but whenever there are external users, the set - of actions we are aware of is incomplete. - * *Solution:* We could either return an error or just issue a warning when - inotify is used without InteropModeExclusive. Although faulty, VFS1 - allows it when the filesystem is shared, and Linux does the same for - remote filesystems (as mentioned above, inotify sits at the vfs level). - -## Dentry Interface - -For events that must be generated above the vfs layer, we provide the following -DentryImpl methods to allow interactions with targets on any FilesystemImpl: - -* **InotifyWithParent()** generates events on the dentry’s watches as well as - its parent’s. -* **Watches()** retrieves the watch set of the target represented by the - dentry. This is used to access and modify watches on a target. -* **OnZeroWatches()** performs cleanup tasks after the last watch is removed - from a dentry. This is needed by gofer fs, which must allow a watched dentry - to be cached once it has no more watches. Most implementations can just do - nothing. Note that OnZeroWatches() must be called after all inotify locks - are released to preserve lock ordering, since it may acquire - FilesystemImpl-specific locks. - -## IN_EXCL_UNLINK - -There are several options that can be set for a watch, specified as part of the -mask in inotify_add_watch(2). In particular, IN_EXCL_UNLINK requires some -additional support in each filesystem. - -A watch with IN_EXCL_UNLINK will not generate events for its target if it -corresponds to a path that was unlinked. For instance, if an fd is opened on -“foo/bar” and “foo/bar” is subsequently unlinked, any reads/writes/etc. on the -fd will be ignored by watches on “foo” or “foo/bar” with IN_EXCL_UNLINK. This -requires each DentryImpl to keep track of whether it has been unlinked, in order -to determine whether events should be sent to watches with IN_EXCL_UNLINK. - -## IN_ONESHOT - -One-shot watches expire after generating a single event. When an event occurs, -all one-shot watches on the target that successfully generated an event are -removed. Lock ordering can cause the management of one-shot watches to be quite -expensive; see Watches.Notify() for more information. |