Age | Commit message (Collapse) | Author |
|
|
|
Major differences from existing overlay filesystems:
- Linux allows lower layers in an overlay to require revalidation, but not the
upper layer. VFS1 allows the upper layer in an overlay to require
revalidation, but not the lower layer. VFS2 does not allow any layers to
require revalidation. (Now that vfs.MkdirOptions.ForSyntheticMountpoint
exists, no uses of overlay in VFS1 are believed to require upper layer
revalidation; in particular, the requirement that the upper layer support the
creation of "trusted." extended attributes for whiteouts effectively required
the upper filesystem to be tmpfs in most cases.)
- Like VFS1, but unlike Linux, VFS2 overlay does not attempt to make mutations
of the upper layer atomic using a working directory and features like
RENAME_WHITEOUT. (This may change in the future, since not having a working
directory makes error recovery for some operations, e.g. rmdir, particularly
painful.)
- Like Linux, but unlike VFS1, VFS2 represents whiteouts using character
devices with rdev == 0; the equivalent of the whiteout attribute on
directories is xattr trusted.overlay.opaque = "y"; and there is no equivalent
to the whiteout attribute on non-directories since non-directories are never
merged with lower layers.
- Device and inode numbers work as follows:
- In Linux, modulo the xino feature and a special case for when all layers
are the same filesystem:
- Directories use the overlay filesystem's device number and an
ephemeral inode number assigned by the overlay.
- Non-directories that have been copied up use the device and inode
number assigned by the upper filesystem.
- Non-directories that have not been copied up use a per-(overlay,
layer)-pair device number and the inode number assigned by the lower
filesystem.
- In VFS1, device and inode numbers always come from the lower layer unless
"whited out"; this has the adverse effect of requiring interaction with
the lower filesystem even for non-directory files that exist on the upper
layer.
- In VFS2, device and inode numbers are assigned as in Linux, except that
xino and the samefs special case are not supported.
- Like Linux, but unlike VFS1, VFS2 does not attempt to maintain memory mapping
coherence across copy-up. (This may have to change in the future, as users
may be dependent on this property.)
- Like Linux, but unlike VFS1, VFS2 uses the overlayfs mounter's credentials
when interacting with the overlay's layers, rather than the caller's.
- Like Linux, but unlike VFS1, VFS2 permits multiple lower layers in an
overlay.
- Like Linux, but unlike VFS1, VFS2's overlay filesystem is
application-mountable.
Updates #1199
PiperOrigin-RevId: 316019067
|
|
During inititalization inode struct was copied around, but
it isn't great pratice to copy it around since it contains
ref count and sync.Mutex.
Updates #1480
PiperOrigin-RevId: 315983788
|
|
|
|
LockFD is the generic implementation that can be embedded in
FileDescriptionImpl implementations. Unique lock ID is
maintained in vfs.FileDescription and is created on demand.
Updates #1480
PiperOrigin-RevId: 315604825
|
|
|
|
As in VFS1, the mode, uid, and gid options are supported.
Updates #1197
PiperOrigin-RevId: 315340510
|
|
|
|
This is mostly syscall plumbing, VFS2 already implements the internals of
mounts. In addition to the syscall defintions, the following mount-related
mechanisms are updated:
- Implement MS_NOATIME for VFS2, but only for tmpfs and goferfs. The other VFS2
filesystems don't implement node-level timestamps yet.
- Implement the 'mode', 'uid' and 'gid' mount options for VFS2's tmpfs.
- Plumb mount namespace ownership, which is necessary for checking appropriate
capabilities during mount(2).
Updates #1035
PiperOrigin-RevId: 315035352
|
|
|
|
gofer.filesystem.createAndOpenChildLocked() doesn't need to take a reference on
the new dentry since vfs.FileDescription.Init() will do so.
PiperOrigin-RevId: 314242127
|
|
|
|
Using tee instead of read to detect when a O_RDONLY|O_NONBLOCK pipe FD has a
writer circumvents the problem of what to do with the byte read from the pipe,
avoiding much of the complexity of the fdpipe package.
PiperOrigin-RevId: 314216146
|
|
|
|
Limited to tmpfs. Inotify support in other filesystem implementations to
follow.
Updates #1479
PiperOrigin-RevId: 313828648
|
|
|
|
PiperOrigin-RevId: 313817646
|
|
|
|
Support in other filesystem impls is still needed. Unlike in Linux and vfs1, we
need to plumb inotify down to each filesystem implementation in order to keep
track of links/inode structures properly.
IN_EXCL_UNLINK still needs to be implemented, as well as a few inotify hooks
that are not present in either vfs1 or vfs2. Those will be addressed in
subsequent changes.
Updates #1479.
PiperOrigin-RevId: 313781995
|
|
|
|
Inotify sends events when a watch target is reaches a link count of 0 (see
include/linux/fsnotify.h:fsnotify_inoderemove). Currently, we do not account
for both dir/ and dir/.. in unlink, causing
syscalls/linux/inotify.cc:WatchTargetDeletionGeneratesEvent to fail because
the expected inotify events are not generated.
Furthermore, we should DecRef() once the inode reaches zero links; otherwise,
we will leak a reference.
PiperOrigin-RevId: 313502091
|
|
|
|
PiperOrigin-RevId: 313332542
|
|
Updates #138
PiperOrigin-RevId: 313326354
|
|
PiperOrigin-RevId: 312559861
|
|
In VFS1, both fs/host and fs/gofer used the same utils for host file mappings.
Refactor parts of fsimpl/gofer to create similar utils to share with
fsimpl/host (memory accounting code moved to fsutil, page rounding arithmetic
moved to usermem).
Updates #1476.
PiperOrigin-RevId: 312345090
|
|
|
|
As new functionality is added to VFS2, corresponding files in VFS1
don't need to be changed.
PiperOrigin-RevId: 312153799
|
|
|
|
PiperOrigin-RevId: 311657502
|
|
|
|
Closes #2612.
PiperOrigin-RevId: 311548074
|
|
|
|
Closes #1197
PiperOrigin-RevId: 311438223
|
|
|
|
Linux 4.18 and later make reads and writes coherent between pre-copy-up and
post-copy-up FDs representing the same file on an overlay filesystem. However,
memory mappings remain incoherent:
- Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file
residing on a lower layer is opened for read-only and then memory mapped with
MAP_SHARED, then subsequent changes to the file are not reflected in the
memory mapping."
- fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any
management of coherence in the overlay.
- Experimentally on Linux 5.2:
```
$ cat mmap_cat_page.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2) {
errx(1, "syntax: %s [FILE]", argv[0]);
}
const int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
err(1, "open(%s)", argv[1]);
}
const size_t page_size = sysconf(_SC_PAGE_SIZE);
void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
if (page == MAP_FAILED) {
err(1, "mmap");
}
for (;;) {
write(1, page, strnlen(page, page_size));
if (getc(stdin) == EOF) {
break;
}
}
return 0;
}
$ gcc -O2 -o mmap_cat_page mmap_cat_page.c
$ mkdir lowerdir upperdir workdir overlaydir
$ echo old > lowerdir/file
$ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir
$ ./mmap_cat_page overlaydir/file
old
^Z
[1]+ Stopped ./mmap_cat_page overlaydir/file
$ echo new > overlaydir/file
$ cat overlaydir/file
new
$ fg
./mmap_cat_page overlaydir/file
old
```
Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only
necessary pre-4.18, replacing existing memory mappings (in both sentry and
application address spaces) with mappings of the new FD is required regardless
of kernel version, and this latter behavior is common to both VFS1 and VFS2.
Re-document accordingly, and change the runsc flag to enabled by default.
New test:
- Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b
- After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab
PiperOrigin-RevId: 311361267
|
|
|
|
PiperOrigin-RevId: 311046755
|
|
|
|
PiperOrigin-RevId: 311014995
|
|
|
|
This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]")
effective, and it prevents imported pipes/sockets/character devices from being
opened with O_NONBLOCK unconditionally (because the underlying host FD was set
to non-blocking in ImportFD()).
PiperOrigin-RevId: 310596062
|
|
|
|
Updates #1197, #1198, #1672
PiperOrigin-RevId: 310432006
|
|
|
|
|
|
They don't depend on anything in VFS2, so they should be their own packages.
PiperOrigin-RevId: 310416807
|
|
|
|
PiperOrigin-RevId: 310404113
|
|
|