summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/fsimpl
AgeCommit message (Collapse)Author
2020-06-12Merge release-20200522.0-145-g77c206e37 (automated)gVisor bot
2020-06-11Add //pkg/sentry/fsimpl/overlay.Jamie Liu
Major differences from existing overlay filesystems: - Linux allows lower layers in an overlay to require revalidation, but not the upper layer. VFS1 allows the upper layer in an overlay to require revalidation, but not the lower layer. VFS2 does not allow any layers to require revalidation. (Now that vfs.MkdirOptions.ForSyntheticMountpoint exists, no uses of overlay in VFS1 are believed to require upper layer revalidation; in particular, the requirement that the upper layer support the creation of "trusted." extended attributes for whiteouts effectively required the upper filesystem to be tmpfs in most cases.) - Like VFS1, but unlike Linux, VFS2 overlay does not attempt to make mutations of the upper layer atomic using a working directory and features like RENAME_WHITEOUT. (This may change in the future, since not having a working directory makes error recovery for some operations, e.g. rmdir, particularly painful.) - Like Linux, but unlike VFS1, VFS2 represents whiteouts using character devices with rdev == 0; the equivalent of the whiteout attribute on directories is xattr trusted.overlay.opaque = "y"; and there is no equivalent to the whiteout attribute on non-directories since non-directories are never merged with lower layers. - Device and inode numbers work as follows: - In Linux, modulo the xino feature and a special case for when all layers are the same filesystem: - Directories use the overlay filesystem's device number and an ephemeral inode number assigned by the overlay. - Non-directories that have been copied up use the device and inode number assigned by the upper filesystem. - Non-directories that have not been copied up use a per-(overlay, layer)-pair device number and the inode number assigned by the lower filesystem. - In VFS1, device and inode numbers always come from the lower layer unless "whited out"; this has the adverse effect of requiring interaction with the lower filesystem even for non-directory files that exist on the upper layer. - In VFS2, device and inode numbers are assigned as in Linux, except that xino and the samefs special case are not supported. - Like Linux, but unlike VFS1, VFS2 does not attempt to maintain memory mapping coherence across copy-up. (This may have to change in the future, as users may be dependent on this property.) - Like Linux, but unlike VFS1, VFS2 uses the overlayfs mounter's credentials when interacting with the overlay's layers, rather than the caller's. - Like Linux, but unlike VFS1, VFS2 permits multiple lower layers in an overlay. - Like Linux, but unlike VFS1, VFS2's overlay filesystem is application-mountable. Updates #1199 PiperOrigin-RevId: 316019067
2020-06-11Don't copy structs with sync.Mutex during initializationFabricio Voznika
During inititalization inode struct was copied around, but it isn't great pratice to copy it around since it contains ref count and sync.Mutex. Updates #1480 PiperOrigin-RevId: 315983788
2020-06-10Merge release-20200522.0-112-g67565078b (automated)gVisor bot
2020-06-09Implement flock(2) in VFS2Fabricio Voznika
LockFD is the generic implementation that can be embedded in FileDescriptionImpl implementations. Unique lock ID is maintained in vfs.FileDescription and is created on demand. Updates #1480 PiperOrigin-RevId: 315604825
2020-06-08Merge release-20200522.0-95-gdc029b4b (automated)gVisor bot
2020-06-08Implement VFS2 tmpfs mount options.Jamie Liu
As in VFS1, the mode, uid, and gid options are supported. Updates #1197 PiperOrigin-RevId: 315340510
2020-06-06Merge release-20200522.0-89-g21b6bc72 (automated)gVisor bot
2020-06-05Implement mount(2) and umount2(2) for VFS2.Rahat Mahmood
This is mostly syscall plumbing, VFS2 already implements the internals of mounts. In addition to the syscall defintions, the following mount-related mechanisms are updated: - Implement MS_NOATIME for VFS2, but only for tmpfs and goferfs. The other VFS2 filesystems don't implement node-level timestamps yet. - Implement the 'mode', 'uid' and 'gid' mount options for VFS2's tmpfs. - Plumb mount namespace ownership, which is necessary for checking appropriate capabilities during mount(2). Updates #1035 PiperOrigin-RevId: 315035352
2020-06-02Merge release-20200522.0-56-g49a9b78f (automated)gVisor bot
2020-06-01Fix VFS2 gofer open(O_CREAT) reference leak.Jamie Liu
gofer.filesystem.createAndOpenChildLocked() doesn't need to take a reference on the new dentry since vfs.FileDescription.Init() will do so. PiperOrigin-RevId: 314242127
2020-06-01Merge release-20200522.0-55-g3a987160 (automated)gVisor bot
2020-06-01Handle gofer blocking opens of host named pipes in VFS2.Jamie Liu
Using tee instead of read to detect when a O_RDONLY|O_NONBLOCK pipe FD has a writer circumvents the problem of what to do with the byte read from the pipe, avoiding much of the complexity of the fdpipe package. PiperOrigin-RevId: 314216146
2020-05-29Merge release-20200522.0-33-gccf69bdd (automated)gVisor bot
2020-05-29Implement IN_EXCL_UNLINK inotify option in vfs2.Dean Deng
Limited to tmpfs. Inotify support in other filesystem implementations to follow. Updates #1479 PiperOrigin-RevId: 313828648
2020-05-29Merge release-20200522.0-31-g9ada8c97 (automated)gVisor bot
2020-05-29Fix the smallest of typos.Dean Deng
PiperOrigin-RevId: 313817646
2020-05-29Merge release-20200522.0-27-gfe464f44 (automated)gVisor bot
2020-05-29Port inotify to vfs2, with support in tmpfs.Dean Deng
Support in other filesystem impls is still needed. Unlike in Linux and vfs1, we need to plumb inotify down to each filesystem implementation in order to keep track of links/inode structures properly. IN_EXCL_UNLINK still needs to be implemented, as well as a few inotify hooks that are not present in either vfs1 or vfs2. Those will be addressed in subsequent changes. Updates #1479. PiperOrigin-RevId: 313781995
2020-05-28Merge release-20200518.0-48-g32021bce (automated)gVisor bot
2020-05-27Correctly update link and ref counts in rmdir.Dean Deng
Inotify sends events when a watch target is reaches a link count of 0 (see include/linux/fsnotify.h:fsnotify_inoderemove). Currently, we do not account for both dir/ and dir/.. in unlink, causing syscalls/linux/inotify.cc:WatchTargetDeletionGeneratesEvent to fail because the expected inotify events are not generated. Furthermore, we should DecRef() once the inode reaches zero links; otherwise, we will leak a reference. PiperOrigin-RevId: 313502091
2020-05-27Merge release-20200518.0-45-g0bc022b7 (automated)gVisor bot
2020-05-26Support dfltuid and dfltgid mount options in the VFS2 gofer client.Jamie Liu
PiperOrigin-RevId: 313332542
2020-05-26Implement splice(2) and tee(2) for VFS2.Jamie Liu
Updates #138 PiperOrigin-RevId: 313326354
2020-05-20Move fsimpl/host file offset from inode to fileDescription.Dean Deng
PiperOrigin-RevId: 312559861
2020-05-19Implement mmap for host fs in vfs2.Dean Deng
In VFS1, both fs/host and fs/gofer used the same utils for host file mappings. Refactor parts of fsimpl/gofer to create similar utils to share with fsimpl/host (memory accounting code moved to fsutil, page rounding arithmetic moved to usermem). Updates #1476. PiperOrigin-RevId: 312345090
2020-05-18Merge release-20200511.0-255-g20e6efd (automated)gVisor bot
2020-05-18Remove IfChange/ThenChange lint from VFS2Fabricio Voznika
As new functionality is added to VFS2, corresponding files in VFS1 don't need to be changed. PiperOrigin-RevId: 312153799
2020-05-15Merge release-20200422.0-308-gfb7e5f1 (automated)gVisor bot
2020-05-14Make utimes_test pass on VFS2.Jamie Liu
PiperOrigin-RevId: 311657502
2020-05-14Merge release-20200422.0-303-g47dfba7 (automated)gVisor bot
2020-05-14Port memfd_create to vfs2 and finish implementation of file seals.Nicolas Lacasse
Closes #2612. PiperOrigin-RevId: 311548074
2020-05-14Merge release-20200422.0-299-gdb655f0 (automated)gVisor bot
2020-05-13Resolve remaining TODOs for tmpfs.Nicolas Lacasse
Closes #1197 PiperOrigin-RevId: 311438223
2020-05-13Merge release-20200422.0-297-gd846077 (automated)gVisor bot
2020-05-13Enable overlayfs_stale_read by default for runsc.Jamie Liu
Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char **argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-12Merge release-20200422.0-69-g94251ae (automated)gVisor bot
2020-05-11Internal change.Jamie Liu
PiperOrigin-RevId: 311046755
2020-05-11Merge release-20200422.0-68-g15de8cc (automated)gVisor bot
2020-05-11Add fsimpl/gofer.InternalFilesystemOptions.OpenSocketsByConnecting.Jamie Liu
PiperOrigin-RevId: 311014995
2020-05-08Merge release-20200422.0-59-g21b7139 (automated)gVisor bot
2020-05-08Pass flags to fsimpl/host.inode.open().Jamie Liu
This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]") effective, and it prevents imported pipes/sockets/character devices from being opened with O_NONBLOCK unconditionally (because the underlying host FD was set to non-blocking in ImportFD()). PiperOrigin-RevId: 310596062
2020-05-07Merge release-20200422.0-52-g9115f26 (automated)gVisor bot
2020-05-07Allocate device numbers for VFS2 filesystems.Jamie Liu
Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006
2020-05-07Merge release-20200422.0-51-g1f4087e (automated)gVisor bot
2020-05-07Merge release-20200422.0-49-gd0b1d02 (automated)gVisor bot
2020-05-07Move pkg/sentry/vfs/{eventfd,timerfd} to new packages in pkg/sentry/fsimpl.Nicolas Lacasse
They don't depend on anything in VFS2, so they should be their own packages. PiperOrigin-RevId: 310416807
2020-05-07Merge release-20200422.0-47-g26c60d7 (automated)gVisor bot
2020-05-07Port signalfd to vfs2.Nicolas Lacasse
PiperOrigin-RevId: 310404113
2020-05-06Merge release-20200422.0-34-g591ff0e (automated)gVisor bot