gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-05-13	Enable overlayfs_stale_read by default for runsc.	Jamie Liu
	Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char *argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-12	Don't allow rename across different gofer or tmpfs mounts.	Nicolas Lacasse
	Fixes #2651. PiperOrigin-RevId: 311193661
2020-05-07	Update privateunixsocket TODOs.	Dean Deng
	Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474
2020-05-05	Translate p9.NoUID/GID to OverflowUID/GID.	Jamie Liu
	p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in particular, using it for file ownership causes capabilities to be ineffective since file capabilities require that the file's KUID and KGID are mapped into the capability holder's user namespace [1], and auth.NoID is not mapped into any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the unlikely case that an application actually using the overflow KUID/KGID attempts an operation that is consequently permitted by client permission checks, the remote operation will still fail with EPERM. Since this changes the VFS2 gofer client to no longer ignore the invalid IDs entirely, this CL both permits and requires that we change synthetic mount point creation to use root credentials. [1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions. PiperOrigin-RevId: 309856455
2020-04-28	Support pipes and sockets in VFS2 gofer fs.	Dean Deng
	Named pipes and sockets can be represented in two ways in gofer fs: 1. As a file on the remote filesystem. In this case, all file operations are passed through 9p. 2. As a synthetic file that is internal to the sandbox. In this case, the dentry stores an endpoint or VFSPipe for sockets and pipes respectively, which replaces interactions with the remote fs through the gofer. In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p, and if it fails, fall back to the synthetic version. Updates #1200. PiperOrigin-RevId: 308828161
2020-04-21	Sentry metrics updates.	Dave Bailey
	Sentry metrics with nanoseconds units are labeled as such, and non-cumulative sentry metrics are supported. PiperOrigin-RevId: 307621080
2020-04-13	Remove obsolete TODOs for b/38173783	Jon Budd
	The comments in the ticket indicate that this behavior is fine and that the ticket should be closed, so we shouldn't need pointers to the ticket. PiperOrigin-RevId: 306266071
2020-04-10	Use O_CLOEXEC when dup'ing FDs	Fabricio Voznika
	The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737
2020-04-08	Remove InodeOperations FIXMEs that will be obsoleted by VFS2.	Dean Deng
	PiperOrigin-RevId: 305588941
2020-04-08	Handle utimes correctly for shared gofer filesystems.	Dean Deng
	Determine system time from within the sentry rather than relying on the remote filesystem to prevent inconsistencies. Resolve related TODOs; the time discrepancies in question don't exist anymore. PiperOrigin-RevId: 305557099
2020-02-10	Add context to comments.	Adin Scannell
	PiperOrigin-RevId: 294295852
2020-02-07	Support listxattr and removexattr syscalls.	Dean Deng
	Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-04	Add support for sentry internal pipe for gofer mounts	Fabricio Voznika
	Internal pipes are supported similarly to how internal UDS is done. It is also controlled by the same flag. Fixes #1102 PiperOrigin-RevId: 293150045
2020-01-27	Update package locations.	Adin Scannell
	Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27	Standardize on tools directory.	Adin Scannell
	PiperOrigin-RevId: 291745021
2020-01-16	Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.	Dean Deng
	There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300
2020-01-09	New sync package.	Ian Gudger
	* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2019-12-16	Fix UDS bind cause fd leak in gofer	Yong He
	After the finalizer optimize in 76039f895995c3fe0deef5958f843868685ecc38 commit, clientFile needs to closed before finalizer release it. The clientFile is not closed if it is created via gofer.(*inodeOperations).Bind, this will cause fd leak which is hold by gofer process. Fixes #1396 Signed-off-by: Yong He <chenglang.hy@antfin.com> Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-12-09	Redirect TODOs to gvisor.dev	Fabricio Voznika
	PiperOrigin-RevId: 284606233
2019-11-25	Merge pull request #1176 from xiaobo55x:runsc_boot	gVisor bot
	PiperOrigin-RevId: 282382564
2019-11-20	Pass OpenTruncate to gofer in Open call when opening file with O_TRUNC.	Nicolas Lacasse
	Note that the Sentry still calls Truncate() on the file before calling Open. A new p9 version check was added to ensure that the p9 server can handle the the OpenTruncate flag. If not, then the flag is stripped before sending. PiperOrigin-RevId: 281609112
2019-11-13	Enable runsc/boot support on arm64.	Haibo Xu
	This patch also include a minor change to replace syscall.Dup2 with syscall.Dup3 which was missed in a previous commit(ref a25a976). Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-10-16	Reorder BUILD license and load functions in gvisor.	Kevin Krakauer
	PiperOrigin-RevId: 275139066
2019-10-16	Fix problem with open FD when copy up is triggered in overlayfs	Fabricio Voznika
	Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-16	Support O_SYNC and O_DSYNC flags.	Nicolas Lacasse
	When any of these flags are set, all writes will trigger a subsequent fsync call. This behavior already existed for "write-through" mounts. O_DIRECT is treated as an alias for O_SYNC. Better support coming soon. PiperOrigin-RevId: 275114392
2019-09-30	Force timestamps to update when set via InodeOperations.SetTimestamps.	Nicolas Lacasse
	The gofer's CachingInodeOperations implementation contains an optimization for the common open-read-close pattern when we have a host FD. In this case, the host kernel will update the timestamp for us to a reasonably close time, so we don't need an extra RPC to the gofer. However, when the app explicitly sets the timestamps (via futimes or similar) then we actually DO need to update the timestamps, because the host kernel won't do it for us. To fix this, a new boolean `forceSetTimestamps` was added to CachineInodeOperations.SetMaskedAttributes. It is only set by gofer.InodeOperations.SetTimestamps. PiperOrigin-RevId: 272048146
2019-09-12	Remove go_test from go_stateify and go_marshal	Michael Pratt
	They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-08-29	Add limit_host_fd_translation Gofer mount option.	Jamie Liu
	PiperOrigin-RevId: 266177409
2019-06-28	Add finalizer on AtomicRefCount to check for leaks.	Ian Gudger
	PiperOrigin-RevId: 255711454
2019-06-27	Fix various spelling issues in the documentation	Michael Pratt
	Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779
2019-06-27	Cache directory entries in the overlay	Michael Pratt
	Currently, the overlay dirCache is only used for a single logical use of getdents. i.e., it is discard when the FD is closed or seeked back to the beginning. But the initial work of getting the directory contents can be quite expensive (particularly sorting large directories), so we should keep it as long as possible. This is very similar to the readdirCache in fs/gofer. Since the upper filesystem does not have to allow caching readdir entries, the new CacheReaddir MountSourceOperations method controls this behavior. This caching should be trivially movable to all Inodes if desired, though that adds an additional copy step for non-overlay Inodes. (Overlay Inodes already do the extra copy). PiperOrigin-RevId: 255477592
2019-06-13	Plumb context through more layers of filesytem.	Ian Gudger
	All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709
2019-06-13	Update canonical repository.	Adin Scannell
	This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-06	Use common definition of SockType.	Rahat Mahmood
	SockType isn't specific to unix domain sockets, and the current definition basically mirrors the linux ABI's definition. PiperOrigin-RevId: 251956740
2019-06-03	gvisor/sock/unix: pass creds when a message is sent between unconnected sockets	Andrei Vagin
	and don't report a sender address if it doesn't have one PiperOrigin-RevId: 251371284
2019-05-21	Add basic plumbing for splice and stub implementation.	Adin Scannell
	This does not actually implement an efficient splice or sendfile. Rather, it adds a generic plumbing to the file internals so that this can be added. All file implementations use the stub fileutil.NoSplice implementation, which causes sendfile and splice to fall back to an internal copy. A basic splice system call interface is added, along with a test. PiperOrigin-RevId: 249335960 Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-05-20	Forward named pipe creation to the gofer	Michael Pratt
	The backing 9p server must allow named pipe creation, which the runsc fsgofer currently does not. There are small changes to the overlay here. GetFile may block when opening a named pipe, which can cause a deadlock: 1. open(O_RDONLY) -> copyMu.Lock() -> GetFile() 2. open(O_WRONLY) -> copyMu.Lock() -> Deadlock A named pipe usable for writing must already be on the upper filesystem, but we are still taking copyMu for write when checking for upper. That can be changed to a read lock to fix the common case. However, a named pipe on the lower filesystem would still deadlock in open(O_WRONLY) when it tries to actually perform copy up (which would simply return EINVAL). Move the copy up type check before taking copyMu for write to avoid this. p9 must be modified, as it was incorrectly removing the file mode when sending messages on the wire. PiperOrigin-RevId: 249154033 Change-Id: Id6637130e567b03758130eb6c7cdbc976384b7d6
2019-05-20	Fix incorrect tmpfs timestamp updates	Michael Pratt
	* Creation of files, directories (and other fs objects) in a directory should always update ctime. * Same for removal. * atime should not be updated on lookup, only readdir. I've also renamed some misleading functions that update mtime and ctime. PiperOrigin-RevId: 249115063 Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7
2019-05-17	Return EPERM for mknod	Michael Pratt
	This more directly matches what Linux does with unsupported nodes. PiperOrigin-RevId: 248780425 Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec
2019-05-17	Fix gofer rename ctime and cleanup stat_times test	Michael Pratt
	There is a lot of redundancy that we can simplify in the stat_times test. This will make it easier to add new tests. However, the simplification reveals that cached uattrs on goferfs don't properly update ctime on rename. PiperOrigin-RevId: 248773425 Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf
2019-05-15	gofer: don't call hostfile.Close if hostFile is nil	Andrei Vagin
	PiperOrigin-RevId: 248437159 Change-Id: Ife71f6ca032fca59ec97a82961000ed0af257101
2019-05-14	Remove false comment	Michael Pratt
	PiperOrigin-RevId: 248249285 Change-Id: I9b6d267baa666798b22def590ff20c9a118efd47
2019-05-09	Implement fallocate(2)	Fabricio Voznika
	Closes #225 PiperOrigin-RevId: 247508791 Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
2019-05-07	Remove defers from gofer.contextFile	Fabricio Voznika
	Most are single line methods in hot paths. PiperOrigin-RevId: 247050267 Change-Id: I428d78723fe00b57483185899dc8fa9e1f01e2ea
2019-05-03	gofer: don't leak file descriptors	Andrei Vagin
	Fixes #219 PiperOrigin-RevId: 246568639 Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a
2019-04-29	Change copyright notice to "The gVisor Authors"	Michael Pratt
	Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29	Allow and document bug ids in gVisor codebase.	Nicolas Lacasse
	PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-25	Don't enforce NAME_MAX in fs.Dirent.walk().	Jamie Liu
	Maximum filename length is filesystem-dependent, and obtained via statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not always. For example, VFAT supports filenames of up to 255... UCS-2 characters, which Linux conservatively takes to mean UTF-8-encoded bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. As a result, Linux's VFS does not enforce NAME_MAX: $ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/ fs/libfs.c 38: buf->f_namelen = NAME_MAX; 64: if (dentry->d_name.len > NAME_MAX) include/linux/relay.h 74: char base_filename[NAME_MAX]; /* saved base filename / include/linux/fscrypt.h 149: filenames up to NAME_MAX bytes, since base64 encoding expands the length. include/linux/exportfs.h 176: * understanding that it is already pointing to a a %NAME_MAX+1 sized Remove this check from core VFS, and add it to ramfs (and by extension tmpfs), where it is actually applicable: mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup does enforce NAME_MAX. PiperOrigin-RevId: 245324748 Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-17	Use FD limit and file size limit from host	Fabricio Voznika
	FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-04-11	Use open fids when fstat()ing gofer files.	Jamie Liu
	PiperOrigin-RevId: 243018347 Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37