gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-08-13	[syserror] Remove pkg syserror.	Zach Koopmans
	Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-12	[syserror] Convert remaining syserror definitions to linuxerr.	Zach Koopmans
	Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-07-12	[syserror] Update syserror to linuxerr for more errors.	Zach Koopmans
	Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-01	Mix checklocks and atomic analyzers.	Adin Scannell
	This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01	[syserror] Update several syserror errors to linuxerr equivalents.	Zach Koopmans
	Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-30	[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.	Zach Koopmans
	Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-22	[syserror] Add conversions to linuxerr with temporary Equals method.	Zach Koopmans
	Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-09	Change TODO to NOTE.	Nicolas Lacasse
	It's in VFS1 code, so we probably will not do it. PiperOrigin-RevId: 378474174
2021-05-25	setgid directories for VFS1 tmpfs, overlayfs, and goferfs	Kevin Krakauer
	PiperOrigin-RevId: 375780659
2021-05-18	Delete /cloud/gvisor/sandbox/sentry/gofer/opened_write_execute_file metric	Nayana Bidari
	This metric is replaced by /cloud/gvisor/sandbox/sentry/suspicious_operations metric with field value opened_write_execute_file. PiperOrigin-RevId: 374509823
2021-05-14	Add new metric for suspicious operations.	Nayana Bidari
	The new metric contains fields and will replace the below existing metric: - opened_write_execute_file PiperOrigin-RevId: 373884604
2021-03-29	[syserror] Split usermem package	Zach Koopmans
	Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-22	Avoid calling sync on each write in writethrough mode.	Nicolas Lacasse
	PiperOrigin-RevId: 364370595
2021-03-03	Add checklocks analyzer.	Bhasker Hariharan
	This validates that struct fields if annotated with "// checklocks:mu" where "mu" is a mutex field in the same struct then access to the field is only done with "mu" locked. All types that are guarded by a mutex must be annotated with // +checklocks:<mutex field name> For more details please refer to README.md. PiperOrigin-RevId: 360729328
2021-03-03	[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.	Ayush Ranjan
	The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-01-26	Implement error on pointers	Tamir Duberstein
	This improves type-assertion safety. PiperOrigin-RevId: 353931228
2020-12-23	vfs1: don't allow to open socket files	Andrei Vagin
	open() has to return ENXIO in this case. O_PATH isn't supported by vfs1. PiperOrigin-RevId: 348820478
2020-12-11	Remove existing nogo exceptions.	Adin Scannell
	PiperOrigin-RevId: 347047550
2020-11-19	Require sync.Mutex to lock and unlock from the same goroutine	Michael Pratt
	We would like to track locks ordering to detect ordering violations. Detecting violations is much simpler if mutexes must be unlocked by the same goroutine that locked them. Thus, as a first step to tracking lock ordering, add this lock/unlock requirement to gVisor's sync.Mutex. This is more strict than the Go standard library's sync.Mutex, but initial testing indicates only a single lock that is used across goroutines. The new sync.CrossGoroutineMutex relaxes the requirement (but will not provide lock order checking). Due to the additional overhead, enforcement is only enabled with the "checklocks" build tag. Build with this tag using: bazel build --define=gotags=checklocks ... From my spot-checking, this has no changed inlining properties when disabled. Updates #4804 PiperOrigin-RevId: 343370200
2020-11-18	Port filesystem metrics to VFS2.	Jamie Liu
	PiperOrigin-RevId: 343196927
2020-11-03	Make pipe min/max sizes match linux.	Nicolas Lacasse
	The default pipe size already matched linux, and is unchanged. Furthermore `atomicIOBytes` is made a proper constant (as it is in Linux). We were plumbing usermem.PageSize everywhere, so this is no functional change. PiperOrigin-RevId: 340497006
2020-08-03	Plumbing context.Context to DecRef() and Release().	Nayana Bidari
	context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-06-09	Don't WriteOut to readonly mounts	Fabricio Voznika
	When the file closes, it attempts to write dirty cached attributes to the file. This should not be done when the mount is readonly. PiperOrigin-RevId: 315585058
2020-05-13	Enable overlayfs_stale_read by default for runsc.	Jamie Liu
	Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char *argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-12	Don't allow rename across different gofer or tmpfs mounts.	Nicolas Lacasse
	Fixes #2651. PiperOrigin-RevId: 311193661
2020-05-07	Update privateunixsocket TODOs.	Dean Deng
	Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474
2020-05-05	Translate p9.NoUID/GID to OverflowUID/GID.	Jamie Liu
	p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in particular, using it for file ownership causes capabilities to be ineffective since file capabilities require that the file's KUID and KGID are mapped into the capability holder's user namespace [1], and auth.NoID is not mapped into any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the unlikely case that an application actually using the overflow KUID/KGID attempts an operation that is consequently permitted by client permission checks, the remote operation will still fail with EPERM. Since this changes the VFS2 gofer client to no longer ignore the invalid IDs entirely, this CL both permits and requires that we change synthetic mount point creation to use root credentials. [1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions. PiperOrigin-RevId: 309856455
2020-04-28	Support pipes and sockets in VFS2 gofer fs.	Dean Deng
	Named pipes and sockets can be represented in two ways in gofer fs: 1. As a file on the remote filesystem. In this case, all file operations are passed through 9p. 2. As a synthetic file that is internal to the sandbox. In this case, the dentry stores an endpoint or VFSPipe for sockets and pipes respectively, which replaces interactions with the remote fs through the gofer. In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p, and if it fails, fall back to the synthetic version. Updates #1200. PiperOrigin-RevId: 308828161
2020-04-21	Sentry metrics updates.	Dave Bailey
	Sentry metrics with nanoseconds units are labeled as such, and non-cumulative sentry metrics are supported. PiperOrigin-RevId: 307621080
2020-04-13	Remove obsolete TODOs for b/38173783	Jon Budd
	The comments in the ticket indicate that this behavior is fine and that the ticket should be closed, so we shouldn't need pointers to the ticket. PiperOrigin-RevId: 306266071
2020-04-10	Use O_CLOEXEC when dup'ing FDs	Fabricio Voznika
	The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737
2020-04-08	Remove InodeOperations FIXMEs that will be obsoleted by VFS2.	Dean Deng
	PiperOrigin-RevId: 305588941
2020-04-08	Handle utimes correctly for shared gofer filesystems.	Dean Deng
	Determine system time from within the sentry rather than relying on the remote filesystem to prevent inconsistencies. Resolve related TODOs; the time discrepancies in question don't exist anymore. PiperOrigin-RevId: 305557099
2020-02-10	Add context to comments.	Adin Scannell
	PiperOrigin-RevId: 294295852
2020-02-07	Support listxattr and removexattr syscalls.	Dean Deng
	Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-04	Add support for sentry internal pipe for gofer mounts	Fabricio Voznika
	Internal pipes are supported similarly to how internal UDS is done. It is also controlled by the same flag. Fixes #1102 PiperOrigin-RevId: 293150045
2020-01-27	Update package locations.	Adin Scannell
	Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27	Standardize on tools directory.	Adin Scannell
	PiperOrigin-RevId: 291745021
2020-01-16	Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.	Dean Deng
	There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300
2020-01-09	New sync package.	Ian Gudger
	* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2019-12-16	Fix UDS bind cause fd leak in gofer	Yong He
	After the finalizer optimize in 76039f895995c3fe0deef5958f843868685ecc38 commit, clientFile needs to closed before finalizer release it. The clientFile is not closed if it is created via gofer.(*inodeOperations).Bind, this will cause fd leak which is hold by gofer process. Fixes #1396 Signed-off-by: Yong He <chenglang.hy@antfin.com> Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-12-09	Redirect TODOs to gvisor.dev	Fabricio Voznika
	PiperOrigin-RevId: 284606233
2019-11-25	Merge pull request #1176 from xiaobo55x:runsc_boot	gVisor bot
	PiperOrigin-RevId: 282382564
2019-11-20	Pass OpenTruncate to gofer in Open call when opening file with O_TRUNC.	Nicolas Lacasse
	Note that the Sentry still calls Truncate() on the file before calling Open. A new p9 version check was added to ensure that the p9 server can handle the the OpenTruncate flag. If not, then the flag is stripped before sending. PiperOrigin-RevId: 281609112
2019-11-13	Enable runsc/boot support on arm64.	Haibo Xu
	This patch also include a minor change to replace syscall.Dup2 with syscall.Dup3 which was missed in a previous commit(ref a25a976). Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-10-16	Reorder BUILD license and load functions in gvisor.	Kevin Krakauer
	PiperOrigin-RevId: 275139066
2019-10-16	Fix problem with open FD when copy up is triggered in overlayfs	Fabricio Voznika
	Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-16	Support O_SYNC and O_DSYNC flags.	Nicolas Lacasse
	When any of these flags are set, all writes will trigger a subsequent fsync call. This behavior already existed for "write-through" mounts. O_DIRECT is treated as an alias for O_SYNC. Better support coming soon. PiperOrigin-RevId: 275114392
2019-09-30	Force timestamps to update when set via InodeOperations.SetTimestamps.	Nicolas Lacasse
	The gofer's CachingInodeOperations implementation contains an optimization for the common open-read-close pattern when we have a host FD. In this case, the host kernel will update the timestamp for us to a reasonably close time, so we don't need an extra RPC to the gofer. However, when the app explicitly sets the timestamps (via futimes or similar) then we actually DO need to update the timestamps, because the host kernel won't do it for us. To fix this, a new boolean `forceSetTimestamps` was added to CachineInodeOperations.SetMaskedAttributes. It is only set by gofer.InodeOperations.SetTimestamps. PiperOrigin-RevId: 272048146
2019-09-12	Remove go_test from go_stateify and go_marshal	Michael Pratt
	They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264