gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-08-20	[op] Prevent file leak in MultiGetAttr's error path.	Ayush Ranjan
	The old implementation was mostly correct but error prone - making way for the issue in question here. In its error path, it would leak the intermediate file being walked. Each return/break needed explicit cleanup. This change implements a more clean way to cleaning up intermediate directories. If the code were to evolve to be more complex, it would still work. PiperOrigin-RevId: 392102826
2021-07-20	Add go:build directives as required by Go 1.17's gofmt.	Jamie Liu
	PiperOrigin-RevId: 385894869
2021-07-12	Fix GoLand analyzer errors under runsc/...	Fabricio Voznika
	PiperOrigin-RevId: 384344990
2021-05-13	Fix file descriptor leak in MultiGetAttr	Tiwei Bie
	We need to make sure that all children are closed before return. But the last child saved in parent isn't closed after we successfully iterate all the files in "names". This patch fixes this issue. Fixes #5982 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2021-04-28	Automated rollback of changelist 369686285	Fabricio Voznika
	PiperOrigin-RevId: 371015541
2021-04-21	Automated rollback of changelist 369325957	Michael Pratt
	PiperOrigin-RevId: 369686285
2021-04-19	Add MultiGetAttr message to 9P	Fabricio Voznika
	While using remote-validation, the vast majority of time spent during FS operations is re-walking the path to check for modifications and then closing the file given that in most cases it has not been modified externally. This change introduces a new 9P message called MultiGetAttr which bulks query attributes of several files in one shot. The returned attributes are then used to update cached dentries before they are walked. File attributes are updated for files that still exist. Dentries that have been deleted are removed from the cache. And negative cache entries are removed if a new file/directory was created externally. Similarly, synthetic dentries are replaced if a file/directory is created externally. The bulk update needs to be carefull not to follow symlinks, cross mount points, because the gofer doesn't know how to resolve symlinks and where mounts points are located. It also doesn't walk to the parent ("..") to avoid deadlocks. Here are the results: Workload VFS1 VFS2 Change bazel action 115s 70s 28.8s Stat/100 11,043us 7,623us 974us Updates #1638 PiperOrigin-RevId: 369325957
2021-03-23	Allow FSETXATTR/FGETXATTR host calls for Verity	Chong Cai
	These host calls are needed for Verity fs to generate/verify hashes. PiperOrigin-RevId: 364598180
2021-03-08	Internal change.	Chong Cai
	PiperOrigin-RevId: 361689477
2021-03-06	[op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.	Ayush Ranjan
	The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in some places because the following don't seem to have an equivalent in unix package: - syscall.SysProcIDMap - syscall.Credential Updates #214 PiperOrigin-RevId: 361381490
2021-02-24	runsc/filters: permit clock_nanosleep for race	Andrei Vagin
	Syzkaller hosts contains many audit messages that runsc tries to call the clock_nanosleep syscall. PiperOrigin-RevId: 359331413
2021-02-11	Allow rt_sigaction in gofer seccomp	Fabricio Voznika
	rt_sigaction may be called by Go runtime when trying to panic: https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;drc=ed3e4afa12d655a0c5606bcf3dd4e1cdadcb1476;bpv=1;bpt=1;l=780?q=rt_sigaction&ss=go Updates #5038 PiperOrigin-RevId: 357013186
2021-02-04	Move getcpu() to core filter list	Michael Pratt
	Some versions of the Go runtime call getcpu(), so add it for compatibility. The hostcpu package already uses getcpu() on arm64. PiperOrigin-RevId: 355717757
2021-01-22	Remove dependency to abi/linux	Fabricio Voznika
	abi package is to be used by the Sentry to implement the Linux ABI. Code dealing with the host should use x/sys/unix. PiperOrigin-RevId: 353272679
2021-01-21	Fix ownership change logic	Fabricio Voznika
	Previously fsgofer was skipping chown call if the uid and gid were the same as the current user/group. However, when setgid is set, the group may not be the same as the caller. Instead, compare the actual uid/gid of the file after it has been created and change ownership only if needed. Updates #180 PiperOrigin-RevId: 353118733
2021-01-12	Fix simple mistakes identified by goreportcard.	Adin Scannell
	These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2020-12-15	fsgofer optimizations	Fabricio Voznika
	- Skip chown call in case owner change is not needed - Skip filepath.Clean() calls when joining paths - Pass unix.Stat_t by value to reduce runtime.duffcopy calls. This change allows for better inlining in localFile.walk(). Change Baseline Improvement BenchmarkWalkOne-6 2912 ns/op 3082 ns/op 5.5% BenchmarkCreate-6 15915 ns/op 19126 ns/op 16.8% BenchmarkCreateDiffOwner-6 18795 ns/op 19741 ns/op 4.8% PiperOrigin-RevId: 347667833
2020-09-25	Add openat() to list of permitted syscalls in gotsan runs.	Bhasker Hariharan
	PiperOrigin-RevId: 333853498
2020-09-22	Allow CLONE_SETTLS for Go 1.16	Michael Pratt
	https://go.googlesource.com/go/+/0941fc3 switches the Go runtime (on amd64) from using arch_prctl(ARCH_SET_FS) to CLONE_SETTLS to set the TLS. PiperOrigin-RevId: 333100550
2020-09-22	Force clone parent_tidptr and child_tidptr to zero	Michael Pratt
	Neither CLONE_PARENT_SETTID nor CLONE_CHILD_SETTID are used, so these arguments will always be NULL. PiperOrigin-RevId: 333085326
2020-09-18	Drop ARCH_GET_FS	Michael Pratt
	Go does not call arch_prctl(ARCH_GET_FS), nor am I sure it ever did. Drop the filter. PiperOrigin-RevId: 332470532
2020-09-17	Remove option to panic gofer	Fabricio Voznika
	Gofer panics are suppressed by p9 server and an error is returned to the caller, making it effectively the same as returning EROFS. PiperOrigin-RevId: 332282959
2020-09-15	Add support for OCI seccomp filters in the sandbox.	Ian Lewis
	OCI configuration includes support for specifying seccomp filters. In runc, these filter configurations are converted into seccomp BPF programs and loaded into the kernel via libseccomp. runsc needs to be a static binary so, for runsc, we cannot rely on a C library and need to implement the functionality in Go. The generator added here implements basic support for taking OCI seccomp configuration and converting it into a seccomp BPF program with the same behavior as a program generated by libseccomp. - New conditional operations were added to pkg/seccomp to support operations available in OCI. - AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect that syscalls matching the conditionals result in the provided action not simply SCMP_RET_ALLOW. - BuildProgram in pkg/seccomp no longer panics if provided an empty list of rules. It now builds a program with the architecture sanity check only. - ProgramBuilder now allows adding labels that are unused. However, backwards jumps are still not permitted. Fixes #510 PiperOrigin-RevId: 331938697
2020-09-01	Implement setattr+clunk in 9P	Fabricio Voznika
	This is to cover the common pattern: open->read/write->close, where SetAttr needs to be called to update atime/mtime before the file is closed. Benchmark results: BM_OpenReadClose/10240 CPU setattr+clunk: 63783 ns VFS2: 68109 ns VFS1: 72507 ns Updates #1198 PiperOrigin-RevId: 329628461
2020-08-19	Move boot.Config to its own package	Fabricio Voznika
	Updates #3494 PiperOrigin-RevId: 327548511
2020-08-19	Remove path walk from localFile.Mknod	Fabricio Voznika
	Replace mknod call with mknodat equivalent to protect against symlink attacks. Also added Mknod tests. Remove goferfs reliance on gofer to check for file existence before creating a synthetic entry. Updates #2923 PiperOrigin-RevId: 327544516
2020-08-18	Return EROFS if mount is read-only	Fabricio Voznika
	PiperOrigin-RevId: 327300635
2020-07-30	Call lseek(0, SEEK_CUR) unconditionally in runsc fsgofer's Readdir(offset=0).	Jamie Liu
	9P2000.L is silent as to how readdir RPCs interact with directory mutation. The most performant option is for Treaddir with offset=0 to restart iteration, avoiding needing to walk+open+clunk a new directory fid between invocations of getdents64(2), and the VFS2 gofer client assumes this is the case. Make this actually true for the runsc fsgofer. Fixes #3344, #3345, #3355 PiperOrigin-RevId: 324090384
2020-07-25	test/syscall: run each test case in a separate network namespace	Andrei Vagin
	... when it is possible. The guitar gVisorKernel*Workflow-s runs test with the local execution_method. In this case, blaze runs test cases locally without sandboxes. This means that all tests run in the same network namespace. We have a few tests which use hard-coded network ports and they can fail if one of these port will be used by someone else or by another test cases. PiperOrigin-RevId: 323137254
2020-07-24	Reduce walk and open cost in fsgofer	Fabricio Voznika
	Implement WalkGetAttr() to reuse the stat that is already needed for Walk(). In addition, cache file QID, so it doesn't need to stat the file to compute it. open(2) time improved by 10%: Baseline: 6780 ns Change: 6083 ns Also fixed file type which was not being set in all places. PiperOrigin-RevId: 323102560
2020-07-23	Fix fsgofer Open() when control file is using O_PATH	Fabricio Voznika
	Open tries to reuse the control file to save syscalls and file descriptors when opening a file. However, when the control file was opened using O_PATH (e.g. no file permission to open readonly), Open() would not check for it. PiperOrigin-RevId: 322821729
2020-07-07	Fix mknod and inotify syscall test	Ayush Ranjan
	This change fixes a few things: - creating sockets using mknod(2) is supported via vfs2 - fsgofer can create regular files via mknod(2) - mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well Updates #2923 PiperOrigin-RevId: 320074267
2020-06-16	Print spec as json when --debug is enabled	Fabricio Voznika
	The previous format skipped many important structs that are pointers, especially for cgroups. Change to print as json, removing parts of the spec that are not relevant. Also removed debug message from gofer that can be very noisy when directories are large. PiperOrigin-RevId: 316713267
2020-05-28	Move Cleanup to its own package	Fabricio Voznika
	PiperOrigin-RevId: 313663382
2020-04-07	Remove TODOs for local gofer extended attributes.	Dean Deng
	PiperOrigin-RevId: 305344989
2020-02-07	Support listxattr and removexattr syscalls.	Dean Deng
	Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-04	Allow mlock in fsgofer system call filters	Fabricio Voznika
	Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 293212935
2020-01-27	Standardize on tools directory.	Adin Scannell
	PiperOrigin-RevId: 291745021
2020-01-16	Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.	Dean Deng
	There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300
2020-01-09	New sync package.	Ian Gudger
	* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2019-12-11	Finish incomplete comment.	Dean Deng
	PiperOrigin-RevId: 285012278
2019-11-23	gofer: reduce CPU usage on GC as of frequent readdir	Jianfeng Tan
	Refer to golang mallocgc(), each time of allocating an object > 32 KB, a gc will be triggered. When we do readdir, sentry always passes 65535, which leads to a malloc of 65535 * sizeof(p9.Direnta) > 32 KB. Considering we already use slice append, let's avoid defining the capability for this slide. Command for test: Before this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m54.272s user 0m2.010s sys 0m1.740s (CPU usage of Gofer: ~30 cores) (host)$ perf top -p <pid-of-gofer> 42.57% runsc [.] runtime.gcDrain 23.41% runsc [.] runtime.(lfstack).pop 9.74% runsc [.] runtime.greyobject 8.06% runsc [.] runtime.(lfstack).push 4.33% runsc [.] runtime.scanobject 1.69% runsc [.] runtime.findObject 1.12% runsc [.] runtime.findrunnable 0.69% runsc [.] runtime.runqgrab ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 2m10.934s user 0m0.280s sys 0m4.260s (CPU usage of Gofer: ~1 core) After this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m22.465s user 0m1.270s sys 0m1.310s (CPU usage of Gofer: ~1 core) $ perf top -p <pid-of-gofer> 20.57% runsc [.] runtime.gcDrain 7.15% runsc [.] runtime.(lfstack).pop 4.11% runsc [.] runtime.scanobject 3.78% runsc [.] runtime.greyobject 2.78% runsc [.] runtime.(lfstack).push ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 0m13.338s user 0m0.190s sys 0m3.980s (CPU usage of Gofer: ~0.8 core) Fixes #898 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-11-06	Add p9.OpenTruncate.	Jamie Liu
	This is required to implement O_TRUNC correctly on filesystems backed by gofers. 9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags bits, e.g. O_RDONLY, O_RDWR, O_WRONLY." open(2): "The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation flags and file status flags can be bitwise-or'd in flags." The reference 9P2000.L implementation also appears to expect arbitrary flags, not just access modes, in Tlopen.flags: https://github.com/chaos/diod/blob/master/diod/ops.c#L703 PiperOrigin-RevId: 278972683
2019-10-30	Enable runsc/fsgofer support on arm64.	Haibo Xu
	newfstatat() syscall is not supported on arm64, so we resort to use the fstatat() syscall. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
2019-10-28	Cast the Stat_t.Nlink to uint64 on arm64.	Haibo Xu
	Since the syscall.Stat_t.Nlink is defined as different types on amd64 and arm64(uint64 and uint32 respectively), we need to cast them to a unified uint64 type in gVisor code. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-18	Cleanup host UDS support	Michael Pratt
	This change fixes several issues with the fsgofer host UDS support. Notably, it adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes unsafe use of unet.Socket, which could cause a panic if Socket.FD is called when err != nil, and calls to Socket.FD with nothing to prevent the garbage collector from destroying and closing the socket. A set of tests is added to exercise host UDS access. This required extracting most of the syscall test runner into a library that can be used by custom tests. Updates #235 Updates #1003 [1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can only reply to a client that binds first. We don't allow bind, so these are unlikely to be used. PiperOrigin-RevId: 275558502
2019-10-15	Make Attach no longer a special snowflake	Michael Pratt
	fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW file or connect a socket rather than creating a standard control file like localFile.Walk. This is unecessary and error-prone, as the attach point still has to go through Open or Connect which will properly convert the control file to something usable. As such, switch the logic to be equivalent to a simple Walk. Updates #235 PiperOrigin-RevId: 274827872
2019-10-10	Allow rt_sigreturn in runsc gofer	Michael Pratt
	rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer dereference). Before this, nil-pointer dereferences cause a syscall violation instead of a panic. PiperOrigin-RevId: 274028767
2019-09-26	Disallow opening of sockets if --fsgofer-host-uds=false	Fabricio Voznika
	Updates #235 PiperOrigin-RevId: 271475319
2019-09-25	Merge pull request #765 from trailofbits:uds_support	gVisor bot
	PiperOrigin-RevId: 271235134