gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-07-22	runsc: Wait child processes without timeouts	Andrei Vagin
	* First, we don't need to poll child processes. * Second, the 5 seconds timeout is too small if a host is overloaded. * Third, this can hide bugs in the code when we wait a process that isn't going to exit. PiperOrigin-RevId: 386337586
2021-07-20	Don't kill container when volume is unmounted	Fabricio Voznika
	The gofer session is killed when a gofer backed volume is unmounted. The gofer monitor catches the disconnect and kills the container. This changes the gofer monitor to only care about the rootfs connections, which cannot be unmounted. Fixes #6259 PiperOrigin-RevId: 385929039
2021-07-20	Add go:build directives as required by Go 1.17's gofmt.	Jamie Liu
	PiperOrigin-RevId: 385894869
2021-07-13	Replace whitelist with allowlist	Fabricio Voznika
	PiperOrigin-RevId: 384586164
2021-07-13	Use consistent naming for subcontainers	Fabricio Voznika
	It was confusing to find functions relating to root and non-root containers. Replace "non-root" and "subcontainer" and make naming consistent in Sandbox and controller. PiperOrigin-RevId: 384512518
2021-07-12	Fix stdios ownership	Fabricio Voznika
	Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12	Fix GoLand analyzer errors under runsc/...	Fabricio Voznika
	PiperOrigin-RevId: 384344990
2021-07-08	clarify safemount behavior	Kevin Krakauer
	PiperOrigin-RevId: 383750666
2021-07-08	Replace kernel.ExitStatus with linux.WaitStatus.	Jamie Liu
	PiperOrigin-RevId: 383705129
2021-07-02	runsc: validate mount targets	Kevin Krakauer
	PiperOrigin-RevId: 382845950
2021-07-01	Mix checklocks and atomic analyzers.	Adin Scannell
	This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01	[syserror] Update several syserror errors to linuxerr equivalents.	Zach Koopmans
	Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-29	Add SIOCGIFFLAGS ioctl support to hostinet.	Lucas Manning
	PiperOrigin-RevId: 382194711
2021-06-28	Exit early with error message on checkpoint/pause w/ hostinet.	Ian Lewis
	PiperOrigin-RevId: 381964660
2021-06-25	Merge pull request #6222 from avagin:stop	gVisor bot
	PiperOrigin-RevId: 381561785
2021-06-22	[syserror] Add conversions to linuxerr with temporary Equals method.	Zach Koopmans
	Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-22	runsc: don't kill sandbox, let it stop properly	Andrei Vagin
	The typical sequence of calls to start a container looks like this ct, err := container.New(conf, containerArgs) defer ct.Destroy() ct.Start(conf) ws, err := ct.Wait() For the root container, ct.Destroy() kills the sandbox process. This doesn't look like a right wait to stop it. For example, all ongoing rpc calls are aborted in this case. If everything is going alright, we can just wait and it will exit itself. Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-06-17	Move tcpip.Clock impl to Timekeeper	Tamir Duberstein
	...and pass it explicitly. This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed. PiperOrigin-RevId: 380039167
2021-06-10	Set RLimits during `runsc exec`	Fabricio Voznika
	PiperOrigin-RevId: 378726430
2021-06-10	Parse mmap protection and flags in strace	Fabricio Voznika
	PiperOrigin-RevId: 378712518
2021-06-10	[op] Move SignalInfo to abi/linux package.	Ayush Ranjan
	Fixes #214 PiperOrigin-RevId: 378680466
2021-06-10	remove the erroneous (5th) filter argument to sendmmsg.	gVisor bot
	PiperOrigin-RevId: 378677167
2021-06-09	Remove --overlayfs-stale-read flag	Fabricio Voznika
	It defaults to true and setting it to false can cause filesytem corruption. PiperOrigin-RevId: 378518663
2021-06-03	Add additional mmap seccomp rule	Fabricio Voznika
	HostFileMapper.RegenerateMappings calls mmap with MAP_SHARED\|MAP_FIXED and these were not allowed. Closes #6116 PiperOrigin-RevId: 377428463
2021-06-03	Initialize metrics at init	Tamir Duberstein
	Avoids a race condition at kernel initialization. Updates #6057. PiperOrigin-RevId: 377357723
2021-05-31	Update comments on ambient caps to point to bug	Ian Lewis
	PiperOrigin-RevId: 376747671
2021-05-26	Use the stack RNG everywhere	Tamir Duberstein
	...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666
2021-05-25	Initialize Kernel.Timekeeper before network NS	Tamir Duberstein
	PiperOrigin-RevId: 375843579
2021-05-25	Use specific fmt verbs (avoid %v)	Tamir Duberstein
	Remove useless conversions. Avoid unhandled errors. PiperOrigin-RevId: 375834275
2021-05-20	Suppress log message when there is no error	Fabricio Voznika
	PiperOrigin-RevId: 374981100
2021-05-14	Resolve remaining O_PATH TODOs.	Dean Deng
	O_PATH is now implemented in vfs2. Fixes #2782. PiperOrigin-RevId: 373861410
2021-05-13	Merge pull request #5983 from btw616:fix/issue-5982	gVisor bot
	PiperOrigin-RevId: 373661350
2021-05-13	Fix problem with grouped cgroups	Fabricio Voznika
	cgroup controllers can be grouped together (e.g. cpu,cpuacct) and that was confusing Cgroup.Install() into thinking that a cgroup directory was created by the caller, when it had being created by another controller that is grouped together. PiperOrigin-RevId: 373661336
2021-05-13	Fix file descriptor leak in MultiGetAttr	Tiwei Bie
	We need to make sure that all children are closed before return. But the last child saved in parent isn't closed after we successfully iterate all the files in "names". This patch fixes this issue. Fixes #5982 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2021-05-10	Merge pull request #5764 from zhlhahaha:2126-2	gVisor bot
	PiperOrigin-RevId: 372993341
2021-05-07	Merge pull request #5758 from zhlhahaha:2125	gVisor bot
	PiperOrigin-RevId: 372608247
2021-05-07	Init all vCPU when initializing machine on ARM64	howard zhang
	This patch is to solve problem that vCPU timer mess up when adding vCPU dynamically on ARM64, for detailed information please refer to: https://github.com/google/gvisor/issues/5739 There is no influence on x86 and here are main changes for ARM64: 1. create maxVCPUs number of vCPU in machine initialization 2. we want to sync gvisor vCPU number with host CPU number, so use smaller number between runtime.NumCPU and KVM_CAP_MAX_VCPUS to be maxVCPUS 3. put unused vCPUs into architecture-specific map initialvCPUs 4. When machine need to bind a new vCPU with tid, rather than creating new one, it would pick a vCPU from map initalvCPUs 5. change the setSystemTime function. When vCPU number increasing, the time cost for function setTSC(use syscall to set cntvoff) is liner growth from around 300 ns to 100000 ns, and this leads to the function setSystemTimeLegacy can not get correct offset value. 6. initializing StdioFDs and goferFD before a platform to avoid StdioFDs confects with vCPU fds Signed-off-by: howard zhang <howard.zhang@arm.com>
2021-05-05	Fixes to runsc cgroups	Fabricio Voznika
	When loading cgroups for another process, `/proc/self` was used in a few places, causing the end state to be a mix of the process and self. This is now fixes to always use the proper `/proc/[pid]` path. Added net_prio and net_cls to the list of optional controllers. This is to allow runsc to execute then these cgroups are disabled as long as there are no net_prio and net_cls limits that need to be applied. Deflake TestMultiContainerEvent. Closes #5875 Closes #5887 PiperOrigin-RevId: 372242687
2021-05-04	Remove uses of the binary package from the rest of the sentry.	Rahat Mahmood
	PiperOrigin-RevId: 372020696
2021-05-04	Make Mount.Type optional for bind mounts	Fabricio Voznika
	According to the OCI spec Mount.Type is an optional field and it defaults to "bind" when any of "bind" or "rbind" is included in Mount.Options. Also fix the shim to remove bind/rbind from options when mount is converted from bind to tmpfs inside the Sentry. Fixes #2330 Fixes #3274 PiperOrigin-RevId: 371996891
2021-04-28	Automated rollback of changelist 369686285	Fabricio Voznika
	PiperOrigin-RevId: 371015541
2021-04-22	Add weirdness sentry metric.	Nayana Bidari
	Weirdness metric contains fields to track the number of clock fallback, partial result and vsyscalls. This metric will avoid the overhead of having three different metrics (fallbackMetric, partialResultMetric, vsyscallCount). PiperOrigin-RevId: 369970218
2021-04-21	Automated rollback of changelist 369325957	Michael Pratt
	PiperOrigin-RevId: 369686285
2021-04-20	Clean test tags.	Adin Scannell
	PiperOrigin-RevId: 369505182
2021-04-19	Move runsc reference leak checking to better locations.	Dean Deng
	In the previous spot, there was a roughly 50% chance that leak checking would actually run. Move it to the waitContainer() call on the root container, where it is guaranteed to run before the sandbox process is terminated. Add it to runsc/cli/main.go as well for good measure, in case the sandbox exit path does not involve waitContainer(). PiperOrigin-RevId: 369329796
2021-04-19	Add MultiGetAttr message to 9P	Fabricio Voznika
	While using remote-validation, the vast majority of time spent during FS operations is re-walking the path to check for modifications and then closing the file given that in most cases it has not been modified externally. This change introduces a new 9P message called MultiGetAttr which bulks query attributes of several files in one shot. The returned attributes are then used to update cached dentries before they are walked. File attributes are updated for files that still exist. Dentries that have been deleted are removed from the cache. And negative cache entries are removed if a new file/directory was created externally. Similarly, synthetic dentries are replaced if a file/directory is created externally. The bulk update needs to be carefull not to follow symlinks, cross mount points, because the gofer doesn't know how to resolve symlinks and where mounts points are located. It also doesn't walk to the parent ("..") to avoid deadlocks. Here are the results: Workload VFS1 VFS2 Change bazel action 115s 70s 28.8s Stat/100 11,043us 7,623us 974us Updates #1638 PiperOrigin-RevId: 369325957
2021-04-16	Allow runsc to generate coverage reports.	Dean Deng
	Add a coverage-report flag that will cause the sandbox to generate a coverage report (with suffix .cov) in the debug log directory upon exiting. For the report to be generated, runsc must have been built with the following Bazel flags: `--collect_code_coverage --instrumentation_filter=...`. With coverage reports, we should be able to aggregate results across all tests to surface code coverage statistics for the project as a whole. The report is simply a text file with each line representing a covered block as `file:start_line.start_col,end_line.end_col`. Note that this is similar to the format of coverage reports generated with `go test -coverprofile`, although we omit the count and number of statements, which are not useful for us. Some simple ways of getting coverage reports: bazel test <some_test> --collect_code_coverage \ --instrumentation_filter=//pkg/... bazel build //runsc --collect_code_coverage \ --instrumentation_filter=//pkg/... runsc -coverage-report=dir/ <other_flags> do ... PiperOrigin-RevId: 368952911
2021-04-16	Internal change	Zach Koopmans
	PiperOrigin-RevId: 368919504
2021-04-08	Clarify platform errors.	Adin Scannell
	PiperOrigin-RevId: 367446222
2021-04-07	Add internal staging tags to //runsc and //shim binaries.	Adin Scannell
	PiperOrigin-RevId: 367328273