summaryrefslogtreecommitdiffhomepage
path: root/runsc
AgeCommit message (Collapse)Author
2021-07-22runsc: Wait child processes without timeoutsAndrei Vagin
* First, we don't need to poll child processes. * Second, the 5 seconds timeout is too small if a host is overloaded. * Third, this can hide bugs in the code when we wait a process that isn't going to exit. PiperOrigin-RevId: 386337586
2021-07-20Don't kill container when volume is unmountedFabricio Voznika
The gofer session is killed when a gofer backed volume is unmounted. The gofer monitor catches the disconnect and kills the container. This changes the gofer monitor to only care about the rootfs connections, which cannot be unmounted. Fixes #6259 PiperOrigin-RevId: 385929039
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-13Replace whitelist with allowlistFabricio Voznika
PiperOrigin-RevId: 384586164
2021-07-13Use consistent naming for subcontainersFabricio Voznika
It was confusing to find functions relating to root and non-root containers. Replace "non-root" and "subcontainer" and make naming consistent in Sandbox and controller. PiperOrigin-RevId: 384512518
2021-07-12Fix stdios ownershipFabricio Voznika
Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12Fix GoLand analyzer errors under runsc/...Fabricio Voznika
PiperOrigin-RevId: 384344990
2021-07-08clarify safemount behaviorKevin Krakauer
PiperOrigin-RevId: 383750666
2021-07-08Replace kernel.ExitStatus with linux.WaitStatus.Jamie Liu
PiperOrigin-RevId: 383705129
2021-07-02runsc: validate mount targetsKevin Krakauer
PiperOrigin-RevId: 382845950
2021-07-01Mix checklocks and atomic analyzers.Adin Scannell
This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01[syserror] Update several syserror errors to linuxerr equivalents.Zach Koopmans
Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-29Add SIOCGIFFLAGS ioctl support to hostinet.Lucas Manning
PiperOrigin-RevId: 382194711
2021-06-28Exit early with error message on checkpoint/pause w/ hostinet.Ian Lewis
PiperOrigin-RevId: 381964660
2021-06-25Merge pull request #6222 from avagin:stopgVisor bot
PiperOrigin-RevId: 381561785
2021-06-22[syserror] Add conversions to linuxerr with temporary Equals method.Zach Koopmans
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-22runsc: don't kill sandbox, let it stop properlyAndrei Vagin
The typical sequence of calls to start a container looks like this ct, err := container.New(conf, containerArgs) defer ct.Destroy() ct.Start(conf) ws, err := ct.Wait() For the root container, ct.Destroy() kills the sandbox process. This doesn't look like a right wait to stop it. For example, all ongoing rpc calls are aborted in this case. If everything is going alright, we can just wait and it will exit itself. Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-06-17Move tcpip.Clock impl to TimekeeperTamir Duberstein
...and pass it explicitly. This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed. PiperOrigin-RevId: 380039167
2021-06-10Set RLimits during `runsc exec`Fabricio Voznika
PiperOrigin-RevId: 378726430
2021-06-10Parse mmap protection and flags in straceFabricio Voznika
PiperOrigin-RevId: 378712518
2021-06-10[op] Move SignalInfo to abi/linux package.Ayush Ranjan
Fixes #214 PiperOrigin-RevId: 378680466
2021-06-10remove the erroneous (5th) filter argument to sendmmsg.gVisor bot
PiperOrigin-RevId: 378677167
2021-06-09Remove --overlayfs-stale-read flagFabricio Voznika
It defaults to true and setting it to false can cause filesytem corruption. PiperOrigin-RevId: 378518663
2021-06-03Add additional mmap seccomp ruleFabricio Voznika
HostFileMapper.RegenerateMappings calls mmap with MAP_SHARED|MAP_FIXED and these were not allowed. Closes #6116 PiperOrigin-RevId: 377428463
2021-06-03Initialize metrics at initTamir Duberstein
Avoids a race condition at kernel initialization. Updates #6057. PiperOrigin-RevId: 377357723
2021-05-31Update comments on ambient caps to point to bugIan Lewis
PiperOrigin-RevId: 376747671
2021-05-26Use the stack RNG everywhereTamir Duberstein
...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666
2021-05-25Initialize Kernel.Timekeeper before network NSTamir Duberstein
PiperOrigin-RevId: 375843579
2021-05-25Use specific fmt verbs (avoid %v)Tamir Duberstein
Remove useless conversions. Avoid unhandled errors. PiperOrigin-RevId: 375834275
2021-05-20Suppress log message when there is no errorFabricio Voznika
PiperOrigin-RevId: 374981100
2021-05-14Resolve remaining O_PATH TODOs.Dean Deng
O_PATH is now implemented in vfs2. Fixes #2782. PiperOrigin-RevId: 373861410
2021-05-13Merge pull request #5983 from btw616:fix/issue-5982gVisor bot
PiperOrigin-RevId: 373661350
2021-05-13Fix problem with grouped cgroupsFabricio Voznika
cgroup controllers can be grouped together (e.g. cpu,cpuacct) and that was confusing Cgroup.Install() into thinking that a cgroup directory was created by the caller, when it had being created by another controller that is grouped together. PiperOrigin-RevId: 373661336
2021-05-13Fix file descriptor leak in MultiGetAttrTiwei Bie
We need to make sure that all children are closed before return. But the last child saved in parent isn't closed after we successfully iterate all the files in "names". This patch fixes this issue. Fixes #5982 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2021-05-10Merge pull request #5764 from zhlhahaha:2126-2gVisor bot
PiperOrigin-RevId: 372993341
2021-05-07Merge pull request #5758 from zhlhahaha:2125gVisor bot
PiperOrigin-RevId: 372608247
2021-05-07Init all vCPU when initializing machine on ARM64howard zhang
This patch is to solve problem that vCPU timer mess up when adding vCPU dynamically on ARM64, for detailed information please refer to: https://github.com/google/gvisor/issues/5739 There is no influence on x86 and here are main changes for ARM64: 1. create maxVCPUs number of vCPU in machine initialization 2. we want to sync gvisor vCPU number with host CPU number, so use smaller number between runtime.NumCPU and KVM_CAP_MAX_VCPUS to be maxVCPUS 3. put unused vCPUs into architecture-specific map initialvCPUs 4. When machine need to bind a new vCPU with tid, rather than creating new one, it would pick a vCPU from map initalvCPUs 5. change the setSystemTime function. When vCPU number increasing, the time cost for function setTSC(use syscall to set cntvoff) is liner growth from around 300 ns to 100000 ns, and this leads to the function setSystemTimeLegacy can not get correct offset value. 6. initializing StdioFDs and goferFD before a platform to avoid StdioFDs confects with vCPU fds Signed-off-by: howard zhang <howard.zhang@arm.com>
2021-05-05Fixes to runsc cgroupsFabricio Voznika
When loading cgroups for another process, `/proc/self` was used in a few places, causing the end state to be a mix of the process and self. This is now fixes to always use the proper `/proc/[pid]` path. Added net_prio and net_cls to the list of optional controllers. This is to allow runsc to execute then these cgroups are disabled as long as there are no net_prio and net_cls limits that need to be applied. Deflake TestMultiContainerEvent. Closes #5875 Closes #5887 PiperOrigin-RevId: 372242687
2021-05-04Remove uses of the binary package from the rest of the sentry.Rahat Mahmood
PiperOrigin-RevId: 372020696
2021-05-04Make Mount.Type optional for bind mountsFabricio Voznika
According to the OCI spec Mount.Type is an optional field and it defaults to "bind" when any of "bind" or "rbind" is included in Mount.Options. Also fix the shim to remove bind/rbind from options when mount is converted from bind to tmpfs inside the Sentry. Fixes #2330 Fixes #3274 PiperOrigin-RevId: 371996891
2021-04-28Automated rollback of changelist 369686285Fabricio Voznika
PiperOrigin-RevId: 371015541
2021-04-22Add weirdness sentry metric.Nayana Bidari
Weirdness metric contains fields to track the number of clock fallback, partial result and vsyscalls. This metric will avoid the overhead of having three different metrics (fallbackMetric, partialResultMetric, vsyscallCount). PiperOrigin-RevId: 369970218
2021-04-21Automated rollback of changelist 369325957Michael Pratt
PiperOrigin-RevId: 369686285
2021-04-20Clean test tags.Adin Scannell
PiperOrigin-RevId: 369505182
2021-04-19Move runsc reference leak checking to better locations.Dean Deng
In the previous spot, there was a roughly 50% chance that leak checking would actually run. Move it to the waitContainer() call on the root container, where it is guaranteed to run before the sandbox process is terminated. Add it to runsc/cli/main.go as well for good measure, in case the sandbox exit path does not involve waitContainer(). PiperOrigin-RevId: 369329796
2021-04-19Add MultiGetAttr message to 9PFabricio Voznika
While using remote-validation, the vast majority of time spent during FS operations is re-walking the path to check for modifications and then closing the file given that in most cases it has not been modified externally. This change introduces a new 9P message called MultiGetAttr which bulks query attributes of several files in one shot. The returned attributes are then used to update cached dentries before they are walked. File attributes are updated for files that still exist. Dentries that have been deleted are removed from the cache. And negative cache entries are removed if a new file/directory was created externally. Similarly, synthetic dentries are replaced if a file/directory is created externally. The bulk update needs to be carefull not to follow symlinks, cross mount points, because the gofer doesn't know how to resolve symlinks and where mounts points are located. It also doesn't walk to the parent ("..") to avoid deadlocks. Here are the results: Workload VFS1 VFS2 Change bazel action 115s 70s 28.8s Stat/100 11,043us 7,623us 974us Updates #1638 PiperOrigin-RevId: 369325957
2021-04-16Allow runsc to generate coverage reports.Dean Deng
Add a coverage-report flag that will cause the sandbox to generate a coverage report (with suffix .cov) in the debug log directory upon exiting. For the report to be generated, runsc must have been built with the following Bazel flags: `--collect_code_coverage --instrumentation_filter=...`. With coverage reports, we should be able to aggregate results across all tests to surface code coverage statistics for the project as a whole. The report is simply a text file with each line representing a covered block as `file:start_line.start_col,end_line.end_col`. Note that this is similar to the format of coverage reports generated with `go test -coverprofile`, although we omit the count and number of statements, which are not useful for us. Some simple ways of getting coverage reports: bazel test <some_test> --collect_code_coverage \ --instrumentation_filter=//pkg/... bazel build //runsc --collect_code_coverage \ --instrumentation_filter=//pkg/... runsc -coverage-report=dir/ <other_flags> do ... PiperOrigin-RevId: 368952911
2021-04-16Internal changeZach Koopmans
PiperOrigin-RevId: 368919504
2021-04-08Clarify platform errors.Adin Scannell
PiperOrigin-RevId: 367446222
2021-04-07Add internal staging tags to //runsc and //shim binaries.Adin Scannell
PiperOrigin-RevId: 367328273