summaryrefslogtreecommitdiffhomepage
path: root/runsc/container
AgeCommit message (Collapse)Author
2019-10-30Fix container lockingFabricio Voznika
Sandbox root dir was not being saved with the Container state, so it would point to the wrong directory location when attempting to lock the sandbox. This led to race conditions saving and loading container state. Fixing it, led to multiple deadlocks. I've moved the saving and locking logic to a separate struct and moved the lock file inside the RootDir (instead of container root dir), which allows the lock to be taken inside Destroy, and removes the need to lock the sandbox. PiperOrigin-RevId: 277599612
2019-10-24Fix early deletion of rootDirFabricio Voznika
container.startContainers() cannot be called twice in a test (e.g. TestMultiContainerLoadSandbox) because the cleanup function deletes the rootDir, together with information from all other containers that may exist. PiperOrigin-RevId: 276591806
2019-10-20Add runsc OCI annotations to support CRI-O.Tom Lanyon
Obligatory https://xkcd.com/927 Fixes #626
2019-10-16Fix problem with open FD when copy up is triggered in overlayfsFabricio Voznika
Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-08Ignore mount options that are not supported in shared mountsFabricio Voznika
Options that do not change mount behavior inside the Sentry are irrelevant and should not be used when looking for possible incompatibilities between master and slave mounts. PiperOrigin-RevId: 273593486
2019-10-01Prevent CAP_NET_RAW from appearing in execFabricio Voznika
'docker exec' was getting CAP_NET_RAW even when --net-raw=false because it was not filtered out from when copying container's capabilities. PiperOrigin-RevId: 272260451
2019-09-16Bring back to life features lost in recent refactorFabricio Voznika
- Sandbox logs are generated when running tests - Kokoro uploads the sandbox logs - Supports multiple parallel runs - Revive script to install locally built runsc with docker PiperOrigin-RevId: 269337274
2019-09-05Ignore the root container when calculating oom_score_adj for the sandbox.Ian Lewis
This is done because the root container for CRI is the infrastructure (pause) container and always gets a low oom_score_adj. We do this to ensure that only the oom_score_adj of user containers is used to calculated the sandbox oom_score_adj. Implemented in runsc rather than the containerd shim as it's a bit cleaner to implement here (in the shim it would require overwriting the oomScoreAdj and re-writing out the config.json again). This processing is Kubernetes(CRI) specific but we are currently only supporting CRI for multi-container support anyway. PiperOrigin-RevId: 267507706
2019-09-04Resolve flakes with TestMultiContainerDestroyFabricio Voznika
Some processes are reparented to the root container depending on the kill order and the root container would not reap in time. So some zombie processes were still present when the test checked. Fix it by running the second container inside a PID namespace. PiperOrigin-RevId: 267278591
2019-09-03Impose order on test scripts.Adin Scannell
The simple test script has gotten out of control. Shard this script into different pieces and attempt to impose order on overall test structure. This change helps lay some of the foundations for future improvements. * The runsc/test directories are moved into just test/. * The runsc/test/testutil package is split into logical pieces. * The scripts/ directory contains new top-level targets. * Each test is now responsible for building targets it requires. * The install functionality is moved into `runsc` itself for simplicity. * The existing kokoro run_tests.sh file now just calls all (can be split). After this change is merged, I will create multiple distinct workflows for Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today, which should dramatically reduce the time-to-run for the Kokoro tests, and provides a better foundation for further improvements to the infrastructure. PiperOrigin-RevId: 267081397
2019-08-27Mount volumes as super userFabricio Voznika
This used to be the case, but regressed after a recent change. Also made a few fixes around it and clean up the code a bit. Closes #720 PiperOrigin-RevId: 265717496
2019-08-07Set gofer's OOM score adjustmentFabricio Voznika
Updates #512 PiperOrigin-RevId: 262195448
2019-08-06Make loading container in a sandbox more robustFabricio Voznika
PiperOrigin-RevId: 262071646
2019-08-02Stops container if gofer is killedFabricio Voznika
Each gofer now has a goroutine that polls on the FDs used to communicate with the sandbox. The respective gofer is destroyed if any of the FDs is closed. Closes #601 PiperOrigin-RevId: 261383725
2019-08-01Set sandbox oom_score_adjIan Lewis
Set /proc/self/oom_score_adj based on oomScoreAdj specified in the OCI bundle. When new containers are added to the sandbox oom_score_adj for the sandbox and all other gofers are adjusted so that oom_score_adj is equal to the lowest oom_score_adj of all containers in the sandbox. Fixes #512 PiperOrigin-RevId: 261242725
2019-07-24Use different pidns among different containerschris.zn
The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-07-23Give each container a distinct MountNamespace.Nicolas Lacasse
This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346
2019-07-08Don't try to execute a file that is not regular.Nicolas Lacasse
PiperOrigin-RevId: 257037608
2019-07-03Avoid importing platforms from many source filesAndrei Vagin
PiperOrigin-RevId: 256494243
2019-06-27Fix various spelling issues in the documentationMichael Pratt
Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779
2019-06-18Kill sandbox process when 'runsc do' exitsFabricio Voznika
PiperOrigin-RevId: 253882115
2019-06-18Add Container/Sandbox args struct for creationFabricio Voznika
There were 3 string arguments that could be easily misplaced and it makes it easier to add new arguments, especially for Container that has dozens of callers. PiperOrigin-RevId: 253872074
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-12Allow 'runsc do' to run without rootFabricio Voznika
'--rootless' flag lets a non-root user execute 'runsc do'. The drawback is that the sandbox and gofer processes will run as root inside a user namespace that is mapped to the caller's user, intead of nobody. And network is defaulted to '--network=host' inside the root network namespace. On the bright side, it's very convenient for testing: runsc --rootless do ls runsc --rootless do curl www.google.com PiperOrigin-RevId: 252840970
2019-06-11Add support to mount pod shared tmpfs mountsFabricio Voznika
Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037
2019-06-03Remove 'clearStatus' option from container.Wait*PID()Fabricio Voznika
clearStatus was added to allow detached execution to wait on the exec'd process and retrieve its exit status. However, it's not currently used. Both docker and gvisor-containerd-shim wait on the "shim" process and retrieve the exit status from there. We could change gvisor-containerd-shim to use waits, but it will end up also consuming a process for the wait, which is similar to having the shim process. Closes #234 PiperOrigin-RevId: 251349490
2019-05-03Fix runsc restore to be compatible with docker start --checkpoint ...Andrei Vagin
Change-Id: I02b30de13f1393df66edf8829fedbf32405d18f8 PiperOrigin-RevId: 246621192
2019-05-02runsc: move test_app in a separate directoryAndrei Vagin
Opensource tools (e. g. https://github.com/fatih/vim-go) can't hanlde more than one golang package in one directory. PiperOrigin-RevId: 246435962 Change-Id: I67487915e3838762424b2d168efc54ae34fb801f
2019-05-02Add [simple] network support to 'runsc do'Fabricio Voznika
Sandbox always runsc with IP 192.168.10.2 and the peer network adds 1 to the address (192.168.10.3). Sandbox IP can be changed using --ip flag. Here a few examples: sudo runsc do curl www.google.com sudo runsc do --ip=10.10.10.2 bash -c "echo 123 | netcat -l -p 8080" PiperOrigin-RevId: 246421277 Change-Id: I7b3dce4af46a57300350dab41cb27e04e4b6e9da
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29Allow and document bug ids in gVisor codebase.Nicolas Lacasse
PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-23Fix container_test flakes.Kevin Krakauer
Create, Start, and Destroy were racing to create and destroy the metadata directory of containers. This is a re-upload of https://gvisor-review.googlesource.com/c/gvisor/+/16260, but with the correct account. Change-Id: I16b7a9d0971f0df873e7f4145e6ac8f72730a4f1 PiperOrigin-RevId: 244892991
2019-04-09runsc: set UID and GID if gofer is executed in a new user namespaceAndrei Vagin
Otherwise, we will not have capabilities in the user namespace. And this patch adds the noexec option for mounts. https://github.com/google/gvisor/issues/145 PiperOrigin-RevId: 242706519 Change-Id: I1b78b77d6969bd18038c71616e8eb7111b71207c
2019-03-29Set container.CreatedAt in Create().Nicolas Lacasse
PiperOrigin-RevId: 241056805 Change-Id: I13ea8f5dbfb01ca02a3b0ab887b8c3bdf4d556a6
2019-03-18Add support for mount propagationFabricio Voznika
Properly handle propagation options for root and mounts. Now usage of mount options shared, rshared, and noexec cause error to start. shared/ rshared breaks sandbox=>host isolation. slave however can be supported because changes propagate from host to sandbox. Root FS setup moved inside the gofer. Apart from simplifying the code, it keeps all mounts inside the namespace. And they are torn down when the namespace is destroyed (DestroyFS is no longer needed). PiperOrigin-RevId: 239037661 Change-Id: I8b5ee4d50da33c042ea34fa68e56514ebe20e6e0
2019-02-25Fix cgroup when path is relativeFabricio Voznika
This can happen when 'docker run --cgroup-parent=' flag is set. PiperOrigin-RevId: 235645559 Change-Id: Ieea3ae66939abadab621053551bf7d62d412e7ee
2019-01-31gvisor/gofer: Use pivot_root instead of chrootAndrei Vagin
PiperOrigin-RevId: 231864273 Change-Id: I8545b72b615f5c2945df374b801b80be64ec3e13
2019-01-31Remove license commentsMichael Pratt
Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-28runsc: Only uninstall cgroup for sandbox stop.Lantao Liu
PiperOrigin-RevId: 231263114 Change-Id: I57467a34fe94e395fdd3685462c4fe9776d040a3
2019-01-25Make cacheRemoteRevalidating detect changes to file sizeFabricio Voznika
When file size changes outside the sandbox, page cache was not refreshing file size which is required for cacheRemoteRevalidating. In fact, cacheRemoteRevalidating should be skipping the cache completely since it's not really benefiting from it. The cache is cache is already bypassed for unstable attributes (see cachePolicy.cacheUAttrs). And althought the cache is called to map pages, they will always miss the cache and map directly from the host. Created a HostMappable struct that maps directly to the host and use it for files with cacheRemoteRevalidating. Closes #124 PiperOrigin-RevId: 230998440 Change-Id: Ic5f632eabe33b47241e05e98c95e9b2090ae08fc
2019-01-25Fix a nil pointer dereference bug in Container.Destroy()ShiruRen
In Container.Destroy(), we call c.stop() before calling executeHooksBestEffort(), therefore, when we call executeHooksBestEffort(c.Spec.Hooks.Poststop, c.State()) to execute the poststop hook, it results in a nil pointer dereference since it reads c.Sandbox.Pid in c.State() after the sandbox has been destroyed. To fix this bug, we can change container's status to "stopped" before executing the poststop hook. Signed-off-by: ShiruRen <renshiru2000@gmail.com> Change-Id: I4d835e430066fab7e599e188f945291adfc521ef PiperOrigin-RevId: 230975505
2019-01-25Execute statically linked binaryFabricio Voznika
Mounting lib and lib64 are not necessary anymore and simplifies the test. PiperOrigin-RevId: 230971195 Change-Id: Ib91a3ffcec4b322cd3687c337eedbde9641685ed
2019-01-22Don't bind-mount runsc into a sandbox mntnsAndrei Vagin
PiperOrigin-RevId: 230437407 Change-Id: Id9d8ceeb018aad2fe317407c78c6ee0f4b47aa2b
2019-01-18Scrub runsc error messagesFabricio Voznika
Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-18Start a sandbox process in a new userns only if CAP_SETUID is setAndrei Vagin
In addition, it fixes a race condition in TestMultiContainerGoferStop. There are two scripts copy the same set of files into the same directory and sometime one of this command fails with EXIST. PiperOrigin-RevId: 230011247 Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c
2019-01-16Prevent internal tmpfs mount to override files in /tmpFabricio Voznika
Runsc wants to mount /tmp using internal tmpfs implementation for performance. However, it risks hiding files that may exist under /tmp in case it's present in the container. Now, it only mounts over /tmp iff: - /tmp was not explicitly asked to be mounted - /tmp is empty If any of this is not true, then /tmp maps to the container's image /tmp. Note: checkpoint doesn't have sentry FS mounted to check if /tmp is empty. It simply looks for explicit mounts right now. PiperOrigin-RevId: 229607856 Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
2019-01-15Create working directory if it doesn't yet existFabricio Voznika
PiperOrigin-RevId: 229438125 Change-Id: I58eb0d10178d1adfc709d7b859189d1acbcb2f22
2019-01-11runsc: Collect zombies of sandbox and gofer processesAndrei Vagin
And we need to wait a gofer process before cgroup.Uninstall, because it is running in the sandbox cgroups. PiperOrigin-RevId: 228904020 Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
2019-01-09Restore to original cgroup after sandbox and gofer processes are createdFabricio Voznika
The original code assumed that it was safe to join and not restore cgroup, but Container.Run will not exit after calling start, making cgroup cleanup fail because there were still processes inside the cgroup. PiperOrigin-RevId: 228529199 Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51
2018-12-13container.Destroy should clean up container metadata even if other cleanups failNicolas Lacasse
If the sandbox process is dead (because of a panic or some other problem), container.Destroy will never remove the container metadata file, since it will always fail when calling container.stop(). This CL changes container.Destroy() to always perform the three necessary cleanup operations: * Stop the sandbox and gofer processes. * Remove the container fs on the host. * Delete the container metadata directory. Errors from these three operations will be concatenated and returned from Destroy(). PiperOrigin-RevId: 225448164 Change-Id: I99c6311b2e4fe5f6e2ca991424edf1ebeae9df32