summaryrefslogtreecommitdiffhomepage
path: root/runsc/container
AgeCommit message (Collapse)Author
2020-05-13Enable overlayfs_stale_read by default for runsc.Jamie Liu
Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char **argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-04Enable TestRunNonRoot on VFS2Fabricio Voznika
Also added back the default test dimension back which was dropped in a previous refactor. PiperOrigin-RevId: 309797327
2020-05-04Add TTY support on VFS2 to runscFabricio Voznika
Updates #1623, #1487 PiperOrigin-RevId: 309777922
2020-04-29Merge pull request #2487 from moricho:fix/bindmountgVisor bot
PiperOrigin-RevId: 309082540
2020-04-28Merge pull request #2558 from prattmic:forward_signalgVisor bot
PiperOrigin-RevId: 308829800
2020-04-27container: use sighandling packageMichael Pratt
Use the sighandling package for Container.ForwardSignals, for consistency with other signal forwarding. Fixes #2546
2020-04-27Update container.gokevin.xu
typo, should be `start` in comments
2020-04-26refactor and add test for bindmountmoricho
Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-25Add container tests passing with VFS2Zach Koopmans
Several tests are passing after getting TestAppExitStatus (run /bin/true) changes. Make versions that run via VFS2 so that we know what is and isn't working. In addition, fix bug in VFSFile ReadFull. For the TestExePath test in container_test.go, the case "unmasked" will return 0 bytes read with no EOF err, causing the ReadFull call to spin. PiperOrigin-RevId: 308428126
2020-04-23Simplify Docker test infrastructure.Adin Scannell
This change adds a layer of abstraction around the internal Docker APIs, and eliminates all direct dependencies on Dockerfiles in the infrastructure. A subsequent change will automated the generation of local images (with efficient caching). Note that this change drops the use of bazel container rules, as that experiment does not seem to be viable. PiperOrigin-RevId: 308095430
2020-04-17Add test name to boot and gofer log filesFabricio Voznika
This is to make easier to find corresponding logs in case test fails. PiperOrigin-RevId: 307104283
2020-04-17Get /bin/true to run on VFS2Zach Koopmans
Included: - loader_test.go RunTest and TestStartSignal VFS2 - container_test.go TestAppExitStatus on VFS2 - experimental flag added to runsc to turn on VFS2 Note: shared mounts are not yet supported. PiperOrigin-RevId: 307070753
2020-04-08Fix all printf formatting errors.Adin Scannell
Updates #2243
2020-04-07Update TODO to #238Ian Lewis
Move TODO to #238 so that proper synchronization of operations is handled when we create the urpc client. Issue #238 Fixes #512 PiperOrigin-RevId: 305383924
2020-03-12Kill sandbox process when parent process terminatesFabricio Voznika
When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247
2020-03-05tests: Don't print log messages on stdoutAndrei Vagin
A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 299174138
2020-03-04tests: Don't print log messages on stdoutAndrei Vagin
A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 298966577
2020-02-27Log oom_score_adj value on errorFabricio Voznika
Updates #1873 PiperOrigin-RevId: 297695241
2020-02-25Add log during process wait in testsFabricio Voznika
TestMultiContainerKillAll timed out under --race. Without logging, we cannot tell if the process list is still increasing, but slowly, or is stuck. PiperOrigin-RevId: 297158834
2020-02-10Add flag package to limit visibility.Adin Scannell
PiperOrigin-RevId: 294297004
2020-02-06Fix TestPauseResume in container test failed with connection refused.Ting-Yu Wang
Sometimes we get this error under TSAN: """ error getting process data from container: connecting to control server at PID XXXX: connection refused """ The theory is that the top "sleep 20" was too short for TSAN, and the container already exited, so we get connected refused. This commit changes the test to let container signaling it's running by touching a file repeatedly forever during the test. PiperOrigin-RevId: 293710957
2020-02-05Add notes to relevant tests.Adin Scannell
These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915
2020-02-04Increase container_test size.Kevin Krakauer
container_test was flaking because a small percentage of runs timed out. Tested this fix with --runs_per_test=100. PiperOrigin-RevId: 293240102
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2019-12-18Increase waitForProcessList timeoutFabricio Voznika
It can take more than 10 seconds when running under --race. PiperOrigin-RevId: 286296060
2019-12-11runsc/debug: add an option to list all processesAndrei Vagin
runsc debug --ps list all processes with all threads. This option is added to the debug command but not to the ps command, because it is going to be used for debug purposes and we want to add any useful information without thinking about backward compatibility. This will help to investigate syzkaller issues. PiperOrigin-RevId: 285013668
2019-12-06Implement TTY field in control.Processes().Nicolas Lacasse
Threadgroups already know their TTY (if they have one), which now contains the TTY Index, and is returned in the Processes() call. PiperOrigin-RevId: 284263850
2019-12-06Make annotations OCI compliantFabricio Voznika
Changed annotation to follow the standard defined here: https://github.com/opencontainers/image-spec/blob/master/annotations.md PiperOrigin-RevId: 284254847
2019-10-30Fix container lockingFabricio Voznika
Sandbox root dir was not being saved with the Container state, so it would point to the wrong directory location when attempting to lock the sandbox. This led to race conditions saving and loading container state. Fixing it, led to multiple deadlocks. I've moved the saving and locking logic to a separate struct and moved the lock file inside the RootDir (instead of container root dir), which allows the lock to be taken inside Destroy, and removes the need to lock the sandbox. PiperOrigin-RevId: 277599612
2019-10-24Fix early deletion of rootDirFabricio Voznika
container.startContainers() cannot be called twice in a test (e.g. TestMultiContainerLoadSandbox) because the cleanup function deletes the rootDir, together with information from all other containers that may exist. PiperOrigin-RevId: 276591806
2019-10-20Add runsc OCI annotations to support CRI-O.Tom Lanyon
Obligatory https://xkcd.com/927 Fixes #626
2019-10-16Fix problem with open FD when copy up is triggered in overlayfsFabricio Voznika
Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-08Ignore mount options that are not supported in shared mountsFabricio Voznika
Options that do not change mount behavior inside the Sentry are irrelevant and should not be used when looking for possible incompatibilities between master and slave mounts. PiperOrigin-RevId: 273593486
2019-10-01Prevent CAP_NET_RAW from appearing in execFabricio Voznika
'docker exec' was getting CAP_NET_RAW even when --net-raw=false because it was not filtered out from when copying container's capabilities. PiperOrigin-RevId: 272260451
2019-09-16Bring back to life features lost in recent refactorFabricio Voznika
- Sandbox logs are generated when running tests - Kokoro uploads the sandbox logs - Supports multiple parallel runs - Revive script to install locally built runsc with docker PiperOrigin-RevId: 269337274
2019-09-05Ignore the root container when calculating oom_score_adj for the sandbox.Ian Lewis
This is done because the root container for CRI is the infrastructure (pause) container and always gets a low oom_score_adj. We do this to ensure that only the oom_score_adj of user containers is used to calculated the sandbox oom_score_adj. Implemented in runsc rather than the containerd shim as it's a bit cleaner to implement here (in the shim it would require overwriting the oomScoreAdj and re-writing out the config.json again). This processing is Kubernetes(CRI) specific but we are currently only supporting CRI for multi-container support anyway. PiperOrigin-RevId: 267507706
2019-09-04Resolve flakes with TestMultiContainerDestroyFabricio Voznika
Some processes are reparented to the root container depending on the kill order and the root container would not reap in time. So some zombie processes were still present when the test checked. Fix it by running the second container inside a PID namespace. PiperOrigin-RevId: 267278591
2019-09-03Impose order on test scripts.Adin Scannell
The simple test script has gotten out of control. Shard this script into different pieces and attempt to impose order on overall test structure. This change helps lay some of the foundations for future improvements. * The runsc/test directories are moved into just test/. * The runsc/test/testutil package is split into logical pieces. * The scripts/ directory contains new top-level targets. * Each test is now responsible for building targets it requires. * The install functionality is moved into `runsc` itself for simplicity. * The existing kokoro run_tests.sh file now just calls all (can be split). After this change is merged, I will create multiple distinct workflows for Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today, which should dramatically reduce the time-to-run for the Kokoro tests, and provides a better foundation for further improvements to the infrastructure. PiperOrigin-RevId: 267081397
2019-08-27Mount volumes as super userFabricio Voznika
This used to be the case, but regressed after a recent change. Also made a few fixes around it and clean up the code a bit. Closes #720 PiperOrigin-RevId: 265717496
2019-08-07Set gofer's OOM score adjustmentFabricio Voznika
Updates #512 PiperOrigin-RevId: 262195448
2019-08-06Make loading container in a sandbox more robustFabricio Voznika
PiperOrigin-RevId: 262071646
2019-08-02Stops container if gofer is killedFabricio Voznika
Each gofer now has a goroutine that polls on the FDs used to communicate with the sandbox. The respective gofer is destroyed if any of the FDs is closed. Closes #601 PiperOrigin-RevId: 261383725
2019-08-01Set sandbox oom_score_adjIan Lewis
Set /proc/self/oom_score_adj based on oomScoreAdj specified in the OCI bundle. When new containers are added to the sandbox oom_score_adj for the sandbox and all other gofers are adjusted so that oom_score_adj is equal to the lowest oom_score_adj of all containers in the sandbox. Fixes #512 PiperOrigin-RevId: 261242725
2019-07-24Use different pidns among different containerschris.zn
The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-07-23Give each container a distinct MountNamespace.Nicolas Lacasse
This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346
2019-07-08Don't try to execute a file that is not regular.Nicolas Lacasse
PiperOrigin-RevId: 257037608
2019-07-03Avoid importing platforms from many source filesAndrei Vagin
PiperOrigin-RevId: 256494243
2019-06-27Fix various spelling issues in the documentationMichael Pratt
Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779
2019-06-18Kill sandbox process when 'runsc do' exitsFabricio Voznika
PiperOrigin-RevId: 253882115