gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2018-11-05	Fix race between start and destroy	Fabricio Voznika
	Before this change, a container starting up could race with destroy (aka delete) and leave processes behind. Now, whenever a container is created, Loader.processes gets a new entry. Start now expects the entry to be there, and if it's not it means that the container was deleted. I've also fixed Loader.waitPID to search for the process using the init process's PID namespace. We could use a few more tests for signal and wait. I'll send them in another cl. PiperOrigin-RevId: 220224290 Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
2018-11-01	Make error messages a bit more user friendly.	Ian Lewis
	Updated error messages so that it doesn't print full Go struct representations when running a new container in a sandbox. For example, this occurs frequently when commands are not found when doing a 'kubectl exec'. PiperOrigin-RevId: 219729141 Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
2018-10-23	Track paths and provide a rename hook.	Adin Scannell
	This change also adds extensive testing to the p9 package via mocks. The sanity checks and type checks are moved from the gofer into the core package, where they can be more easily validated. PiperOrigin-RevId: 218296768 Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-21	Updated cleanup code to be more explicit about ignoring errors.	Ian Lewis
	Errors are shown as being ignored by assigning to the blank identifier. PiperOrigin-RevId: 218103819 Change-Id: I7cc7b9d8ac503a03de5504ebdeb99ed30a531cf2
2018-10-19	Use correct company name in copyright header	Ian Gudger
	PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-18	Resolve mount paths while setting up root fs mount	Fabricio Voznika
	It's hard to resolve symlinks inside the sandbox because rootfs and mounts may be read-only, forcing us to create mount points inside lower layer of an overlay, before the volumes are mounted. Since the destination must already be resolved outside the sandbox when creating mounts, take this opportunity to rewrite the spec with paths resolved. "runsc boot" will use the "resolved" spec to load mounts. In addition, symlink traversals were disabled while mounting containers inside the sandbox. It haven't been able to write a good test for it. So I'm relying on manual tests for now. PiperOrigin-RevId: 217749904 Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-17	runsc: Support job control signals for the root container.	Nicolas Lacasse
	Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-16	Bump sandbox start and stop timeouts.	Nicolas Lacasse
	PiperOrigin-RevId: 217433699 Change-Id: Icef08285728c23ee7dd650706aaf18da51c25dff
2018-10-15	Never send boot process stdio to application stdio.	Nicolas Lacasse
	We treat handle the boot process stdio separately from the application stdio (which gets passed via flags), but we were still sending both to same place. As a result, some logs that are written directly to os.Stderr by the boot process were ending up in the application logs. This CL starts sendind boot process stdio to the null device (since we don't have any better options). The boot process is already configured to send all logs (and panics) to the log file, so we won't miss anything important. PiperOrigin-RevId: 217173020 Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794
2018-10-11	Make the gofer process enter namespaces	Fabricio Voznika
	This is done to further isolate the gofer from the host. PiperOrigin-RevId: 216790991 Change-Id: Ia265b77e4e50f815d08f743a05669f9d75ad7a6f
2018-10-11	Make Wait() return the sandbox exit status if the sandbox has exited.	Nicolas Lacasse
	It's possible for Start() and Wait() calls to race, if the sandboxed application is short-lived. If the application finishes before (or during) the Wait RPC, then Wait will fail. In practice this looks like "connection refused" or "EOF" errors when waiting for an RPC response. This race is especially bad in tests, where we often run "true" inside a sandbox. This CL does a best-effort fix, by returning the sandbox exit status as the container exit status. In most cases, these are the same. This fixes the remaining flakes in runsc/container:container_test. PiperOrigin-RevId: 216777793 Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
2018-10-11	Make debug log file name configurable	Fabricio Voznika
	This is a breaking change if you're using --debug-log-dir. The fix is to replace it with --debug-log and add a '/' at the end: --debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/ PiperOrigin-RevId: 216761212 Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
2018-10-11	Add bare bones unsupported syscall logging	Fabricio Voznika
	This change introduces a new flags to create/run called --user-log. Logs to this files are visible to users and are meant to help debugging problems with their images and containers. For now only unsupported syscalls are sent to this log, and only minimum support was added. We can build more infrastructure around it as needed. PiperOrigin-RevId: 216735977 Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-10	runsc: Pass controlling TTY by FD in the new process, not current process.	Nicolas Lacasse
	When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY in the new process, not the current process. The ioctl call is made after duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD. This fixes the "bad address" flakes in runsc/container:container_test, although some other flakes remain. PiperOrigin-RevId: 216594394 Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862
2018-10-10	Add sandbox to cgroup	Fabricio Voznika
	Sandbox creation uses the limits and reservations configured in the OCI spec and set cgroup options accordinly. Then it puts both the sandbox and gofer processes inside the cgroup. It also allows the cgroup to be pre-configured by the caller. If the cgroup already exists, sandbox and gofer processes will join the cgroup but it will not modify the cgroup with spec limits. PiperOrigin-RevId: 216538209 Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-03	Fix sandbox chroot	Fabricio Voznika
	Sandbox was setting chroot, but was not chaging the working dir. Added test to ensure this doesn't happen in the future. PiperOrigin-RevId: 215676270 Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
2018-10-03	runsc: Pass root container's stdio via FD.	Nicolas Lacasse
	We were previously using the sandbox process's stdio as the root container's stdio. This makes it difficult/impossible to distinguish output application output from sandbox output, such as panics, which are always written to stderr. Also close the console socket when we are done with it. PiperOrigin-RevId: 215585180 Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-10-01	runsc: Support job control signals in "exec -it".	Nicolas Lacasse
	Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01	Make multi-container the default mode for runsc	Fabricio Voznika
	And remove multicontainer option. PiperOrigin-RevId: 215236981 Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-09-28	Make runsc kill and delete more conformant to the "spec"	Fabricio Voznika
	PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28	Switch to root in userns when CAP_SYS_CHROOT is also missing	Fabricio Voznika
	Some tests check current capabilities and re-run the tests as root inside userns if required capabibilities are missing. It was checking for CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now. PiperOrigin-RevId: 214949226 Change-Id: Ic81363969fa76c04da408fae8ea7520653266312
2018-09-27	Implement 'runsc kill --all'	Fabricio Voznika
	In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27	Refactor 'runsc boot' to take container ID as argument	Fabricio Voznika
	This makes the flow slightly simpler (no need to call Loader.SetRootContainer). And this is required change to tag tasks with container ID inside the Sentry. PiperOrigin-RevId: 214795210 Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-20	Set Sandbox.Chroot so it gets cleaned up upon destruction	Fabricio Voznika
	I've made several attempts to create a test, but the lack of permission from the test user makes it nearly impossible to test anything useful. PiperOrigin-RevId: 213922174 Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
2018-09-19	runsc: Fix stdin/stdout/stderr in multi-container mode.	Kevin Krakauer
	The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
2018-09-19	Add container.Destroy urpc method.	Nicolas Lacasse
	This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-18	Handle children processes better in tests	Fabricio Voznika
	Reap children more systematically in container tests. Previously, container_test was taking ~5 mins to run because constainer.Destroy() would timeout waiting for the sandbox process to exit. Now the test running in less than a minute. Also made the contract around Container and Sandbox destroy clearer. PiperOrigin-RevId: 213527471 Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18	Automated rollback of changelist 213307171	Kevin Krakauer
	PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-17	runsc: Enable waiting on exited processes.	Kevin Krakauer
	This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17	runsc: Fix stdin/out/err in multi-container mode.	Kevin Krakauer
	Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-13	runsc: Support container signal/wait.	Lantao Liu
	This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-12	runsc: Add exec flag that specifies where to save the sandbox-internal pid.	Kevin Krakauer
	This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-11	platform: Pass device fd into platform constructor.	Nicolas Lacasse
	We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-10	runsc: Chmod all mounted files to 777 inside chroot.	Nicolas Lacasse
	Inside the chroot, we run as user nobody, so all mounted files and directories must be accessible to all users. PiperOrigin-RevId: 212284805 Change-Id: I705e0dbbf15e01e04e0c7f378a99daffe6866807
2018-09-07	runsc: Support multi-container exec.	Nicolas Lacasse
	We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07	Remove '--file-access=direct' option	Fabricio Voznika
	It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07	runsc: Run sandbox process inside minimal chroot.	Nicolas Lacasse
	We construct a dir with the executable bind-mounted at /exe, and proc mounted at /proc. Runsc now executes the sandbox process inside this chroot, thus limiting access to the host filesystem. The mounts and chroot dir are removed when the sandbox is destroyed. Because this requires bind-mounts, we can only do the chroot if we have CAP_SYS_ADMIN. PiperOrigin-RevId: 211994001 Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-06	Enable network for multi-container	Fabricio Voznika
	PiperOrigin-RevId: 211834411 Change-Id: I52311a6c5407f984e5069359d9444027084e4d2a
2018-09-04	runsc: Run sandbox as user nobody.	Nicolas Lacasse
	When starting a sandbox without direct file or network access, we create an empty user namespace and run the sandbox in there. However, the root user in that namespace is still mapped to the root user in the parent namespace. This CL maps the "nobody" user from the parent namespace into the child namespace, and runs the sandbox process as user "nobody" inside the new namespace. PiperOrigin-RevId: 211572223 Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6
2018-09-04	runsc: Pass log and config files to sandbox process by FD.	Nicolas Lacasse
	This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. The specutils.ReadSpecFromFile method was fixed to always seek to the beginning of the file before reading. This allows Files from the same FD to be read multiple times, as we do in the boot command when the apply-caps flag is set. Tested with --network=host. PiperOrigin-RevId: 211570647 Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-08-31	Automated rollback of changelist 210995199	Fabricio Voznika
	PiperOrigin-RevId: 211116429 Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-30	runsc: Pass log and config files to sandbox process by FD.	Nicolas Lacasse
	This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. PiperOrigin-RevId: 210995199 Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-08-27	Put fsgofer inside chroot	Fabricio Voznika
	Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-24	runsc: Terminal support for "docker exec -ti".	Nicolas Lacasse
	This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24	Add option to panic gofer if writes are attempted over RO mounts	Fabricio Voznika
	This is used when '--overlay=true' to guarantee writes are not sent to gofer. PiperOrigin-RevId: 210116288 Change-Id: I7616008c4c0e8d3668e07a205207f46e2144bf30
2018-08-21	Initial change for multi-gofer support	Fabricio Voznika
	PiperOrigin-RevId: 209647293 Change-Id: I980fca1257ea3fcce796388a049c353b0303a8a5
2018-08-20	Standardize mounts in tests	Fabricio Voznika
	Tests get a readonly rootfs mapped to / (which was the case before) and writable TEST_TMPDIR. This makes it easier to setup containers to write to files and to share state between test and containers. PiperOrigin-RevId: 209453224 Change-Id: I4d988e45dc0909a0450a3bb882fe280cf9c24334
2018-08-15	runsc fsgofer: Support dynamic serving of filesystems.	Kevin Krakauer
	When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-14	runsc: Change cache policy for root fs and volume mounts.	Nicolas Lacasse
	Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-06	Tiny reordering to network code	Fabricio Voznika
	PiperOrigin-RevId: 207581723 Change-Id: I6e4eb1227b5ed302de5e6c891040b670955f1eea