gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2019-01-18	Scrub runsc error messages	Fabricio Voznika
	Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-16	Prevent internal tmpfs mount to override files in /tmp	Fabricio Voznika
	Runsc wants to mount /tmp using internal tmpfs implementation for performance. However, it risks hiding files that may exist under /tmp in case it's present in the container. Now, it only mounts over /tmp iff: - /tmp was not explicitly asked to be mounted - /tmp is empty If any of this is not true, then /tmp maps to the container's image /tmp. Note: checkpoint doesn't have sentry FS mounted to check if /tmp is empty. It simply looks for explicit mounts right now. PiperOrigin-RevId: 229607856 Change-Id: I10b6dae7ac157ef578efc4dfceb089f3b94cde06
2019-01-14	Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.	Nicolas Lacasse
	More helper structs have been added to the fsutil package to make it easier to implement fs.InodeOperations and fs.FileOperations. PiperOrigin-RevId: 229305982 Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2018-12-04	Max link traversals should be for an entire path.	Brian Geffon
	The number of symbolic links that are allowed to be followed are for a full path and not just a chain of symbolic links. PiperOrigin-RevId: 224047321 Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
2018-11-20	Don't fail when destroyContainerFS is called more than once	Fabricio Voznika
	This can happen when destroy is called multiple times or when destroy failed previously and is being called again. PiperOrigin-RevId: 221882034 Change-Id: I8d069af19cf66c4e2419bdf0d4b789c5def8d19e
2018-11-07	AsyncBarrier should be run after all defers in destroyContainerFS.	Nicolas Lacasse
	destroyContainerFS must wait for all async operations to finish before returning. In an attempt to do this, we call fs.AsyncBarrier() at the end of the function. However, there are many defer'd DecRefs which end up running AFTER the AsyncBarrier() call. This CL fixes this by calling fs.AsyncBarrier() in the first defer statement, thus ensuring that it runs at the end of the function, after all other defers. PiperOrigin-RevId: 220523545 Change-Id: I5e96ee9ea6d86eeab788ff964484c50ef7f64a2f
2018-10-19	Use correct company name in copyright header	Ian Gudger
	PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-18	Resolve mount paths while setting up root fs mount	Fabricio Voznika
	It's hard to resolve symlinks inside the sandbox because rootfs and mounts may be read-only, forcing us to create mount points inside lower layer of an overlay, before the volumes are mounted. Since the destination must already be resolved outside the sandbox when creating mounts, take this opportunity to rewrite the spec with paths resolved. "runsc boot" will use the "resolved" spec to load mounts. In addition, symlink traversals were disabled while mounting containers inside the sandbox. It haven't been able to write a good test for it. So I'm relying on manual tests for now. PiperOrigin-RevId: 217749904 Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
2018-10-01	Make multi-container the default mode for runsc	Fabricio Voznika
	And remove multicontainer option. PiperOrigin-RevId: 215236981 Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-09-28	Make runsc kill and delete more conformant to the "spec"	Fabricio Voznika
	PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-24	runsc: All non-root bind mounts should be shared.	Nicolas Lacasse
	This CL changes the semantics of the "--file-access" flag so that it only affects the root filesystem. The default remains "exclusive" which is the common use case, as neither Docker nor K8s supports sharing the root. Keeping the root fs as "exclusive" means that the fs-intensive work done during application startup will mostly be cacheable, and thus faster. Non-root bind mounts will always be shared. This CL also removes some redundant FSAccessType validations. We validate this flag in main(), so we can assume it is valid afterwards. PiperOrigin-RevId: 214359936 Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51
2018-09-19	runsc: Fix stdin/stdout/stderr in multi-container mode.	Kevin Krakauer
	The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
2018-09-19	Add container.Destroy urpc method.	Nicolas Lacasse
	This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-19	Fix sandbox and gofer capabilities	Fabricio Voznika
	Capabilities.Set() adds capabilities, but doesn't remove existing ones that might have been loaded. Fixed the code and added tests. PiperOrigin-RevId: 213726369 Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba
2018-09-19	runsc: Don't create __runsc_containers__ unless we are in multi-container mode.	Nicolas Lacasse
	PiperOrigin-RevId: 213715511 Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12
2018-09-18	Automated rollback of changelist 213307171	Kevin Krakauer
	PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-17	runsc: Fix stdin/out/err in multi-container mode.	Kevin Krakauer
	Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-07	runsc: Support multi-container exec.	Nicolas Lacasse
	We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07	Remove '--file-access=direct' option	Fabricio Voznika
	It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-05	Enabled bind mounts in sub-containers	Fabricio Voznika
	With multi-gofers, bind mounts in sub-containers should just work. Removed restrictions and added test. There are also a few cleanups along the way, e.g. retry unmounting in case cleanup races with gofer teardown. PiperOrigin-RevId: 211699569 Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
2018-09-05	runsc: Promote getExecutablePathInternal to getExecutablePath.	Nicolas Lacasse
	Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-08-27	Put fsgofer inside chroot	Fabricio Voznika
	Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-24	runsc: Allow runsc to properly search the PATH for executable name.	Kevin Krakauer
	Previously, runsc improperly attempted to find an executable in the container's PATH. We now search the PATH via the container's fsgofer rather than the host FS, eliminating the confusing differences between paths on the host and within a container. PiperOrigin-RevId: 210159488 Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2
2018-08-15	runsc fsgofer: Support dynamic serving of filesystems.	Kevin Krakauer
	When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-14	runsc: Change cache policy for root fs and volume mounts.	Nicolas Lacasse
	Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-07-18	Moved restore code out of create and made to be called after create.	Justine Olshan
	Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-03	Skip overlay on root when its readonly	Fabricio Voznika
	PiperOrigin-RevId: 203161098 Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
2018-07-03	runsc: Mount "mandatory" mounts right after mounting the root.	Nicolas Lacasse
	The /proc and /sys mounts are "mandatory" in the sense that they should be mounted in the sandbox even when they are not included in the spec. Runsc treats /tmp similarly, because it is faster to use the internal tmpfs implementation instead of proxying to the host. However, the spec may contain submounts of these mandatory mounts (particularly for /tmp). In those cases, we must mount our mandatory mounts before the submount, otherwise the submount will be masked. Since the mandatory mounts are all top-level directories, we can mount them right after the root. PiperOrigin-RevId: 203145635 Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf
2018-06-29	Sets the restore environment for restoring a container.	Justine Olshan
	Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-25	Fix lint errors	Fabricio Voznika
	PiperOrigin-RevId: 201978212 Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067
2018-06-21	Added functionality to create a RestoreEnvironment.	Justine Olshan
	Before a container can be restored, the mounts must be configured. The root and submounts and their key information is compiled into a RestoreEnvironment. Future code will be added to set this created environment before restoring a container. Tests to ensure the correct environment were added. PiperOrigin-RevId: 201544637 Change-Id: Ia894a8b0f80f31104d1c732e113b1d65a4697087
2018-06-18	runsc: support "rw" mount option.	Lantao Liu
	PiperOrigin-RevId: 201018483 Change-Id: I52fe3d01c83c8a2f0e9275d9d88c37e46fa224a2
2018-06-15	runsc: support /dev bind mount which does not conflict with default /dev mount.	Lantao Liu
	PiperOrigin-RevId: 200768923 Change-Id: I4b8da10bcac296e8171fe6754abec5aabfec5e65
2018-06-13	Fix failure to mount volume that sandbox process has no access	Fabricio Voznika
	Boot loader tries to stat mount to determine whether it's a file or not. This may file if the sandbox process doesn't have access to the file. Instead, add overlay on top of file, which is better anyway since we don't want to propagate changes to the host. PiperOrigin-RevId: 200411261 Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
2018-06-12	runsc: do not include sub target if it is not started with '/'.	Lantao Liu
	PiperOrigin-RevId: 200274828 Change-Id: I956703217df08d8650a881479b7ade8f9f119912
2018-06-12	runsc: enable terminals in the sandbox.	Kevin Krakauer
	runsc now mounts the devpts filesystem, so you get a real terminal using ssh+sshd. PiperOrigin-RevId: 200244830 Change-Id: If577c805ad0138fda13103210fa47178d8ac6605
2018-06-04	Create destination mount dir if it doesn't exist	Fabricio Voznika
	PiperOrigin-RevId: 199175296 Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
2018-05-24	Configure sandbox as superuser	Fabricio Voznika
	Container user might not have enough priviledge to walk directories and mount filesystems. Instead, create superuser to perform these steps of the configuration. PiperOrigin-RevId: 197953667 Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb
2018-05-10	Skip atime and mtime update when file is backed by host FD	Fabricio Voznika
	When file is backed by host FD, atime and mtime for the host file and the cached attributes in the Sentry must be close together. In this case, the call to update atime and mtime can be skipped. This is important when host filesystem is using overlay because updating atime and mtime explicitly forces a copy up for every file that is touched. PiperOrigin-RevId: 196176413 Change-Id: I3933ea91637a071ba2ea9db9d8ac7cdba5dc0482
2018-04-28	Check in gVisor.	Googler
	PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463