gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-09-16	runsc: add global profile collection flags	Michael Pratt
	Add global flags -profile-{block,cpu,heap,mutex} and -trace which enable collection of the specified profile for the entire duration of a container execution. This provides a way to definitively start profiling before that application starts, rather than attempting to race with an out-of-band `runsc debug`. Note that only the main boot process is profiled. This exposed a bug in Task.traceExecEvent: a crash when tracing and -race are enabled. traceExecEvent is called off of the task goroutine, but uses the Task as a context, which is a violation of the Task contract. Switching to the AsyncContext fixes the issue. Fixes #220
2021-07-08	Replace kernel.ExitStatus with linux.WaitStatus.	Jamie Liu
	PiperOrigin-RevId: 383705129
2021-07-01	Mix checklocks and atomic analyzers.	Adin Scannell
	This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-03-06	[op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.	Ayush Ranjan
	The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in some places because the following don't seem to have an equivalent in unix package: - syscall.SysProcIDMap - syscall.Credential Updates #214 PiperOrigin-RevId: 361381490
2020-10-23	Add --traceback flag to customize GOTRACEBACK level	Lennart Espe

2020-09-01	Let flags be overriden from OCI annotations	Fabricio Voznika
	This allows runsc flags to be set per sandbox instance. For example, K8s pod annotations can be used to enable --debug for a single pod, making troubleshoot much easier. Similarly, features like --vfs2 can be enabled for experimentation without affecting other pods in the node. Closes #3494 PiperOrigin-RevId: 329542815
2020-08-19	Move boot.Config to its own package	Fabricio Voznika
	Updates #3494 PiperOrigin-RevId: 327548511
2020-07-14	Prepare boot.Loader to support multi-container TTY	Fabricio Voznika
	- Combine process creation code that is shared between root and subcontainer processes - Move root container information into a struct for clarity Updates #2714 PiperOrigin-RevId: 321204798
2020-06-01	Include runtime goroutines in panics	Michael Pratt
	SetTraceback("all") does not include all goroutines in panics (you didn't think it was that simple, did you?). It includes all _user_ goroutines; those started by the runtime (such as GC workers) are excluded. Switch to "system" to additionally include runtime goroutines, which are useful to track down bugs in the runtime itself. PiperOrigin-RevId: 314204473
2020-04-22	Specify a memory file in platform.New().	Andrei Vagin
	PiperOrigin-RevId: 307941984
2020-03-12	Kill sandbox process when parent process terminates	Fabricio Voznika
	When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247
2020-02-10	Add flag package to limit visibility.	Adin Scannell
	PiperOrigin-RevId: 294297004
2019-07-03	Avoid importing platforms from many source files	Andrei Vagin
	PiperOrigin-RevId: 256494243
2019-06-13	Update canonical repository.	Adin Scannell
	This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-12	Allow 'runsc do' to run without root	Fabricio Voznika
	'--rootless' flag lets a non-root user execute 'runsc do'. The drawback is that the sandbox and gofer processes will run as root inside a user namespace that is mapped to the caller's user, intead of nobody. And network is defaulted to '--network=host' inside the root network namespace. On the bright side, it's very convenient for testing: runsc --rootless do ls runsc --rootless do curl www.google.com PiperOrigin-RevId: 252840970
2019-05-15	Cleanup around urpc file payload handling	Fabricio Voznika
	urpc always closes all files once the RPC function returns. PiperOrigin-RevId: 248406857 Change-Id: I400a8562452ec75c8e4bddc2154948567d572950
2019-04-29	Change copyright notice to "The gVisor Authors"	Michael Pratt
	Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-03-18	Add support for mount propagation	Fabricio Voznika
	Properly handle propagation options for root and mounts. Now usage of mount options shared, rshared, and noexec cause error to start. shared/ rshared breaks sandbox=>host isolation. slave however can be supported because changes propagate from host to sandbox. Root FS setup moved inside the gofer. Apart from simplifying the code, it keeps all mounts inside the namespace. And they are torn down when the namespace is destroyed (DestroyFS is no longer needed). PiperOrigin-RevId: 239037661 Change-Id: I8b5ee4d50da33c042ea34fa68e56514ebe20e6e0
2019-01-22	Don't bind-mount runsc into a sandbox mntns	Andrei Vagin
	PiperOrigin-RevId: 230437407 Change-Id: Id9d8ceeb018aad2fe317407c78c6ee0f4b47aa2b
2019-01-18	Scrub runsc error messages	Fabricio Voznika
	Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-18	runsc: create a new proc mount if the sandbox process is running in a new pidns	Andrei Vagin
	PiperOrigin-RevId: 229971902 Change-Id: Ief4fac731e839ef092175908de9375d725eaa3aa
2019-01-14	runsc: set up a minimal chroot from the sandbox process	Andrei Vagin
	In this case, new mounts are not created in the host mount namspaces, so tearDownChroot isn't needed, because chroot will be destroyed with a sandbox mount namespace. In additional, pivot_root can't be called instead of chroot. PiperOrigin-RevId: 229250871 Change-Id: I765bdb587d0b8287a6a8efda8747639d37c7e7b6
2018-12-28	Simplify synchronization between runsc and sandbox process	Fabricio Voznika
	Make 'runsc create' join cgroup before creating sandbox process. This removes the need to synchronize platform creation and ensure that sandbox process is charged to the right cgroup from the start. PiperOrigin-RevId: 227166451 Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2
2018-12-06	A sandbox process should wait until it has not been moved into cgroups	Andrei Vagin
	PiperOrigin-RevId: 224418900 Change-Id: I53cf4d7c1c70117875b6920f8fd3d58a3b1497e9
2018-11-28	Internal change.	Googler
	PiperOrigin-RevId: 223231273 Change-Id: I8fb97ea91f7507b4918f7ce6562890611513fc30
2018-10-19	Use correct company name in copyright header	Ian Gudger
	PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-11	Add bare bones unsupported syscall logging	Fabricio Voznika
	This change introduces a new flags to create/run called --user-log. Logs to this files are visible to users and are meant to help debugging problems with their images and containers. For now only unsupported syscalls are sent to this log, and only minimum support was added. We can build more infrastructure around it as needed. PiperOrigin-RevId: 216735977 Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-10	Add sandbox to cgroup	Fabricio Voznika
	Sandbox creation uses the limits and reservations configured in the OCI spec and set cgroup options accordinly. Then it puts both the sandbox and gofer processes inside the cgroup. It also allows the cgroup to be pre-configured by the caller. If the cgroup already exists, sandbox and gofer processes will join the cgroup but it will not modify the cgroup with spec limits. PiperOrigin-RevId: 216538209 Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-03	runsc: Pass root container's stdio via FD.	Nicolas Lacasse
	We were previously using the sandbox process's stdio as the root container's stdio. This makes it difficult/impossible to distinguish output application output from sandbox output, such as panics, which are always written to stderr. Also close the console socket when we are done with it. PiperOrigin-RevId: 215585180 Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-09-28	Make runsc kill and delete more conformant to the "spec"	Fabricio Voznika
	PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-27	Refactor 'runsc boot' to take container ID as argument	Fabricio Voznika
	This makes the flow slightly simpler (no need to call Loader.SetRootContainer). And this is required change to tag tasks with container ID inside the Sentry. PiperOrigin-RevId: 214795210 Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-11	platform: Pass device fd into platform constructor.	Nicolas Lacasse
	We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-07	Remove '--file-access=direct' option	Fabricio Voznika
	It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-04	runsc: Pass log and config files to sandbox process by FD.	Nicolas Lacasse
	This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. The specutils.ReadSpecFromFile method was fixed to always seek to the beginning of the file before reading. This allows Files from the same FD to be read multiple times, as we do in the boot command when the apply-caps flag is set. Tested with --network=host. PiperOrigin-RevId: 211570647 Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-09-04	runsc: fix container rootfs path.	Lantao Liu
	PiperOrigin-RevId: 211515350 Change-Id: Ia495af57447c799909aa97bb873a50b87bee2625
2018-08-31	Automated rollback of changelist 210995199	Fabricio Voznika
	PiperOrigin-RevId: 211116429 Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-30	runsc: Pass log and config files to sandbox process by FD.	Nicolas Lacasse
	This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. PiperOrigin-RevId: 210995199 Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-07-18	Moved restore code out of create and made to be called after create.	Justine Olshan
	Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-12	runsc: Don't close the control server in a defer.	Nicolas Lacasse
	Closing the control server will block until all open requests have completed. If a control server method panics, we end up stuck because the defer'd Destroy function will never return. PiperOrigin-RevId: 204354676 Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
2018-06-29	Sets the restore environment for restoring a container.	Justine Olshan
	Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-28	Error out if spec is invalid	Fabricio Voznika
	Closes #66 PiperOrigin-RevId: 202496258 Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
2018-06-26	runsc: set gofer umask to 0.	Lantao Liu
	PiperOrigin-RevId: 202185642 Change-Id: I2eefcc0b2ffadc6ef21d177a8a4ab0cda91f3399
2018-06-18	Modified boot.go to allow for restores.	Justine Olshan
	A file descriptor was added as a flag to boot so a state file can restore a container that was checkpointed. PiperOrigin-RevId: 201068699 Change-Id: I18e96069488ffa3add468861397f3877725544aa
2018-06-08	Drop capabilities not needed by Gofer	Fabricio Voznika
	PiperOrigin-RevId: 199808391 Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
2018-05-17	Push signal-delivery and wait into the sandbox.	Nicolas Lacasse
	This is another step towards multi-container support. Previously, we delivered signals directly to the sandbox process (which then forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a container by waiting on the sandbox process itself. This approach will not work when there are multiple containers inside the sandbox, and we need to signal/wait on individual containers. This CL adds two new messages, ContainerSignal and ContainerWait. These messages include the id of the container to signal/wait. The controller inside the sandbox receives these messages and signals/waits on the appropriate process inside the sandbox. The container id is plumbed into the sandbox, but it currently is not used. We still end up signaling/waiting on PID 1 in all cases. Once we actually have multiple containers inside the sandbox, we will need to keep some sort of map of container id -> pid (or possibly pid namespace), and signal/kill the appropriate process for the container. PiperOrigin-RevId: 197028366 Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-04-28	Check in gVisor.	Googler
	PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463