gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-09-16	runsc: add global profile collection flags	Michael Pratt
	Add global flags -profile-{block,cpu,heap,mutex} and -trace which enable collection of the specified profile for the entire duration of a container execution. This provides a way to definitively start profiling before that application starts, rather than attempting to race with an out-of-band `runsc debug`. Note that only the main boot process is profiled. This exposed a bug in Task.traceExecEvent: a crash when tracing and -race are enabled. traceExecEvent is called off of the task goroutine, but uses the Task as a context, which is a violation of the Task contract. Switching to the AsyncContext fixes the issue. Fixes #220
2021-08-13	Add Event controls	Chong Cai
	Add Event controls and implement "stream" commands. PiperOrigin-RevId: 390691702
2021-08-12	Add Usage controls	Chong Cai
	Add Usage controls and implement "usage/usagefd" commands. PiperOrigin-RevId: 390507423
2021-08-04	Add Fs controls	Chong Cai
	Add Fs controls and implement "cat" command. PiperOrigin-RevId: 388812540
2021-08-03	Add Lifecycle controls	Chong Cai
	Also change runsc pause/resume cmd to access Lifecycle instead of containerManager. PiperOrigin-RevId: 388534928
2021-07-22	runsc: Wait child processes without timeouts	Andrei Vagin
	* First, we don't need to poll child processes. * Second, the 5 seconds timeout is too small if a host is overloaded. * Third, this can hide bugs in the code when we wait a process that isn't going to exit. PiperOrigin-RevId: 386337586
2021-07-13	Use consistent naming for subcontainers	Fabricio Voznika
	It was confusing to find functions relating to root and non-root containers. Replace "non-root" and "subcontainer" and make naming consistent in Sandbox and controller. PiperOrigin-RevId: 384512518
2021-07-12	Fix stdios ownership	Fabricio Voznika
	Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12	Fix GoLand analyzer errors under runsc/...	Fabricio Voznika
	PiperOrigin-RevId: 384344990
2021-06-22	runsc: don't kill sandbox, let it stop properly	Andrei Vagin
	The typical sequence of calls to start a container looks like this ct, err := container.New(conf, containerArgs) defer ct.Destroy() ct.Start(conf) ws, err := ct.Wait() For the root container, ct.Destroy() kills the sandbox process. This doesn't look like a right wait to stop it. For example, all ongoing rpc calls are aborted in this case. If everything is going alright, we can just wait and it will exit itself. Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-05-20	Suppress log message when there is no error	Fabricio Voznika
	PiperOrigin-RevId: 374981100
2021-05-05	Fixes to runsc cgroups	Fabricio Voznika
	When loading cgroups for another process, `/proc/self` was used in a few places, causing the end state to be a mix of the process and self. This is now fixes to always use the proper `/proc/[pid]` path. Added net_prio and net_cls to the list of optional controllers. This is to allow runsc to execute then these cgroups are disabled as long as there are no net_prio and net_cls limits that need to be applied. Deflake TestMultiContainerEvent. Closes #5875 Closes #5887 PiperOrigin-RevId: 372242687
2021-04-16	Allow runsc to generate coverage reports.	Dean Deng
	Add a coverage-report flag that will cause the sandbox to generate a coverage report (with suffix .cov) in the debug log directory upon exiting. For the report to be generated, runsc must have been built with the following Bazel flags: `--collect_code_coverage --instrumentation_filter=...`. With coverage reports, we should be able to aggregate results across all tests to surface code coverage statistics for the project as a whole. The report is simply a text file with each line representing a covered block as `file:start_line.start_col,end_line.end_col`. Note that this is similar to the format of coverage reports generated with `go test -coverprofile`, although we omit the count and number of statements, which are not useful for us. Some simple ways of getting coverage reports: bazel test <some_test> --collect_code_coverage \ --instrumentation_filter=//pkg/... bazel build //runsc --collect_code_coverage \ --instrumentation_filter=//pkg/... runsc -coverage-report=dir/ <other_flags> do ... PiperOrigin-RevId: 368952911
2021-04-08	Clarify platform errors.	Adin Scannell
	PiperOrigin-RevId: 367446222
2021-03-06	[op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.	Ayush Ranjan
	The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in some places because the following don't seem to have an equivalent in unix package: - syscall.SysProcIDMap - syscall.Credential Updates #214 PiperOrigin-RevId: 361381490
2021-02-02	Stub out basic `runsc events --stat` CPU functionality	Kevin Krakauer
	Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod and divide it proportionally according to sentry-internal usage stats. This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage of its containers. Addresses #172. PiperOrigin-RevId: 355229833
2021-01-12	Fix simple mistakes identified by goreportcard.	Adin Scannell
	These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2021-01-05	Add benchmarks targets to BuildKite.	Adin Scannell
	This includes minor fix-ups: * Handle SIGTERM in runsc debug, to exit gracefully. * Fix cmd.debug.go opening all profiles as RDONLY. * Fix the test name in fio_test.go, and encode the block size in the test. PiperOrigin-RevId: 350205718
2020-12-29	Make profiling commands synchronous.	Adin Scannell
	This allows for a model of profiling when you can start collection, and it will terminate when the sandbox terminates. Without this synchronous call, it is effectively impossible to collect length blocking and mutex profiles. PiperOrigin-RevId: 349483418
2020-12-10	Disable host reassembly for fragments.	Bhasker Hariharan
	fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket to ensure that fragments are delivered inorder to the right dispatcher. But this prevents fragments from being delivered to gvisor at all and makes testing of gvisor's fragment reassembly code impossible. The potential impact from this is minimal since IP Fragmentation is not really that prevelant and in cases where we do get fragments we may deliver the fragment out of order to the TCP layer as multiple network dispatchers may process the fragments and deliver a reassembled fragment after the next packet has been delivered to the TCP endpoint. While not desirable I believe the impact from this is minimal due to low prevalence of fragmentation. Also removed PktType and Hatype fields when binding the socket as these are not used when binding. Its just confusing to have them specified. See: https://man7.org/linux/man-pages/man7/packet.7.html "Fields used for binding are sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex." Fixes #5055 PiperOrigin-RevId: 346919439
2020-11-19	Propagate IP address prefix from host to netstack	Fabricio Voznika
	Closes #4022 PiperOrigin-RevId: 343378647
2020-11-17	Add support for TTY in multi-container	Fabricio Voznika
	Fixes #2714 PiperOrigin-RevId: 342950412
2020-11-05	Fix failure setting OOM score adjustment	Fabricio Voznika
	When OOM score adjustment needs to be set, all the containers need to be loaded to find all containers that belong to the sandbox. However, each load signals the container to ensure it is still alive. OOM score adjustment is set during creation and deletion of every container, generating a flood of signals to all containers. The fix removes the signal check when it's not needed. There is also a race fetching OOM score adjustment value from the parent when the sandbox exits at the same time (the time it took to signal containers above made this window quite large). The fix is to store the original value in the sandbox state file and use it when the value needs to be restored. Also add more logging and made the existing ones more consistent to help with debugging. PiperOrigin-RevId: 340940799
2020-10-06	Merge pull request #4355 from majek:marek/swallow-SO_RCVBUFFORCE-error	gVisor bot
	PiperOrigin-RevId: 335714100
2020-10-05	Fix gofer monitor prematurely destroying container	Fabricio Voznika
	When all container tasks finish, they release the mount which in turn will close the 9P session to the gofer. The gofer exits when the connection closes, triggering the gofer monitor. The gofer monitor will _think_ that the gofer died prematurely and destroy the container. Then when the caller attempts to wait for the container, e.g. to get the exit code, wait fails saying the container doesn't exist. Gofer monitor now just SIGKILLs the container, and let the normal teardown process to happen, which will evetually destroy the container at the right time. Also, fixed an issue with exec racing with container's init process exiting. Closes #1487 PiperOrigin-RevId: 335537350
2020-09-25	Swallow SO_RCVBUFFORCE and SO_SNDBUFFORCE errors on SOCK_RAW sockets	Marek Majkowski
	In --network=sandbox mode, we create SOCK_RAW sockets on the given network device. The code tries to force-set 4MiB rcv and snd buffers on it, but in certain situations it might fail. There is no reason for refusing the sandbox startup in such case - we should bump the buffers to max availabe size and just move on.
2020-09-01	Refactor tty codebase to use master-replica terminology.	Ayush Ranjan
	Updates #2972 PiperOrigin-RevId: 329584905
2020-08-26	Make flag propagation automatic	Fabricio Voznika
	Use reflection and tags to provide automatic conversion from Config to flags. This makes adding new flags less error-prone, skips flags using default values (easier to read), and makes tests correctly use default flag values for test Configs. Updates #3494 PiperOrigin-RevId: 328662070
2020-08-19	Move boot.Config to its own package	Fabricio Voznika
	Updates #3494 PiperOrigin-RevId: 327548511
2020-08-05	Stop profiling when the sentry exits	Fabricio Voznika
	Also removes `--profile-goroutine` because it's equivalent to `debug --stacks`. PiperOrigin-RevId: 325061502
2020-07-14	Prepare boot.Loader to support multi-container TTY	Fabricio Voznika
	- Combine process creation code that is shared between root and subcontainer processes - Move root container information into a struct for clarity Updates #2714 PiperOrigin-RevId: 321204798
2020-07-13	Merge pull request #2672 from amscanne:shim-integrated	gVisor bot
	PiperOrigin-RevId: 321053634
2020-07-08	Drop empty line	Michael Pratt
	PiperOrigin-RevId: 320281516
2020-06-16	Add runsc options to set checksum offloading status	gVisor bot
	--tx-checksum-offload=<true\|false> enable TX checksum offload (default: false) --rx-checksum-offload=<true\|false> enable RX checksum offload (default: true) Fixes #2989 PiperOrigin-RevId: 316781309
2020-05-28	Move Cleanup to its own package	Fabricio Voznika
	PiperOrigin-RevId: 313663382
2020-04-30	FIFO QDisc implementation	Bhasker Hariharan
	Updates #231 PiperOrigin-RevId: 309323808
2020-04-22	Specify a memory file in platform.New().	Andrei Vagin
	PiperOrigin-RevId: 307941984
2020-04-09	Don't unconditionally set --panic-signal	Fabricio Voznika
	Closes #2393 PiperOrigin-RevId: 305793027
2020-04-07	Add friendlier messages for frequently encountered errors.	Ian Lewis
	Issue #2270 Issue #1765 PiperOrigin-RevId: 305385436
2020-04-07	Don't map the 0 uid into a sandbox user namespace	Andrei Vagin
	Starting with go1.13, we can specify ambient capabilities when we execute a new process with os/exe.Cmd. PiperOrigin-RevId: 305366706
2020-04-01	Automated rollback of changelist 303799678	Adin Scannell
	PiperOrigin-RevId: 304221302
2020-03-30	kvm: handle exit reasons even under EINTR.	Adin Scannell
	In the case of other signals (preemption), inject a normal bounce and defer the signal until the vCPU has been returned from guest mode. PiperOrigin-RevId: 303799678
2020-03-12	Kill sandbox process when parent process terminates	Fabricio Voznika
	When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247
2020-03-11	runsc: Set asyncpreemptoff for the kvm platform	Andrei Vagin
	The asynchronous goroutine preemption is a new feature of Go 1.14. When we switched to go 1.14 (cl/297915917) in the bazel config, the kokoro syscall-kvm job started permanently failing. Lets temporary set asyncpreemptoff for the kvm platform to unblock tests. PiperOrigin-RevId: 300372387
2020-03-05	Merge pull request #1951 from moricho:moricho/add-profiler-option	gVisor bot
	PiperOrigin-RevId: 299233818
2020-02-28	Allow to specify a separate log for GO's runtime messages	Andrei Vagin
	GO's runtime calls the write system call twice to print "panic:" and "the reason of this panic", so here is a race window when other threads can print something to the log and we will see something like this: panic: log messages from another thread The reason of the panic. This confuses the syzkaller blacklist and dedup detection. It also makes the logs generally difficult to read. e.g., data races often have one side of the race, followed by a large "diagnosis" dump, finally followed by the other side of the race. PiperOrigin-RevId: 297887895
2020-02-26	add profile option	moricho

2020-02-20	Initial network namespace support.	gVisor bot
	TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-11	Disallow duplicate NIC names.	gVisor bot
	PiperOrigin-RevId: 294500858
2020-01-27	Standardize on tools directory.	Adin Scannell
	PiperOrigin-RevId: 291745021