summaryrefslogtreecommitdiffhomepage
path: root/runsc/sandbox
AgeCommit message (Collapse)Author
2021-09-16runsc: add global profile collection flagsMichael Pratt
Add global flags -profile-{block,cpu,heap,mutex} and -trace which enable collection of the specified profile for the entire duration of a container execution. This provides a way to definitively start profiling before that application starts, rather than attempting to race with an out-of-band `runsc debug`. Note that only the main boot process is profiled. This exposed a bug in Task.traceExecEvent: a crash when tracing and -race are enabled. traceExecEvent is called off of the task goroutine, but uses the Task as a context, which is a violation of the Task contract. Switching to the AsyncContext fixes the issue. Fixes #220
2021-08-13Add Event controlsChong Cai
Add Event controls and implement "stream" commands. PiperOrigin-RevId: 390691702
2021-08-12Add Usage controlsChong Cai
Add Usage controls and implement "usage/usagefd" commands. PiperOrigin-RevId: 390507423
2021-08-04Add Fs controlsChong Cai
Add Fs controls and implement "cat" command. PiperOrigin-RevId: 388812540
2021-08-03Add Lifecycle controlsChong Cai
Also change runsc pause/resume cmd to access Lifecycle instead of containerManager. PiperOrigin-RevId: 388534928
2021-07-22runsc: Wait child processes without timeoutsAndrei Vagin
* First, we don't need to poll child processes. * Second, the 5 seconds timeout is too small if a host is overloaded. * Third, this can hide bugs in the code when we wait a process that isn't going to exit. PiperOrigin-RevId: 386337586
2021-07-13Use consistent naming for subcontainersFabricio Voznika
It was confusing to find functions relating to root and non-root containers. Replace "non-root" and "subcontainer" and make naming consistent in Sandbox and controller. PiperOrigin-RevId: 384512518
2021-07-12Fix stdios ownershipFabricio Voznika
Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12Fix GoLand analyzer errors under runsc/...Fabricio Voznika
PiperOrigin-RevId: 384344990
2021-06-22runsc: don't kill sandbox, let it stop properlyAndrei Vagin
The typical sequence of calls to start a container looks like this ct, err := container.New(conf, containerArgs) defer ct.Destroy() ct.Start(conf) ws, err := ct.Wait() For the root container, ct.Destroy() kills the sandbox process. This doesn't look like a right wait to stop it. For example, all ongoing rpc calls are aborted in this case. If everything is going alright, we can just wait and it will exit itself. Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-05-20Suppress log message when there is no errorFabricio Voznika
PiperOrigin-RevId: 374981100
2021-05-05Fixes to runsc cgroupsFabricio Voznika
When loading cgroups for another process, `/proc/self` was used in a few places, causing the end state to be a mix of the process and self. This is now fixes to always use the proper `/proc/[pid]` path. Added net_prio and net_cls to the list of optional controllers. This is to allow runsc to execute then these cgroups are disabled as long as there are no net_prio and net_cls limits that need to be applied. Deflake TestMultiContainerEvent. Closes #5875 Closes #5887 PiperOrigin-RevId: 372242687
2021-04-16Allow runsc to generate coverage reports.Dean Deng
Add a coverage-report flag that will cause the sandbox to generate a coverage report (with suffix .cov) in the debug log directory upon exiting. For the report to be generated, runsc must have been built with the following Bazel flags: `--collect_code_coverage --instrumentation_filter=...`. With coverage reports, we should be able to aggregate results across all tests to surface code coverage statistics for the project as a whole. The report is simply a text file with each line representing a covered block as `file:start_line.start_col,end_line.end_col`. Note that this is similar to the format of coverage reports generated with `go test -coverprofile`, although we omit the count and number of statements, which are not useful for us. Some simple ways of getting coverage reports: bazel test <some_test> --collect_code_coverage \ --instrumentation_filter=//pkg/... bazel build //runsc --collect_code_coverage \ --instrumentation_filter=//pkg/... runsc -coverage-report=dir/ <other_flags> do ... PiperOrigin-RevId: 368952911
2021-04-08Clarify platform errors.Adin Scannell
PiperOrigin-RevId: 367446222
2021-03-06[op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in some places because the following don't seem to have an equivalent in unix package: - syscall.SysProcIDMap - syscall.Credential Updates #214 PiperOrigin-RevId: 361381490
2021-02-02Stub out basic `runsc events --stat` CPU functionalityKevin Krakauer
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod and divide it proportionally according to sentry-internal usage stats. This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage of its containers. Addresses #172. PiperOrigin-RevId: 355229833
2021-01-12Fix simple mistakes identified by goreportcard.Adin Scannell
These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2021-01-05Add benchmarks targets to BuildKite.Adin Scannell
This includes minor fix-ups: * Handle SIGTERM in runsc debug, to exit gracefully. * Fix cmd.debug.go opening all profiles as RDONLY. * Fix the test name in fio_test.go, and encode the block size in the test. PiperOrigin-RevId: 350205718
2020-12-29Make profiling commands synchronous.Adin Scannell
This allows for a model of profiling when you can start collection, and it will terminate when the sandbox terminates. Without this synchronous call, it is effectively impossible to collect length blocking and mutex profiles. PiperOrigin-RevId: 349483418
2020-12-10Disable host reassembly for fragments.Bhasker Hariharan
fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket to ensure that fragments are delivered inorder to the right dispatcher. But this prevents fragments from being delivered to gvisor at all and makes testing of gvisor's fragment reassembly code impossible. The potential impact from this is minimal since IP Fragmentation is not really that prevelant and in cases where we do get fragments we may deliver the fragment out of order to the TCP layer as multiple network dispatchers may process the fragments and deliver a reassembled fragment after the next packet has been delivered to the TCP endpoint. While not desirable I believe the impact from this is minimal due to low prevalence of fragmentation. Also removed PktType and Hatype fields when binding the socket as these are not used when binding. Its just confusing to have them specified. See: https://man7.org/linux/man-pages/man7/packet.7.html "Fields used for binding are sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex." Fixes #5055 PiperOrigin-RevId: 346919439
2020-11-19Propagate IP address prefix from host to netstackFabricio Voznika
Closes #4022 PiperOrigin-RevId: 343378647
2020-11-17Add support for TTY in multi-containerFabricio Voznika
Fixes #2714 PiperOrigin-RevId: 342950412
2020-11-05Fix failure setting OOM score adjustmentFabricio Voznika
When OOM score adjustment needs to be set, all the containers need to be loaded to find all containers that belong to the sandbox. However, each load signals the container to ensure it is still alive. OOM score adjustment is set during creation and deletion of every container, generating a flood of signals to all containers. The fix removes the signal check when it's not needed. There is also a race fetching OOM score adjustment value from the parent when the sandbox exits at the same time (the time it took to signal containers above made this window quite large). The fix is to store the original value in the sandbox state file and use it when the value needs to be restored. Also add more logging and made the existing ones more consistent to help with debugging. PiperOrigin-RevId: 340940799
2020-10-06Merge pull request #4355 from majek:marek/swallow-SO_RCVBUFFORCE-errorgVisor bot
PiperOrigin-RevId: 335714100
2020-10-05Fix gofer monitor prematurely destroying containerFabricio Voznika
When all container tasks finish, they release the mount which in turn will close the 9P session to the gofer. The gofer exits when the connection closes, triggering the gofer monitor. The gofer monitor will _think_ that the gofer died prematurely and destroy the container. Then when the caller attempts to wait for the container, e.g. to get the exit code, wait fails saying the container doesn't exist. Gofer monitor now just SIGKILLs the container, and let the normal teardown process to happen, which will evetually destroy the container at the right time. Also, fixed an issue with exec racing with container's init process exiting. Closes #1487 PiperOrigin-RevId: 335537350
2020-09-25Swallow SO_RCVBUFFORCE and SO_SNDBUFFORCE errors on SOCK_RAW socketsMarek Majkowski
In --network=sandbox mode, we create SOCK_RAW sockets on the given network device. The code tries to force-set 4MiB rcv and snd buffers on it, but in certain situations it might fail. There is no reason for refusing the sandbox startup in such case - we should bump the buffers to max availabe size and just move on.
2020-09-01Refactor tty codebase to use master-replica terminology.Ayush Ranjan
Updates #2972 PiperOrigin-RevId: 329584905
2020-08-26Make flag propagation automaticFabricio Voznika
Use reflection and tags to provide automatic conversion from Config to flags. This makes adding new flags less error-prone, skips flags using default values (easier to read), and makes tests correctly use default flag values for test Configs. Updates #3494 PiperOrigin-RevId: 328662070
2020-08-19Move boot.Config to its own packageFabricio Voznika
Updates #3494 PiperOrigin-RevId: 327548511
2020-08-05Stop profiling when the sentry exitsFabricio Voznika
Also removes `--profile-goroutine` because it's equivalent to `debug --stacks`. PiperOrigin-RevId: 325061502
2020-07-14Prepare boot.Loader to support multi-container TTYFabricio Voznika
- Combine process creation code that is shared between root and subcontainer processes - Move root container information into a struct for clarity Updates #2714 PiperOrigin-RevId: 321204798
2020-07-13Merge pull request #2672 from amscanne:shim-integratedgVisor bot
PiperOrigin-RevId: 321053634
2020-07-08Drop empty lineMichael Pratt
PiperOrigin-RevId: 320281516
2020-06-16Add runsc options to set checksum offloading statusgVisor bot
--tx-checksum-offload=<true|false> enable TX checksum offload (default: false) --rx-checksum-offload=<true|false> enable RX checksum offload (default: true) Fixes #2989 PiperOrigin-RevId: 316781309
2020-05-28Move Cleanup to its own packageFabricio Voznika
PiperOrigin-RevId: 313663382
2020-04-30FIFO QDisc implementationBhasker Hariharan
Updates #231 PiperOrigin-RevId: 309323808
2020-04-22Specify a memory file in platform.New().Andrei Vagin
PiperOrigin-RevId: 307941984
2020-04-09Don't unconditionally set --panic-signalFabricio Voznika
Closes #2393 PiperOrigin-RevId: 305793027
2020-04-07Add friendlier messages for frequently encountered errors.Ian Lewis
Issue #2270 Issue #1765 PiperOrigin-RevId: 305385436
2020-04-07Don't map the 0 uid into a sandbox user namespaceAndrei Vagin
Starting with go1.13, we can specify ambient capabilities when we execute a new process with os/exe.Cmd. PiperOrigin-RevId: 305366706
2020-04-01Automated rollback of changelist 303799678Adin Scannell
PiperOrigin-RevId: 304221302
2020-03-30kvm: handle exit reasons even under EINTR.Adin Scannell
In the case of other signals (preemption), inject a normal bounce and defer the signal until the vCPU has been returned from guest mode. PiperOrigin-RevId: 303799678
2020-03-12Kill sandbox process when parent process terminatesFabricio Voznika
When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247
2020-03-11runsc: Set asyncpreemptoff for the kvm platformAndrei Vagin
The asynchronous goroutine preemption is a new feature of Go 1.14. When we switched to go 1.14 (cl/297915917) in the bazel config, the kokoro syscall-kvm job started permanently failing. Lets temporary set asyncpreemptoff for the kvm platform to unblock tests. PiperOrigin-RevId: 300372387
2020-03-05Merge pull request #1951 from moricho:moricho/add-profiler-optiongVisor bot
PiperOrigin-RevId: 299233818
2020-02-28Allow to specify a separate log for GO's runtime messagesAndrei Vagin
GO's runtime calls the write system call twice to print "panic:" and "the reason of this panic", so here is a race window when other threads can print something to the log and we will see something like this: panic: log messages from another thread The reason of the panic. This confuses the syzkaller blacklist and dedup detection. It also makes the logs generally difficult to read. e.g., data races often have one side of the race, followed by a large "diagnosis" dump, finally followed by the other side of the race. PiperOrigin-RevId: 297887895
2020-02-26add profile optionmoricho
2020-02-20Initial network namespace support.gVisor bot
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-11Disallow duplicate NIC names.gVisor bot
PiperOrigin-RevId: 294500858
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021