summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot
AgeCommit message (Collapse)Author
2021-09-20[lisa] Plumb lisafs through runsc.Ayush Ranjan
lisafs is only supported in VFS2. Added a runsc flag which enables lisafs. When the flag is enabled, the gofer process and the client communicate using lisafs protocol instead of 9P. Added a filesystem option in fsimpl/gofer which indicates if lisafs is being used. That will be used to gate lisafs on the gofer client. Note that this change does not make the gofer client use lisafs just yet. Updates #5465 PiperOrigin-RevId: 397917844
2021-09-16Merge pull request #6579 from prattmic:runsc_do_profilegVisor bot
PiperOrigin-RevId: 397114051
2021-09-16runsc: add global profile collection flagsMichael Pratt
Add global flags -profile-{block,cpu,heap,mutex} and -trace which enable collection of the specified profile for the entire duration of a container execution. This provides a way to definitively start profiling before that application starts, rather than attempting to race with an out-of-band `runsc debug`. Note that only the main boot process is profiled. This exposed a bug in Task.traceExecEvent: a crash when tracing and -race are enabled. traceExecEvent is called off of the task goroutine, but uses the Task as a context, which is a violation of the Task contract. Switching to the AsyncContext fixes the issue. Fixes #220
2021-09-15Pass address properties in a single structTony Gong
Replaced the current AddAddressWithOptions method with AddAddressWithProperties which passes all address properties in a single AddressProperties type. More properties that need to be configured in the future are expected, so adding a type makes adding them easier. PiperOrigin-RevId: 396930729
2021-09-09Remove link/packetsocketGhanan Gowripalan
This change removes NetworkDispatcher.DeliverOutboundPacket. Since all packet writes go through the NIC (the only NetworkDispatcher), we can deliver outgoing packets to interested packet endpoints before writing the packet to the link endpoint as the stack expects that all packets that get delivered to a link endpoint are transmitted on the wire. That is, link endpoints no longer need to let the stack know when it writes a packet as the stack already knows about the packet it writes through a link endpoint. PiperOrigin-RevId: 395761629
2021-09-09Add EthernetHeader only if underlying NIC has a mac address.Bhasker Hariharan
Fixes #6532 PiperOrigin-RevId: 395741741
2021-09-01Support sending with packet socketsGhanan Gowripalan
...through the loopback interface, only. This change only supports sending on packet sockets through the loopback interface as the loopback interface is the only interface used in packet socket syscall tests - the other link endpoints are not excercised with the existing test infrastructure. Support for sending on packet sockets through the other interfaces will be added as needed. BUG: https://fxbug.dev/81592 PiperOrigin-RevId: 394368899
2021-08-19Add loopback interface as an ethernet-based deviceGhanan Gowripalan
...to match Linux behaviour. We can see evidence of Linux representing loopback as an ethernet-based device below: ``` # EUI-48 based MAC addresses. $ ip link show lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # tcpdump showing ethernet frames when sniffing loopback and logging the # link-type as EN10MB (Ethernet). $ sudo tcpdump -i lo -e -c 2 -n tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0 03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0 2 packets captured 116 packets received by filter 0 packets dropped by kernel ``` Wireshark shows a similar result as the tcpdump example above. Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162 PiperOrigin-RevId: 391836719
2021-08-18Add control configsChong Cai
Also plumber the controls through runsc PiperOrigin-RevId: 391594318
2021-08-13Add Event controlsChong Cai
Add Event controls and implement "stream" commands. PiperOrigin-RevId: 390691702
2021-08-12Add Usage controlsChong Cai
Add Usage controls and implement "usage/usagefd" commands. PiperOrigin-RevId: 390507423
2021-08-04Add Fs controlsChong Cai
Add Fs controls and implement "cat" command. PiperOrigin-RevId: 388812540
2021-08-03Add Lifecycle controlsChong Cai
Also change runsc pause/resume cmd to access Lifecycle instead of containerManager. PiperOrigin-RevId: 388534928
2021-07-23Add support for SIOCGIFCONF ioctl in hostinet.Lucas Manning
PiperOrigin-RevId: 386511818
2021-07-20Don't kill container when volume is unmountedFabricio Voznika
The gofer session is killed when a gofer backed volume is unmounted. The gofer monitor catches the disconnect and kills the container. This changes the gofer monitor to only care about the rootfs connections, which cannot be unmounted. Fixes #6259 PiperOrigin-RevId: 385929039
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-13Use consistent naming for subcontainersFabricio Voznika
It was confusing to find functions relating to root and non-root containers. Replace "non-root" and "subcontainer" and make naming consistent in Sandbox and controller. PiperOrigin-RevId: 384512518
2021-07-12Fix stdios ownershipFabricio Voznika
Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12Fix GoLand analyzer errors under runsc/...Fabricio Voznika
PiperOrigin-RevId: 384344990
2021-07-08Replace kernel.ExitStatus with linux.WaitStatus.Jamie Liu
PiperOrigin-RevId: 383705129
2021-07-01[syserror] Update several syserror errors to linuxerr equivalents.Zach Koopmans
Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-29Add SIOCGIFFLAGS ioctl support to hostinet.Lucas Manning
PiperOrigin-RevId: 382194711
2021-06-28Exit early with error message on checkpoint/pause w/ hostinet.Ian Lewis
PiperOrigin-RevId: 381964660
2021-06-25Merge pull request #6222 from avagin:stopgVisor bot
PiperOrigin-RevId: 381561785
2021-06-22[syserror] Add conversions to linuxerr with temporary Equals method.Zach Koopmans
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-22runsc: don't kill sandbox, let it stop properlyAndrei Vagin
The typical sequence of calls to start a container looks like this ct, err := container.New(conf, containerArgs) defer ct.Destroy() ct.Start(conf) ws, err := ct.Wait() For the root container, ct.Destroy() kills the sandbox process. This doesn't look like a right wait to stop it. For example, all ongoing rpc calls are aborted in this case. If everything is going alright, we can just wait and it will exit itself. Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-06-17Move tcpip.Clock impl to TimekeeperTamir Duberstein
...and pass it explicitly. This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed. PiperOrigin-RevId: 380039167
2021-06-10Set RLimits during `runsc exec`Fabricio Voznika
PiperOrigin-RevId: 378726430
2021-06-10[op] Move SignalInfo to abi/linux package.Ayush Ranjan
Fixes #214 PiperOrigin-RevId: 378680466
2021-06-10remove the erroneous (5th) filter argument to sendmmsg.gVisor bot
PiperOrigin-RevId: 378677167
2021-06-09Remove --overlayfs-stale-read flagFabricio Voznika
It defaults to true and setting it to false can cause filesytem corruption. PiperOrigin-RevId: 378518663
2021-06-03Add additional mmap seccomp ruleFabricio Voznika
HostFileMapper.RegenerateMappings calls mmap with MAP_SHARED|MAP_FIXED and these were not allowed. Closes #6116 PiperOrigin-RevId: 377428463
2021-06-03Initialize metrics at initTamir Duberstein
Avoids a race condition at kernel initialization. Updates #6057. PiperOrigin-RevId: 377357723
2021-05-26Use the stack RNG everywhereTamir Duberstein
...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666
2021-05-25Initialize Kernel.Timekeeper before network NSTamir Duberstein
PiperOrigin-RevId: 375843579
2021-05-25Use specific fmt verbs (avoid %v)Tamir Duberstein
Remove useless conversions. Avoid unhandled errors. PiperOrigin-RevId: 375834275
2021-05-14Resolve remaining O_PATH TODOs.Dean Deng
O_PATH is now implemented in vfs2. Fixes #2782. PiperOrigin-RevId: 373861410
2021-05-07Merge pull request #5758 from zhlhahaha:2125gVisor bot
PiperOrigin-RevId: 372608247
2021-05-07Init all vCPU when initializing machine on ARM64howard zhang
This patch is to solve problem that vCPU timer mess up when adding vCPU dynamically on ARM64, for detailed information please refer to: https://github.com/google/gvisor/issues/5739 There is no influence on x86 and here are main changes for ARM64: 1. create maxVCPUs number of vCPU in machine initialization 2. we want to sync gvisor vCPU number with host CPU number, so use smaller number between runtime.NumCPU and KVM_CAP_MAX_VCPUS to be maxVCPUS 3. put unused vCPUs into architecture-specific map initialvCPUs 4. When machine need to bind a new vCPU with tid, rather than creating new one, it would pick a vCPU from map initalvCPUs 5. change the setSystemTime function. When vCPU number increasing, the time cost for function setTSC(use syscall to set cntvoff) is liner growth from around 300 ns to 100000 ns, and this leads to the function setSystemTimeLegacy can not get correct offset value. 6. initializing StdioFDs and goferFD before a platform to avoid StdioFDs confects with vCPU fds Signed-off-by: howard zhang <howard.zhang@arm.com>
2021-05-04Make Mount.Type optional for bind mountsFabricio Voznika
According to the OCI spec Mount.Type is an optional field and it defaults to "bind" when any of "bind" or "rbind" is included in Mount.Options. Also fix the shim to remove bind/rbind from options when mount is converted from bind to tmpfs inside the Sentry. Fixes #2330 Fixes #3274 PiperOrigin-RevId: 371996891
2021-04-22Add weirdness sentry metric.Nayana Bidari
Weirdness metric contains fields to track the number of clock fallback, partial result and vsyscalls. This metric will avoid the overhead of having three different metrics (fallbackMetric, partialResultMetric, vsyscallCount). PiperOrigin-RevId: 369970218
2021-04-19Move runsc reference leak checking to better locations.Dean Deng
In the previous spot, there was a roughly 50% chance that leak checking would actually run. Move it to the waitContainer() call on the root container, where it is guaranteed to run before the sandbox process is terminated. Add it to runsc/cli/main.go as well for good measure, in case the sandbox exit path does not involve waitContainer(). PiperOrigin-RevId: 369329796
2021-04-16Allow runsc to generate coverage reports.Dean Deng
Add a coverage-report flag that will cause the sandbox to generate a coverage report (with suffix .cov) in the debug log directory upon exiting. For the report to be generated, runsc must have been built with the following Bazel flags: `--collect_code_coverage --instrumentation_filter=...`. With coverage reports, we should be able to aggregate results across all tests to surface code coverage statistics for the project as a whole. The report is simply a text file with each line representing a covered block as `file:start_line.start_col,end_line.end_col`. Note that this is similar to the format of coverage reports generated with `go test -coverprofile`, although we omit the count and number of statements, which are not useful for us. Some simple ways of getting coverage reports: bazel test <some_test> --collect_code_coverage \ --instrumentation_filter=//pkg/... bazel build //runsc --collect_code_coverage \ --instrumentation_filter=//pkg/... runsc -coverage-report=dir/ <other_flags> do ... PiperOrigin-RevId: 368952911
2021-04-05Allow user mount for verity fsChong Cai
Allow user mounting a verity fs on an existing mount by specifying mount flags root_hash and lower_path. PiperOrigin-RevId: 366843846
2021-04-02Implement cgroupfs.Rahat Mahmood
A skeleton implementation of cgroupfs. It supports trivial cpu and memory controllers with no support for hierarchies. PiperOrigin-RevId: 366561126
2021-04-02Implement the runsc verity-prepare command.Rahat Mahmood
Implement a new runsc command to set up a sandbox with verityfs and run the measure tool. This is loosely forked from the do command, and currently requires the caller to provide the measure tool binary. PiperOrigin-RevId: 366553769
2021-03-30Fix panic when overriding /dev files with VFS2Fabricio Voznika
VFS1 skips over mounts that overrides files in /dev because the list of files is hardcoded. This is not needed for VFS2 and a recent change lifted this restriction. However, parts of the code were still skipping /dev mounts even in VFS2, causing the loader to panic when it ran short of FDs to connect to the gofer. PiperOrigin-RevId: 365858436
2021-03-23Add --file-access-mounts flagFabricio Voznika
--file-access-mounts flag is similar to --file-access, but controls non-root mounts that were previously mounted in shared mode only. This gives more flexibility to control how mounts are shared within a container. PiperOrigin-RevId: 364669882
2021-03-18Skip /dev submount hack on VFS2.Jamie Liu
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.: ``` "mounts": [ ... { "destination": "/dev", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, ... { "destination": "/dev/shm", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=67108864" ] }, ... ``` (This is mostly consistent with how Linux is usually configured, except that /dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer implements OCI-runtime-spec-undocumented behavior to create /dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently switches /dev to devtmpfs. In VFS1, this is necessary to get device files like /dev/null at all, since VFS1 doesn't support real device special files, only what is hardcoded in devfs. VFS2 does support device special files, but using devtmpfs is the easiest way to get pre-created files in /dev.) runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1, this appears to be to avoid introducing a submount overlay for /dev, and is mostly fine since the typical mode for the /dev/shm mount is ~consistent with the mode of the /dev/shm directory provided by devfs (modulo the sticky bit). In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs' /dev/shm mode is correct for the mount point but not the mount. So turn off this behavior for VFS2. After this change: ``` $ docker run --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 360 Mar 18 00:16 .. $ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwx 1 root root 0 Mar 18 00:16 . dr-xr-xr-x 1 root root 0 Mar 18 00:16 .. $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 320 Mar 18 00:16 .. ``` Fixes #5687 PiperOrigin-RevId: 363699385
2021-03-06[op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in some places because the following don't seem to have an equivalent in unix package: - syscall.SysProcIDMap - syscall.Credential Updates #214 PiperOrigin-RevId: 361381490