gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-03-24	Add POLLRDNORM/POLLWRNORM support.	Bhasker Hariharan
	On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-24	Fix data race in fdbased when accessing fanoutID.	Bhasker Hariharan
	PiperOrigin-RevId: 364859173
2021-03-24	Unexpose immutable fields in stack.Route	Nick Brown
	This change sets the inner `routeInfo` struct to be a named private member and replaces direct access with access through getters. Note that direct access to the fields of `routeInfo` is still possible through the `RouteInfo` struct. Fixes #4902 PiperOrigin-RevId: 364822872
2021-03-23	Merge pull request #5677 from avagin:kvm-mmio	gVisor bot
	PiperOrigin-RevId: 364728696
2021-03-23	Move the code that manages floating-point state to a separate package	Andrei Vagin
	This change is inspired by Adin's cl/355256448. PiperOrigin-RevId: 364695931
2021-03-23	Add --file-access-mounts flag	Fabricio Voznika
	--file-access-mounts flag is similar to --file-access, but controls non-root mounts that were previously mounted in shared mode only. This gives more flexibility to control how mounts are shared within a container. PiperOrigin-RevId: 364669882
2021-03-23	setgid directory support in goferfs	Kevin Krakauer
	Also adds support for clearing the setuid bit when appropriate (writing, truncating, changing size, changing UID, or changing GID). VFS2 only. PiperOrigin-RevId: 364661835
2021-03-23	Skip checklocks analysis for stateify generated code.	Rahat Mahmood
	Stateify methods are always called without holding the appropriate locks. The system is paused and we know there will be no mutations when we call Save/Load, so this is perfectly safe. However, checklocks can't know about this, and it will always complain. Mark stateify generated methods that touch struct fields as "checklocksignore" to avoid this. PiperOrigin-RevId: 364610241
2021-03-23	Allow FSETXATTR/FGETXATTR host calls for Verity	Chong Cai
	These host calls are needed for Verity fs to generate/verify hashes. PiperOrigin-RevId: 364598180
2021-03-23	Use constant (TestInitialSequenceNumber) instead of integer (789) in tests.	Nayana Bidari
	PiperOrigin-RevId: 364596526
2021-03-23	Split fio read/write and randread/randwrite operations	Zach Koopmans
	The fio benchmark was changed to a fixed size read/write ammount because the timed benchmark was overwhelming machine memory on tmpfs mounts. Now rand(read\|write) operations are prohibitively long, leading to timeouts. Split the benchmarks as they were in python bm-tools: the read/write as fixed sized (1GB) and the rand(read\|write) as timed operations (15s). PiperOrigin-RevId: 364584436
2021-03-23	Explicitly allow martian loopback packets	Ghanan Gowripalan
	...instead of opting out of them. Loopback traffic should be stack-local but gVisor has some clients that depend on the ability to receive loopback traffic that originated from outside of the stack. Because of this, we guard this change behind IP protocol options. A previous change provided the facility to deny these martian loopback packets but this change requires client to opt-in to accepting martian loopback packets as accepting martian loopback packets are not meant to be accepted, as per RFC 1122 section 3.2.1.3.g: (g) { 127, <any> } Internal host loopback address. Addresses of this form MUST NOT appear outside a host. PiperOrigin-RevId: 364581174
2021-03-22	Update apt repository to limit to supported architectures.	Adin Scannell
	Fixes #5703 PiperOrigin-RevId: 364492235
2021-03-22	[lisa] Support dynamic types for all types.	Ayush Ranjan
	We were only supporting dynamic struct types. With this change, users can make any type dynamic. The tool (correctly) blindly just generates the remaining methods needed to implement Marshallable using the 3 methods defined by the user on the dynamic type. This is helpful in situations like: type StringArray []string Added a test for such a use case. PiperOrigin-RevId: 364463164
2021-03-22	Fix logs for packetimpact tests cleanup	Zeling Feng
	- Don't cleanup containers in Network.Cleanup, otherwise containers will be killed and removed several times. - Don't set AutoRemove for containers. This will prevent the confusing 'removal already in progress' messages. Fixes #3795 PiperOrigin-RevId: 364404414
2021-03-22	Return tcpip.Error from (*Stack).GetMainNICAddress	Ghanan Gowripalan
	PiperOrigin-RevId: 364381970
2021-03-22	Emit comment about build tags in gomarshal generated files.	Rahat Mahmood
	This may be useful for tracking down where build tags come from and understanding tag import issues in generated files. PiperOrigin-RevId: 364374931
2021-03-22	Avoid calling sync on each write in writethrough mode.	Nicolas Lacasse
	PiperOrigin-RevId: 364370595
2021-03-22	Fix and merge tcp_{outside_the_window,tcp_unacc_seq_ack}_closing	Zeling Feng
	The tests were not using the correct windowSize so the testing segments were actually within the window for seqNumOffset=0 tests. The issue is already fixed by #5674. PiperOrigin-RevId: 364252630
2021-03-18	Translate syserror when validating partial IO errors	Fabricio Voznika
	syserror allows packages to register translators for errors. These translators should be called prior to checking if the error is valid, otherwise it may not account for possible errors that can be returned from different packages, e.g. safecopy.BusError => syserror.EFAULT. Second attempt, it passes tests now :-) PiperOrigin-RevId: 363714508
2021-03-18	Address post submit comments for fs benchmarks.	Zach Koopmans
	Also, drop fio total reads/writes to 1GB as 10GB is prohibitively slow. PiperOrigin-RevId: 363714060
2021-03-18	Skip /dev submount hack on VFS2.	Jamie Liu
	containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.: ``` "mounts": [ ... { "destination": "/dev", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, ... { "destination": "/dev/shm", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=67108864" ] }, ... ``` (This is mostly consistent with how Linux is usually configured, except that /dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer implements OCI-runtime-spec-undocumented behavior to create /dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently switches /dev to devtmpfs. In VFS1, this is necessary to get device files like /dev/null at all, since VFS1 doesn't support real device special files, only what is hardcoded in devfs. VFS2 does support device special files, but using devtmpfs is the easiest way to get pre-created files in /dev.) runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1, this appears to be to avoid introducing a submount overlay for /dev, and is mostly fine since the typical mode for the /dev/shm mount is ~consistent with the mode of the /dev/shm directory provided by devfs (modulo the sticky bit). In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs' /dev/shm mode is correct for the mount point but not the mount. So turn off this behavior for VFS2. After this change: ``` $ docker run --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 360 Mar 18 00:16 .. $ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwx 1 root root 0 Mar 18 00:16 . dr-xr-xr-x 1 root root 0 Mar 18 00:16 .. $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 320 Mar 18 00:16 .. ``` Fixes #5687 PiperOrigin-RevId: 363699385
2021-03-17	Do not use martian loopback packets in tests	Ghanan Gowripalan
	Transport demuxer and UDP tests should not use a loopback address as the source address for packets injected into the stack as martian loopback packets will be dropped in a later change. PiperOrigin-RevId: 363479681
2021-03-17	Drop loopback traffic from outside of the stack	Ghanan Gowripalan
	Loopback traffic should be stack-local but gVisor has some clients that depend on the ability to receive loopback traffic that originated from outside of the stack. Because of this, we guard this change behind IP protocol options. Test: integration_test.TestExternalLoopbackTraffic PiperOrigin-RevId: 363461242
2021-03-16	kvm: prefault a floating point state before restoring it	Andrei Vagin
	If physical pages of a memory region are not mapped yet, the kernel will trigger KVM_EXIT_MMIO and we will map physical pages in bluepillHandler(). An instruction that triggered a fault will not be re-executed, it will be emulated in the kernel, but it can't emulate complex instructions like xsave, xrstor. We can touch the memory with simple instructions to workaround this problem.
2021-03-16	Fix tcp_fin_retransmission_netstack_test	Zeling Feng
	Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT unconditionally. Fixing the state machine will give us back the retransmission of FIN. PiperOrigin-RevId: 363301883
2021-03-16	Fix a race with synRcvdCount and accept	Mithun Iyer
	There is a race in handling new incoming connections on a listening endpoint that causes the endpoint to reply to more incoming SYNs than what is permitted by the listen backlog. The race occurs when there is a successful passive connection handshake and the synRcvdCount counter is decremented, followed by the endpoint delivered to the accept queue. In the window of time between synRcvdCount decrementing and the endpoint being enqueued for accept, new incoming SYNs can be handled without honoring the listen backlog value, as the backlog could be perceived not full. Fixes #5637 PiperOrigin-RevId: 363279372
2021-03-16	setgid directory support in overlayfs	Kevin Krakauer
	PiperOrigin-RevId: 363276495
2021-03-16	Unexport methods on NDPOption	Ghanan Gowripalan
	They are not used outside of the header package. PiperOrigin-RevId: 363237708
2021-03-16	Detect looped-back NDP DAD messages	Ghanan Gowripalan
	...as per RFC 7527. If a looped-back DAD message is received, do not fail DAD since our own DAD message does not indicate that a neighbor has the address assigned. Test: ndp_test.TestDADResolveLoopback PiperOrigin-RevId: 363224288
2021-03-16	Do not call into Stack from LinkAddressRequest	Ghanan Gowripalan
	Calling into the stack from LinkAddressRequest is not needed as we already have a reference to the network endpoint (IPv6) or network interface (IPv4/ARP). PiperOrigin-RevId: 363213973
2021-03-15	Turn sys_thread constants into variables.	Etienne Perot
	PiperOrigin-RevId: 363092268
2021-03-15	Move `MaxIovs` back to a variable in `iovec.go`.	Etienne Perot
	PiperOrigin-RevId: 363091954
2021-03-15	Deflake proc_test_native	Fabricio Voznika
	Terminating tasks from other tests can mess up with the task list of the current test. Tests were changed to look for added/removed tasks, ignoring other tasks that may exist while the test is running. PiperOrigin-RevId: 363084261
2021-03-15	Make netstack (//pkg/tcpip) buildable for 32 bit	Kevin Krakauer
	Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-15	[op] Make gofer client handle return partial write length when err is nil.	Ayush Ranjan
	If there was a partial write (when not using the host FD) which did not generate an error, we were incorrectly returning the number of bytes attempted to write instead of the number of bytes actually written. PiperOrigin-RevId: 363058989
2021-03-15	Merge pull request #5618 from iangudger:unix-transport-race	gVisor bot
	PiperOrigin-RevId: 362999220
2021-03-15	Packetimpact test for ACK to OTW Seq segments behavior in CLOSING	Zeling Feng
	TCP, in CLOSING state, MUST send an ACK with next expected SEQ number after receiving any segment with OTW SEQ number and remain in the same state. While I am here, I also changed shutdown to behave the same as other calls in posix_server. PiperOrigin-RevId: 362976955
2021-03-14	Fix race in tcp_retransmits_test	Mithun Iyer
	The test queries for RTO via TCP_INFO and applies that to the rest of the test. The RTO is estimated by processing incoming ACK. There is a race in the test where we may query for RTO before the incoming ACK was processed. Fix the race in the test by letting the DUT complete a payload receive, thus estimating RTO before proceeding to query the RTO. Bump up the time correction to reduce flakes. PiperOrigin-RevId: 362865904
2021-03-13	[perf] Run benchmarks with VFS2.	Ayush Ranjan
	The run-benchmark target would run the benchmark with VFS1. PiperOrigin-RevId: 362754188
2021-03-12	Add escapes to newlines in syzkaller instructions.	Nicolas Lacasse
	So they can be copy-pasted. PiperOrigin-RevId: 362605833
2021-03-12	Merge pull request #5663 from avagin:apt-repo	gVisor bot
	PiperOrigin-RevId: 362545342
2021-03-11	Support ICMP echo sockets on Linux DUT	Zeling Feng
	By default net.ipv4.ping_group_range is set to "1 0" and no one (even the root) can create an ICMP socket. Setting it to "0 0" allows root, which we are inside the container, to create ICMP sockets for packetimpact tests. PiperOrigin-RevId: 362454201
2021-03-11	make/release: Sign a package only if it isn't signed yet.	Andrei Vagin
	We can generate more than one apt repo for the same package. If we will sign a package again, its file will be changed and all hashes that have been generated before will be invalid.
2021-03-11	Remove special casing of socket stress test	Kevin Krakauer
	With /proc/sys/net/ipv4/ip_local_port_range implemented, the socket stress test runs in a more normal time and doesn't need to sacrifice coverage to prevent timeouts. PiperOrigin-RevId: 362443366
2021-03-11	improve readability of ports package	Kevin Krakauer
	Lots of small changes: - simplify package API via Reservation type - rename some single-letter variable names that were hard to follow - rename some types PiperOrigin-RevId: 362442366
2021-03-11	fusefs: Implement default_permissions and allow_other mount options.	Rahat Mahmood
	By default, fusefs defers node permission checks to the server. The default_permissions mount option enables the usual unix permission checks based on the node owner and mode bits. Previously fusefs was incorrectly checking permissions unconditionally. Additionally, fusefs should restrict filesystem access to processes started by the mount owner to prevent the fuse daemon from gaining priviledge over other processes. The allow_other mount option overrides this behaviour. Previously fusefs was incorrectly skipping this check. Updates #3229 PiperOrigin-RevId: 362419092
2021-03-11	Implement Merkle tree generate tool binary	Chong Cai
	This binary is used to recursively enable and generate Merkle tree files for all files and directories in a file system from inside a gVisor sandbox. PiperOrigin-RevId: 362418770
2021-03-11	Merge pull request #5654 from sethvargo:sethvargo/cancel	gVisor bot
	PiperOrigin-RevId: 362416183
2021-03-11	Clear Merkle tree files in RuntimeEnable mode	Chong Cai
	The Merkle tree files need to be cleared before enabling to avoid redundant content. PiperOrigin-RevId: 362409591