gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-05-13	Enable overlayfs_stale_read by default for runsc.	Jamie Liu
	Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char *argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-13	Replace test_runner.sh bash script with Go.	Ian Gudger
	PiperOrigin-RevId: 311285868
2020-05-12	Merge pull request #2678 from nybidari:iptables	gVisor bot
	PiperOrigin-RevId: 311203776
2020-05-12	Don't allow rename across different gofer or tmpfs mounts.	Nicolas Lacasse
	Fixes #2651. PiperOrigin-RevId: 311193661
2020-05-12	Merge pull request #2671 from kevinGC:skip-output	gVisor bot
	PiperOrigin-RevId: 311181084
2020-05-12	Don't call kernel.Task.Block() from netstack.SocketOperations.Write().	Jamie Liu
	kernel.Task.Block() requires that the caller is running on the task goroutine. netstack.SocketOperations.Write() uses kernel.TaskFromContext() to call kernel.Task.Block() even if it's not running on the task goroutine. Stop doing that. PiperOrigin-RevId: 311178335
2020-05-12	iptables: support gid match for owner matching.	Nayana Bidari
	- Added support for matching gid owner and invert flag for uid and gid. $ iptables -A OUTPUT -p tcp -m owner --gid-owner root -j ACCEPT $ iptables -A OUTPUT -p tcp -m owner ! --uid-owner root -j ACCEPT $ iptables -A OUTPUT -p tcp -m owner ! --gid-owner root -j DROP - Added tests for uid, gid and invert flags.
2020-05-12	Merge pull request #2664 from lubinszARM:pr_sigfp	gVisor bot
	PiperOrigin-RevId: 311153824
2020-05-11	Internal change.	Jamie Liu
	PiperOrigin-RevId: 311046755
2020-05-11	iptables: check for truly unconditional rules	Kevin Krakauer
	We weren't properly checking whether the inserted default rule was unconditional.
2020-05-11	Add fpsimd support in sigreturn on Arm64	Bin Lu
	Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-11	Add fsimpl/gofer.InternalFilesystemOptions.OpenSocketsByConnecting.	Jamie Liu
	PiperOrigin-RevId: 311014995
2020-05-11	Automated rollback of changelist 310417191	Bhasker Hariharan
	PiperOrigin-RevId: 310963404
2020-05-11	Fix view.ToVectorisedView().	Bhasker Hariharan
	view.ToVectorisedView() now just returns an empty vectorised view if the view is of zero length. Earlier it would return a VectorisedView of zero length but with 1 empty view. This has been a source of bugs as lower layers don't expect zero length views in VectorisedViews. VectorisedView.AppendView() now is a no-op if the view being appended is of zero length. Fixes #2658 PiperOrigin-RevId: 310942269
2020-05-10	Stop avoiding preadv2 and pwritev2, and add them to the filters.	Nicolas Lacasse
	Some code paths needed these syscalls anyways, so they should be included in the filters. Given that we depend on these syscalls in some cases, there's no real reason to avoid them any more. PiperOrigin-RevId: 310829126
2020-05-08	iptables - filter packets using outgoing interface.	gVisor bot
	Enables commands with -o (--out-interface) for iptables rules. $ iptables -A OUTPUT -o eth0 -j ACCEPT PiperOrigin-RevId: 310642286
2020-05-08	Pass flags to fsimpl/host.inode.open().	Jamie Liu
	This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]") effective, and it prevents imported pipes/sockets/character devices from being opened with O_NONBLOCK unconditionally (because the underlying host FD was set to non-blocking in ImportFD()). PiperOrigin-RevId: 310596062
2020-05-08	Send ACK to OTW SEQs/unacc ACKs in CLOSE_WAIT	Zeling Feng
	This fixed the corresponding packetimpact test. PiperOrigin-RevId: 310593470
2020-05-07	Fix ARM64 build.	Adin Scannell
	The common syscall definitions mean that ARM64-exclusive files need stubs in the ARM64 build. PiperOrigin-RevId: 310446698
2020-05-07	Capture range variable in parallel subtests	Sam Balana
	Only the last test was running before since the goroutines won't be executed until after this loop. I added t.Log(test.name) and this is was the result: TestListenNoAcceptNonUnicastV4/SourceUnspecified: DestOtherMulticast TestListenNoAcceptNonUnicastV4/DestUnspecified: DestOtherMulticast TestListenNoAcceptNonUnicastV4/DestOtherMulticast: DestOtherMulticast TestListenNoAcceptNonUnicastV4/SourceBroadcast: DestOtherMulticast TestListenNoAcceptNonUnicastV4/DestOurMulticast: DestOtherMulticast TestListenNoAcceptNonUnicastV4/DestBroadcast: DestOtherMulticast TestListenNoAcceptNonUnicastV4/SourceOtherMulticast: DestOtherMulticast TestListenNoAcceptNonUnicastV4/SourceOurMulticast: DestOtherMulticast https://github.com/golang/go/wiki/TableDrivenTests#parallel-testing PiperOrigin-RevId: 310440629
2020-05-07	Allocate device numbers for VFS2 filesystems.	Jamie Liu
	Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006
2020-05-07	Automated rollback of changelist 309339316	Bhasker Hariharan
	PiperOrigin-RevId: 310417191
2020-05-07	Move pkg/sentry/vfs/{eventfd,timerfd} to new packages in pkg/sentry/fsimpl.	Nicolas Lacasse
	They don't depend on anything in VFS2, so they should be their own packages. PiperOrigin-RevId: 310416807
2020-05-07	Port signalfd to vfs2.	Nicolas Lacasse
	PiperOrigin-RevId: 310404113
2020-05-07	Fix bugs in SACK recovery.	Bhasker Hariharan
	Every call to sender.NextSeg does not need to iterate from the front of the writeList as in a given recovery episode we can cache the last nextSeg returned. There cannot be a lower sequenced segment that matches the next call to NextSeg as otherwise we would have returned that instead in the previous call. This fixes the issue of excessive CPU usage w/ large send buffers where we spend a lot of time iterating from the front of the list on every NextSeg invocation. Further the following other bugs were also fixed: * Iteration of segments never sent in NextSeg() when looking for segments for retransmission that match step1/3/4 of the NextSeg algorithm * Correctly setting rescueRxt only if the rescue segment was actually sent. * Correctly initializing rescueRxt/highRxt when entering SACK recovery. * Correctly re-arming the timer only on retransmissions when SACK is in use and not for every segment being sent as it was being done before. * Copy over xmitTime and xmitCount on segment clone. * Move writeNext along when skipping over SACKED segments. This is required to prevent spurious retransmissions where we end up retransmitting data that was never lost. PiperOrigin-RevId: 310387671
2020-05-07	Update privateunixsocket TODOs.	Dean Deng
	Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474
2020-05-07	Merge pull request #2639 from kevinGC:ipv4-frag-reassembly-test	gVisor bot
	PiperOrigin-RevId: 310380911
2020-05-07	Remove outdated TODO for VFS2 AccessAt.	Dean Deng
	Fixes #1965. PiperOrigin-RevId: 310380433
2020-05-06	Add basic incoming ipv4 fragment tests	Kevin Krakauer
	Based on ipv6's TestReceiveIPv6Fragments.
2020-05-06	Merge pull request #2570 from lubinszARM:pr_clean	gVisor bot
	PiperOrigin-RevId: 310259686
2020-05-06	Remove vfs.FileDescriptionOptions.InvalidWrite.	Jamie Liu
	Compare: https://elixir.bootlin.com/linux/v5.6/source/fs/timerfd.c#L431 PiperOrigin-RevId: 310246908
2020-05-06	Do not assume no DHCPv6 configurations	Ghanan Gowripalan
	Do not assume that networks need any DHCPv6 configurations. Instead, notify the NDP dispatcher in response to the first NDP RA's DHCPv6 flags, even if the flags indicate no DHCPv6 configurations are available. PiperOrigin-RevId: 310245068
2020-05-06	Fix runsc syscall documentation generation.	Adin Scannell
	We can register any number of tables with any number of architectures, and need not limit the definitions to the architecture in question. This allows runsc to generate documentation for all architectures simultaneously. Similarly, this simplifies the VFSv2 patching process. PiperOrigin-RevId: 310224827
2020-05-06	sniffer: fix accidental logging of good packets as bad	Kevin Krakauer
	We need to check vv.Size() instead of len(tcp), as tcp will always be 20 bytes long. PiperOrigin-RevId: 310218351
2020-05-06	Add maximum memory limit.	Nicolas Lacasse
	PiperOrigin-RevId: 310179277
2020-05-05	Internal change.	gVisor bot
	PiperOrigin-RevId: 310057834
2020-05-05	Support TCP zero window probes.	Mithun Iyer
	As per RFC 1122 4.2.2.17, when the remote advertizes zero receive window, the sender needs to probe for the window-size to become non-zero starting from the next retransmission interval. The TCP connection needs to be kept open as long as the remote is acknowledging the zero window probes. We reuse the retransmission timers to support this. Fixes #1644 PiperOrigin-RevId: 310021575
2020-05-05	Update vfs2 socket TODOs.	Dean Deng
	Three updates: - Mark all vfs2 socket syscalls as supported. - Use the same dev number and ino number generator for all types of sockets, unlike in VFS1. - Do not use host fd for hostinet metadata. Fixes #1476, #1478, #1484, 1485, #2017. PiperOrigin-RevId: 309994579
2020-05-05	Update comments for synthetic gofer files in vfs2.	Dean Deng
	PiperOrigin-RevId: 309966538
2020-05-05	Return correct name for imported host files	Fabricio Voznika
	Implement PrependPath() in host.filesystem to correctly format name for host files. Updates #1672 PiperOrigin-RevId: 309959135
2020-05-05	Translate p9.NoUID/GID to OverflowUID/GID.	Jamie Liu
	p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in particular, using it for file ownership causes capabilities to be ineffective since file capabilities require that the file's KUID and KGID are mapped into the capability holder's user namespace [1], and auth.NoID is not mapped into any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the unlikely case that an application actually using the overflow KUID/KGID attempts an operation that is consequently permitted by client permission checks, the remote operation will still fail with EPERM. Since this changes the VFS2 gofer client to no longer ignore the invalid IDs entirely, this CL both permits and requires that we change synthetic mount point creation to use root credentials. [1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions. PiperOrigin-RevId: 309856455
2020-05-04	Port eventfd to VFS2.	Nicolas Lacasse
	And move sys_timerfd.go to just timerfd.go for consistency. Updates #1475. PiperOrigin-RevId: 309835029
2020-05-04	Remove kernfs.Filesystem cast from GenericDirectoryFD	Fabricio Voznika
	This allows for kerfs.Filesystem to be overridden by different implementations. Updates #1672 PiperOrigin-RevId: 309809321
2020-05-04	Merge pull request #2275 from nybidari:iptables	gVisor bot
	PiperOrigin-RevId: 309783486
2020-05-04	Add TTY support on VFS2 to runsc	Fabricio Voznika
	Updates #1623, #1487 PiperOrigin-RevId: 309777922
2020-05-04	Fix flaky monotonic time.	Adin Scannell
	This change ensures that even platforms with some TSC issues (e.g. KVM), can get reliable monotonic time by applied a lower bound on each read. PiperOrigin-RevId: 309773801
2020-05-01	Support for connection tracking of TCP packets.	Nayana Bidari
	Connection tracking is used to track packets in prerouting and output hooks of iptables. The NAT rules modify the tuples in connections. The connection tracking code modifies the packets by looking at the modified tuples.
2020-05-01	Regenerate SLAAC address on conflicts with the NIC	Ghanan Gowripalan
	If the NIC already has a generated SLAAC address, regenerate a new SLAAC address until one is generated that does not conflict with the NIC's existing addresses, up to a maximum of 10 attempts. This applies to both stable and temporary SLAAC addresses. Test: stack_test.TestMixedSLAACAddrConflictRegen PiperOrigin-RevId: 309495628
2020-05-01	Automated rollback of changelist 308674219	Kevin Krakauer
	PiperOrigin-RevId: 309491861
2020-05-01	Port netstack, hostinet, and netlink sockets to VFS2.	Dean Deng
	All three follow the same pattern: 1. Refactor VFS1 sockets into socketOpsCommon, so that most of the methods can be shared with VFS2. 2. Create a FileDescriptionImpl with the corresponding socket operations, rewriting the few that cannot be shared with VFS1. 3. Set up a VFS2 socket provider that creates a socket by setting up a dentry in the global Kernel.socketMount and connecting it with a new FileDescription. This mostly completes the work for porting sockets to VFS2, and many syscall tests can be enabled as a result. There are several networking-related syscall tests that are still not passing: 1. net gofer tests 2. socketpair gofer tests 2. sendfile tests (splice is not implemented in VFS2 yet) Updates #1478, #1484, #1485 PiperOrigin-RevId: 309457331