gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-07-07	Set IPv4 ID on all non-atomic datagrams	Tony Gong
	RFC 6864 imposes various restrictions on the uniqueness of the IPv4 Identification field for non-atomic datagrams, defined as an IP datagram that either can be fragmented (DF=0) or is already a fragment (MF=1 or positive fragment offset). In order to be compliant, the ID field is assigned for all non-atomic datagrams. Add a TCP unit test that induces retransmissions and checks that the IPv4 ID field is unique every time. Add basic handling of the IP_MTU_DISCOVER socket option so that the option can be used to disable PMTU discovery, effectively setting DF=0. Attempting to set the sockopt to anything other than disabled will fail because PMTU discovery is currently not implemented, and the default behavior matches that of disabled. PiperOrigin-RevId: 320081842
2020-07-07	icmp: When setting TransportHeader, remove from the Data portion.	Ting-Yu Wang
	The current convention is when a header is set to pkt.XxxHeader field, it gets removed from pkt.Data. ICMP does not currently follow this convention. PiperOrigin-RevId: 320078606
2020-07-07	Fix mknod and inotify syscall test	Ayush Ranjan
	This change fixes a few things: - creating sockets using mknod(2) is supported via vfs2 - fsgofer can create regular files via mknod(2) - mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well Updates #2923 PiperOrigin-RevId: 320074267
2020-07-06	Call fdnotifier.UpdateFD() from fsimpl/gofer.specialFileFD.	Jamie Liu
	The fdnotifier package provides an API to a thread that continually epolls arbitrary host FDs. The set of events polled for each host FD is (intended to be) all events for which a waiter.Entry has expressed interest, as returned by waiter.Queue.Events() for the waiter.Queue registered to the given host FD. When the set of events changes (due to a change in the set of registered waiter.Entries), the mutator must call fdnotifier.UpdateFD() to recalculate the new event set and propagate it to the epoll FD. PiperOrigin-RevId: 319924719
2020-07-06	Ensure sync is called for readonly file	Fabricio Voznika
	Calling sync on a readonly file flushes metadata that may have been modified, like last access time. Updates #1198 PiperOrigin-RevId: 319888290
2020-07-06	Add support for SO_RCVBUF/SO_SNDBUF for AF_PACKET sockets.	Bhasker Hariharan
	Updates #2746 PiperOrigin-RevId: 319887810
2020-07-06	Fix NonBlockingWrite3 not writing b3 if b2 is zero-length.	Ting-Yu Wang
	PiperOrigin-RevId: 319882171
2020-07-06	Add inode number to synthetic dentries	Fabricio Voznika
	Reserve the MSB from ino for synthetic dentries to prevent conflict with regular dentries. Log warning in case MSB is set for regular dentries. Updates #1487 PiperOrigin-RevId: 319869858
2020-07-06	Shard some slow tests.	Ting-Yu Wang
	stack_x_test: 2m -> 20s tcp_x_test: 80s -> 25s PiperOrigin-RevId: 319828101
2020-07-06	Remove dependency on pkg/binary	Tamir Duberstein
	PiperOrigin-RevId: 319770124
2020-07-05	Add wakers synchronously	Tamir Duberstein
	Avoid a race where an arbitrary goroutine scheduling delay can cause the processor to miss events and hang indefinitely. Reduce allocations by storing processors by-value in the dispatcher, and by using a single WaitGroup rather than one per processor. PiperOrigin-RevId: 319665861
2020-07-01	Update preadv2/pwritev2 flag handling in vfs2.	Dean Deng
	We do not support RWF_SYNC/RWF_DSYNC and probably shouldn't silently accept them, since the user may incorrectly believe that we are synchronizing I/O. Remove the pwritev2 test verifying that we support these flags. gvisor.dev/issue/2601 is the tracking bug for deciding which RWF_.* flags we need and supporting them. Updates #2923, #2601. PiperOrigin-RevId: 319351286
2020-07-01	[vfs2][gofer] Fix mmap syscall test.	Ayush Ranjan
	We were not invalidating mappings when the file size changed in shared mode. Enabled the syscall test for vfs2. Updates #2923 PiperOrigin-RevId: 319346569
2020-07-01	[vfs2][gofer] Update file size to 0 on O_TRUNC	Ayush Ranjan
	Some Open:TruncateXxx syscall tests were failing because the file size was not being updated when the file was opened with O_TRUNC. Fixes Truncate tests in test/syscalls:open_test_runsc_ptrace_vfs2. Updates #2923 PiperOrigin-RevId: 319340127
2020-07-01	Remove maxSendBufferSize from vfs2.	Dean Deng
	Complements cl/315991648. PiperOrigin-RevId: 319327853
2020-07-01	Port vfs1 implementation of sync_file_range to vfs2.	Dean Deng
	Currently, we always perform a full-file sync which could be extremely expensive for some applications. Although vfs1 did not fully support sync_file_range, there were some optimizations that allowed us skip some unnecessary write-outs. Updates #2923, #1897. PiperOrigin-RevId: 319324213
2020-07-01	TCP receive should block when in SYN-SENT state.	Mithun Iyer
	The application can choose to initiate a non-blocking connect and later block on a read, when the endpoint is still in SYN-SENT state. PiperOrigin-RevId: 319311016
2020-07-01	Port fallocate to VFS2.	Zach Koopmans
	PiperOrigin-RevId: 319283715
2020-07-01	Complete async signal delivery support in vfs2.	Dean Deng
	- Support FIOASYNC, FIO{SET,GET}OWN, SIOC{G,S}PGRP (refactor getting/setting owner in the process). - Unset signal recipient when setting owner with pid == 0 and valid owner type. Updates #2923. PiperOrigin-RevId: 319231420
2020-06-30	Fix two bugs in TCP sender.	Bhasker Hariharan
	a) When GSO is in use we should not cap the segment to maxPayloadSize in sender.maybeSendSegment as the GSO logic will cap the segment to the correct size. Without this the host GSO is not used as we end up breaking up large segments into small MSS sized segments before writing the packets to the host. b) The check to not split a segment due to it not fitting in the receiver window when there are pending segments is incorrect as segments in writeList can be really large as we just take the write call's buffer size and create a single large segment. So a write of say 128KB will just be 1 segment in the writeList. The linux code checks if 1 MSS sized segments fits in the receiver's window and if not then does not split the current segment. gVisor's check was incorrect that it was checking if the whole segment which could be >>> 1 MSS would fit in the receiver's window. This was causing us to prematurely stop sending and falling back to retransmit timer/probe from the other end to send data. This was seen when running HTTPD benchmarks where @ HEAD when sending large files the benchmark was taking forever to run. The tcp_splitseg_mss_test.go is being deleted as the test as written doesn't test what is intended correctly. This is because GSO is enabled by default and the reason the MSS+1 sized segment is sent is because GSO is in use. A proper test will require disabling GSO on linux and netstack which is going to take a bit of work in packetimpact to do it correctly. Separately a new test probably should be written that verifies that a segment > availableWindow is not split if the availableWindow is < 1 MSS. Fixes #3107 PiperOrigin-RevId: 319172089
2020-06-30	Fix index calculation for /proc/[pid]/cmdline.	Dean Deng
	We were truncating buf using a index relative to the middle of the slice (i.e. where envv begins), but we need to calculate the index relative to the entire slice. Updates #2923. PiperOrigin-RevId: 319154950
2020-06-30	Allow O_DIRECT on vfs2 tmpfs files.	Dean Deng
	Updates #2923. PiperOrigin-RevId: 319153792
2020-06-30	Add missing newline in /sys/devices/systen/cpu/onine	Bhasker Hariharan
	PiperOrigin-RevId: 319143410
2020-06-30	Avoid multiple atomic loads	Tamir Duberstein
	...by calling (tcp.endpoint).EndpointState only once when possible. Avoid wrapping (sleep.Waker).Assert in a useless func while I'm here. PiperOrigin-RevId: 319074149
2020-06-27	Port GETOWN, SETOWN fcntls to vfs2.	Dean Deng
	Also make some fixes to vfs1's F_SETOWN. The fcntl test now entirely passes on vfs2. Fixes #2920. PiperOrigin-RevId: 318669529
2020-06-27	Support sticky bit in vfs2.	Dean Deng
	Updates #2923. PiperOrigin-RevId: 318648128
2020-06-27	Add documentation for vfs2 inotify.	Dean Deng
	Updates #1479. PiperOrigin-RevId: 318631247
2020-06-26	IPv6 raw sockets. Needed for ip6tables.	Kevin Krakauer
	IPv6 raw sockets never include the IPv6 header. PiperOrigin-RevId: 318582989
2020-06-26	Implement SO_NO_CHECK socket option.	gVisor bot
	SO_NO_CHECK is used to skip the UDP checksum generation on a TX socket (UDP checksum is optional on IPv4). Test: - TestNoChecksum - SoNoCheckOffByDefault (UdpSocketTest) - SoNoCheck (UdpSocketTest) Fixes #3055 PiperOrigin-RevId: 318575215
2020-06-26	Require CAP_SYS_ADMIN in the root user namespace for TTY theft	Kevin Krakauer
	PiperOrigin-RevId: 318563543
2020-06-26	Support inotify IN_ONESHOT.	Dean Deng
	Also, while we're here, make sure that gofer inotify events are generated when files are created in remote revalidating mode. Updates #1479. PiperOrigin-RevId: 318536354
2020-06-26	Merge pull request #2931 from ridwanmsharif:ridwanmsharif/fuse-char-device	gVisor bot
	PiperOrigin-RevId: 318511615
2020-06-25	conntrack refactor, no behavior changes	Kevin Krakauer
	- Split connTrackForPacket into 2 functions instead of switching on flag - Replace hash with struct keys. - Remove prefixes where possible - Remove unused connStatus, timeout - Flatten ConnTrack struct a bit - some intermediate structs had no meaning outside of the context of their parent. - Protect conn.tcb with a mutex - Remove redundant error checking (e.g. when is pkt.NetworkHeader valid) - Clarify that HandlePacket and CreateConnFor are the expected entrypoints for ConnTrack PiperOrigin-RevId: 318407168
2020-06-25	Avoid an allocation in epoll	Tamir Duberstein
	PiperOrigin-RevId: 318346153
2020-06-25	Moved FUSE device under the fuse directory	Ridwan Sharif

2020-06-25	Add FUSE character device	Ridwan Sharif
	This change adds a FUSE character device backed by devtmpfs. This device will be used to establish a connection between the FUSE server daemon and fusefs. The FileDescriptionImpl methods will be implemented as we flesh out fusefs some more. The tests assert that the device can be opened and used.
2020-06-24	Fix procfs bugs in vfs2.	Dean Deng
	- Support writing on proc/[pid]/{uid,gid}map - Return EIO for writing to static files. Updates #2923. PiperOrigin-RevId: 318188503
2020-06-24	Port /dev/net/tun device to VFS2.	Nicolas Lacasse
	Updates #2912 #1035 PiperOrigin-RevId: 318162565
2020-06-24	Remove waiter.Entry.Context	Tamir Duberstein
	This field is redundant since state can be stored in the callback. PiperOrigin-RevId: 318134855
2020-06-24	Add support for Stack level options.	Bhasker Hariharan
	Linux controls socket send/receive buffers using a few sysctl variables - net.core.rmem_default - net.core.rmem_max - net.core.wmem_max - net.core.wmem_default - net.ipv4.tcp_rmem - net.ipv4.tcp_wmem The first 4 control the default socket buffer sizes for all sockets raw/packet/tcp/udp and also the maximum permitted socket buffer that can be specified in setsockopt(SOL_SOCKET, SO_(RCV\|SND)BUF,...). The last two control the TCP auto-tuning limits and override the default specified in rmem_default/wmem_default as well as the max limits. Netstack today only implements tcp_rmem/tcp_wmem and incorrectly uses it to limit the maximum size in setsockopt() as well as uses it for raw/udp sockets. This changelist introduces the other 4 and updates the udp/raw sockets to use the newly introduced variables. The values for min/max match the current tcp_rmem/wmem values and the default value buffers for UDP/RAW sockets is updated to match the linux value of 212KiB up from the really low current value of 32 KiB. Updates #3043 Fixes #3043 PiperOrigin-RevId: 318089805
2020-06-23	Support for saving pointers to fields in the state package.	Adin Scannell
	Previously, it was not possible to encode/decode an object graph which contained a pointer to a field within another type. This was because the encoder was previously unable to disambiguate a pointer to an object and a pointer within the object. This CL remedies this by constructing an address map tracking the full memory range object occupy. The encoded Refvalue message has been extended to allow references to children objects within another object. Because the encoding process may learn about object structure over time, we cannot encode any objects under the entire graph has been generated. This CL also updates the state package to use standard interfaces intead of reflection-based dispatch in order to improve performance overall. This includes a custom wire protocol to significantly reduce the number of allocations and take advantage of structure packing. As part of these changes, there are a small number of minor changes in other places of the code base: * The lists used during encoding are changed to use intrusive lists with the objectEncodeState directly, which required that the ilist Len() method is updated to work properly with the ElementMapper mechanism. * A bug is fixed in the list code wherein Remove() called on an element that is already removed can corrupt the list (removing the element if there's only a single element). Now the behavior is correct. * Standard error wrapping is introduced. * Compressio was updated to implement the new wire.Reader and wire.Writer inteface methods directly. The lack of a ReadByte and WriteByte caused issues not due to interface dispatch, but because underlying slices for a Read or Write call through an interface would always escape to the heap! * Statify has been updated to support the new APIs. See README.md for a description of how the new mechanism works. PiperOrigin-RevId: 318010298
2020-06-23	Resolve remaining inotify TODOs.	Dean Deng
	Also refactor HandleDeletion(). Updates #1479. PiperOrigin-RevId: 317989000
2020-06-23	Clean up hostfs TODOs.	Dean Deng
	This CL does a handful of things: - Support O_DSYNC, O_SYNC - Support O_APPEND and document an unavoidable race condition - Ignore O_DIRECT; we probably don't want to allow applications to set O_DIRECT on the host fd itself. - Leave a TODO for supporting O_NONBLOCK, which is a simple fix once RWF_NOWAIT is supported. - Get rid of caching TODO; force_page_cache is not configurable for host fs in vfs1 or vfs2 after whitelist fs was removed. - For the remaining TODOs, link to more specific bugs. Fixes #1672. PiperOrigin-RevId: 317985269
2020-06-23	Add support for SO_REUSEADDR to TCP sockets/endpoints.	Ian Gudger
	For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses. gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did not allow disabling it. This change brings the SO_REUSEADDR behavior closer to the behavior implemented by Linux and adds a new SO_REUSEADDR disabled behavior. Like Linux, SO_REUSEADDR is now disabled by default. PiperOrigin-RevId: 317984380
2020-06-23	Port /dev/tty device to VFS2.	Nicolas Lacasse
	Support is limited to the functionality that exists in VFS1. Updates #2923 #1035 PiperOrigin-RevId: 317981417
2020-06-23	Complete inotify IN_EXCL_UNLINK implementation in VFS2.	Dean Deng
	Events were only skipped on parent directories after their children were unlinked; events on the unlinked file itself need to be skipped as well. As a result, all Watches.Notify() calls need to know whether the dentry where the call came from was unlinked. Updates #1479. PiperOrigin-RevId: 317979476
2020-06-23	Support inotify in vfs2 gofer fs.	Dean Deng
	Because there is no inode structure stored in the sandbox, inotify watches must be held on the dentry. This would be an issue in the presence of hard links, where multiple dentries would need to share the same set of watches, but in VFS2, we do not support the internal creation of hard links on gofer fs. As a result, we make the assumption that every dentry corresponds to a unique inode. Furthermore, dentries can be cached and then evicted, even if the underlying file has not be deleted. We must prevent this from occurring if there are any watches that would be lost. Note that if the dentry was deleted or invalidated (d.vfsd.IsDead()), we should still destroy it along with its watches. Additionally, when a dentry’s last watch is removed, we cache it if it also has zero references. This way, the dentry can eventually be evicted from memory if it is no longer needed. This is accomplished with a new dentry method, OnZeroWatches(), which is called by Inotify.RmWatch and Inotify.Release. Note that it must be called after all inotify locks are released to avoid violating lock order. Stress tests are added to make sure that inotify operations don't deadlock with gofer.OnZeroWatches. Updates #1479. PiperOrigin-RevId: 317958034
2020-06-23	Port readahead to VFS2.	Nicolas Lacasse
	It preserves the same functionality (almost none) as in VFS1. Updates #2923 #1035 PiperOrigin-RevId: 317943522
2020-06-23	Merge pull request #2272 from lubinszARM:pr_serr_injection	gVisor bot
	PiperOrigin-RevId: 317933650
2020-06-22	Only allow regular files, sockets, pipes, and char devices to be imported.	Dean Deng
	PiperOrigin-RevId: 317796028