gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-09-09	Fix vfs2 pipe behavior when splicing to a non-pipe.	Dean Deng
	Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero would not actually empty the pipe. There was no guarantee that the data would actually be consumed on a splice operation unless the output file's implementation of Write/PWrite actually called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed regardless of whether CopyIn is called or not. Furthermore, the number of bytes in the IOSequence for reads is now capped at the amount of data actually available. Before, splicing to /dev/zero would always return the requested splice size without taking the actual available data into account. This change also refactors the case where an input file is spliced into an output pipe so that it follows a similar pattern, which is arguably cleaner anyway. Updates #3576. PiperOrigin-RevId: 328843954
2020-09-09	unix: return ECONNREFUSE if a socket file exists but a socket isn't bound to it	Andrei Vagin
	PiperOrigin-RevId: 328843560
2020-09-09	[go-marshal] Support for usermem.IOOpts.	Ayush Ranjan
	PiperOrigin-RevId: 328839759
2020-09-09	Improve type safety for socket options	Ghanan Gowripalan
	The existing implementation for {G,S}etSockOpt take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for socket options that may be set or queried which socket option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. Fixes #3714. RELNOTES: n/a PiperOrigin-RevId: 328832161
2020-09-09	Add function to get error from a tcpip.Endpoint	Ghanan Gowripalan
	In an upcoming CL, socket option types are made to implement a marker interface with pointer receivers. Since this results in calling methods of an interface with a pointer, we incur an allocation when attempting to get an Endpoint's last error with the current implementation. When calling the method of an interface, the compiler is unable to determine what the interface implementation does with the pointer (since calling a method on an interface uses virtual dispatch at runtime so the compiler does not know what the interface method will do) so it allocates on the heap to be safe incase an implementation continues to hold the pointer after the functioon returns (the reference escapes the scope of the object). In the example below, the compiler does not know what b.foo does with the reference to a it allocates a on the heap as the reference to a may escape the scope of a. ``` var a int var b someInterface b.foo(&a) ``` This change removes the opportunity for that allocation. RELNOTES: n/a PiperOrigin-RevId: 328796559
2020-09-09	ip6tables: (de)serialize ip6tables structs	Kevin Krakauer
	More implementation+testing to follow. #3549. PiperOrigin-RevId: 328770160
2020-09-09	Make flag propagation automatic	Fabricio Voznika
	Use reflection and tags to provide automatic conversion from Config to flags. This makes adding new flags less error-prone, skips flags using default values (easier to read), and makes tests correctly use default flag values for test Configs. Updates #3494 PiperOrigin-RevId: 328662070
2020-09-09	Device major number greater than 2 digits in /proc/self/maps on arm64 N1 machine	Bin Lu
	Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-09	Support stdlib analyzers with nogo.	Adin Scannell
	This immediately revealed an escape analysis violation (!), where the sync.Map was being used in a context that escapes were not allowed. This is a relatively minor fix and is included. PiperOrigin-RevId: 328611237
2020-09-09	Remove spurious fd.IncRef().	Nicolas Lacasse
	PiperOrigin-RevId: 328583461
2020-09-09	tmpfs: Allow xattrs in the trusted namespace if creds has CAP_SYS_ADMIN.	Nicolas Lacasse
	This is needed to support the overlay opaque attribute. PiperOrigin-RevId: 328552985
2020-09-09	Use new reference count utility throughout gvisor.	Dean Deng
	This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted soon anyway. The following structs now use the new tool, with leak check enabled: devpts:rootInode fuse:inode kernfs:Dentry kernfs:dir kernfs:readonlyDir kernfs:StaticDirectory proc:fdDirInode proc:fdInfoDirInode proc:subtasksInode proc:taskInode proc:tasksInode vfs:FileDescription vfs:MountNamespace vfs:Filesystem sys:dir kernel:FSContext kernel:ProcessGroup kernel:Session shm:Shm mm:aioMappable mm:SpecialMappable transport:queue And the following use the template, but because they currently are not leak checked, a TODO is left instead of enabling leak check in this patch: kernel:FDTable tun:tunEndpoint Updates #1486. PiperOrigin-RevId: 328460377
2020-09-09	Return non-zero size for tmpfs statfs(2).	Jamie Liu
	This does not implement accepting or enforcing any size limit, which will be more complex and has performance implications; it just returns a fixed non-zero size. Updates #1936 PiperOrigin-RevId: 328428588
2020-09-09	Expose basic coverage information to userspace through kcov interface.	Dean Deng
	In Linux, a kernel configuration is set that compiles the kernel with a custom function that is called at the beginning of every basic block, which updates the memory-mapped coverage information. The Go coverage tool does not allow us to inject arbitrary instructions into basic blocks, but it does provide data that we can convert to a kcov-like format and transfer them to userspace through a memory mapping. Note that this is not a strict implementation of kcov, which is especially tricky to do because we do not have the same coverage tools available in Go that that are available for the actual Linux kernel. In Linux, a kernel configuration is set that compiles the kernel with a custom function that is called at the beginning of every basic block to write program counters to the kcov memory mapping. In Go, however, coverage tools only give us a count of basic blocks as they are executed. Every time we return to userspace, we collect the coverage information and write out PCs for each block that was executed, providing userspace with the illusion that the kcov data is always up to date. For convenience, we also generate a unique synthetic PC for each block instead of using actual PCs. Finally, we do not provide thread-specific coverage data (each kcov instance only contains PCs executed by the thread owning it); instead, we will supply data for any file specified by -- instrumentation_filter. Also, fix issue in nogo that was causing pkg/coverage:coverage_nogo compilation to fail. PiperOrigin-RevId: 328426526
2020-09-09	Only send an ICMP error message if UDP checksum is valid.	Toshi Kikuchi
	Test: - TestV4UnknownDestination - TestV6UnknownDestination PiperOrigin-RevId: 328424137
2020-09-09	[go-marshal] Enable auto-marshalling for host tty.	Ayush Ranjan
	PiperOrigin-RevId: 328415633
2020-09-09	overlay: clonePrivateMount must pass a Dentry reference to MakeVirtualDentry.	Nicolas Lacasse
	PiperOrigin-RevId: 328410065
2020-09-09	Clarify comment on NetworkProtocolNumber.	Bhasker Hariharan
	The actual values used for this field in Netstack are actually EtherType values of the protocol in an Ethernet frame. Eg. header.IPv4ProtocolNumber is 0x0800 and not the number of the IPv4 Protocol Number itself which is 4. Similarly header.IPv6ProtocolNumber is set to 0x86DD whereas the IPv6 protocol number is 41. See: - https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml (For EtherType) - https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml (For ProtocolNumbers) PiperOrigin-RevId: 328407293
2020-09-09	remove iptables sockopt special cases	Kevin Krakauer
	iptables sockopts were kludged into an unnecessary check, this properly relegates them to the {get,set}SockOptIP functions. PiperOrigin-RevId: 328395135
2020-09-09	Add nogo support to go_binary and go_test targets.	Adin Scannell
	Updates #3374 PiperOrigin-RevId: 328378700
2020-09-09	Change "Fd" member to "FD" according to convension	gVisor bot
	PiperOrigin-RevId: 328374775
2020-09-09	Add option to replace linkAddrCache with neighborCache	Sam Balana
	This change adds an option to replace the current implementation of ARP through linkAddrCache, with an implementation of NUD through neighborCache. Switching to using NUD for both ARP and NDP is beneficial for the reasons described by RFC 4861 Section 3.1: "[Using NUD] significantly improves the robustness of packet delivery in the presence of failing routers, partially failing or partitioned links, or nodes that change their link-layer addresses. For instance, mobile nodes can move off-link without losing any connectivity due to stale ARP caches." "Unlike ARP, Neighbor Unreachability Detection detects half-link failures and avoids sending traffic to neighbors with which two-way connectivity is absent." Along with these changes exposes the API for querying and operating the neighbor cache. Operations include: - Create a static entry - List all entries - Delete all entries - Remove an entry by address This also exposes the API to change the NUD protocol constants on a per-NIC basis to allow Neighbor Discovery to operate over links with widely varying performance characteristics. See [RFC 4861 Section 10][1] for the list of constants. Finally, an API for subscribing to NUD state changes is exposed through NUDDispatcher. See [RFC 4861 Appendix C][3] for the list of edges. Tests: pkg/tcpip/network/arp:arp_test + TestDirectRequest pkg/tcpip/network/ipv6:ipv6_test + TestLinkResolution + TestNDPValidation + TestNeighorAdvertisementWithTargetLinkLayerOption + TestNeighorSolicitationResponse + TestNeighorSolicitationWithSourceLinkLayerOption + TestRouterAdvertValidation pkg/tcpip/stack:stack_test + TestCacheWaker + TestForwardingWithFakeResolver + TestForwardingWithFakeResolverManyPackets + TestForwardingWithFakeResolverManyResolutions + TestForwardingWithFakeResolverPartialTimeout + TestForwardingWithFakeResolverTwoPackets + TestIPv6SourceAddressSelectionScopeAndSameAddress [1]: https://tools.ietf.org/html/rfc4861#section-10 [2]: https://tools.ietf.org/html/rfc4861#appendix-C Fixes #1889 Fixes #1894 Fixes #1895 Fixes #1947 Fixes #1948 Fixes #1949 Fixes #1950 PiperOrigin-RevId: 328365034
2020-09-09	Support SO_LINGER socket option.	Nayana Bidari
	When SO_LINGER option is enabled, the close will not return until all the queued messages are sent and acknowledged for the socket or linger timeout is reached. If the option is not set, close will return immediately. This option is mainly supported for connection oriented protocols such as TCP. PiperOrigin-RevId: 328350576
2020-09-09	Fix TCP_LINGER2 behavior to match linux.	Bhasker Hariharan
	We still deviate a bit from linux in how long we will actually wait in FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely obvious as to why it's done that way. For now I think we can ignore that and fix it if it really is an issue. PiperOrigin-RevId: 328324922
2020-09-09	Fix deadlock in gofer direct IO.	Dean Deng
	Fixes several java runtime tests: java/nio/channels/FileChannel/directio/ReadDirect.java java/nio/channels/FileChannel/directio/PreadDirect.java Updates #3576. PiperOrigin-RevId: 328281849
2020-09-09	Automated rollback of changelist 327325153	Ghanan Gowripalan
	PiperOrigin-RevId: 328259353
2020-09-09	Flush in fsimpl/gofer.regularFileFD.OnClose() if there are no dirty pages.	Jamie Liu
	This is closer to indistinguishable from VFS1 behavior. PiperOrigin-RevId: 328256068
2020-09-09	Add check for same source in merkle tree lib	gVisor bot
	If the data is in the same Reader as the merkle tree, we should verify from the first layer in the tree, instead of from the beginning. PiperOrigin-RevId: 328230988
2020-09-09	Remove go profiling flag from dockerutil.	Zach Koopmans
	Go profiling was removed from runsc debug in a previous change. PiperOrigin-RevId: 328203826
2020-09-09	Bump build constraints to 1.17	Michael Pratt
	This enables pre-release testing with 1.16. The intention is to replace these with a nogo check before the next release. PiperOrigin-RevId: 328193911
2020-09-09	Consider loopback bound to all addresses in subnet	Ghanan Gowripalan
	When a loopback interface is configurd with an address and associated subnet, the loopback should treat all addresses in that subnet as an address it owns. This is mimicking linux behaviour as seen below: ``` $ ip addr show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever $ ping 192.0.2.1 PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data. ^C --- 192.0.2.1 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1018ms $ ping 192.0.2.2 PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. ^C --- 192.0.2.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2039ms $ sudo ip addr add 192.0.2.1/24 dev lo $ ip addr show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 192.0.2.1/24 scope global lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever $ ping 192.0.2.1 PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data. 64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms 64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms 64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms ^C --- 192.0.2.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2042ms rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms $ ping 192.0.2.2 PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms 64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms 64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms 64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms ^C --- 192.0.2.2 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3049ms rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms ``` Test: integration_test.TestLoopbackAcceptAllInSubnet PiperOrigin-RevId: 328188546
2020-09-09	Update inotify documentation for gofer filesystem.	Dean Deng
	We now allow hard links to be created within gofer fs (see github.com/google/gvisor/commit/f20e63e31b56784c596897e86f03441f9d05f567). Update the inotify documentation accordingly. PiperOrigin-RevId: 328177485
2020-09-09	Implement GetFilesystem for verity fs	gVisor bot
	verity GetFilesystem is implemented by mounting the underlying file system, save the mount, and store both the underlying root dentry and root Merkle file dentry in verity's root dentry. PiperOrigin-RevId: 327959334
2020-09-09	[vfs] Allow mountpoint to be an existing non-directory.	Ayush Ranjan
	Unlike linux mount(2), OCI spec allows mounting on top of an existing non-directory file. PiperOrigin-RevId: 327914342
2020-09-09	stateify: Fix pretty print not printing odd numbered fields.	Ting-Yu Wang
	PiperOrigin-RevId: 327902182
2020-09-09	Provide fdReader/Writer for FileDescription	gVisor bot
	fdReader/Writer implements io.Reader/Writer so that they can be passed to Merkle tree library. PiperOrigin-RevId: 327901376
2020-09-09	Internal change.	gVisor bot
	PiperOrigin-RevId: 327892274
2020-09-09	Clarify seek behaviour for kernfs.GenericDirectoryFD.	Rahat Mahmood
	- Remove comment about GenericDirectoryFD not being compatible with dynamic directories. It is currently being used to implement dynamic directories. - Try to handle SEEK_END better than setting the offset to infinity. SEEK_END is poorly defined for dynamic directories anyways, so at least try make it work correctly for the static entries. Updates #1193. PiperOrigin-RevId: 327890128
2020-09-09	Pass overlay credentials via context in copy up.	Nicolas Lacasse
	Some VFS operations (those which operate on FDs) get their credentials via the context instead of via an explicit creds param. For these cases, we must pass the overlay credentials on the context. PiperOrigin-RevId: 327881259
2020-09-09	Make mounts ReadWrite first, then later change to ReadOnly.	Nicolas Lacasse
	This lets us create "synthetic" mountpoint directories in ReadOnly mounts during VFS setup. Also add context.WithMountNamespace, as some filesystems (like overlay) require a MountNamespace on ctx to handle vfs.Filesystem Operations. PiperOrigin-RevId: 327874971
2020-09-09	Fix parent directory creation in CreateDeviceFile.	Nicolas Lacasse
	It was not properly creating recursive directories. Added tests for this case. Updates #1196 PiperOrigin-RevId: 327850811
2020-09-09	[vfs] Create recursive dir creation util.	Ayush Ranjan
	Refactored the recursive dir creation util in runsc/boot/vfs.go to be more flexible. PiperOrigin-RevId: 327719100
2020-09-09	stateify: Fix afterLoad not being called for root object	Ting-Yu Wang
	PiperOrigin-RevId: 327711264
2020-09-09	Add reference count checking to the fsimpl/host package.	Dean Deng
	Includes a minor refactor for inode construction. Updates #1486. PiperOrigin-RevId: 327694933
2020-09-09	Consistent precondition formatting	Michael Pratt
	Our "Preconditions:" blocks are very useful to determine the input invariants, but they are bit inconsistent throughout the codebase, which makes them harder to read (particularly cases with 5+ conditions in a single paragraph). I've reformatted all of the cases to fit in simple rules: 1. Cases with a single condition are placed on a single line. 2. Cases with multiple conditions are placed in a bulleted list. This format has been added to the style guide. I've also mentioned "Postconditions:", though those are much less frequently used, and all uses already match this style. PiperOrigin-RevId: 327687465
2020-09-09	Skip listening TCP ports when trying to bind a free port.	Bhasker Hariharan
	PiperOrigin-RevId: 327686558
2020-09-09	Only use the NextHeader value of the first IPv6 fragment extension header.	Arthur Sfez
	As per RFC 8200 Section 4.5: The Next Header field of the last header of the Per-Fragment headers is obtained from the Next Header field of the first fragment's Fragment header. Test: - pkg/tcpip/network/ipv6:ipv6_test - pkg/tcpip/network/ipv4:ipv4_test - pkg/tcpip/network/fragmentation:fragmentation_test Updates #2197 PiperOrigin-RevId: 327671635
2020-09-09	Use a explicit random src for RandomID.	Bhasker Hariharan
	PiperOrigin-RevId: 327659759
2020-09-09	Fix tabs in lock-ordering doc.	Nicolas Lacasse
	PiperOrigin-RevId: 327654207
2020-09-09	Move boot.Config to its own package	Fabricio Voznika
	Updates #3494 PiperOrigin-RevId: 327548511