gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-11-17	Fix SO_ERROR behavior for TCP in gVisor.	Bhasker Hariharan
	Fixes the behaviour of SO_ERROR for tcp sockets where in linux it returns sk->sk_err and if sk->sk_err is 0 then it returns sk->sk_soft_err. In gVisor TCP we endpoint.HardError is the equivalent of sk->sk_err and endpoint.LastError holds soft errors. This change brings this into alignment with Linux such that both hard/soft errors are cleared when retrieved using getsockopt(.. SO_ERROR) is called on a socket. Fixes #3812 PiperOrigin-RevId: 342868552
2020-11-12	Refactor SOL_SOCKET options	Nayana Bidari
	Store all the socket level options in a struct and call {Get/Set}SockOpt on this struct. This will avoid implementing socket level options on all endpoints. This CL contains implementing one socket level option for tcp and udp endpoints. PiperOrigin-RevId: 342203981
2020-11-09	Initialize references with a value of 1.	Dean Deng
	This lets us avoid treating a value of 0 as one reference. All references using the refsvfs2 template must call InitRefs() before the reference is incremented/decremented, or else a panic will occur. Therefore, it should be pretty easy to identify missing InitRef calls during testing. Updates #1486. PiperOrigin-RevId: 341411151
2020-11-01	Fix returned error when deleting non-existant address	Ian Lewis
	PiperOrigin-RevId: 340149214
2020-10-29	Make RedirectTarget thread safe	Kevin Krakauer
	Fixes #4613. PiperOrigin-RevId: 339746784
2020-10-29	Keep magic constants out of netstack	Kevin Krakauer
	PiperOrigin-RevId: 339721152
2020-10-27	Add basic address deletion to netlink	Ian Lewis
	Updates #3921 PiperOrigin-RevId: 339195417
2020-10-26	Fix SCM Rights S/R reference leak.	Dean Deng
	Control messages collected when peeking into a socket were being leaked. PiperOrigin-RevId: 339114961
2020-10-23	Fix nogo tests in //pkg/sentry/socket/...	Ting-Yu Wang
	PiperOrigin-RevId: 338784921
2020-10-23	Support VFS2 save/restore.	Jamie Liu
	Inode number consistency checks are now skipped in save/restore tests for reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename. They pass in VFS1 due to the bug described in new test case SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs. Fixes #1663 PiperOrigin-RevId: 338776148
2020-10-23	[vfs] kernfs: Implement remaining InodeAttr fields.	Ayush Ranjan
	Added the following fields in kernfs.InodeAttr: - blockSize - atime - mtime - ctime Also resolved all TODOs for #1193. Fixes #1193 PiperOrigin-RevId: 338714527
2020-10-23	Support getsockopt for SO_ACCEPTCONN.	Nayana Bidari
	The SO_ACCEPTCONN option is used only on getsockopt(). When this option is specified, getsockopt() indicates whether socket listening is enabled for the socket. A value of zero indicates that socket listening is disabled; non-zero that it is enabled. PiperOrigin-RevId: 338703206
2020-10-23	Rewrite reference leak checker without finalizers.	Dean Deng
	Our current reference leak checker uses finalizers to verify whether an object has reached zero references before it is garbage collected. There are multiple problems with this mechanism, so a rewrite is in order. With finalizers, there is no way to guarantee that a finalizer will run before the program exits. When an unreachable object with a finalizer is garbage collected, its finalizer will be added to a queue and run asynchronously. The best we can do is run garbage collection upon sandbox exit to make sure that all finalizers are enqueued. Furthermore, if there is a chain of finalized objects, e.g. A points to B points to C, garbage collection needs to run multiple times before all of the finalizers are enqueued. The first GC run will register the finalizer for A but not free it. It takes another GC run to free A, at which point B's finalizer can be registered. As a result, we need to run GC as many times as the length of the longest such chain to have a somewhat reliable leak checker. Finally, a cyclical chain of structs pointing to one another will never be garbage collected if a finalizer is set. This is a well-known issue with Go finalizers (https://github.com/golang/go/issues/7358). Using leak checking on filesystem objects that produce cycles will not work and even result in memory leaks. The new leak checker stores reference counted objects in a global map when leak check is enabled and removes them once they are destroyed. At sandbox exit, any remaining objects in the map are considered as leaked. This provides a deterministic way of detecting leaks without relying on the complexities of finalizers and garbage collection. This approach has several benefits over the former, including: - Always detects leaks of objects that should be destroyed very close to sandbox exit. The old checker very rarely detected these leaks, because it relied on garbage collection to be run in a short window of time. - Panics if we forgot to enable leak check on a ref-counted object (we will try to remove it from the map when it is destroyed, but it will never have been added). - Can store extra logging information in the map values without adding to the size of the ref count struct itself. With the size of just an int64, the ref count object remains compact, meaning frequent operations like IncRef/DecRef are more cache-efficient. - Can aggregate leak results in a single report after the sandbox exits. Instead of having warnings littered in the log, which were non-deterministically triggered by garbage collection, we can print all warning messages at once. Note that this could also be a limitation--the sandbox must exit properly for leaks to be detected. Some basic benchmarking indicates that this change does not significantly affect performance when leak checking is enabled, which is understandable since registering/unregistering is only done once for each filesystem object. Updates #1486. PiperOrigin-RevId: 338685972
2020-10-15	sockets: ignore io.EOF from view.ReadAt	Andrei Vagin
	Reported-by: syzbot+5466463b7604c2902875@syzkaller.appspotmail.com PiperOrigin-RevId: 337451896
2020-10-14	Fix SCM Rights reference leaks.	Dean Deng
	Control messages should be released on Read (which ignores the control message) or zero-byte Send. Otherwise, open fds sent through the control messages will be leaked. PiperOrigin-RevId: 337110774
2020-09-30	ip6tables: redirect support	Kevin Krakauer
	Adds support for the IPv6-compatible redirect target. Redirection is a limited form of DNAT, where the destination is always the localhost. Updates #3549. PiperOrigin-RevId: 334698344
2020-09-30	Make all Target.Action implementation pointer receivers	Kevin Krakauer
	PiperOrigin-RevId: 334652998
2020-09-29	iptables: remove unused min/max NAT range fields	Kevin Krakauer
	PiperOrigin-RevId: 334531794
2020-09-29	Don't allow broadcast/multicast source address	Ghanan Gowripalan
	As per relevant IP RFCS (see code comments), broadcast (for IPv4) and multicast addresses are not allowed. Currently checks for these are done at the transport layer, but since it is explicitly forbidden at the IP layers, check for them there. This change also removes the UDP.InvalidSourceAddress stat since there is no longer a need for it. Test: ip_test.TestSourceAddressValidation PiperOrigin-RevId: 334490971
2020-09-29	iptables: refactor to make targets extendable	Kevin Krakauer
	Like matchers, targets should use a module-like register/lookup system. This replaces the brittle switch statements we had before. The only behavior change is supporing IPT_GET_REVISION_TARGET. This makes it much easier to add IPv6 redirect in the next change. Updates #3549. PiperOrigin-RevId: 334469418
2020-09-29	Merge pull request #3875 from btw616:fix/issue-3874	gVisor bot
	PiperOrigin-RevId: 334428344
2020-09-28	Don't leak dentries returned by sockfs.NewDentry().	Jamie Liu
	PiperOrigin-RevId: 334263322
2020-09-24	Add basic stateify annotations.	Adin Scannell
	Updates #1663 PiperOrigin-RevId: 333539293
2020-09-24	Fix socket record leak in VFS2	Tiwei Bie
	VFS2 socket record is not removed from the system-wide socket table when the socket is released, which will lead to a memory leak. This patch fixes this issue. Fixes: #3874 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-20	Merge pull request #3651 from ianlewis:ip-forwarding	gVisor bot
	PiperOrigin-RevId: 332760843
2020-09-18	Count packets dropped by iptables in IPStats	Kevin Krakauer
	PiperOrigin-RevId: 332486383
2020-09-17	ip6tables: filter table support	Kevin Krakauer
	`ip6tables -t filter` is now usable. NAT support will come in a future CL. #3549 PiperOrigin-RevId: 332381801
2020-09-17	{Set,Get} SO_LINGER on all endpoints.	Nayana Bidari
	SO_LINGER is a socket level option and should be stored on all endpoints even though it is used to linger only for TCP endpoints. PiperOrigin-RevId: 332369252
2020-09-17	Complete vfs2 implementation of fallocate.	Dean Deng
	This change includes overlay, special regular gofer files, and hostfs. Fixes #3589. PiperOrigin-RevId: 332330860
2020-09-17	Return ENOPROTOOPT for all SOL_PACKET options.	Bhasker Hariharan
	This is required to make tcpdump work. tcpdump falls back to not using things like PACKET_RX_RING if setsockopt returns ENOPROTOOPT. This used to be the case before https://github.com/google/gvisor/commit/6f8fb7e0db2790ff1f5ba835780c03fe245e437f. Fixes #3981 PiperOrigin-RevId: 332326517
2020-09-16	Automated rollback of changelist 329526153	Nayana Bidari
	PiperOrigin-RevId: 332097286
2020-09-11	Move the 'marshal' and 'primitive' packages to the 'pkg' directory.	Rahat Mahmood
	PiperOrigin-RevId: 331256608
2020-09-08	Improve type safety for transport protocol options	Ghanan Gowripalan
	The existing implementation for TransportProtocol.{Set}Option take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for transport protocol options that may be set or queried which transport protocol option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. RELNOTES: n/a PiperOrigin-RevId: 330559811
2020-09-02	Fix Accept to not return error for sockets in accept queue.	Bhasker Hariharan
	Accept on gVisor will return an error if a socket in the accept queue was closed before Accept() was called. Linux will return the new fd even if the returned socket is already closed by the peer say due to a RST being sent by the peer. This seems to be intentional in linux more details on the github issue. Fixes #3780 PiperOrigin-RevId: 329828404
2020-09-01	Automated rollback of changelist 328350576	Nayana Bidari
	PiperOrigin-RevId: 329526153
2020-08-28	Fix kernfs.Dentry reference leak.	Nicolas Lacasse
	PiperOrigin-RevId: 329036994
2020-08-27	Improve type safety for socket options	Ghanan Gowripalan
	The existing implementation for {G,S}etSockOpt take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for socket options that may be set or queried which socket option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. Fixes #3714. RELNOTES: n/a PiperOrigin-RevId: 328832161
2020-08-27	Add function to get error from a tcpip.Endpoint	Ghanan Gowripalan
	In an upcoming CL, socket option types are made to implement a marker interface with pointer receivers. Since this results in calling methods of an interface with a pointer, we incur an allocation when attempting to get an Endpoint's last error with the current implementation. When calling the method of an interface, the compiler is unable to determine what the interface implementation does with the pointer (since calling a method on an interface uses virtual dispatch at runtime so the compiler does not know what the interface method will do) so it allocates on the heap to be safe incase an implementation continues to hold the pointer after the functioon returns (the reference escapes the scope of the object). In the example below, the compiler does not know what b.foo does with the reference to a it allocates a on the heap as the reference to a may escape the scope of a. ``` var a int var b someInterface b.foo(&a) ``` This change removes the opportunity for that allocation. RELNOTES: n/a PiperOrigin-RevId: 328796559
2020-08-27	ip6tables: (de)serialize ip6tables structs	Kevin Krakauer
	More implementation+testing to follow. #3549. PiperOrigin-RevId: 328770160
2020-08-25	Use new reference count utility throughout gvisor.	Dean Deng
	This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted soon anyway. The following structs now use the new tool, with leak check enabled: devpts:rootInode fuse:inode kernfs:Dentry kernfs:dir kernfs:readonlyDir kernfs:StaticDirectory proc:fdDirInode proc:fdInfoDirInode proc:subtasksInode proc:taskInode proc:tasksInode vfs:FileDescription vfs:MountNamespace vfs:Filesystem sys:dir kernel:FSContext kernel:ProcessGroup kernel:Session shm:Shm mm:aioMappable mm:SpecialMappable transport:queue And the following use the template, but because they currently are not leak checked, a TODO is left instead of enabling leak check in this patch: kernel:FDTable tun:tunEndpoint Updates #1486. PiperOrigin-RevId: 328460377
2020-08-25	remove iptables sockopt special cases	Kevin Krakauer
	iptables sockopts were kludged into an unnecessary check, this properly relegates them to the {get,set}SockOptIP functions. PiperOrigin-RevId: 328395135
2020-08-25	Support SO_LINGER socket option.	Nayana Bidari
	When SO_LINGER option is enabled, the close will not return until all the queued messages are sent and acknowledged for the socket or linger timeout is reached. If the option is not set, close will return immediately. This option is mainly supported for connection oriented protocols such as TCP. PiperOrigin-RevId: 328350576
2020-08-25	Fix TCP_LINGER2 behavior to match linux.	Bhasker Hariharan
	We still deviate a bit from linux in how long we will actually wait in FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely obvious as to why it's done that way. For now I think we can ignore that and fix it if it really is an issue. PiperOrigin-RevId: 328324922
2020-08-20	Skip listening TCP ports when trying to bind a free port.	Bhasker Hariharan
	PiperOrigin-RevId: 327686558
2020-08-19	ip6tables: move ipv4-specific logic into its own file	Kevin Krakauer
	A later change will introduce the equivalent IPv6 logic. #3549 PiperOrigin-RevId: 327499064
2020-08-17	Merge branch 'master' into ip-forwarding	Ian Lewis
	- Merges aleksej-paschenko's with HEAD - Adds vfs2 support for ip_forward
2020-08-17	Remove weak references from unix sockets.	Dean Deng
	The abstract socket namespace no longer holds any references on sockets. Instead, TryIncRef() is used when a socket is being retrieved in BoundEndpoint(). Abstract sockets are now responsible for removing themselves from the namespace they are in, when they are destroyed. Updates #1486. PiperOrigin-RevId: 327064173
2020-08-13	Migrate to PacketHeader API for PacketBuffer.	Ting-Yu Wang
	Formerly, when a packet is constructed or parsed, all headers are set by the client code. This almost always involved prepending to pk.Header buffer or trimming pk.Data portion. This is known to prone to bugs, due to the complexity and number of the invariants assumed across netstack to maintain. In the new PacketHeader API, client will call Push()/Consume() method to construct/parse an outgoing/incoming packet. All invariants, such as slicing and trimming, are maintained by the API itself. NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no longer valid. PacketBuffer now assumes the packet is a concatenation of following portions: * LinkHeader * NetworkHeader * TransportHeader * Data Any of them could be empty, or zero-length. PiperOrigin-RevId: 326507688
2020-08-10	ip6tables: move target-specific code to targets.go	Kevin Krakauer
	This is purely moving code, no changes. netfilter.go is cluttered and targets.go is a good place for this. #3549 PiperOrigin-RevId: 325879965
2020-08-05	Add loss recovery option for TCP.	Nayana Bidari
	/proc/sys/net/ipv4/tcp_recovery is used to enable RACK loss recovery in TCP. PiperOrigin-RevId: 325157807