gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-03-04	Fix race in unix socket transport.	Ian Gudger
	transport.baseEndpoint.receiver and transport.baseEndpoint.connected are protected by transport.baseEndpoint.Mutex. In order to access them without holding the mutex, we must make a copy. Notifications must be sent without holding the mutex, so we need the values without holding the mutex.
2021-03-03	[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.	Ayush Ranjan
	The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-19	Don't hold baseEndpoint.mu while calling EventUpdate().	Nicolas Lacasse
	This removes a three-lock deadlock between fdnotifier.notifier.mu, epoll.EventPoll.listsMu, and baseEndpoint.mu. A lock order comment was added to epoll/epoll.go. Also fix unsafe access of baseEndpoint.connected/receiver. PiperOrigin-RevId: 358515191
2021-02-18	Make socketops reflect correct sndbuf value for host UDS.	Bhasker Hariharan
	Also skips a test if the setsockopt to increase send buffer did not result in an increase. This is possible when the underlying socket is a host backed unix domain socket as in such cases gVisor does not permit increasing SO_SNDBUF. PiperOrigin-RevId: 358285158
2021-02-09	Add support for setting SO_SNDBUF for unix domain sockets.	Bhasker Hariharan
	The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-05	Replace TaskFromContext(ctx).Kernel() with KernelFromContext(ctx)	Ting-Yu Wang
	Panic seen at some code path like control.ExecAsync where ctx does not have a Task. Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com PiperOrigin-RevId: 355960596
2021-01-28	Change tcpip.Error to an interface	Tamir Duberstein
	This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-26	Initialize the send buffer handler in endpoint creation.	Nayana Bidari
	- This CL will initialize the function handler used for getting the send buffer size limits during endpoint creation and does not require the caller of SetSendBufferSize(..) to know the endpoint type(tcp/udp/..) PiperOrigin-RevId: 353992634
2021-01-26	Do not send SCM Rights more than once when message is truncated.	Dean Deng
	If data is sent over a stream socket that will not fit all at once, it will be sent over multiple packets. SCM Rights should only be sent with the first packet (see net/unix/af_unix.c:unix_stream_sendmsg in Linux). Reported-by: syzbot+aa26482e9c4887aff259@syzkaller.appspotmail.com PiperOrigin-RevId: 353886442
2021-01-26	Move SO_SNDBUF to socketops.	Nayana Bidari
	This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For unix sockets, we do not support setting of this option. PiperOrigin-RevId: 353871484
2021-01-20	Move Lock/UnlockPOSIX into LockFD util.	Dean Deng
	PiperOrigin-RevId: 352904728
2021-01-12	Fix simple mistakes identified by goreportcard.	Adin Scannell
	These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2020-12-14	Move SO_LINGER option to socketops.	Nayana Bidari
	PiperOrigin-RevId: 347437786
2020-12-02	[netstack] Refactor common utils out of netstack to socket package.	Ayush Ranjan
	Moved AddressAndFamily() and ConvertAddress() to socket package from netstack. This helps because these utilities are used by sibling netstack packages. Such sibling dependencies can later cause circular dependencies. Common utils shared between siblings should be moved up to the parent. PiperOrigin-RevId: 345275571
2020-11-26	[netstack] Add SOL_TCP options to SocketOptions.	Ayush Ranjan
	Ports the following options: - TCP_NODELAY - TCP_CORK - TCP_QUICKACK Also deletes the {Get/Set}SockOptBool interface methods from all implementations PiperOrigin-RevId: 344378824
2020-11-18	[netstack] Move SO_KEEPALIVE and SO_ACCEPTCONN option to SocketOptions.	Ayush Ranjan
	PiperOrigin-RevId: 343217712
2020-11-18	[netstack] Move SO_REUSEPORT and SO_REUSEADDR option to SocketOptions.	Ayush Ranjan
	This changes also introduces: - `SocketOptionsHandler` interface which can be implemented by endpoints to handle endpoint specific behavior on SetSockOpt. This is analogous to what Linux does. - `DefaultSocketOptionsHandler` which is a default implementation of the above. This is embedded in all endpoints so that we don't have to uselessly implement empty functions. Endpoints with specific behavior can override the embedded method by manually defining its own implementation. PiperOrigin-RevId: 343158301
2020-11-18	[netstack] Move SO_PASSCRED option to SocketOptions.	Ayush Ranjan
	This change also makes the following fixes: - Make SocketOptions use atomic operations instead of having to acquire/drop locks upon each get/set option. - Make documentation more consistent. - Remove tcpip.SocketOptions from socketOpsCommon because it already exists in transport.Endpoint. - Refactors get/set socket options tests to be easily extendable. PiperOrigin-RevId: 343103780
2020-11-12	Refactor SOL_SOCKET options	Nayana Bidari
	Store all the socket level options in a struct and call {Get/Set}SockOpt on this struct. This will avoid implementing socket level options on all endpoints. This CL contains implementing one socket level option for tcp and udp endpoints. PiperOrigin-RevId: 342203981
2020-11-09	Initialize references with a value of 1.	Dean Deng
	This lets us avoid treating a value of 0 as one reference. All references using the refsvfs2 template must call InitRefs() before the reference is incremented/decremented, or else a panic will occur. Therefore, it should be pretty easy to identify missing InitRef calls during testing. Updates #1486. PiperOrigin-RevId: 341411151
2020-10-26	Fix SCM Rights S/R reference leak.	Dean Deng
	Control messages collected when peeking into a socket were being leaked. PiperOrigin-RevId: 339114961
2020-10-23	Fix nogo tests in //pkg/sentry/socket/...	Ting-Yu Wang
	PiperOrigin-RevId: 338784921
2020-10-23	[vfs] kernfs: Implement remaining InodeAttr fields.	Ayush Ranjan
	Added the following fields in kernfs.InodeAttr: - blockSize - atime - mtime - ctime Also resolved all TODOs for #1193. Fixes #1193 PiperOrigin-RevId: 338714527
2020-10-23	Support getsockopt for SO_ACCEPTCONN.	Nayana Bidari
	The SO_ACCEPTCONN option is used only on getsockopt(). When this option is specified, getsockopt() indicates whether socket listening is enabled for the socket. A value of zero indicates that socket listening is disabled; non-zero that it is enabled. PiperOrigin-RevId: 338703206
2020-10-23	Rewrite reference leak checker without finalizers.	Dean Deng
	Our current reference leak checker uses finalizers to verify whether an object has reached zero references before it is garbage collected. There are multiple problems with this mechanism, so a rewrite is in order. With finalizers, there is no way to guarantee that a finalizer will run before the program exits. When an unreachable object with a finalizer is garbage collected, its finalizer will be added to a queue and run asynchronously. The best we can do is run garbage collection upon sandbox exit to make sure that all finalizers are enqueued. Furthermore, if there is a chain of finalized objects, e.g. A points to B points to C, garbage collection needs to run multiple times before all of the finalizers are enqueued. The first GC run will register the finalizer for A but not free it. It takes another GC run to free A, at which point B's finalizer can be registered. As a result, we need to run GC as many times as the length of the longest such chain to have a somewhat reliable leak checker. Finally, a cyclical chain of structs pointing to one another will never be garbage collected if a finalizer is set. This is a well-known issue with Go finalizers (https://github.com/golang/go/issues/7358). Using leak checking on filesystem objects that produce cycles will not work and even result in memory leaks. The new leak checker stores reference counted objects in a global map when leak check is enabled and removes them once they are destroyed. At sandbox exit, any remaining objects in the map are considered as leaked. This provides a deterministic way of detecting leaks without relying on the complexities of finalizers and garbage collection. This approach has several benefits over the former, including: - Always detects leaks of objects that should be destroyed very close to sandbox exit. The old checker very rarely detected these leaks, because it relied on garbage collection to be run in a short window of time. - Panics if we forgot to enable leak check on a ref-counted object (we will try to remove it from the map when it is destroyed, but it will never have been added). - Can store extra logging information in the map values without adding to the size of the ref count struct itself. With the size of just an int64, the ref count object remains compact, meaning frequent operations like IncRef/DecRef are more cache-efficient. - Can aggregate leak results in a single report after the sandbox exits. Instead of having warnings littered in the log, which were non-deterministically triggered by garbage collection, we can print all warning messages at once. Note that this could also be a limitation--the sandbox must exit properly for leaks to be detected. Some basic benchmarking indicates that this change does not significantly affect performance when leak checking is enabled, which is understandable since registering/unregistering is only done once for each filesystem object. Updates #1486. PiperOrigin-RevId: 338685972
2020-10-14	Fix SCM Rights reference leaks.	Dean Deng
	Control messages should be released on Read (which ignores the control message) or zero-byte Send. Otherwise, open fds sent through the control messages will be leaked. PiperOrigin-RevId: 337110774
2020-09-29	Merge pull request #3875 from btw616:fix/issue-3874	gVisor bot
	PiperOrigin-RevId: 334428344
2020-09-28	Don't leak dentries returned by sockfs.NewDentry().	Jamie Liu
	PiperOrigin-RevId: 334263322
2020-09-24	Add basic stateify annotations.	Adin Scannell
	Updates #1663 PiperOrigin-RevId: 333539293
2020-09-24	Fix socket record leak in VFS2	Tiwei Bie
	VFS2 socket record is not removed from the system-wide socket table when the socket is released, which will lead to a memory leak. This patch fixes this issue. Fixes: #3874 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-17	{Set,Get} SO_LINGER on all endpoints.	Nayana Bidari
	SO_LINGER is a socket level option and should be stored on all endpoints even though it is used to linger only for TCP endpoints. PiperOrigin-RevId: 332369252
2020-09-16	Automated rollback of changelist 329526153	Nayana Bidari
	PiperOrigin-RevId: 332097286
2020-09-11	Move the 'marshal' and 'primitive' packages to the 'pkg' directory.	Rahat Mahmood
	PiperOrigin-RevId: 331256608
2020-09-02	Fix Accept to not return error for sockets in accept queue.	Bhasker Hariharan
	Accept on gVisor will return an error if a socket in the accept queue was closed before Accept() was called. Linux will return the new fd even if the returned socket is already closed by the peer say due to a RST being sent by the peer. This seems to be intentional in linux more details on the github issue. Fixes #3780 PiperOrigin-RevId: 329828404
2020-09-01	Automated rollback of changelist 328350576	Nayana Bidari
	PiperOrigin-RevId: 329526153
2020-08-27	Improve type safety for socket options	Ghanan Gowripalan
	The existing implementation for {G,S}etSockOpt take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for socket options that may be set or queried which socket option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. Fixes #3714. RELNOTES: n/a PiperOrigin-RevId: 328832161
2020-08-27	Add function to get error from a tcpip.Endpoint	Ghanan Gowripalan
	In an upcoming CL, socket option types are made to implement a marker interface with pointer receivers. Since this results in calling methods of an interface with a pointer, we incur an allocation when attempting to get an Endpoint's last error with the current implementation. When calling the method of an interface, the compiler is unable to determine what the interface implementation does with the pointer (since calling a method on an interface uses virtual dispatch at runtime so the compiler does not know what the interface method will do) so it allocates on the heap to be safe incase an implementation continues to hold the pointer after the functioon returns (the reference escapes the scope of the object). In the example below, the compiler does not know what b.foo does with the reference to a it allocates a on the heap as the reference to a may escape the scope of a. ``` var a int var b someInterface b.foo(&a) ``` This change removes the opportunity for that allocation. RELNOTES: n/a PiperOrigin-RevId: 328796559
2020-08-25	Use new reference count utility throughout gvisor.	Dean Deng
	This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted soon anyway. The following structs now use the new tool, with leak check enabled: devpts:rootInode fuse:inode kernfs:Dentry kernfs:dir kernfs:readonlyDir kernfs:StaticDirectory proc:fdDirInode proc:fdInfoDirInode proc:subtasksInode proc:taskInode proc:tasksInode vfs:FileDescription vfs:MountNamespace vfs:Filesystem sys:dir kernel:FSContext kernel:ProcessGroup kernel:Session shm:Shm mm:aioMappable mm:SpecialMappable transport:queue And the following use the template, but because they currently are not leak checked, a TODO is left instead of enabling leak check in this patch: kernel:FDTable tun:tunEndpoint Updates #1486. PiperOrigin-RevId: 328460377
2020-08-25	remove iptables sockopt special cases	Kevin Krakauer
	iptables sockopts were kludged into an unnecessary check, this properly relegates them to the {get,set}SockOptIP functions. PiperOrigin-RevId: 328395135
2020-08-25	Support SO_LINGER socket option.	Nayana Bidari
	When SO_LINGER option is enabled, the close will not return until all the queued messages are sent and acknowledged for the socket or linger timeout is reached. If the option is not set, close will return immediately. This option is mainly supported for connection oriented protocols such as TCP. PiperOrigin-RevId: 328350576
2020-08-17	Remove weak references from unix sockets.	Dean Deng
	The abstract socket namespace no longer holds any references on sockets. Instead, TryIncRef() is used when a socket is being retrieved in BoundEndpoint(). Abstract sockets are now responsible for removing themselves from the namespace they are in, when they are destroyed. Updates #1486. PiperOrigin-RevId: 327064173
2020-08-03	Plumbing context.Context to DecRef() and Release().	Nayana Bidari
	context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-23	Marshallable socket opitons.	Ayush Ranjan
	Socket option values are now required to implement marshal.Marshallable. Co-authored-by: Rahat Mahmood <rahat@google.com> PiperOrigin-RevId: 322831612
2020-06-18	socket/unix: (*connectionedEndpoint).State() has to take the endpoint lock	Andrei Vagin
	It accesses e.receiver which is protected by the endpoint lock. WARNING: DATA RACE Write at 0x00c0006aa2b8 by goroutine 189: pkg/sentry/socket/unix/transport.(connectionedEndpoint).Connect.func1() pkg/sentry/socket/unix/transport/connectioned.go:359 +0x50 pkg/sentry/socket/unix/transport.(connectionedEndpoint).BidirectionalConnect() pkg/sentry/socket/unix/transport/connectioned.go:327 +0xa3c pkg/sentry/socket/unix/transport.(connectionedEndpoint).Connect() pkg/sentry/socket/unix/transport/connectioned.go:363 +0xca pkg/sentry/socket/unix.(socketOpsCommon).Connect() pkg/sentry/socket/unix/unix.go:420 +0x13a pkg/sentry/socket/unix.(SocketOperations).Connect() <autogenerated>:1 +0x78 pkg/sentry/syscalls/linux.Connect() pkg/sentry/syscalls/linux/sys_socket.go:286 +0x251 Previous read at 0x00c0006aa2b8 by goroutine 270: pkg/sentry/socket/unix/transport.(baseEndpoint).Connected() pkg/sentry/socket/unix/transport/unix.go:789 +0x42 pkg/sentry/socket/unix/transport.(connectionedEndpoint).State() pkg/sentry/socket/unix/transport/connectioned.go:479 +0x2f pkg/sentry/socket/unix.(socketOpsCommon).State() pkg/sentry/socket/unix/unix.go:714 +0xc3e pkg/sentry/socket/unix.(socketOpsCommon).SendMsg() pkg/sentry/socket/unix/unix.go:466 +0xc44 pkg/sentry/socket/unix.(SocketOperations).SendMsg() <autogenerated>:1 +0x173 pkg/sentry/syscalls/linux.sendTo() pkg/sentry/syscalls/linux/sys_socket.go:1121 +0x4c5 pkg/sentry/syscalls/linux.SendTo() pkg/sentry/syscalls/linux/sys_socket.go:1134 +0x87 Reported-by: syzbot+c2be37eedc672ed59a86@syzkaller.appspotmail.com PiperOrigin-RevId: 317236996
2020-06-17	Implement POSIX locks	Fabricio Voznika
	- Change FileDescriptionImpl Lock/UnlockPOSIX signature to take {start,length,whence}, so the correct offset can be calculated in the implementations. - Create PosixLocker interface to make it possible to share the same locking code from different implementations. Closes #1480 PiperOrigin-RevId: 316910286
2020-06-10	socket/unix: handle sendto address argument for connected sockets	Andrei Vagin
	In case of SOCK_SEQPACKET, it has to be ignored. In case of SOCK_STREAM, EISCONN or EOPNOTSUPP has to be returned. PiperOrigin-RevId: 315755972
2020-06-09	Implement flock(2) in VFS2	Fabricio Voznika
	LockFD is the generic implementation that can be embedded in FileDescriptionImpl implementations. Unique lock ID is maintained in vfs.FileDescription and is created on demand. Updates #1480 PiperOrigin-RevId: 315604825
2020-06-02	Check that two sockets with different types can't be connected to each other	Andrei Vagin
	PiperOrigin-RevId: 314450191
2020-05-07	Allocate device numbers for VFS2 filesystems.	Jamie Liu
	Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006
2020-05-05	Update vfs2 socket TODOs.	Dean Deng
	Three updates: - Mark all vfs2 socket syscalls as supported. - Use the same dev number and ino number generator for all types of sockets, unlike in VFS1. - Do not use host fd for hostinet metadata. Fixes #1476, #1478, #1484, 1485, #2017. PiperOrigin-RevId: 309994579