gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-09-16	fuse: Implement IterDirents for directory file description	Ridwan Sharif
	Fixes #3255. This change adds support for IterDirents. You can now use `ls` in the FUSE sandbox. Co-authored-by: Craig Chi <craigchi@google.com>
2020-09-16	Implement FUSE_RMDIR	Ridwan Sharif
	Fixes #3587 Co-authored-by: Craig Chi <craigchi@google.com>
2020-09-16	Implement FUSE_READ	Jinmou Li
	Fixes #3206
2020-09-16	Implement FUSE_MKDIR	Boyuan He
	Fixes #3392
2020-09-16	Implement FUSE_READLINK	Boyuan He
	Fixes #3316
2020-09-16	Implement FUSE_SYMLINK	Boyuan He
	Fixes #3452
2020-09-16	Implement FUSE_MKNOD	Boyuan He
	Fixes #3492
2020-09-16	Implement FUSE_RELEASE/RELEASEDIR	Boyuan He
	Fixes #3314
2020-09-16	Implement FUSE_OPEN/OPENDIR	Boyuan He
	Fixes #3174
2020-09-16	Implement FUSE_LOOKUP	Andrei Vagin
	Fixes #3231 Co-authored-by: Boyuan He <heboyuan@google.com>
2020-09-16	Extend integration test to test sequence of FUSE operation	Craig Chi
	Original FUSE integration test has limited capabilities. To test more situations, the new integration test framework introduces a protocol to communicate between testing thread and the FUSE server. In summary, this change includes: 1. Remove CompareResult() and break SetExpected() into SetServerResponse() and GetServerActualRequest(). We no longer set up an expected request because we want to retrieve the actual FUSE request made to the FUSE server and check in the testing thread. 2. Declare a serial buffer data structure to save the received requests and expected responses sequentially. The data structure contains a cursor to indicate the progress of accessing. This change makes sequential SetServerResponse() and GetServerActualRequest() possible. 3. Replace 2 single directional pipes with 1 bi-directional socketpair. A protocol which starts with FuseTestCmd is used between the testing thread and the FUSE server to provide various functionality. Fixes #3405
2020-09-16	Rename marshal.Task to marshal.CopyContext.	Rahat Mahmood
	CopyContext is a better name for the interface because from go-marshal's perspective, the interface has nothing to do with a task. A kernel.Task happens to implement the interface, but so can other things like MemoryManager and IO sequences. PiperOrigin-RevId: 331959678
2020-09-15	Enable automated marshalling for the syscall package.	Rahat Mahmood
	PiperOrigin-RevId: 331940975
2020-09-15	Add support for OCI seccomp filters in the sandbox.	Ian Lewis
	OCI configuration includes support for specifying seccomp filters. In runc, these filter configurations are converted into seccomp BPF programs and loaded into the kernel via libseccomp. runsc needs to be a static binary so, for runsc, we cannot rely on a C library and need to implement the functionality in Go. The generator added here implements basic support for taking OCI seccomp configuration and converting it into a seccomp BPF program with the same behavior as a program generated by libseccomp. - New conditional operations were added to pkg/seccomp to support operations available in OCI. - AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect that syscalls matching the conditionals result in the provided action not simply SCMP_RET_ALLOW. - BuildProgram in pkg/seccomp no longer panics if provided an empty list of rules. It now builds a program with the architecture sanity check only. - ProgramBuilder now allows adding labels that are unused. However, backwards jumps are still not permitted. Fixes #510 PiperOrigin-RevId: 331938697
2020-09-15	Implement gvisor verity fs ioctl with GETFLAGS	Chong Cai
	PiperOrigin-RevId: 331905347
2020-09-15	Support setting STATX_SIZE for kernfs.InodeAttrs.	Dean Deng
	Make setting STATX_SIZE a no-op, if it is valid for the given permissions and file type. Also update proc tests, which were overfitted before. Fixes #3842. Updates #1193. PiperOrigin-RevId: 331861087
2020-09-15	Release FDTable lock before dropping the fds.	Nayana Bidari
	This is needed for SO_LINGER, where close() is blocked for linger timeout and we are holding the FDTable lock for the entire timeout which will not allow us to create/delete other fds. We have to release the locks and then drop the fds. PiperOrigin-RevId: 331844185
2020-09-15	Read vfs2 epoll events atomically.	Jamie Liu
	Discovered by ayushranjan@: VFS2 was employing the following algorithm for fetching ready events from an epoll instance: - Create a statically sized EpollEvent slice on the stack of size 16. - Pass that to EpollInstance.ReadEvents() to populate. - EpollInstance.ReadEvents() requeues level-triggered events that it returns back into the ready queue. - Write the results to usermem. - If the number of results were = 16 then recall EpollInstance.ReadEvents() in the hopes of getting more. But this will cause duplication of the "requeued" ready level-triggered events. So if the ready queue has >= 16 ready events, the EpollWait for loop will spin until it fills the usermem with `maxEvents` events. Fixes #3521 PiperOrigin-RevId: 331840527
2020-09-15	Merge pull request #3895 from btw616:fix/issue-3894	gVisor bot
	PiperOrigin-RevId: 331824411
2020-09-15	Fix proc.(*fdDir).IterDirents for VFS2	Tiwei Bie
	Currently the returned offset is an index, and we can't use it to find the next fd to serialize, because getdents should iterate correctly despite mutation of fds. Instead, we can return the next fd to serialize plus 2 (which accounts for "." and "..") as the offset. Fixes: #3894 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-14	Add note about gofer link(2) limitation	Fabricio Voznika
	PiperOrigin-RevId: 331648296
2020-09-14	Correct FDSize in /proc/[pid]/status.	Jamie Liu
	In Linux, FDSize is fs/proc/array.c:task_state() => struct fdtable::max_fds, which is set to the underlying array's length in fs/file.c:alloc_fdtable(). Follow-up changes: - Remove FDTable.GetRefs() and FDTable.GetRefsVFS2(), which are unused. - Reset FDTable.used to 0 during restore, since the subsequent calls to FDTable.setAll() increment it again, causing its value to be doubled. (After this CL, FDTable.used is only used to avoid reallocation in FDTable.GetFDs(), so this fix is not very visible.) PiperOrigin-RevId: 331588190
2020-09-11	Move the 'marshal' and 'primitive' packages to the 'pkg' directory.	Rahat Mahmood
	PiperOrigin-RevId: 331256608
2020-09-11	Implement copy-up-coherent mmap for VFS2 overlayfs.	Jamie Liu
	This is very similar to copy-up-coherent mmap in the VFS1 overlay, with the minor wrinkle that there is no fs.InodeOperations.Mappable(). Updates #1199 PiperOrigin-RevId: 331206314
2020-09-11	Fix host unix socket to not swallow EOF incorrectly.	Bhasker Hariharan
	Fixes an error where in case of a receive buffer larger than the host send buffer size for a host backed unix dgram socket we would end up swallowing EOF from recvmsg syscall causing the read() to block forever. PiperOrigin-RevId: 331192810
2020-09-10	arm64:place an SB sequence following an ERET instruction	Bin Lu
	Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-09	Unlock VFS.mountMu before FilesystemImpl calls for ↵	Jamie Liu
	/proc/[pid]/{mounts,mountinfo}. Also move VFS.MakeSyntheticMountpoint() (which is a utility wrapper around VFS.MkdirAllAt(), itself a utility wrapper around VFS.MkdirAt()) to not be in the middle of the implementation of these proc files. Fixes #3878 PiperOrigin-RevId: 330843106
2020-09-09	Don't write VFS2 gofer client timestamps back on dentry destruction.	Jamie Liu
	This feature is too expensive for runsc, even with setattrclunk, because fsgofer.localFile.SetAttr() ends up needing to call reopenProcFD(), incurring two string allocations for the FD pathname, an fd.FD allocation, and two calls to runtime.SetFinalizer() when the fd.FD is created and closed respectively (b/133767962) (plus the actual cost of the syscalls, which is negligible). PiperOrigin-RevId: 330843012
2020-09-09	Don't sched_setaffinity in ptrace platform.	Jamie Liu
	PiperOrigin-RevId: 330777900
2020-09-08	Implement synthetic mountpoints for kernfs.	Jamie Liu
	PiperOrigin-RevId: 330629897
2020-09-08	[vfs] overlayfs: Fix socket tests.	Ayush Ranjan
	- BindSocketThenOpen test was expecting the incorrect error when opening a socket. Fixed that. - VirtualFilesystem.BindEndpointAt should not require pop.Path.Begin.Ok() because the filesystem implementations do not need to walk to the parent dentry. This check also exists for MknodAt, MkdirAt, RmdirAt, SymlinkAt and UnlinkAt but those filesystem implementations also need to walk to the parent denty. So that check is valid. Added some syscall tests to test this. PiperOrigin-RevId: 330625220
2020-09-08	Add check for both child and childMerkle ENOENT	gVisor bot
	The check in verity walk returns error for non ENOENT cases, and all ENOENT results should be checked. This case was missing. PiperOrigin-RevId: 330604771
2020-09-08	Implement ioctl with enable verity	gVisor bot
	ioctl with FS_IOC_ENABLE_VERITY is added to verity file system to enable a file as verity file. For a file, a Merkle tree is built with its data. For a directory, a Merkle tree is built with the root hashes of its children. PiperOrigin-RevId: 330604368
2020-09-08	[vfs] overlayfs: decref VD when not using it.	Ayush Ranjan
	overlay/filesystem.go:lookupLocked() did not DecRef the VD on some error paths when it would not end up saving or using the VD. PiperOrigin-RevId: 330589742
2020-09-08	Honor readonly flag for root mount	Fabricio Voznika
	Updates #1487 PiperOrigin-RevId: 330580699
2020-09-08	Merge pull request #3856 from btw616:fix/issue-3855	gVisor bot
	PiperOrigin-RevId: 330565414
2020-09-08	Improve type safety for transport protocol options	Ghanan Gowripalan
	The existing implementation for TransportProtocol.{Set}Option take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for transport protocol options that may be set or queried which transport protocol option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. RELNOTES: n/a PiperOrigin-RevId: 330559811
2020-09-08	[vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.	Ayush Ranjan
	PiperOrigin-RevId: 330554450
2020-09-08	Fix the use after nil check on args.MountNamespaceVFS2	Tiwei Bie
	The args.MountNamespaceVFS2 is used again after the nil check, instead, mntnsVFS2 which holds the expected reference should be used. This patch fixes this issue. Fixes: #3855 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-04	Simplify FD handling for container start/exec	Fabricio Voznika
	VFS1 and VFS2 host FDs have different dupping behavior, making error prone to code for both. Change the contract so that FDs are released as they are used, so the caller can simple defer a block that closes all remaining files. This also addresses handling of partial failures. With this fix, more VFS2 tests can be enabled. Updates #1487 PiperOrigin-RevId: 330112266
2020-09-03	Adjust input file offset when sendfile only completes a partial write.	Dean Deng
	Fixes #3779. PiperOrigin-RevId: 330057268
2020-09-02	Fix Accept to not return error for sockets in accept queue.	Bhasker Hariharan
	Accept on gVisor will return an error if a socket in the accept queue was closed before Accept() was called. Linux will return the new fd even if the returned socket is already closed by the peer say due to a RST being sent by the peer. This seems to be intentional in linux more details on the github issue. Fixes #3780 PiperOrigin-RevId: 329828404
2020-09-02	[vfs] Implement xattr for overlayfs.	Ayush Ranjan
	PiperOrigin-RevId: 329825497
2020-09-02	[vfs] Fix error handling in overlayfs OpenAt.	Ayush Ranjan
	Updates #1199 PiperOrigin-RevId: 329802274
2020-09-01	Implement setattr+clunk in 9P	Fabricio Voznika
	This is to cover the common pattern: open->read/write->close, where SetAttr needs to be called to update atime/mtime before the file is closed. Benchmark results: BM_OpenReadClose/10240 CPU setattr+clunk: 63783 ns VFS2: 68109 ns VFS1: 72507 ns Updates #1198 PiperOrigin-RevId: 329628461
2020-09-01	Refactor tty codebase to use master-replica terminology.	Ayush Ranjan
	Updates #2972 PiperOrigin-RevId: 329584905
2020-09-01	Fix panic when calling dup2().	Nayana Bidari
	PiperOrigin-RevId: 329572337
2020-09-01	[go-marshal] Enable auto-marshalling for fs/tty.	Ayush Ranjan
	PiperOrigin-RevId: 329564614
2020-09-01	Automated rollback of changelist 328350576	Nayana Bidari
	PiperOrigin-RevId: 329526153
2020-08-31	Don't use read-only host FD for writable gofer dentries in VFS2.	Jamie Liu
	As documented for gofer.dentry.hostFD. PiperOrigin-RevId: 329372319