gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-09-16	fuse: use safe go_marshal API for FUSE	Ridwan Sharif
	Until #3698 is resolved, this change is needed to ensure we're not corrupting memory anywhere.
2020-09-16	fuse: Implement IterDirents for directory file description	Ridwan Sharif
	Fixes #3255. This change adds support for IterDirents. You can now use `ls` in the FUSE sandbox. Co-authored-by: Craig Chi <craigchi@google.com>
2020-09-16	Implement FUSE_RMDIR	Ridwan Sharif
	Fixes #3587 Co-authored-by: Craig Chi <craigchi@google.com>
2020-09-16	Implement FUSE_READ	Jinmou Li
	Fixes #3206
2020-09-16	Implement FUSE_MKDIR	Boyuan He
	Fixes #3392
2020-09-16	Implement FUSE_READLINK	Boyuan He
	Fixes #3316
2020-09-16	Implement FUSE_SYMLINK	Boyuan He
	Fixes #3452
2020-09-16	Implement FUSE_MKNOD	Boyuan He
	Fixes #3492
2020-09-16	Implement FUSE_RELEASE/RELEASEDIR	Boyuan He
	Fixes #3314
2020-09-16	Implement FUSE_OPEN/OPENDIR	Boyuan He
	Fixes #3174
2020-09-16	Implement FUSE_LOOKUP	Andrei Vagin
	Fixes #3231 Co-authored-by: Boyuan He <heboyuan@google.com>
2020-09-16	Add function to create a fake inode in FUSE integration test	Craig Chi
	Adds a function for the testing thread to set up a fake inode with a specific path under mount point. After this function is called, each subsequent FUSE_LOOKUP request with the same path will be served with the fixed stub response. Fixes #3539
2020-09-16	Add function generating array of iovec with different FUSE structs	Craig Chi
	This commit adds a function in the newly created fuse_util library, which accepts a variable number of arguments and data structures. Fixes #3609
2020-09-16	Add functions in FUSE integration test to get metrics from FUSE server	Craig Chi
	This commit adds 3 utility functions to ensure all received requests and preset responses are consumed. 1. Get number of unconsumed requests (received by the FUSE server but not consumed by the testing thread). 2. Get number of unsent responses (set by the testing thread but not processed by the FUSE server). 3. Get total bytes of the received requests (to ensure some operations don't trigger FUSE requests). Fixes #3607
2020-09-16	Extend integration test to test sequence of FUSE operation	Craig Chi
	Original FUSE integration test has limited capabilities. To test more situations, the new integration test framework introduces a protocol to communicate between testing thread and the FUSE server. In summary, this change includes: 1. Remove CompareResult() and break SetExpected() into SetServerResponse() and GetServerActualRequest(). We no longer set up an expected request because we want to retrieve the actual FUSE request made to the FUSE server and check in the testing thread. 2. Declare a serial buffer data structure to save the received requests and expected responses sequentially. The data structure contains a cursor to indicate the progress of accessing. This change makes sequential SetServerResponse() and GetServerActualRequest() possible. 3. Replace 2 single directional pipes with 1 bi-directional socketpair. A protocol which starts with FuseTestCmd is used between the testing thread and the FUSE server to provide various functionality. Fixes #3405
2020-09-16	Refactor removed default test dimension	Fabricio Voznika
	ptrace was always selected as a dimension before, but not anymore. Some tests were specifying "overlay" expecting that to be in addition to the default. PiperOrigin-RevId: 332004111
2020-09-16	Rename marshal.Task to marshal.CopyContext.	Rahat Mahmood
	CopyContext is a better name for the interface because from go-marshal's perspective, the interface has nothing to do with a task. A kernel.Task happens to implement the interface, but so can other things like MemoryManager and IO sequences. PiperOrigin-RevId: 331959678
2020-09-15	Enable automated marshalling for the syscall package.	Rahat Mahmood
	PiperOrigin-RevId: 331940975
2020-09-15	Add support for OCI seccomp filters in the sandbox.	Ian Lewis
	OCI configuration includes support for specifying seccomp filters. In runc, these filter configurations are converted into seccomp BPF programs and loaded into the kernel via libseccomp. runsc needs to be a static binary so, for runsc, we cannot rely on a C library and need to implement the functionality in Go. The generator added here implements basic support for taking OCI seccomp configuration and converting it into a seccomp BPF program with the same behavior as a program generated by libseccomp. - New conditional operations were added to pkg/seccomp to support operations available in OCI. - AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect that syscalls matching the conditionals result in the provided action not simply SCMP_RET_ALLOW. - BuildProgram in pkg/seccomp no longer panics if provided an empty list of rules. It now builds a program with the architecture sanity check only. - ProgramBuilder now allows adding labels that are unused. However, backwards jumps are still not permitted. Fixes #510 PiperOrigin-RevId: 331938697
2020-09-15	Fix GitHub issue template.	Ian Lewis
	runsc -v doesn't work. It should be runsc -version PiperOrigin-RevId: 331911035
2020-09-15	Implement gvisor verity fs ioctl with GETFLAGS	Chong Cai
	PiperOrigin-RevId: 331905347
2020-09-15	Improve syserror_test.	Jamie Liu
	- It's very difficult to prevent returnErrnoAsError and returnError from being optimized out. Instead, replace BenchmarkReturn* with BenchmarkAssign, which store to globalError. - Compare to a non-nil globalError in BenchmarkCompare and BenchmarkSwitch*. New results: BenchmarkAssignErrno BenchmarkAssignErrno-12 1000000000 0.615 ns/op BenchmarkAssignError BenchmarkAssignError-12 1000000000 0.626 ns/op BenchmarkCompareErrno BenchmarkCompareErrno-12 1000000000 0.522 ns/op BenchmarkCompareError BenchmarkCompareError-12 1000000000 3.54 ns/op BenchmarkSwitchErrno BenchmarkSwitchErrno-12 1000000000 1.45 ns/op BenchmarkSwitchError BenchmarkSwitchError-12 536315757 10.9 ns/op PiperOrigin-RevId: 331875387
2020-09-15	Invert dependency between the context and amutex packages.	Jamie Liu
	This is to allow the syserror package to depend on the context package in a future change. PiperOrigin-RevId: 331866252
2020-09-15	Support setting STATX_SIZE for kernfs.InodeAttrs.	Dean Deng
	Make setting STATX_SIZE a no-op, if it is valid for the given permissions and file type. Also update proc tests, which were overfitted before. Fixes #3842. Updates #1193. PiperOrigin-RevId: 331861087
2020-09-15	Move reusable IPv4 test code into a testutil module and refactor it	Arthur Sfez
	The refactor aims to simplify the package, by replacing the Go channel with a PacketBuffer slice. This code will be reused by tests for IPv6 fragmentation. PiperOrigin-RevId: 331860411
2020-09-15	Release FDTable lock before dropping the fds.	Nayana Bidari
	This is needed for SO_LINGER, where close() is blocked for linger timeout and we are holding the FDTable lock for the entire timeout which will not allow us to create/delete other fds. We have to release the locks and then drop the fds. PiperOrigin-RevId: 331844185
2020-09-15	Read vfs2 epoll events atomically.	Jamie Liu
	Discovered by ayushranjan@: VFS2 was employing the following algorithm for fetching ready events from an epoll instance: - Create a statically sized EpollEvent slice on the stack of size 16. - Pass that to EpollInstance.ReadEvents() to populate. - EpollInstance.ReadEvents() requeues level-triggered events that it returns back into the ready queue. - Write the results to usermem. - If the number of results were = 16 then recall EpollInstance.ReadEvents() in the hopes of getting more. But this will cause duplication of the "requeued" ready level-triggered events. So if the ready queue has >= 16 ready events, the EpollWait for loop will spin until it fills the usermem with `maxEvents` events. Fixes #3521 PiperOrigin-RevId: 331840527
2020-09-15	RFC: design for a 9P replacement	Jamie Liu
	Tentatively `lisafs` (LInux SAndbox FileSystem). PiperOrigin-RevId: 331839246
2020-09-15	Merge pull request #3895 from btw616:fix/issue-3894	gVisor bot
	PiperOrigin-RevId: 331824411
2020-09-15	Don't conclude broadcast from route destination	Ghanan Gowripalan
	The routing table (in its current) form should not be used to make decisions about whether a remote address is a broadcast address or not (for IPv4). Note, a destination subnet does not always map to a network. E.g. RouterA may have a route to 192.168.0.0/22 through RouterB, but RouterB may be configured with 4x /24 subnets on 4 different interfaces. See https://github.com/google/gvisor/issues/3938. PiperOrigin-RevId: 331819868
2020-09-15	Fix proc.(*fdDir).IterDirents for VFS2	Tiwei Bie
	Currently the returned offset is an index, and we can't use it to find the next fd to serialize, because getdents should iterate correctly despite mutation of fds. Instead, we can return the next fd to serialize plus 2 (which accounts for "." and "..") as the offset. Fixes: #3894 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-14	Add note about gofer link(2) limitation	Fabricio Voznika
	PiperOrigin-RevId: 331648296
2020-09-14	Store multicast memberships in a set	Tamir Duberstein
	This is simpler and more performant. PiperOrigin-RevId: 331639978
2020-09-14	Test RST handling in TIME_WAIT.	Mithun Iyer
	gVisor stack ignores RSTs when in TIME_WAIT which is not the default Linux behavior. Add a packetimpact test to test the same. Also update code comments to reflect the rationale for the current gVisor behavior. PiperOrigin-RevId: 331629879
2020-09-14	Correct FDSize in /proc/[pid]/status.	Jamie Liu
	In Linux, FDSize is fs/proc/array.c:task_state() => struct fdtable::max_fds, which is set to the underlying array's length in fs/file.c:alloc_fdtable(). Follow-up changes: - Remove FDTable.GetRefs() and FDTable.GetRefsVFS2(), which are unused. - Reset FDTable.used to 0 during restore, since the subsequent calls to FDTable.setAll() increment it again, causing its value to be doubled. (After this CL, FDTable.used is only used to avoid reallocation in FDTable.GetFDs(), so this fix is not very visible.) PiperOrigin-RevId: 331588190
2020-09-14	Fix modprobe dependency	Kevin Krakauer
	The modprobe command only takes 1 module per invocation. The second module name is being passed as a module parameter. PiperOrigin-RevId: 331585765
2020-09-12	Cap reassembled IPv6 packets at 65535 octets	Toshi Kikuchi
	IPv4 can accept 65536-octet reassembled packets. Test: - ipv4_test.TestInvalidFragments - ipv4_test.TestReceiveFragments - ipv6.TestInvalidIPv6Fragments - ipv6.TestReceiveIPv6Fragments Fixes #3770 PiperOrigin-RevId: 331382977
2020-09-11	Move the 'marshal' and 'primitive' packages to the 'pkg' directory.	Rahat Mahmood
	PiperOrigin-RevId: 331256608
2020-09-11	Check that we have access to the trusted.* xattr namespace directly.	Nicolas Lacasse
	These operations require CAP_SYS_ADMIN in the root user namespace. There's no easy way to check that other than trying the operation and seeing what happens. PiperOrigin-RevId: 331242256
2020-09-11	Use correct test device name in Fuchsia packetimpact	Amanda Tait
	Packetimpact on Fuchsia was formerly using the Linux test device name. This change fixes that. PiperOrigin-RevId: 331211518
2020-09-11	Make nogo more robust to variety of stdlib layouts.	Michael Pratt
	PiperOrigin-RevId: 331206424
2020-09-11	Implement copy-up-coherent mmap for VFS2 overlayfs.	Jamie Liu
	This is very similar to copy-up-coherent mmap in the VFS1 overlay, with the minor wrinkle that there is no fs.InodeOperations.Mappable(). Updates #1199 PiperOrigin-RevId: 331206314
2020-09-11	Fix host unix socket to not swallow EOF incorrectly.	Bhasker Hariharan
	Fixes an error where in case of a receive buffer larger than the host send buffer size for a host backed unix dgram socket we would end up swallowing EOF from recvmsg syscall causing the read() to block forever. PiperOrigin-RevId: 331192810
2020-09-11	Clean up image construction	Tamir Duberstein
	- Skip `docker inspect`; `docker pull` is idempotent - Remove unnecessary CMD directives in Dockerfiles - Run bazel before building images to catch errors sooner PiperOrigin-RevId: 331107815
2020-09-10	[vfs] Disable inode number equality check for overlayfs.	Ayush Ranjan
	Overlayfs does not persist a directory's inode number even while it is mounted. See fs/overlayfs/inode.c:ovl_map_dev_ino(). VFS2 generates a new inode number for directories everytime in lookup. PiperOrigin-RevId: 331045037
2020-09-10	[vfs] Add vfs2 runtime tests.	Ayush Ranjan
	PiperOrigin-RevId: 330981912
2020-09-10	Merge pull request #3892 from lubinszARM:pr_n1_02	gVisor bot
	PiperOrigin-RevId: 330973856
2020-09-10	[vfs] Disable nlink tests for overlayfs.	Ayush Ranjan
	Overlayfs intentionally does not compute nlink for directories (because it can be really expensive). Linux returns 1, VFS2 returns 2 and VFS1 actually calculates the correct value. PiperOrigin-RevId: 330967139
2020-09-10	Fix typo, remove duplicate word.	gVisor bot
	PiperOrigin-RevId: 330898705
2020-09-10	arm64:place an SB sequence following an ERET instruction	Bin Lu
	Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. Signed-off-by: Bin Lu <bin.lu@arm.com>