gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2019-07-30	ext: Migrate from using fileReader custom interface to using io.Reader.	Ayush Ranjan
	It gets rid of holding state of the io.Reader offset (which is anyways held by the vfs.FileDescriptor struct. It is also odd using a io.Reader becuase we using io.ReaderAt to interact with the device. So making a io.ReaderAt wrapper makes more sense. Most importantly, it gets rid of the complexity of extracting the file reader from a regular file implementation and then using it. Now we can just use the regular file implementation as a reader which is more intuitive. PiperOrigin-RevId: 260846927
2019-07-30	ext: block map file reader implementation.	Ayush Ranjan
	Also adds stress tests for block map reader and intensifies extent reader tests. PiperOrigin-RevId: 260838177
2019-07-30	Remove unused const variables	Ian Lewis
	PiperOrigin-RevId: 260824989
2019-07-30	Pass ProtocolAddress instead of its fields	Tamir Duberstein
	PiperOrigin-RevId: 260803517
2019-07-30	Merge pull request #607 from DarcySail:master	gVisor bot
	PiperOrigin-RevId: 260783254
2019-07-30	Add feature to launch Sentry from an open host FD.	Zach Koopmans
	Adds feature to launch from an open host FD instead of a binary_path. The FD should point to a valid executable and most likely be statically compiled. If the executable is not statically compiled, the loader will search along the interpreter paths, which must be able to be resolved in the Sandbox's file system or start will fail. PiperOrigin-RevId: 260756825
2019-07-30	Change syscall.POLL to syscall.PPOLL.	Haibo Xu
	syscall.POLL is not supported on arm64, using syscall.PPOLL to support both the x86 and arm64. refs #63 Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I2c81a063d3ec4e7e6b38fe62f17a0924977f505e COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/543 from xiaobo55x:master ba598263fd3748d1addd48e4194080aa12085164 PiperOrigin-RevId: 260752049
2019-07-29	Migrate from using io.ReadSeeker to io.ReaderAt.	Ayush Ranjan
	This provides the following benefits: - We can now use pkg/fd package which does not take ownership of the file descriptor. So it does not close the fd when garbage collected. This reduces scope of errors from unexpected garbage collection of io.File. - It enforces the offset parameter in every read call. It does not affect the fd offset nor is it affected by it. Hence reducing scope of error of using stale offsets when reading. - We do not need to serialize the usage of any global file descriptor anymore. So this drops the mutual exclusion req hence reducing complexity and congestion. PiperOrigin-RevId: 260635174
2019-07-30	Combine multiple epoll events copies	Hang Su
	Allocate a larger memory buffer and combine multiple copies into one copy, to reduce the number of copies from kernel memory to user memory. Signed-off-by: Hang Su <darcy.sh@antfin.com>
2019-07-29	ext: extent reader implementation.	Ayush Ranjan
	PiperOrigin-RevId: 260629559
2019-07-29	ext: inode implementations.	Ayush Ranjan
	PiperOrigin-RevId: 260624470
2019-07-29	Use x/sys/unix for sentry/host interaction; abi is for guest/sentry.	Christopher Koch
	PiperOrigin-RevId: 260613864
2019-07-29	Rate limit the unimplemented syscall event handler.	Nicolas Lacasse
	This introduces two new types of Emitters: 1. MultiEmitter, which will forward events to other registered Emitters, and 2. RateLimitedEmitter, which will forward events to a wrapped Emitter, subject to given rate limits. The methods in the eventchannel package itself act like a multiEmitter, but is not actually an Emitter. Now we have a DefaultEmitter, and the methods in eventchannel simply forward calls to the DefaultEmitter. The unimplemented syscall handler now uses a RateLimetedEmitter that wraps the DefaultEmitter. PiperOrigin-RevId: 260612770
2019-07-29	Fix flaky stat.cc test.	Zach Koopmans
	This test flaked on my current CL. Linux makes no guarantee that two inodes will consecutive (overflows happen). https://github.com/avagin/linux-task-diag/blob/master/fs/inode.c#L880 PiperOrigin-RevId: 260608240
2019-07-29	Move runtimes tests to appropriate directory.	Samantha Sample
	PiperOrigin-RevId: 260577765
2019-07-29	Add iptables types for syscalls tests.	Kevin Krakauer
	Unfortunately, Linux's ip_tables.h header doesn't compile in C++ because it implicitly converts from void* to struct xt_entry_target*. C allows this, but C++ does not. So we have to re-implement many types ourselves. Relevant code here: https://github.com/torvalds/linux/blob/master/include/uapi/linux/netfilter_ipv4/ip_tables.h#L222 PiperOrigin-RevId: 260565570
2019-07-26	runsc: propagate the alsologtostderr to sub-commands	Andrei Vagin
	PiperOrigin-RevId: 260239119
2019-07-26	Add debug symbols to published runsc binary	Fabricio Voznika
	This allows published binary to be debugged if needed. PiperOrigin-RevId: 260228367
2019-07-26	Merge pull request #452 from zhangningdlut:chris_test_pidns	gVisor bot
	PiperOrigin-RevId: 260220279
2019-07-26	Publish Dockerfiles and test-runner binaries for running language tests.	Samantha Sample
	By following the directions in the README file, these Dockerfiles can be built and used to run native language tests for their respective runtimes. PiperOrigin-RevId: 260174430
2019-07-25	Automated rollback of changelist 255679453	Fabricio Voznika
	PiperOrigin-RevId: 260047477
2019-07-24	ext: filesystem boilerplate code.	Ayush Ranjan
	PiperOrigin-RevId: 259865366
2019-07-24	ext: Add tests for root directory inode.	Ayush Ranjan
	PiperOrigin-RevId: 259856442
2019-07-24	ext: testing environment setup with VFS2 support.	Ayush Ranjan
	PiperOrigin-RevId: 259835948
2019-07-24	Add support for a subnet prefix length on interface network addresses	Chris Kuiper
	This allows the user code to add a network address with a subnet prefix length. The prefix length value is stored in the network endpoint and provided back to the user in the ProtocolAddress type. PiperOrigin-RevId: 259807693
2019-07-24	Use different pidns among different containers	chris.zn
	The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-07-23	ext: Inode creation logic.	Ayush Ranjan
	PiperOrigin-RevId: 259666476
2019-07-23	ext: Add ext2 and ext3 tiny images.	Ayush Ranjan
	PiperOrigin-RevId: 259657917
2019-07-23	ext: Added extent tree building logic.	Ayush Ranjan
	PiperOrigin-RevId: 259628657
2019-07-23	Give each container a distinct MountNamespace.	Nicolas Lacasse
	This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346
2019-07-23	Make runAllTests() consistent with listTests().	Brett Landau
	This change has the listTests() function return a string slice of all the tests. Originally, I planned not to modify the listTests() function and instead capture the output of it and then iterate through the captured output. I decided against this approach as most of the test binaries already produce a slice as they collect tests through filepath.Walk(). Now I use this slice and return it so that I can iterate through in runAllTests() and also when printing out the tests. PiperOrigin-RevId: 259599782
2019-07-23	Deduplicate EndpointState.connected some	Tamir Duberstein
	This fixes a bug introduced in cl/251934850 that caused connect-accept-close-connect races to result in the second connect call failiing when it should have succeeded. PiperOrigin-RevId: 259584525
2019-07-22	Fix up and add some iptables ABI.	Kevin Krakauer
	PiperOrigin-RevId: 259437060
2019-07-22	Merge pull request #571 from lubinszARM:pr_loader	gVisor bot
	PiperOrigin-RevId: 259427074
2019-07-22	kvm: fix race between machine.Put and machine.Get	Andrei Vagin
	m.available.Signal() has to be called under m.mu.RLock, otherwise it can race with machine.Get: m.Get \| m.Put ------------------------------------- m.mu.Lock() \| Seatching available vcpu\| \| m.available.Signal() m.available.Wait \| PiperOrigin-RevId: 259394051
2019-07-22	Prototype integration of runtime language tests for Node.js into gVisor.	Samantha Sample
	This is the first version of a testing program to be used by gVisor for including language testing into their presubmits. It works when ran in the same manor the image and integration tests are ran in as described in their README file. PiperOrigin-RevId: 259392416
2019-07-22	Fix struct statx field alignment.	Jamie Liu
	PiperOrigin-RevId: 259376740
2019-07-21	Add ARM64 support to pkg/sentry/loader	Bin Lu
	Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-07-19	Create the initial binary for each of the 5 runtime's test-runner.	Brett Landau
	Repeated code is planned to be factored out to improve clarity and readability. PiperOrigin-RevId: 259059978
2019-07-19	Merge pull request #450 from Pixep:feature/add-clock-boottime-as-monotonic	gVisor bot
	PiperOrigin-RevId: 258996346
2019-07-19	Handle interfaceAddr and NIC options separately for IP_MULTICAST_IF	Chris Kuiper
	This tweaks the handling code for IP_MULTICAST_IF to ignore the InterfaceAddr if a NICID is given. PiperOrigin-RevId: 258982541
2019-07-18	net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)	Andrei Vagin
	PiperOrigin-RevId: 258859507
2019-07-18	Sentry virtual filesystem, v2	Jamie Liu
	Major differences from the current ("v1") sentry VFS: - Path resolution is Filesystem-driven (FilesystemImpl methods call vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a Dirent tree and calls fs.InodeOperations methods to populate it). This drastically improves performance, primarily by reducing overhead from inefficient synchronization and indirection. It also makes it possible to implement remote filesystem protocols that translate FS system calls into single RPCs, rather than having to make (at least) one RPC per path component, significantly reducing the latency of remote filesystems (especially during cold starts and for uncacheable shared filesystems). - Mounts are correctly represented as a separate check based on contextual state (current mount) rather than direct replacement in a fs.Dirent tree. This makes it possible to support (non-recursive) bind mounts and mount namespaces. Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem that exists primarily to demonstrate intended filesystem implementation patterns and for benchmarking: BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op From this we can infer that, on this machine: - Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1. - Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a difference of about 6x. - The cost of crossing a mount boundary is about 80ns in VFS2 (MemfsMountStat/1 does approximately the same amount of work as MemfsStat/2, except that it also crosses a mount boundary). This is an inescapable cost of the separate mount lookup needed to support bind mounts and mount namespaces. PiperOrigin-RevId: 258853946
2019-07-17	sys_time: Wrap comments to 80 columns	Adrien Leravat

2019-07-17	Take copyMu in Revalidate	Michael Pratt
	copyMu is required to read child.overlay.upper. PiperOrigin-RevId: 258662209
2019-07-17	Separate O_DSYNC and O_SYNC.	Jamie Liu
	PiperOrigin-RevId: 258657913
2019-07-17	ext: disklayout: extents support.	Ayush Ranjan
	PiperOrigin-RevId: 258657776
2019-07-17	Merge pull request #504 from matthyx:master	gVisor bot
	PiperOrigin-RevId: 258654826
2019-07-17	ext: Filesystem init implementation.	Ayush Ranjan
	PiperOrigin-RevId: 258645957
2019-07-17	Merge pull request #355 from zhuangel:master	gVisor bot
	PiperOrigin-RevId: 258643966