gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-10-27	Add basic address deletion to netlink	Ian Lewis
	Updates #3921 PiperOrigin-RevId: 339195417
2020-10-26	Implement command IPC_STAT for semctl.	Jing Chen
	PiperOrigin-RevId: 339166854
2020-10-26	Fix SCM Rights S/R reference leak.	Dean Deng
	Control messages collected when peeking into a socket were being leaked. PiperOrigin-RevId: 339114961
2020-10-24	Avoid excessive save/restore cycles in socket_ipv4_udp_unbound tests.	Jamie Liu
	PiperOrigin-RevId: 338805321
2020-10-23	Support VFS2 save/restore.	Jamie Liu
	Inode number consistency checks are now skipped in save/restore tests for reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename. They pass in VFS1 due to the bug described in new test case SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs. Fixes #1663 PiperOrigin-RevId: 338776148
2020-10-23	Fix socket_ipv4_udp_unbound_loopback_test_linux	Zach Koopmans
	Handle "Resource temporarily unavailable" EAGAIN errors with a select call before calling recvmsg. Also rename similar helper call from "RecvMsgTimeout" to "RecvTimeout", because it calls "recv". PiperOrigin-RevId: 338761695
2020-10-23	iptables testing: handle EINTR on calls to accept().	Kevin Krakauer
	This caused test flakes. PiperOrigin-RevId: 338758723
2020-10-23	[runtime tests] Exclude flaky tests.	Ayush Ranjan
	Also updated a test which only fails with VFS1. PiperOrigin-RevId: 338704940
2020-10-23	Support getsockopt for SO_ACCEPTCONN.	Nayana Bidari
	The SO_ACCEPTCONN option is used only on getsockopt(). When this option is specified, getsockopt() indicates whether socket listening is enabled for the socket. A value of zero indicates that socket listening is disabled; non-zero that it is enabled. PiperOrigin-RevId: 338703206
2020-10-23	Decrement e.synRcvdCount once handshake is complete.	Bhasker Hariharan
	Earlier the count was dropped only after calling e.deliverAccepted. This lead to an issue where there were no connections in SYN-RCVD state for the listening endpoint but e.synRcvdCount would not be zero because it was being reduced only when handleSynSegment returned after deliverAccepted returned. This issue is seen when the Nth SYN for a listen backlog of size N which would cause the listen backlog to be full gets dropped occasionally. This happens when the new SYN comes at when the previous completed endpoint has been delivered to the accept queue but the synRcvdCount hasn't yet been decremented because the goroutine running handleSynSegment has not yet completed. PiperOrigin-RevId: 338690646
2020-10-23	Rewrite reference leak checker without finalizers.	Dean Deng
	Our current reference leak checker uses finalizers to verify whether an object has reached zero references before it is garbage collected. There are multiple problems with this mechanism, so a rewrite is in order. With finalizers, there is no way to guarantee that a finalizer will run before the program exits. When an unreachable object with a finalizer is garbage collected, its finalizer will be added to a queue and run asynchronously. The best we can do is run garbage collection upon sandbox exit to make sure that all finalizers are enqueued. Furthermore, if there is a chain of finalized objects, e.g. A points to B points to C, garbage collection needs to run multiple times before all of the finalizers are enqueued. The first GC run will register the finalizer for A but not free it. It takes another GC run to free A, at which point B's finalizer can be registered. As a result, we need to run GC as many times as the length of the longest such chain to have a somewhat reliable leak checker. Finally, a cyclical chain of structs pointing to one another will never be garbage collected if a finalizer is set. This is a well-known issue with Go finalizers (https://github.com/golang/go/issues/7358). Using leak checking on filesystem objects that produce cycles will not work and even result in memory leaks. The new leak checker stores reference counted objects in a global map when leak check is enabled and removes them once they are destroyed. At sandbox exit, any remaining objects in the map are considered as leaked. This provides a deterministic way of detecting leaks without relying on the complexities of finalizers and garbage collection. This approach has several benefits over the former, including: - Always detects leaks of objects that should be destroyed very close to sandbox exit. The old checker very rarely detected these leaks, because it relied on garbage collection to be run in a short window of time. - Panics if we forgot to enable leak check on a ref-counted object (we will try to remove it from the map when it is destroyed, but it will never have been added). - Can store extra logging information in the map values without adding to the size of the ref count struct itself. With the size of just an int64, the ref count object remains compact, meaning frequent operations like IncRef/DecRef are more cache-efficient. - Can aggregate leak results in a single report after the sandbox exits. Instead of having warnings littered in the log, which were non-deterministically triggered by garbage collection, we can print all warning messages at once. Note that this could also be a limitation--the sandbox must exit properly for leaks to be detected. Some basic benchmarking indicates that this change does not significantly affect performance when leak checking is enabled, which is understandable since registering/unregistering is only done once for each filesystem object. Updates #1486. PiperOrigin-RevId: 338685972
2020-10-22	Pass NetworkInterface to LinkAddressRequest	Ghanan Gowripalan
	Previously a link endpoint was passed to stack.LinkAddressResolver.LinkAddressRequest. With this change, implementations that want a route for the link address request may find one through the stack. Other implementations that want to send a packet without a route may continue to do so using the network interface directly. Test: - arp_test.TestLinkAddressRequest - ipv6.TestLinkAddressRequest PiperOrigin-RevId: 338577474
2020-10-20	test/runtime: set the NOFILE soft rlimit to 32K	Andrei Vagin
	The python:test_subprocess enumerates all possible file descriptors and fails by timeout if the limit is too high. There is a know thing about docker that it sets this limit to 1M by default, but on native linux, this limit will be between 1K to 32K. PiperOrigin-RevId: 338197239
2020-10-20	[runtime tests] Update exclude files.	Ayush Ranjan
	bhaskerh@ fixed a bunch of the EADDRINUSE flakes in #3662 so we should unexclude them. I have also tested other flaky tests on this list and removed those that do not flake anymore. PiperOrigin-RevId: 338158545
2020-10-19	Merge pull request #4510 from btw616:fix/issue-4509	gVisor bot
	PiperOrigin-RevId: 337971497
2020-10-19	Fix runsc tests on VFS2 overlay.	Jamie Liu
	- Check the sticky bit in overlay.filesystem.UnlinkAt(). Fixes StickyTest.StickyBitPermDenied. - When configuring a VFS2 overlay in runsc, copy the lower layer's root owner/group/mode to the upper layer's root (as in the VFS1 equivalent, boot.addOverlay()). This makes the overlay root owned by UID/GID 65534 with mode 0755 rather than owned by UID/GID 0 with mode 01777. Fixes CreateTest.CreateFailsOnUnpermittedDir, which assumes that the test cannot create files in /. - MknodTest.UnimplementedTypesReturnError assumes that the creation of device special files is not supported. However, while the VFS2 gofer client still doesn't support device special files, VFS2 tmpfs does, and in the overlay test dimension mknod() targets a tmpfs upper layer. The test initially has all capabilities, including CAP_MKNOD, so its creation of these files succeeds. Constrain these tests to VFS1. - Rename overlay.nonDirectoryFD to overlay.regularFileFD and only use it for regular files, using the original FD for pipes and device special files. This is more consistent with Linux (which gets the original inode_operations, and therefore file_operations, for these file types from ovl_fill_inode() => init_special_inode()) and fixes remaining mknod and pipe tests. - Read/write 1KB at a time in PipeTest.Streaming, rather than 4 bytes. This isn't strictly necessary, but it makes the test less obnoxiously slow on ptrace. Fixes #4407 PiperOrigin-RevId: 337971042
2020-10-16	Use POSIX interval timers in flock test.	Dean Deng
	ualarm(2) is obsolete. Move IntervalTimer into a test util, where it can be used by flock tests. These tests were flaky with TSAN, probably because it slowed the tests down enough that the alarm was expiring before flock() was called. Use an interval timer so that even if we miss the first alarm (or more), flock() is still guaranteed to be interrupted. PiperOrigin-RevId: 337578751
2020-10-15	sockets: ignore io.EOF from view.ReadAt	Andrei Vagin
	Reported-by: syzbot+5466463b7604c2902875@syzkaller.appspotmail.com PiperOrigin-RevId: 337451896
2020-10-15	Syncing packetimpact tests in different directories	Zeling Feng
	By exposing an ALL_TESTS list in defs.bzl we can make sure all packetimpact users get to agree on the list of all tests. A defect in this approach is that we have to keep a list of packetimpact_testbench rules in the BUILD file. An helper validate_all_tests has been added to help keep BUILD and .bzl files in sync. PiperOrigin-RevId: 337411839
2020-10-14	Disable strace+debug when explicitly requested	Tiwei Bie
	Currently strace+debug is always enabled as the setting from the upper layer isn't passed to _syscall_test(). And it will negatively affect the performance tests. This patch fixes this issue. The "debug" argument of _syscall_test() is also made mandatory to prevent this happening again. //test/perf:getpid_benchmark_runsc_kvm ----------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------- Before: BM_Getpid 28119 ns 28157 ns 25926 After: BM_Getpid 947 ns 939 ns 777778 Fixes #4509 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-10-12	gvisor/test: Set nogotsan for native tests	Andrei Vagin
	Tests are written in C++ and there is no reason to run them with gotsan without gVisor. PiperOrigin-RevId: 336783276
2020-10-09	TCP Receive window advertisement fixes.	Bhasker Hariharan
	The fix in commit 028e045da93b7c1c26417e80e4b4e388b86a713d was incorrect as it can cause the right edge of the window to shrink when we announce a zero window due to receive buffer being full as its done before the check for seeing if the window is being shrunk because of the selected window. Further the window was calculated purely on available space but in cases where we are getting full sized segments it makes more sense to use the actual bytes being held. This CL changes to use the lower of the total available space vs the available space in the maximal window we could advertise minus the actual payload bytes being held. This change also cleans up the code so that the window selection logic is not duplicated between getSendParams() and windowCrossedACKThresholdLocked. PiperOrigin-RevId: 336404827
2020-10-09	Add gvisor webhook configuration	Kevin Krakauer
	PiperOrigin-RevId: 336393190
2020-10-09	Add parsers golang benchmarks.	Zach Koopmans
	Add parser and formatting for golang benchmarks for docker benchmarks. Change adds a library for printing and parsing Test parameters and metrics. Benchmarks use the library to print parameters in the Benchmark title (e.g. the name field in b.Run()), and to report CustomMetrics. Parser uses the library to parse printed data from benchmark output and put it into BigQuery structs. PiperOrigin-RevId: 336365628
2020-10-09	Set expect_failure flags on tests that currently fails on fuchsia	Zeling Feng
	PiperOrigin-RevId: 336350318
2020-10-09	test/syscall/iptables: don't use designated initializers	Andrei Vagin
	test/syscalls/linux/iptables.cc:130:3: error: C99 designator 'name' outside aggregate initializer 130 \| }; \| PiperOrigin-RevId: 336331738
2020-10-07	Add staticcheck and staticstyle analyzers.	Adin Scannell
	This change also adds support to go_stateify for detecting an appropriate receiver name, avoiding a large number of false positives. PiperOrigin-RevId: 335994587
2020-10-07	[runtime-tests] Exclude failing test due to expired cert.	Ayush Ranjan
	PiperOrigin-RevId: 335927821
2020-10-06	Add support for IPv6 fragmentation	Arthur Sfez
	Most of the IPv4 fragmentation code was moved in the fragmentation package and it is reused by IPv6 fragmentation. Test: - pkg/tcpip/network/ipv4:ipv4_test - pkg/tcpip/network/ipv6:ipv6_test - pkg/tcpip/network/fragmentation:fragmentation_test Fixes #4389 PiperOrigin-RevId: 335714280
2020-10-06	Implement membarrier(2) commands other than *_SYNC_CORE.	Jamie Liu
	Updates #267 PiperOrigin-RevId: 335713923
2020-10-05	Enable more VFS2 tests	Fabricio Voznika
	Updates #1487 PiperOrigin-RevId: 335516732
2020-10-05	Remove reference to deleted script	Kevin Krakauer
	PiperOrigin-RevId: 335516625
2020-10-05	Internal change.	gVisor bot
	PiperOrigin-RevId: 335429072
2020-10-03	Fix kcov enabling and disabling procedures.	Dean Deng
	- When the KCOV_ENABLE_TRACE ioctl is called with the trace kind KCOV_TRACE_PC, the kcov mode should be set to KCOV_MODE_TRACE_PC. - When the owning task of kcov exits, the memory mapping should not be cleared so it can be used by other tasks. - Add more tests (also tested on native Linux kcov). PiperOrigin-RevId: 335202585
2020-10-02	Actually disable nodejs test parallel/test-fs-write-stream-double-close.	Jamie Liu
	PiperOrigin-RevId: 335070320
2020-09-30	Ensure proctor is built as pure Go binary.	Adin Scannell
	PiperOrigin-RevId: 334716351
2020-09-30	ip6tables: redirect support	Kevin Krakauer
	Adds support for the IPv6-compatible redirect target. Redirection is a limited form of DNAT, where the destination is always the localhost. Updates #3549. PiperOrigin-RevId: 334698344
2020-09-29	Stop depending on go_binary targets.	Adin Scannell
	Closes #3374 PiperOrigin-RevId: 334505627
2020-09-29	Add /proc/[pid]/cwd	Fabricio Voznika
	PiperOrigin-RevId: 334478850
2020-09-29	iptables: refactor to make targets extendable	Kevin Krakauer
	Like matchers, targets should use a module-like register/lookup system. This replaces the brittle switch statements we had before. The only behavior change is supporing IPT_GET_REVISION_TARGET. This makes it much easier to add IPv6 redirect in the next change. Updates #3549. PiperOrigin-RevId: 334469418
2020-09-29	Migrates uses of deprecated map types to recommended types.	gVisor bot
	PiperOrigin-RevId: 334419854
2020-09-28	Fix lingering of TCP socket in the initial state.	Nayana Bidari
	When the socket is set with SO_LINGER and close()'d in the initial state, it should not linger and return immediately. PiperOrigin-RevId: 334263149
2020-09-28	Support creating protocol instances with Stack ref	Ghanan Gowripalan
	Network or transport protocols may want to reach the stack. Support this by letting the stack create the protocol instances so it can pass a reference to itself at protocol creation time. Note, protocols do not yet use the stack in this CL but later CLs will make use of the stack from protocols. PiperOrigin-RevId: 334260210
2020-09-28	Support inotify in overlayfs.	Dean Deng
	Fixes #1479, #317. PiperOrigin-RevId: 334258052
2020-09-27	Clean up kcov.	Dean Deng
	Previously, we did not check the kcov mode when performing task work. As a result, disabling kcov did not do anything. Also avoid expensive atomic RMW when consuming coverage data. We don't need the swap if the value is already zero (which is most of the time), and it is ok if there are slight inconsistencies due to a race between coverage data generation (incrementing the value) and consumption (reading a nonzero value and writing zero). PiperOrigin-RevId: 334049207
2020-09-25	Disable flaky java11 tests.	Jamie Liu
	Regarding ThreadCpuTimeArray.java: The test starts 10 threads, each of which does some computation, then blocks. When all threads are blocked, the test sleeps for 200ms, then checks that less than 100ns of CPU time in userspace elapse over the course of the sleep; AFAICT, the 100ns of slop is because a thread indicates that it's in the WAITING state before it actually blocks, and because signals can cause threads to be temporarily woken. gVisor's CPU clocks have a granularity of 10ms (the interval of Kernel.cpuClockTicker is //pkg/abi/linux.ClockTick), so a single tick pushes the test over the threshold. PiperOrigin-RevId: 333830287
2020-09-24	Fix Nginx Startup and Size Benchmarks.	Zach Koopmans
	Changes in Nginx Benchmarks in network_tests also affect Startup/Size Nginx Benchmarks. Make sure the commands line up. PiperOrigin-RevId: 333543697
2020-09-24	Change segment/pending queue to use receive buffer limits.	Bhasker Hariharan
	segment_queue today has its own standalone limit of MaxUnprocessedSegments but this can be a problem in UnlockUser() we do not release the lock till there are segments to be processed. What can happen is as handleSegments dequeues packets more keep getting queued and we will never release the lock. This can keep happening even if the receive buffer is full because nothing can read() till we release the lock. Further having a separate limit for pending segments makes it harder to track memory usage etc. Unifying the limits makes it easier to reason about memory in use and makes the overall buffer behaviour more consistent. PiperOrigin-RevId: 333508122
2020-09-24	test/syscall/mknod: Don't use a hard-coded file name	Andrei Vagin
	PiperOrigin-RevId: 333461380
2020-09-23	Clean up inotify tests.	Dean Deng
	Mostly simplifies SKIP_IF statements and adds some more documentation. Also, mknod is now supported by gofer fs, so remove SKIP_IFs related to this. PiperOrigin-RevId: 333449932