gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-09-10	arm64:place an SB sequence following an ERET instruction	Bin Lu
	Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-09	Add note about kubeadm to the FAQ	Ian Lewis
	Fixes #3277 PiperOrigin-RevId: 330853338
2020-09-09	Unlock VFS.mountMu before FilesystemImpl calls for ↵	Jamie Liu
	/proc/[pid]/{mounts,mountinfo}. Also move VFS.MakeSyntheticMountpoint() (which is a utility wrapper around VFS.MkdirAllAt(), itself a utility wrapper around VFS.MkdirAt()) to not be in the middle of the implementation of these proc files. Fixes #3878 PiperOrigin-RevId: 330843106
2020-09-09	Don't write VFS2 gofer client timestamps back on dentry destruction.	Jamie Liu
	This feature is too expensive for runsc, even with setattrclunk, because fsgofer.localFile.SetAttr() ends up needing to call reopenProcFD(), incurring two string allocations for the FD pathname, an fd.FD allocation, and two calls to runtime.SetFinalizer() when the fd.FD is created and closed respectively (b/133767962) (plus the actual cost of the syscalls, which is negligible). PiperOrigin-RevId: 330843012
2020-09-09	Merge pull request #3886 from avagin:github-act-feature	gVisor bot
	PiperOrigin-RevId: 330841374
2020-09-09	github: Don't build the Go branch for feature branches	Andrei Vagin
	We can't actually push the Go branch on pushes to feature branches. Signed-off-by: Andrei Vagin <avagin@google.com>
2020-09-09	Merge pull request #3880 from avagin:github-act-feature	gVisor bot
	PiperOrigin-RevId: 330802067
2020-09-09	Don't sched_setaffinity in ptrace platform.	Jamie Liu
	PiperOrigin-RevId: 330777900
2020-09-09	github: run actions for feature branches	Andrei Vagin
	Signed-off-by: Andrei Vagin <avagin@google.com>
2020-09-09	Fix formatting for Kubernetes tutorial	Ian Lewis
	PiperOrigin-RevId: 330745430
2020-09-09	Add syntax highlighting to website	Ian Lewis
	Adds a syntax highlighting theme css so that code snippets are highlighted properly. PiperOrigin-RevId: 330733737
2020-09-08	Add a Docker Compose tutorial	Ian Lewis
	Adds a Docker Compose tutorial to the website that shows how to start a Wordpress site and includes information about how to get DNS working. Fixes #115 PiperOrigin-RevId: 330652842
2020-09-08	Implement synthetic mountpoints for kernfs.	Jamie Liu
	PiperOrigin-RevId: 330629897
2020-09-08	[vfs] overlayfs: Fix socket tests.	Ayush Ranjan
	- BindSocketThenOpen test was expecting the incorrect error when opening a socket. Fixed that. - VirtualFilesystem.BindEndpointAt should not require pop.Path.Begin.Ok() because the filesystem implementations do not need to walk to the parent dentry. This check also exists for MknodAt, MkdirAt, RmdirAt, SymlinkAt and UnlinkAt but those filesystem implementations also need to walk to the parent denty. So that check is valid. Added some syscall tests to test this. PiperOrigin-RevId: 330625220
2020-09-08	Add check for both child and childMerkle ENOENT	gVisor bot
	The check in verity walk returns error for non ENOENT cases, and all ENOENT results should be checked. This case was missing. PiperOrigin-RevId: 330604771
2020-09-08	Implement ioctl with enable verity	gVisor bot
	ioctl with FS_IOC_ENABLE_VERITY is added to verity file system to enable a file as verity file. For a file, a Merkle tree is built with its data. For a directory, a Merkle tree is built with the root hashes of its children. PiperOrigin-RevId: 330604368
2020-09-08	[vfs] overlayfs: decref VD when not using it.	Ayush Ranjan
	overlay/filesystem.go:lookupLocked() did not DecRef the VD on some error paths when it would not end up saving or using the VD. PiperOrigin-RevId: 330589742
2020-09-08	Honor readonly flag for root mount	Fabricio Voznika
	Updates #1487 PiperOrigin-RevId: 330580699
2020-09-08	Increase resolution timeout for TestCacheResolution	Sam Balana
	Fixes pkg/tcpip/stack:stack_test flake experienced while running TestCacheResolution with gotsan. This occurs when the test-runner takes longer than the resolution timeout to call linkAddrCache.get. In this test we don't care about the resolution timeout, so set it to the maximum and rely on test-runner timeouts to avoid deadlocks. PiperOrigin-RevId: 330566250
2020-09-08	Merge pull request #3856 from btw616:fix/issue-3855	gVisor bot
	PiperOrigin-RevId: 330565414
2020-09-08	Fix data race in tcp.GetSockOpt.	Bhasker Hariharan
	e.ID can't be read without holding e.mu. GetSockOpt was reading e.ID when looking up OriginalDst without holding e.mu. PiperOrigin-RevId: 330562293
2020-09-08	Improve type safety for transport protocol options	Ghanan Gowripalan
	The existing implementation for TransportProtocol.{Set}Option take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for transport protocol options that may be set or queried which transport protocol option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. RELNOTES: n/a PiperOrigin-RevId: 330559811
2020-09-08	[vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.	Ayush Ranjan
	PiperOrigin-RevId: 330554450
2020-09-08	Fix the use after nil check on args.MountNamespaceVFS2	Tiwei Bie
	The args.MountNamespaceVFS2 is used again after the nil check, instead, mntnsVFS2 which holds the expected reference should be used. This patch fixes this issue. Fixes: #3855 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-07	Fix make_apt script.	Ayush Ranjan
	This change makes the following fixes: - When creating a test repo.key, create a secret keyring as other workflows also use secret keyrings only. - We should not be using both --keyring and --secret-keyring options. Just use --secret-keyring. - Pass homedir to all gpg commands. dpkg-sig takes an arg -g which stands for gpgopts. So we need to pass the homedir there too. PiperOrigin-RevId: 330443280
2020-09-04	Simplify FD handling for container start/exec	Fabricio Voznika
	VFS1 and VFS2 host FDs have different dupping behavior, making error prone to code for both. Change the contract so that FDs are released as they are used, so the caller can simple defer a block that closes all remaining files. This also addresses handling of partial failures. With this fix, more VFS2 tests can be enabled. Updates #1487 PiperOrigin-RevId: 330112266
2020-09-03	Adjust input file offset when sendfile only completes a partial write.	Dean Deng
	Fixes #3779. PiperOrigin-RevId: 330057268
2020-09-03	Fix the release workflow.	Ayush Ranjan
	PiperOrigin-RevId: 330049242
2020-09-03	Use fine-grained mutex for stack.cleanupEndpoints.	Bhasker Hariharan
	stack.cleanupEndpoints is protected by the stack.mu but that can cause contention as the stack mutex is already acquired in a lot of hot paths during new endpoint creation /cleanup etc. Moving this to a fine grained mutex should reduce contention on the stack.mu. PiperOrigin-RevId: 330026151
2020-09-03	Use atomic.Value for Stack.tcpProbeFunc.	Jamie Liu
	b/166980357#comment56 shows: - 837 goroutines blocked in: gvisor/pkg/sync/sync.(RWMutex).Lock gvisor/pkg/tcpip/stack/stack.(Stack).StartTransportEndpointCleanup gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).cleanupLocked gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).completeWorkerLocked gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).protocolMainLoop.func1 gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).protocolMainLoop - 695 goroutines blocked in: gvisor/pkg/sync/sync.(RWMutex).Lock gvisor/pkg/tcpip/stack/stack.(Stack).CompleteTransportEndpointCleanup gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).cleanupLocked gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).completeWorkerLocked gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).protocolMainLoop.func1 gvisor/pkg/tcpip/transport/tcp/tcp.(endpoint).protocolMainLoop - 3882 goroutines blocked in: gvisor/pkg/sync/sync.(RWMutex).Lock gvisor/pkg/tcpip/stack/stack.(Stack).GetTCPProbe gvisor/pkg/tcpip/transport/tcp/tcp.newEndpoint gvisor/pkg/tcpip/transport/tcp/tcp.(protocol).NewEndpoint gvisor/pkg/tcpip/stack/stack.(Stack).NewEndpoint All of these are contending on Stack.mu. Stack.StartTransportEndpointCleanup() and Stack.CompleteTransportEndpointCleanup() insert/delete TransportEndpoints in a map (Stack.cleanupEndpoints), and the former also does endpoint unregistration while holding Stack.mu, so it's not immediately clear how feasible it is to replace the map with a mutex-less implementation or how much doing so would help. However, Stack.GetTCPProbe() just reads a function object (Stack.tcpProbeFunc) that is almost always nil (as far as I can tell, Stack.AddTCPProbe() is only called in tests), and it's called for every new TCP endpoint. So converting it to an atomic.Value should significantly reduce contention on Stack.mu, improving TCP endpoint creation latency and allowing TCP endpoint cleanup to proceed. PiperOrigin-RevId: 330004140
2020-09-03	Run gentdents_benchmark with fewer files.	Nicolas Lacasse
	This test regularly times out when "shared" filesystem is enabled. PiperOrigin-RevId: 329950622
2020-09-03	Avoid grpc_impl	Tamir Duberstein
	PiperOrigin-RevId: 329902747
2020-09-02	Update version in cni tutorial	Ian Lewis
	Update the cniVersion used in the CNI tutorial so that it works with containerd 1.2. Containerd 1.2 includes a version of the cri plugin (release/1.2) that, in turn, includes a version of the cni library (0.6.0) that only supports up to 0.3.1. https://github.com/containernetworking/cni/blob/v0.6.0/pkg/version/version.go#L38 PiperOrigin-RevId: 329837188
2020-09-02	Add support to run packetimpact tests against Fuchsia	Zeling Feng
	blaze test <test_name>_fuchsia_test will run the corresponding packetimpact test against fuchsia. PiperOrigin-RevId: 329835290
2020-09-02	Fix Accept to not return error for sockets in accept queue.	Bhasker Hariharan
	Accept on gVisor will return an error if a socket in the accept queue was closed before Accept() was called. Linux will return the new fd even if the returned socket is already closed by the peer say due to a RST being sent by the peer. This seems to be intentional in linux more details on the github issue. Fixes #3780 PiperOrigin-RevId: 329828404
2020-09-02	[vfs] Implement xattr for overlayfs.	Ayush Ranjan
	PiperOrigin-RevId: 329825497
2020-09-02	[vfs] Fix error handling in overlayfs OpenAt.	Ayush Ranjan
	Updates #1199 PiperOrigin-RevId: 329802274
2020-09-02	Update Go version constraint on sync/spin_unsafe.go.	Jamie Liu
	PiperOrigin-RevId: 329801584
2020-09-02	Improve sync.SeqCount performance.	Jamie Liu
	- Make sync.SeqCountEpoch not a struct. This allows sync.SeqCount.BeginRead() to be inlined. - Mark sync.SeqAtomicLoad<T> nosplit to mitigate the Go compiler's refusal to inline it. (Best I could get was "cost 92 exceeds budget 80".) - Use runtime-guided spinning in SeqCount.BeginRead(). Benchmarks: name old time/op new time/op delta pkg:pkg/sync/sync goos:linux goarch:amd64 SeqCountWriteUncontended-12 8.24ns ± 0% 11.40ns ± 0% +38.35% (p=0.000 n=10+10) SeqCountReadUncontended-12 0.33ns ± 0% 0.14ns ± 3% -57.77% (p=0.000 n=7+8) pkg:pkg/sync/seqatomictest/seqatomic goos:linux goarch:amd64 SeqAtomicLoadIntUncontended-12 0.64ns ± 1% 0.41ns ± 1% -36.40% (p=0.000 n=10+8) SeqAtomicTryLoadIntUncontended-12 0.18ns ± 4% 0.18ns ± 1% ~ (p=0.206 n=10+8) AtomicValueLoadIntUncontended-12 0.27ns ± 3% 0.27ns ± 0% -1.77% (p=0.000 n=10+8) (atomic.Value.Load is, of course, inlined. We would expect an uncontended inline SeqAtomicLoad<int> to perform identically to SeqAtomicTryLoad<int>.) The "regression" in BenchmarkSeqCountWriteUncontended, despite this CL changing nothing in that path, is attributed to microarchitectural subtlety; the benchmark loop is unchanged except for its address: Before this CL: :0 0x4e62d1 48ffc2 INCQ DX :0 0x4e62d4 48399110010000 CMPQ DX, 0x110(CX) :0 0x4e62db 7e26 JLE 0x4e6303 :0 0x4e62dd 90 NOPL :0 0x4e62de bb01000000 MOVL $0x1, BX :0 0x4e62e3 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e62e7 ffc3 INCL BX :0 0x4e62e9 0fbae300 BTL $0x0, BX :0 0x4e62ed 733a JAE 0x4e6329 :0 0x4e62ef 90 NOPL :0 0x4e62f0 bb01000000 MOVL $0x1, BX :0 0x4e62f5 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e62f9 ffc3 INCL BX :0 0x4e62fb 0fbae300 BTL $0x0, BX :0 0x4e62ff 73d0 JAE 0x4e62d1 After this CL: :0 0x4e6361 48ffc2 INCQ DX :0 0x4e6364 48399110010000 CMPQ DX, 0x110(CX) :0 0x4e636b 7e26 JLE 0x4e6393 :0 0x4e636d 90 NOPL :0 0x4e636e bb01000000 MOVL $0x1, BX :0 0x4e6373 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e6377 ffc3 INCL BX :0 0x4e6379 0fbae300 BTL $0x0, BX :0 0x4e637d 733a JAE 0x4e63b9 :0 0x4e637f 90 NOPL :0 0x4e6380 bb01000000 MOVL $0x1, BX :0 0x4e6385 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e6389 ffc3 INCL BX :0 0x4e638b 0fbae300 BTL $0x0, BX :0 0x4e638f 73d0 JAE 0x4e6361 PiperOrigin-RevId: 329754148
2020-09-02	Add Docs to nginx benchmark.	Zach Koopmans
	Adds docs to nginx and refactors both Httpd and Nginx benchmarks. Key changes: - Add docs and make nginx tests the same as httpd (reverse, all docs, etc.). - Make requests scale on c * b.N -> a request per thread. This works well with both --test.benchtime=10m (do a run that lasts at least 10m) and --test.benchtime=10x (do b.N = 10). -- Remove a doc from both tests (1000Kb) as 1024Kb exists. PiperOrigin-RevId: 329751091
2020-09-02	[runtime tests] Exclude flaky nodejs test	Ayush Ranjan
	PiperOrigin-RevId: 329749191
2020-09-02	Merge pull request #3822 from btw616:fix/issue-3821	gVisor bot
	PiperOrigin-RevId: 329710371
2020-09-01	Fix statfs test for opensource.	Zach Koopmans
	PiperOrigin-RevId: 329638946
2020-09-01	Implement setattr+clunk in 9P	Fabricio Voznika
	This is to cover the common pattern: open->read/write->close, where SetAttr needs to be called to update atime/mtime before the file is closed. Benchmark results: BM_OpenReadClose/10240 CPU setattr+clunk: 63783 ns VFS2: 68109 ns VFS1: 72507 ns Updates #1198 PiperOrigin-RevId: 329628461
2020-09-01	Fix handling of unacceptable ACKs during close.	Mithun Iyer
	On receiving an ACK with unacceptable ACK number, in a closing state, TCP, needs to reply back with an ACK with correct seq and ack numbers and remain in same state. This change is as per RFC793 page 37, but with a difference that it does not apply to ESTABLISHED state, just as in Linux. Also add more tests to check for OTW sequence number and unacceptable ack numbers in these states. Fixes #3785 PiperOrigin-RevId: 329616283
2020-09-01	Test opening file handles with different permissions.	Dean Deng
	These were problematic for vfs2 gofers before correctly implementing separate read/write handles. PiperOrigin-RevId: 329613261
2020-09-01	Refactor tty codebase to use master-replica terminology.	Ayush Ranjan
	Updates #2972 PiperOrigin-RevId: 329584905
2020-09-01	Fix panic when calling dup2().	Nayana Bidari
	PiperOrigin-RevId: 329572337
2020-09-01	[go-marshal] Enable auto-marshalling for fs/tty.	Ayush Ranjan
	PiperOrigin-RevId: 329564614
2020-09-01	Let flags be overriden from OCI annotations	Fabricio Voznika
	This allows runsc flags to be set per sandbox instance. For example, K8s pod annotations can be used to enable --debug for a single pod, making troubleshoot much easier. Similarly, features like --vfs2 can be enabled for experimentation without affecting other pods in the node. Closes #3494 PiperOrigin-RevId: 329542815