summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2020-09-09Use fine-grained mutex for stack.cleanupEndpoints.Bhasker Hariharan
stack.cleanupEndpoints is protected by the stack.mu but that can cause contention as the stack mutex is already acquired in a lot of hot paths during new endpoint creation /cleanup etc. Moving this to a fine grained mutex should reduce contention on the stack.mu. PiperOrigin-RevId: 330026151
2020-09-09Use atomic.Value for Stack.tcpProbeFunc.Jamie Liu
b/166980357#comment56 shows: - 837 goroutines blocked in: gvisor/pkg/sync/sync.(*RWMutex).Lock gvisor/pkg/tcpip/stack/stack.(*Stack).StartTransportEndpointCleanup gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).cleanupLocked gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).completeWorkerLocked gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop.func1 gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop - 695 goroutines blocked in: gvisor/pkg/sync/sync.(*RWMutex).Lock gvisor/pkg/tcpip/stack/stack.(*Stack).CompleteTransportEndpointCleanup gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).cleanupLocked gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).completeWorkerLocked gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop.func1 gvisor/pkg/tcpip/transport/tcp/tcp.(*endpoint).protocolMainLoop - 3882 goroutines blocked in: gvisor/pkg/sync/sync.(*RWMutex).Lock gvisor/pkg/tcpip/stack/stack.(*Stack).GetTCPProbe gvisor/pkg/tcpip/transport/tcp/tcp.newEndpoint gvisor/pkg/tcpip/transport/tcp/tcp.(*protocol).NewEndpoint gvisor/pkg/tcpip/stack/stack.(*Stack).NewEndpoint All of these are contending on Stack.mu. Stack.StartTransportEndpointCleanup() and Stack.CompleteTransportEndpointCleanup() insert/delete TransportEndpoints in a map (Stack.cleanupEndpoints), and the former also does endpoint unregistration while holding Stack.mu, so it's not immediately clear how feasible it is to replace the map with a mutex-less implementation or how much doing so would help. However, Stack.GetTCPProbe() just reads a function object (Stack.tcpProbeFunc) that is almost always nil (as far as I can tell, Stack.AddTCPProbe() is only called in tests), and it's called for every new TCP endpoint. So converting it to an atomic.Value should significantly reduce contention on Stack.mu, improving TCP endpoint creation latency and allowing TCP endpoint cleanup to proceed. PiperOrigin-RevId: 330004140
2020-09-09Run gentdents_benchmark with fewer files.Nicolas Lacasse
This test regularly times out when "shared" filesystem is enabled. PiperOrigin-RevId: 329950622
2020-09-09Avoid grpc_implTamir Duberstein
PiperOrigin-RevId: 329902747
2020-09-09Update version in cni tutorialIan Lewis
Update the cniVersion used in the CNI tutorial so that it works with containerd 1.2. Containerd 1.2 includes a version of the cri plugin (release/1.2) that, in turn, includes a version of the cni library (0.6.0) that only supports up to 0.3.1. https://github.com/containernetworking/cni/blob/v0.6.0/pkg/version/version.go#L38 PiperOrigin-RevId: 329837188
2020-09-09Add support to run packetimpact tests against FuchsiaZeling Feng
blaze test <test_name>_fuchsia_test will run the corresponding packetimpact test against fuchsia. PiperOrigin-RevId: 329835290
2020-09-09Fix Accept to not return error for sockets in accept queue.Bhasker Hariharan
Accept on gVisor will return an error if a socket in the accept queue was closed before Accept() was called. Linux will return the new fd even if the returned socket is already closed by the peer say due to a RST being sent by the peer. This seems to be intentional in linux more details on the github issue. Fixes #3780 PiperOrigin-RevId: 329828404
2020-09-09[vfs] Implement xattr for overlayfs.Ayush Ranjan
PiperOrigin-RevId: 329825497
2020-09-09[vfs] Fix error handling in overlayfs OpenAt.Ayush Ranjan
Updates #1199 PiperOrigin-RevId: 329802274
2020-09-09Update Go version constraint on sync/spin_unsafe.go.Jamie Liu
PiperOrigin-RevId: 329801584
2020-09-09Improve sync.SeqCount performance.Jamie Liu
- Make sync.SeqCountEpoch not a struct. This allows sync.SeqCount.BeginRead() to be inlined. - Mark sync.SeqAtomicLoad<T> nosplit to mitigate the Go compiler's refusal to inline it. (Best I could get was "cost 92 exceeds budget 80".) - Use runtime-guided spinning in SeqCount.BeginRead(). Benchmarks: name old time/op new time/op delta pkg:pkg/sync/sync goos:linux goarch:amd64 SeqCountWriteUncontended-12 8.24ns ± 0% 11.40ns ± 0% +38.35% (p=0.000 n=10+10) SeqCountReadUncontended-12 0.33ns ± 0% 0.14ns ± 3% -57.77% (p=0.000 n=7+8) pkg:pkg/sync/seqatomictest/seqatomic goos:linux goarch:amd64 SeqAtomicLoadIntUncontended-12 0.64ns ± 1% 0.41ns ± 1% -36.40% (p=0.000 n=10+8) SeqAtomicTryLoadIntUncontended-12 0.18ns ± 4% 0.18ns ± 1% ~ (p=0.206 n=10+8) AtomicValueLoadIntUncontended-12 0.27ns ± 3% 0.27ns ± 0% -1.77% (p=0.000 n=10+8) (atomic.Value.Load is, of course, inlined. We would expect an uncontended inline SeqAtomicLoad<int> to perform identically to SeqAtomicTryLoad<int>.) The "regression" in BenchmarkSeqCountWriteUncontended, despite this CL changing nothing in that path, is attributed to microarchitectural subtlety; the benchmark loop is unchanged except for its address: Before this CL: :0 0x4e62d1 48ffc2 INCQ DX :0 0x4e62d4 48399110010000 CMPQ DX, 0x110(CX) :0 0x4e62db 7e26 JLE 0x4e6303 :0 0x4e62dd 90 NOPL :0 0x4e62de bb01000000 MOVL $0x1, BX :0 0x4e62e3 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e62e7 ffc3 INCL BX :0 0x4e62e9 0fbae300 BTL $0x0, BX :0 0x4e62ed 733a JAE 0x4e6329 :0 0x4e62ef 90 NOPL :0 0x4e62f0 bb01000000 MOVL $0x1, BX :0 0x4e62f5 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e62f9 ffc3 INCL BX :0 0x4e62fb 0fbae300 BTL $0x0, BX :0 0x4e62ff 73d0 JAE 0x4e62d1 After this CL: :0 0x4e6361 48ffc2 INCQ DX :0 0x4e6364 48399110010000 CMPQ DX, 0x110(CX) :0 0x4e636b 7e26 JLE 0x4e6393 :0 0x4e636d 90 NOPL :0 0x4e636e bb01000000 MOVL $0x1, BX :0 0x4e6373 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e6377 ffc3 INCL BX :0 0x4e6379 0fbae300 BTL $0x0, BX :0 0x4e637d 733a JAE 0x4e63b9 :0 0x4e637f 90 NOPL :0 0x4e6380 bb01000000 MOVL $0x1, BX :0 0x4e6385 f00fc118 LOCK XADDL BX, 0(AX) :0 0x4e6389 ffc3 INCL BX :0 0x4e638b 0fbae300 BTL $0x0, BX :0 0x4e638f 73d0 JAE 0x4e6361 PiperOrigin-RevId: 329754148
2020-09-09Add Docs to nginx benchmark.Zach Koopmans
Adds docs to nginx and refactors both Httpd and Nginx benchmarks. Key changes: - Add docs and make nginx tests the same as httpd (reverse, all docs, etc.). - Make requests scale on c * b.N -> a request per thread. This works well with both --test.benchtime=10m (do a run that lasts at least 10m) and --test.benchtime=10x (do b.N = 10). -- Remove a doc from both tests (1000Kb) as 1024Kb exists. PiperOrigin-RevId: 329751091
2020-09-09[runtime tests] Exclude flaky nodejs testAyush Ranjan
PiperOrigin-RevId: 329749191
2020-09-09Dup stdio FDs for VFS2 when starting a child containerTiwei Bie
Currently the stdio FDs are not dupped and will be closed unexpectedly in VFS2 when starting a child container. This patch fixes this issue. Fixes: #3821 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-09Fix statfs test for opensource.Zach Koopmans
PiperOrigin-RevId: 329638946
2020-09-09Implement setattr+clunk in 9PFabricio Voznika
This is to cover the common pattern: open->read/write->close, where SetAttr needs to be called to update atime/mtime before the file is closed. Benchmark results: BM_OpenReadClose/10240 CPU setattr+clunk: 63783 ns VFS2: 68109 ns VFS1: 72507 ns Updates #1198 PiperOrigin-RevId: 329628461
2020-09-09Fix handling of unacceptable ACKs during close.Mithun Iyer
On receiving an ACK with unacceptable ACK number, in a closing state, TCP, needs to reply back with an ACK with correct seq and ack numbers and remain in same state. This change is as per RFC793 page 37, but with a difference that it does not apply to ESTABLISHED state, just as in Linux. Also add more tests to check for OTW sequence number and unacceptable ack numbers in these states. Fixes #3785 PiperOrigin-RevId: 329616283
2020-09-09Test opening file handles with different permissions.Dean Deng
These were problematic for vfs2 gofers before correctly implementing separate read/write handles. PiperOrigin-RevId: 329613261
2020-09-09Refactor tty codebase to use master-replica terminology.Ayush Ranjan
Updates #2972 PiperOrigin-RevId: 329584905
2020-09-09Fix panic when calling dup2().Nayana Bidari
PiperOrigin-RevId: 329572337
2020-09-09[go-marshal] Enable auto-marshalling for fs/tty.Ayush Ranjan
PiperOrigin-RevId: 329564614
2020-09-09Let flags be overriden from OCI annotationsFabricio Voznika
This allows runsc flags to be set per sandbox instance. For example, K8s pod annotations can be used to enable --debug for a single pod, making troubleshoot much easier. Similarly, features like --vfs2 can be enabled for experimentation without affecting other pods in the node. Closes #3494 PiperOrigin-RevId: 329542815
2020-09-09Automated rollback of changelist 328350576Nayana Bidari
PiperOrigin-RevId: 329526153
2020-09-09Use 1080p background image.Ian Lewis
This makes the background image on the top page 1/3 as big and allows it to load in roughly half the time. PiperOrigin-RevId: 329462030
2020-09-09Fix bug in bazel build benchmark.Zach Koopmans
PiperOrigin-RevId: 329409802
2020-09-09Change nogo failures to test failures, instead of build failures.Adin Scannell
PiperOrigin-RevId: 329408633
2020-09-09Set errno on response when syscall actually failsJay Zhuang
This prevents setting stale errno on responses. Also fixes TestDiscardsUDPPacketsWithMcastSourceAddressV6 to use correct multicast addresses in test. Fixes #3793 PiperOrigin-RevId: 329391155
2020-09-09Don't use read-only host FD for writable gofer dentries in VFS2.Jamie Liu
As documented for gofer.dentry.hostFD. PiperOrigin-RevId: 329372319
2020-09-09Remove __fuchsia__ definesTamir Duberstein
These mostly guard linux-only headers; check for linux instead. PiperOrigin-RevId: 329362762
2020-09-09Implement walk in gvisor verity fsgVisor bot
Implement walk directories in gvisor verity file system. For each step, the child dentry is verified against a verified parent root hash. PiperOrigin-RevId: 329358747
2020-09-09stateify: Bring back struct field and type names in pretty printTing-Yu Wang
PiperOrigin-RevId: 329349158
2020-09-09Run syscall tests in uts namespaces.Rahat Mahmood
Some syscall tests, namely uname_test_* modify the host and domain name, which modifies the execution environment and can have unintended consequences on other tests. For example, modifying the hostname causes some networking tests to fail DNS lookups. Run all syscall tests in their own uts namespaces to isolate these changes. PiperOrigin-RevId: 329348127
2020-09-09Add code search badgeIan Lewis
PiperOrigin-RevId: 329042549
2020-09-09Fix kernfs.Dentry reference leak.Nicolas Lacasse
PiperOrigin-RevId: 329036994
2020-09-09Include command output on errorTamir Duberstein
Currently the logs produce TestOne: packetimpact_test.go:182: listing devices on ... container: process terminated with status: 126 which is not actionable; presumably the `ip` command output is interesting. PiperOrigin-RevId: 329032105
2020-09-09Don't bind loopback to all IPs in an IPv6 subnetGhanan Gowripalan
An earlier change considered the loopback bound to all addresses in an assigned subnet. This should have only be done for IPv4 to maintain compatability with Linux: ``` $ ip addr show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever $ ping 2001:db8::1 PING 2001:db8::1(2001:db8::1) 56 data bytes ^C --- 2001:db8::1 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3062ms $ ping 2001:db8::2 PING 2001:db8::2(2001:db8::2) 56 data bytes ^C --- 2001:db8::2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2030ms $ sudo ip addr add 2001:db8::1/64 dev lo $ ping 2001:db8::1 PING 2001:db8::1(2001:db8::1) 56 data bytes 64 bytes from 2001:db8::1: icmp_seq=1 ttl=64 time=0.055 ms 64 bytes from 2001:db8::1: icmp_seq=2 ttl=64 time=0.074 ms 64 bytes from 2001:db8::1: icmp_seq=3 ttl=64 time=0.073 ms 64 bytes from 2001:db8::1: icmp_seq=4 ttl=64 time=0.071 ms ^C --- 2001:db8::1 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3075ms rtt min/avg/max/mdev = 0.055/0.068/0.074/0.007 ms $ ping 2001:db8::2 PING 2001:db8::2(2001:db8::2) 56 data bytes From 2001:db8::1 icmp_seq=1 Destination unreachable: No route From 2001:db8::1 icmp_seq=2 Destination unreachable: No route From 2001:db8::1 icmp_seq=3 Destination unreachable: No route From 2001:db8::1 icmp_seq=4 Destination unreachable: No route ^C --- 2001:db8::2 ping statistics --- 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3070ms ``` Test: integration_test.TestLoopbackAcceptAllInSubnet PiperOrigin-RevId: 329011566
2020-09-09Implement StatFS for various VFS2 filesystems.Rahat Mahmood
This mainly involved enabling kernfs' client filesystems to provide a StatFS implementation. Fixes #3411, #3515. PiperOrigin-RevId: 329009864
2020-09-09Improve type safety for network protocol optionsGhanan Gowripalan
The existing implementation for NetworkProtocol.{Set}Option take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for network protocol options that may be set or queried which network protocol option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. PiperOrigin-RevId: 328980359
2020-09-09Fix EOF handling for splice.Dean Deng
Also, add corresponding EOF tests for splice/sendfile. Discovered by syzkaller. PiperOrigin-RevId: 328975990
2020-09-09fix panic when calling SO_ORIGINAL_DST without initializing iptablesKevin Krakauer
Reported-by: syzbot+074ec22c42305725b79f@syzkaller.appspotmail.com PiperOrigin-RevId: 328963899
2020-09-09Add test demonstrating accept bugTamir Duberstein
Updates #3780. PiperOrigin-RevId: 328922573
2020-09-09Use a single NetworkEndpoint per addressGhanan Gowripalan
This change was already done as of https://github.com/google/gvisor/commit/1736b2208f but https://github.com/google/gvisor/commit/a174aa7597 conflicted with that change and it was missed in reviews. This change fixes the conflict. PiperOrigin-RevId: 328920372
2020-09-09[go-marshal] Enable auto-marshalling for tundev.Ayush Ranjan
PiperOrigin-RevId: 328863725
2020-09-09Fix vfs2 pipe behavior when splicing to a non-pipe.Dean Deng
Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero would not actually empty the pipe. There was no guarantee that the data would actually be consumed on a splice operation unless the output file's implementation of Write/PWrite actually called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed regardless of whether CopyIn is called or not. Furthermore, the number of bytes in the IOSequence for reads is now capped at the amount of data actually available. Before, splicing to /dev/zero would always return the requested splice size without taking the actual available data into account. This change also refactors the case where an input file is spliced into an output pipe so that it follows a similar pattern, which is arguably cleaner anyway. Updates #3576. PiperOrigin-RevId: 328843954
2020-09-09unix: return ECONNREFUSE if a socket file exists but a socket isn't bound to itAndrei Vagin
PiperOrigin-RevId: 328843560
2020-09-09[go-marshal] Support for usermem.IOOpts.Ayush Ranjan
PiperOrigin-RevId: 328839759
2020-09-09Improve type safety for socket optionsGhanan Gowripalan
The existing implementation for {G,S}etSockOpt take arguments of an empty interface type which all types (implicitly) implement; any type may be passed to the functions. This change introduces marker interfaces for socket options that may be set or queried which socket option types implement to ensure that invalid types are caught at compile time. Different interfaces are used to allow the compiler to enforce read-only or set-only socket options. Fixes #3714. RELNOTES: n/a PiperOrigin-RevId: 328832161
2020-09-09beef up write syscall testsJinmou Li
Added a few tests for write(2) and pwrite(2) 1. Regular Files For write(2) - write zero bytes should not move the offset - write non-zero bytes should increment the offset the exact amount - write non-zero bytes after a lseek() should move the offset the exact amount after the seek - write non-zero bytes with O_APPEND should move the offset the exact amount after original EOF For pwrite(2), offset is not affected when - pwrite zero bytes - pwrite non-zero bytes For EOF, added a test asserting the EOF (indicated by lseek(SEEK_END)) is updated properly after writing non-zero bytes 2. Symlink Added one pwite64() call for symlink that is written as a counterpart of the existing test using pread64()
2020-09-09Fix BadSocketPair for open source.Zach Koopmans
BadSocketPair test will return several errnos (EPREM, ESOCKTNOSUPPORT, EAFNOSUPPORT) meaning the test is just too specific. Checking the syscall fails is appropriate. PiperOrigin-RevId: 328813071
2020-09-09Skip IPv6UDPUnboundSocketNetlinkTest on native linuxGhanan Gowripalan
...while we figure out of we want to consider the loopback interface bound to all IPs in an assigned IPv6 subnet, or not (to maintain compatibility with Linux). PiperOrigin-RevId: 328807974