summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2021-03-29[perf] Reduce contention in ptrace.threadPool.lookupOrCreate().Ayush Ranjan
lookupOrCreate is called from subprocess.switchToApp() and subprocess.syscall(). lookupOrCreate() looks for a thread already created for the current TID. If a thread exists (common case), it returns immediately. Otherwise it creates a new one. This change switches to using a sync.RWMutex. The initial thread existence lookup is now done only with the read lock. So multiple successful lookups can occur concurrently. Only when a new thread is created will it acquire the lock for writing and update the map (which is not the common case). Discovered in mutex profiles from the various ptrace benchmarks. Example: https://gvisor.dev/profile/gvisor-buildkite/fd14bfad-b30f-44dc-859b-80ebac50beb4/843827db-da50-4dc9-a2ea-ecf734dde2d5/tmp/profile/ptrace/BenchmarkFio/operation.write/blockSize.4K/filesystem.tmpfs/benchmarks/fio/mutex.pprof/flamegraph PiperOrigin-RevId: 365612094
2021-03-25Use seqfile.SeqHandles correctly in VFS1 /proc/net/.Jamie Liu
Before this change: ``` $ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = EOF $ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = 256 ``` After this change: ``` $ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = 256 $ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = 256 ``` Fixes #5732 PiperOrigin-RevId: 365178386
2021-03-25Lock TaskSet mutex for writing in ptraceClone().Jamie Liu
This is necessary since ptraceClone() mutates tracer.ptraceTracees. PiperOrigin-RevId: 365152396
2021-03-25setgid: skip tests when we can't find usable GIDsKevin Krakauer
PiperOrigin-RevId: 365092320
2021-03-24Fix path to runsc in CNI tutorial.Ian Lewis
PiperOrigin-RevId: 364931406
2021-03-24Fix highlighting sidebar menu on the websiteIan Lewis
Highlighting previously highlighted multiple items in the sidebar if the had the same page name (not full url). This change simplifies this by adding the highlight class in the jekyll template rather than javascript, and highlights only the correct page. PiperOrigin-RevId: 364931350
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-24Fix data race in fdbased when accessing fanoutID.Bhasker Hariharan
PiperOrigin-RevId: 364859173
2021-03-24Unexpose immutable fields in stack.RouteNick Brown
This change sets the inner `routeInfo` struct to be a named private member and replaces direct access with access through getters. Note that direct access to the fields of `routeInfo` is still possible through the `RouteInfo` struct. Fixes #4902 PiperOrigin-RevId: 364822872
2021-03-23Merge pull request #5677 from avagin:kvm-mmiogVisor bot
PiperOrigin-RevId: 364728696
2021-03-23Move the code that manages floating-point state to a separate packageAndrei Vagin
This change is inspired by Adin's cl/355256448. PiperOrigin-RevId: 364695931
2021-03-23Add --file-access-mounts flagFabricio Voznika
--file-access-mounts flag is similar to --file-access, but controls non-root mounts that were previously mounted in shared mode only. This gives more flexibility to control how mounts are shared within a container. PiperOrigin-RevId: 364669882
2021-03-23setgid directory support in goferfsKevin Krakauer
Also adds support for clearing the setuid bit when appropriate (writing, truncating, changing size, changing UID, or changing GID). VFS2 only. PiperOrigin-RevId: 364661835
2021-03-23Skip checklocks analysis for stateify generated code.Rahat Mahmood
Stateify methods are always called without holding the appropriate locks. The system is paused and we know there will be no mutations when we call Save/Load, so this is perfectly safe. However, checklocks can't know about this, and it will always complain. Mark stateify generated methods that touch struct fields as "checklocksignore" to avoid this. PiperOrigin-RevId: 364610241
2021-03-23Allow FSETXATTR/FGETXATTR host calls for VerityChong Cai
These host calls are needed for Verity fs to generate/verify hashes. PiperOrigin-RevId: 364598180
2021-03-23Use constant (TestInitialSequenceNumber) instead of integer (789) in tests.Nayana Bidari
PiperOrigin-RevId: 364596526
2021-03-23Split fio read/write and randread/randwrite operationsZach Koopmans
The fio benchmark was changed to a fixed size read/write ammount because the timed benchmark was overwhelming machine memory on tmpfs mounts. Now rand(read|write) operations are prohibitively long, leading to timeouts. Split the benchmarks as they were in python bm-tools: the read/write as fixed sized (1GB) and the rand(read|write) as timed operations (15s). PiperOrigin-RevId: 364584436
2021-03-23Explicitly allow martian loopback packetsGhanan Gowripalan
...instead of opting out of them. Loopback traffic should be stack-local but gVisor has some clients that depend on the ability to receive loopback traffic that originated from outside of the stack. Because of this, we guard this change behind IP protocol options. A previous change provided the facility to deny these martian loopback packets but this change requires client to opt-in to accepting martian loopback packets as accepting martian loopback packets are not meant to be accepted, as per RFC 1122 section 3.2.1.3.g: (g) { 127, <any> } Internal host loopback address. Addresses of this form MUST NOT appear outside a host. PiperOrigin-RevId: 364581174
2021-03-22Update apt repository to limit to supported architectures.Adin Scannell
Fixes #5703 PiperOrigin-RevId: 364492235
2021-03-22[lisa] Support dynamic types for all types.Ayush Ranjan
We were only supporting dynamic struct types. With this change, users can make any type dynamic. The tool (correctly) blindly just generates the remaining methods needed to implement Marshallable using the 3 methods defined by the user on the dynamic type. This is helpful in situations like: type StringArray []string Added a test for such a use case. PiperOrigin-RevId: 364463164
2021-03-22Fix logs for packetimpact tests cleanupZeling Feng
- Don't cleanup containers in Network.Cleanup, otherwise containers will be killed and removed several times. - Don't set AutoRemove for containers. This will prevent the confusing 'removal already in progress' messages. Fixes #3795 PiperOrigin-RevId: 364404414
2021-03-22Return tcpip.Error from (*Stack).GetMainNICAddressGhanan Gowripalan
PiperOrigin-RevId: 364381970
2021-03-22Emit comment about build tags in gomarshal generated files.Rahat Mahmood
This may be useful for tracking down where build tags come from and understanding tag import issues in generated files. PiperOrigin-RevId: 364374931
2021-03-22Avoid calling sync on each write in writethrough mode.Nicolas Lacasse
PiperOrigin-RevId: 364370595
2021-03-22Fix and merge tcp_{outside_the_window,tcp_unacc_seq_ack}_closingZeling Feng
The tests were not using the correct windowSize so the testing segments were actually within the window for seqNumOffset=0 tests. The issue is already fixed by #5674. PiperOrigin-RevId: 364252630
2021-03-18Translate syserror when validating partial IO errorsFabricio Voznika
syserror allows packages to register translators for errors. These translators should be called prior to checking if the error is valid, otherwise it may not account for possible errors that can be returned from different packages, e.g. safecopy.BusError => syserror.EFAULT. Second attempt, it passes tests now :-) PiperOrigin-RevId: 363714508
2021-03-18Address post submit comments for fs benchmarks.Zach Koopmans
Also, drop fio total reads/writes to 1GB as 10GB is prohibitively slow. PiperOrigin-RevId: 363714060
2021-03-18Skip /dev submount hack on VFS2.Jamie Liu
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.: ``` "mounts": [ ... { "destination": "/dev", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, ... { "destination": "/dev/shm", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=67108864" ] }, ... ``` (This is mostly consistent with how Linux is usually configured, except that /dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer implements OCI-runtime-spec-undocumented behavior to create /dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently switches /dev to devtmpfs. In VFS1, this is necessary to get device files like /dev/null at all, since VFS1 doesn't support real device special files, only what is hardcoded in devfs. VFS2 does support device special files, but using devtmpfs is the easiest way to get pre-created files in /dev.) runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1, this appears to be to avoid introducing a submount overlay for /dev, and is mostly fine since the typical mode for the /dev/shm mount is ~consistent with the mode of the /dev/shm directory provided by devfs (modulo the sticky bit). In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs' /dev/shm mode is correct for the mount point but not the mount. So turn off this behavior for VFS2. After this change: ``` $ docker run --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 360 Mar 18 00:16 .. $ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwx 1 root root 0 Mar 18 00:16 . dr-xr-xr-x 1 root root 0 Mar 18 00:16 .. $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 320 Mar 18 00:16 .. ``` Fixes #5687 PiperOrigin-RevId: 363699385
2021-03-17Do not use martian loopback packets in testsGhanan Gowripalan
Transport demuxer and UDP tests should not use a loopback address as the source address for packets injected into the stack as martian loopback packets will be dropped in a later change. PiperOrigin-RevId: 363479681
2021-03-17Drop loopback traffic from outside of the stackGhanan Gowripalan
Loopback traffic should be stack-local but gVisor has some clients that depend on the ability to receive loopback traffic that originated from outside of the stack. Because of this, we guard this change behind IP protocol options. Test: integration_test.TestExternalLoopbackTraffic PiperOrigin-RevId: 363461242
2021-03-16kvm: prefault a floating point state before restoring itAndrei Vagin
If physical pages of a memory region are not mapped yet, the kernel will trigger KVM_EXIT_MMIO and we will map physical pages in bluepillHandler(). An instruction that triggered a fault will not be re-executed, it will be emulated in the kernel, but it can't emulate complex instructions like xsave, xrstor. We can touch the memory with simple instructions to workaround this problem.
2021-03-16Fix tcp_fin_retransmission_netstack_testZeling Feng
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT unconditionally. Fixing the state machine will give us back the retransmission of FIN. PiperOrigin-RevId: 363301883
2021-03-16Fix a race with synRcvdCount and acceptMithun Iyer
There is a race in handling new incoming connections on a listening endpoint that causes the endpoint to reply to more incoming SYNs than what is permitted by the listen backlog. The race occurs when there is a successful passive connection handshake and the synRcvdCount counter is decremented, followed by the endpoint delivered to the accept queue. In the window of time between synRcvdCount decrementing and the endpoint being enqueued for accept, new incoming SYNs can be handled without honoring the listen backlog value, as the backlog could be perceived not full. Fixes #5637 PiperOrigin-RevId: 363279372
2021-03-16setgid directory support in overlayfsKevin Krakauer
PiperOrigin-RevId: 363276495
2021-03-16Unexport methods on NDPOptionGhanan Gowripalan
They are not used outside of the header package. PiperOrigin-RevId: 363237708
2021-03-16Detect looped-back NDP DAD messagesGhanan Gowripalan
...as per RFC 7527. If a looped-back DAD message is received, do not fail DAD since our own DAD message does not indicate that a neighbor has the address assigned. Test: ndp_test.TestDADResolveLoopback PiperOrigin-RevId: 363224288
2021-03-16Do not call into Stack from LinkAddressRequestGhanan Gowripalan
Calling into the stack from LinkAddressRequest is not needed as we already have a reference to the network endpoint (IPv6) or network interface (IPv4/ARP). PiperOrigin-RevId: 363213973
2021-03-15Turn sys_thread constants into variables.Etienne Perot
PiperOrigin-RevId: 363092268
2021-03-15Move `MaxIovs` back to a variable in `iovec.go`.Etienne Perot
PiperOrigin-RevId: 363091954
2021-03-15Deflake proc_test_nativeFabricio Voznika
Terminating tasks from other tests can mess up with the task list of the current test. Tests were changed to look for added/removed tasks, ignoring other tasks that may exist while the test is running. PiperOrigin-RevId: 363084261
2021-03-15Make netstack (//pkg/tcpip) buildable for 32 bitKevin Krakauer
Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-15[op] Make gofer client handle return partial write length when err is nil.Ayush Ranjan
If there was a partial write (when not using the host FD) which did not generate an error, we were incorrectly returning the number of bytes attempted to write instead of the number of bytes actually written. PiperOrigin-RevId: 363058989
2021-03-15Merge pull request #5618 from iangudger:unix-transport-racegVisor bot
PiperOrigin-RevId: 362999220
2021-03-15Packetimpact test for ACK to OTW Seq segments behavior in CLOSINGZeling Feng
TCP, in CLOSING state, MUST send an ACK with next expected SEQ number after receiving any segment with OTW SEQ number and remain in the same state. While I am here, I also changed shutdown to behave the same as other calls in posix_server. PiperOrigin-RevId: 362976955
2021-03-14Fix race in tcp_retransmits_testMithun Iyer
The test queries for RTO via TCP_INFO and applies that to the rest of the test. The RTO is estimated by processing incoming ACK. There is a race in the test where we may query for RTO before the incoming ACK was processed. Fix the race in the test by letting the DUT complete a payload receive, thus estimating RTO before proceeding to query the RTO. Bump up the time correction to reduce flakes. PiperOrigin-RevId: 362865904
2021-03-13[perf] Run benchmarks with VFS2.Ayush Ranjan
The run-benchmark target would run the benchmark with VFS1. PiperOrigin-RevId: 362754188
2021-03-12Add escapes to newlines in syzkaller instructions.Nicolas Lacasse
So they can be copy-pasted. PiperOrigin-RevId: 362605833
2021-03-12Merge pull request #5663 from avagin:apt-repogVisor bot
PiperOrigin-RevId: 362545342
2021-03-11Support ICMP echo sockets on Linux DUTZeling Feng
By default net.ipv4.ping_group_range is set to "1 0" and no one (even the root) can create an ICMP socket. Setting it to "0 0" allows root, which we are inside the container, to create ICMP sockets for packetimpact tests. PiperOrigin-RevId: 362454201
2021-03-11make/release: Sign a package only if it isn't signed yet.Andrei Vagin
We can generate more than one apt repo for the same package. If we will sign a package again, its file will be changed and all hashes that have been generated before will be invalid.