summaryrefslogtreecommitdiffhomepage
path: root/pkg
AgeCommit message (Collapse)Author
2020-07-30Revert change to default buffer size.Bhasker Hariharan
In https://github.com/google/gvisor/commit/ca6bded95dbce07f9683904b4b768dfc2d4a09b2 we reduced the default buffer size to 32KB. This mostly works fine except at high throughput where we hit zero window very quickly and the TCP receive buffer moderation is not able to grow the window. This can be seen in the benchmarks where with a 32KB buffer and 100 connections downloading a 10MB file we get about 30 requests/s vs the 1MB buffer gives us about 53 requests/s. A proper fix requires a few changes to when we send a zero window as well as when we decide to send a zero window update. Today we consider available space below 1MSS as zero and send an update when it crosses 1MSS of available space. This is way too low and results in the window staying very small once we hit a zero window condition as we keep sending updates with size barely over 1MSS. Linux and BSD are smarter about this and use different thresholds. We should separately update our logic to match linux or BSD so that we don't send window updates that are really tiny or wait until we drop below 1MSS to advertise a zero window. PiperOrigin-RevId: 324087019
2020-07-30Enforce fragment block size and validate argsGhanan Gowripalan
Allow configuring fragmentation.Fragmentation with a fragment block size which will be enforced when processing fragments. Also validate arguments when processing fragments. Test: - fragmentation.TestErrors - ipv6_test.TestReceiveIPv6Fragments - ipv4_test.TestReceiveIPv6Fragments PiperOrigin-RevId: 324081521
2020-07-30Implement neighbor unreachability detection for ARP and NDP.Sam Balana
This change implements the Neighbor Unreachability Detection (NUD) state machine, as per RFC 4861 [1]. The state machine operates on a single neighbor in the local network. This requires the state machine to be implemented on each entry of the neighbor table. This change also adds, but does not expose, several APIs. The first API is for performing basic operations on the neighbor table: - Create a static entry - List all entries - Delete all entries - Remove an entry by address The second API is used for changing the NUD protocol constants on a per-NIC basis to allow Neighbor Discovery to operate over links with widely varying performance characteristics. See [RFC 4861 Section 10][2] for the list of constants. Finally, the last API is for allowing users to subscribe to NUD state changes. See [RFC 4861 Appendix C][3] for the list of edges. [1]: https://tools.ietf.org/html/rfc4861 [2]: https://tools.ietf.org/html/rfc4861#section-10 [3]: https://tools.ietf.org/html/rfc4861#appendix-C Tests: pkg/tcpip/stack:stack_test - TestNeighborCacheAddStaticEntryThenOverflow - TestNeighborCacheClear - TestNeighborCacheClearThenOverflow - TestNeighborCacheConcurrent - TestNeighborCacheDuplicateStaticEntryWithDifferentLinkAddress - TestNeighborCacheDuplicateStaticEntryWithSameLinkAddress - TestNeighborCacheEntry - TestNeighborCacheEntryNoLinkAddress - TestNeighborCacheGetConfig - TestNeighborCacheKeepFrequentlyUsed - TestNeighborCacheNotifiesWaker - TestNeighborCacheOverflow - TestNeighborCacheOverwriteWithStaticEntryThenOverflow - TestNeighborCacheRemoveEntry - TestNeighborCacheRemoveEntryThenOverflow - TestNeighborCacheRemoveStaticEntry - TestNeighborCacheRemoveStaticEntryThenOverflow - TestNeighborCacheRemoveWaker - TestNeighborCacheReplace - TestNeighborCacheResolutionFailed - TestNeighborCacheResolutionTimeout - TestNeighborCacheSetConfig - TestNeighborCacheStaticResolution - TestEntryAddsAndClearsWakers - TestEntryDelayToProbe - TestEntryDelayToReachableWhenSolicitedOverrideConfirmation - TestEntryDelayToReachableWhenUpperLevelConfirmation - TestEntryDelayToStaleWhenConfirmationWithDifferentAddress - TestEntryDelayToStaleWhenProbeWithDifferentAddress - TestEntryFailedGetsDeleted - TestEntryIncompleteToFailed - TestEntryIncompleteToIncompleteDoesNotChangeUpdatedAt - TestEntryIncompleteToReachable - TestEntryIncompleteToReachableWithRouterFlag - TestEntryIncompleteToStale - TestEntryInitiallyUnknown - TestEntryProbeToFailed - TestEntryProbeToReachableWhenSolicitedConfirmationWithSameAddress - TestEntryProbeToReachableWhenSolicitedOverrideConfirmation - TestEntryProbeToStaleWhenConfirmationWithDifferentAddress - TestEntryProbeToStaleWhenProbeWithDifferentAddress - TestEntryReachableToStaleWhenConfirmationWithDifferentAddress - TestEntryReachableToStaleWhenConfirmationWithDifferentAddressAndOverride - TestEntryReachableToStaleWhenProbeWithDifferentAddress - TestEntryReachableToStaleWhenTimeout - TestEntryStaleToDelay - TestEntryStaleToReachableWhenSolicitedOverrideConfirmation - TestEntryStaleToStaleWhenOverrideConfirmation - TestEntryStaleToStaleWhenProbeUpdateAddress - TestEntryStaysDelayWhenOverrideConfirmationWithSameAddress - TestEntryStaysProbeWhenOverrideConfirmationWithSameAddress - TestEntryStaysReachableWhenConfirmationWithRouterFlag - TestEntryStaysReachableWhenProbeWithSameAddress - TestEntryStaysStaleWhenProbeWithSameAddress - TestEntryUnknownToIncomplete - TestEntryUnknownToStale - TestEntryUnknownToUnknownWhenConfirmationWithUnknownAddress pkg/tcpip/stack:stack_x_test - TestDefaultNUDConfigurations - TestNUDConfigurationFailsForNotSupported - TestNUDConfigurationsBaseReachableTime - TestNUDConfigurationsDelayFirstProbeTime - TestNUDConfigurationsMaxMulticastProbes - TestNUDConfigurationsMaxRandomFactor - TestNUDConfigurationsMaxUnicastProbes - TestNUDConfigurationsMinRandomFactor - TestNUDConfigurationsRetransmitTimer - TestNUDConfigurationsUnreachableTime - TestNUDStateReachableTime - TestNUDStateRecomputeReachableTime - TestSetNUDConfigurationFailsForBadNICID - TestSetNUDConfigurationFailsForNotSupported [1]: https://tools.ietf.org/html/rfc4861 [2]: https://tools.ietf.org/html/rfc4861#section-10 [3]: https://tools.ietf.org/html/rfc4861#appendix-C Updates #1889 Updates #1894 Updates #1895 Updates #1947 Updates #1948 Updates #1949 Updates #1950 PiperOrigin-RevId: 324070795
2020-07-30Use brodcast MAC for broadcast IPv4 packetsGhanan Gowripalan
When sending packets to a known network's broadcast address, use the broadcast MAC address. Test: - stack_test.TestOutgoingSubnetBroadcast - udp_test.TestOutgoingSubnetBroadcast PiperOrigin-RevId: 324062407
2020-07-30Have dockerutil.Wait* respect the context deadlineKevin Krakauer
PiperOrigin-RevId: 324044634
2020-07-30Fix SETOWN_EX return value.Dean Deng
Return on success should be 0, not size of the struct copied out. PiperOrigin-RevId: 324029193
2020-07-29Force registration for EPOLLHUP, not EPOLLRDHUP, in vfs2's epoll.Jamie Liu
Compare Linux's fs/eventpoll.c:do_epoll_ctl(). I don't know where EPOLLRDHUP came from. PiperOrigin-RevId: 323874419
2020-07-28Redirect TODO to GitHub issuesFabricio Voznika
PiperOrigin-RevId: 323715260
2020-07-28ip6tables testingKevin Krakauer
We skip gVisor tests for now, as ip6tables aren't yet implemented.
2020-07-27Fix strace for epoll event arrays.Jamie Liu
PiperOrigin-RevId: 323491461
2020-07-27Merge pull request #3201 from lubinszARM:pr_sys64_2gVisor bot
PiperOrigin-RevId: 323456118
2020-07-27Merge pull request #3299 from lubinszARM:pr_asidgVisor bot
PiperOrigin-RevId: 323455097
2020-07-27Add ability to send unicast ARP requests and Neighbor SolicitationsSam Balana
The previous implementation of LinkAddressRequest only supported sending broadcast ARP requests and multicast Neighbor Solicitations. The ability to send these packets as unicast is required for Neighbor Unreachability Detection. Tests: pkg/tcpip/network/arp:arp_test - TestLinkAddressRequest pkg/tcpip/network/ipv6:ipv6_test - TestLinkAddressRequest Updates #1889 Updates #1894 Updates #1895 Updates #1947 Updates #1948 Updates #1949 Updates #1950 PiperOrigin-RevId: 323451569
2020-07-27Fix memory accounting in TCP pending segment queue.Bhasker Hariharan
TCP now tracks the overhead of the segment structure itself in it's out-of-order queue (pending). This is required to ensure that a malicious sender sending 1 byte out-of-order segments cannot queue like 1000's of segments which bloat up memory usage. We also reduce the default receive window to 32KB. With TCP moderation there is no need to keep this window at 1MB which means that for new connections the default out-of-order queue will be small unless the application actually reads the data that is being sent. This prevents a sender from just maliciously filling up pending buf with lots of tiny out-of-order segments. PiperOrigin-RevId: 323450913
2020-07-27Merge pull request #2958 from lubinszARM:pr_vfs2_1gVisor bot
PiperOrigin-RevId: 323443142
2020-07-27Add device implementation for /dev/fuseRidwan Sharif
This PR adds the following: - [x] Marshall-able structs for fuse headers - [x] Data structures needed in /dev/fuse to communicate with the daemon server - [x] Implementation of the device interface - [x] Go unit tests This change adds the `/dev/fuse` implementation. `Connection` controls the communication between the server and the sentry. The FUSE server uses the `FileDescription` interface to interact with the Sentry. The Sentry implmenetation of fusefs, uses `Connection` and the Connection interface to interact with the Server. All communication messages are in the form of `go_marshal` backed structs defined in the ABI package. This change also adds some go unit tests that test (pretty basically) the interfaces and should be used as an example of an end to end FUSE operation. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/3083 from ridwanmsharif:ridwanmsharif/fuse-device-impl 69aa2ce970004938fe9f918168dfe57636ab856e PiperOrigin-RevId: 323428180
2020-07-27Move platform.File in memmapAndrei Vagin
The subsequent systrap changes will need to import memmap from the platform package. PiperOrigin-RevId: 323409486
2020-07-26Add profiling to dockerutilZach Koopmans
Adds profiling with `runsc debug` or pprof to dockerutil. All targets using dockerutil should now be able to use profiling. In addition, modifies existing benchmarks to use profiling. PiperOrigin-RevId: 323298634
2020-07-26allow guest user applications read CNTVCT_EL0/CNTFRQ_EL0Bin Lu
At present, when doing syscall_kvm test, we need to enable the function of ESR_ELx_SYS64_ISS_SYS_CNTVCT/ESR_ELx_SYS64_ISS_SYS_CNTFRQ to successfully pass the test. I set CNTKCTL_EL1.EL0VCTEN==1/CNTKCTL_EL1.EL0PCTEN==1, so that the related cases can passed. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-24Enable automated marshalling for netstack.Ayush Ranjan
PiperOrigin-RevId: 322954792
2020-07-23Merge pull request #3142 from tanjianfeng:fix-3141gVisor bot
PiperOrigin-RevId: 322937495
2020-07-23Merge pull request #3317 from sevki:patch-2gVisor bot
PiperOrigin-RevId: 322928424
2020-07-23Add permission checks to vfs2 truncate.Dean Deng
- Check write permission on truncate(2). Unlike ftruncate(2), truncate(2) fails if the user does not have write permissions on the file. - For gofers under InteropModeShared, check file type before making a truncate request. We should fail early and avoid making an rpc when possible. Furthermore, depending on the remote host's failure may give us unexpected behavior--if the host converts the truncate request to an ftruncate syscall on an open fd, we will get EINVAL instead of EISDIR. Updates #2923. PiperOrigin-RevId: 322913569
2020-07-23FileDescription is hard to spell.Dean Deng
Fix typos. PiperOrigin-RevId: 322913282
2020-07-23Add AfterFunc to tcpip.ClockSam Balana
Changes the API of tcpip.Clock to also provide a method for scheduling and rescheduling work after a specified duration. This change also implements the AfterFunc method for existing implementations of tcpip.Clock. This is the groundwork required to mock time within tests. All references to CancellableTimer has been replaced with the tcpip.Job interface, allowing for custom implementations of scheduling work. This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or use tcpip.CancellableTimer. Migration plan: 1. Add AfterFunc(d, f) to tcpip.Clock 2. Replace references of tcpip.CancellableTimer with tcpip.Job 3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel 4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule 5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob. PiperOrigin-RevId: 322906897
2020-07-23Implement get/set_robust_list.Nicolas Lacasse
PiperOrigin-RevId: 322904430
2020-07-23Merge pull request #3024 from ridwanmsharif:ridwanmsharif/fuse-stub-implgVisor bot
PiperOrigin-RevId: 322890087
2020-07-23Add task work mechanism.Dean Deng
Like task_work in Linux, this allows us to register callbacks to be executed before returning to userspace. This is needed for kcov support, which requires coverage information to be up-to-date whenever we are in user mode. We will provide coverage data through the kcov interface to enable coverage-directed fuzzing in syzkaller. One difference from Linux is that task work cannot queue work before the transition to userspace that it precedes; queued work will be picked up before the next transition. PiperOrigin-RevId: 322889984
2020-07-23kvm-tls-2:add the preservation of user-TLS in the Arm64 kvm platformlubinszARM
This patch load/save TLS for the container application. Related issue: full context-switch supporting for Arm64 #1238 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/2761 from lubinszARM:pr_tls_2 cb5dbca1c9c3f378002406da7a58887f9b5032b3 PiperOrigin-RevId: 322887044
2020-07-23iptables: use keyed array literalsKevin Krakauer
PiperOrigin-RevId: 322882426
2020-07-23Use mode supplied by the mount optionsRidwan Sharif
2020-07-23Added stub FUSE filesystemRidwan Sharif
Allow FUSE filesystems to be mounted using libfuse. The appropriate flags and mount options are parsed and understood by fusefs.
2020-07-23Merge pull request #3207 from kevinGC:icmp-connectgVisor bot
PiperOrigin-RevId: 322853192
2020-07-23Fix wildcard bind for raw socket.Bhasker Hariharan
Fixes #3334 PiperOrigin-RevId: 322846384
2020-07-23Marshallable socket opitons.Ayush Ranjan
Socket option values are now required to implement marshal.Marshallable. Co-authored-by: Rahat Mahmood <rahat@google.com> PiperOrigin-RevId: 322831612
2020-07-23Port sendfile to vfs2.Nicolas Lacasse
And do some refactoring of the wait logic in sendfile/splice/tee. Updates #1035 #2923 PiperOrigin-RevId: 322815521
2020-07-23[vfs2][gofer] Fix update attributes race condition.Ayush Ranjan
We were getting the file attributes before locking the metadataMu which was causing stale updates to the file attributes. Fixes OpenTest_AppendConcurrentWrite. Updates #2923 PiperOrigin-RevId: 322804438
2020-07-22make connect(2) fail when dest is unreachableKevin Krakauer
Previously, ICMP destination unreachable datagrams were ignored by TCP endpoints. This caused connect to hang when an intermediate router couldn't find a route to the host. This manifested as a Kokoro error when Docker IPv6 was enabled. The Ruby image test would try to install the sinatra gem and hang indefinitely attempting to use an IPv6 address. Fixes #3079.
2020-07-22iptables: don't NAT existing connectionsKevin Krakauer
Fixes a NAT bug that manifested as: - A SYN was sent from gVisor to another host, unaffected by iptables. - The corresponding SYN/ACK was NATted by a PREROUTING REDIRECT rule despite being part of the existing connection. - The socket that sent the SYN never received the SYN/ACK and thus a connection could not be established. We handle this (as Linux does) by tracking all connections, inserting a no-op conntrack rule for new connections with no rules of their own. Needed for istio support (#170).
2020-07-22iptables: replace maps with arraysKevin Krakauer
For iptables users, Check() is a hot path called for every packet one or more times. Let's avoid a bunch of map lookups. PiperOrigin-RevId: 322678699
2020-07-22[vfs2][tmpfs] Implement O_APPENDAyush Ranjan
Updates #2923 PiperOrigin-RevId: 322671489
2020-07-22Add O_APPEND support in vfs2 gofer.Ayush Ranjan
Helps in fixing open syscall tests: AppendConcurrentWrite and AppendOnly. We also now update the file size for seekable special files (regular files) which we were not doing earlier. Updates #2923 PiperOrigin-RevId: 322670843
2020-07-22Support for receiving outbound packets in AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 322665518
2020-07-21p9: fix `registry.get` ob1 bugSevki
2020-07-20Add standard entrypoints for test targets.Adin Scannell
PiperOrigin-RevId: 322265513
2020-07-20add asid support to Arm64Bin Lu
Support the operation of asid, so that I can optimize tlb performance by combining with nG. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-16Add support to return protocol in recvmsg for AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 321690756
2020-07-16Add ethernet broadcast address constantGhanan Gowripalan
PiperOrigin-RevId: 321620517
2020-07-15Merge pull request #3236 from craig08:fuse-kernfs-inode-stat-add-ctxgVisor bot
PiperOrigin-RevId: 321496734
2020-07-15iptables: remove check for NetworkHeaderKevin Krakauer
This is no longer necessary, as we always set NetworkHeader before calling iptables.Check. PiperOrigin-RevId: 321461978