summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip
AgeCommit message (Collapse)Author
2020-08-05Merge release-20200622.1-336-g0e6f7a12c (automated)gVisor bot
2020-08-04Update variables for implementation of RACK in TCPNayana Bidari
RACK (Recent Acknowledgement) is a new loss detection algorithm in TCP. These are the fields which should be stored on connections to implement RACK algorithm. PiperOrigin-RevId: 324948703
2020-08-04Merge release-20200622.1-329-g00993130e (automated)gVisor bot
2020-08-04Use 1 fragmentation component per IP stackGhanan Gowripalan
This will help manage memory consumption by IP reassembly when receiving IP fragments on multiple network endpoints. Previously, each endpoint would cap memory consumption at 4MB, but with this change, each IP stack will cap memory consumption at 4MB. No behaviour changes. PiperOrigin-RevId: 324913904
2020-08-03Merge release-20200622.1-313-gb2ae7ea1b (automated)gVisor bot
2020-08-03Plumbing context.Context to DecRef() and Release().Nayana Bidari
context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-31Merge release-20200622.1-300-ga7d9aa6d5 (automated)gVisor bot
2020-07-31Support fragments from different sourcesGhanan Gowripalan
Prevent fragments with different source-destination pairs from conflicting with each other. Test: - ipv6_test.TestReceiveIPv6Fragments - ipv4_test.TestReceiveIPv6Fragments PiperOrigin-RevId: 324283246
2020-07-31iptables: support SO_ORIGINAL_DSTKevin Krakauer
Envoy (#170) uses this to get the original destination of redirected packets.
2020-07-30Fix TCP CurrentConnected counter updates.Mithun Iyer
CurrentConnected counter is incorrectly decremented on close of an endpoint which is still not connected. Fixes #3443 PiperOrigin-RevId: 324155171
2020-07-30Revert change to default buffer size.Bhasker Hariharan
In https://github.com/google/gvisor/commit/ca6bded95dbce07f9683904b4b768dfc2d4a09b2 we reduced the default buffer size to 32KB. This mostly works fine except at high throughput where we hit zero window very quickly and the TCP receive buffer moderation is not able to grow the window. This can be seen in the benchmarks where with a 32KB buffer and 100 connections downloading a 10MB file we get about 30 requests/s vs the 1MB buffer gives us about 53 requests/s. A proper fix requires a few changes to when we send a zero window as well as when we decide to send a zero window update. Today we consider available space below 1MSS as zero and send an update when it crosses 1MSS of available space. This is way too low and results in the window staying very small once we hit a zero window condition as we keep sending updates with size barely over 1MSS. Linux and BSD are smarter about this and use different thresholds. We should separately update our logic to match linux or BSD so that we don't send window updates that are really tiny or wait until we drop below 1MSS to advertise a zero window. PiperOrigin-RevId: 324087019
2020-07-30Enforce fragment block size and validate argsGhanan Gowripalan
Allow configuring fragmentation.Fragmentation with a fragment block size which will be enforced when processing fragments. Also validate arguments when processing fragments. Test: - fragmentation.TestErrors - ipv6_test.TestReceiveIPv6Fragments - ipv4_test.TestReceiveIPv6Fragments PiperOrigin-RevId: 324081521
2020-07-30Implement neighbor unreachability detection for ARP and NDP.Sam Balana
This change implements the Neighbor Unreachability Detection (NUD) state machine, as per RFC 4861 [1]. The state machine operates on a single neighbor in the local network. This requires the state machine to be implemented on each entry of the neighbor table. This change also adds, but does not expose, several APIs. The first API is for performing basic operations on the neighbor table: - Create a static entry - List all entries - Delete all entries - Remove an entry by address The second API is used for changing the NUD protocol constants on a per-NIC basis to allow Neighbor Discovery to operate over links with widely varying performance characteristics. See [RFC 4861 Section 10][2] for the list of constants. Finally, the last API is for allowing users to subscribe to NUD state changes. See [RFC 4861 Appendix C][3] for the list of edges. [1]: https://tools.ietf.org/html/rfc4861 [2]: https://tools.ietf.org/html/rfc4861#section-10 [3]: https://tools.ietf.org/html/rfc4861#appendix-C Tests: pkg/tcpip/stack:stack_test - TestNeighborCacheAddStaticEntryThenOverflow - TestNeighborCacheClear - TestNeighborCacheClearThenOverflow - TestNeighborCacheConcurrent - TestNeighborCacheDuplicateStaticEntryWithDifferentLinkAddress - TestNeighborCacheDuplicateStaticEntryWithSameLinkAddress - TestNeighborCacheEntry - TestNeighborCacheEntryNoLinkAddress - TestNeighborCacheGetConfig - TestNeighborCacheKeepFrequentlyUsed - TestNeighborCacheNotifiesWaker - TestNeighborCacheOverflow - TestNeighborCacheOverwriteWithStaticEntryThenOverflow - TestNeighborCacheRemoveEntry - TestNeighborCacheRemoveEntryThenOverflow - TestNeighborCacheRemoveStaticEntry - TestNeighborCacheRemoveStaticEntryThenOverflow - TestNeighborCacheRemoveWaker - TestNeighborCacheReplace - TestNeighborCacheResolutionFailed - TestNeighborCacheResolutionTimeout - TestNeighborCacheSetConfig - TestNeighborCacheStaticResolution - TestEntryAddsAndClearsWakers - TestEntryDelayToProbe - TestEntryDelayToReachableWhenSolicitedOverrideConfirmation - TestEntryDelayToReachableWhenUpperLevelConfirmation - TestEntryDelayToStaleWhenConfirmationWithDifferentAddress - TestEntryDelayToStaleWhenProbeWithDifferentAddress - TestEntryFailedGetsDeleted - TestEntryIncompleteToFailed - TestEntryIncompleteToIncompleteDoesNotChangeUpdatedAt - TestEntryIncompleteToReachable - TestEntryIncompleteToReachableWithRouterFlag - TestEntryIncompleteToStale - TestEntryInitiallyUnknown - TestEntryProbeToFailed - TestEntryProbeToReachableWhenSolicitedConfirmationWithSameAddress - TestEntryProbeToReachableWhenSolicitedOverrideConfirmation - TestEntryProbeToStaleWhenConfirmationWithDifferentAddress - TestEntryProbeToStaleWhenProbeWithDifferentAddress - TestEntryReachableToStaleWhenConfirmationWithDifferentAddress - TestEntryReachableToStaleWhenConfirmationWithDifferentAddressAndOverride - TestEntryReachableToStaleWhenProbeWithDifferentAddress - TestEntryReachableToStaleWhenTimeout - TestEntryStaleToDelay - TestEntryStaleToReachableWhenSolicitedOverrideConfirmation - TestEntryStaleToStaleWhenOverrideConfirmation - TestEntryStaleToStaleWhenProbeUpdateAddress - TestEntryStaysDelayWhenOverrideConfirmationWithSameAddress - TestEntryStaysProbeWhenOverrideConfirmationWithSameAddress - TestEntryStaysReachableWhenConfirmationWithRouterFlag - TestEntryStaysReachableWhenProbeWithSameAddress - TestEntryStaysStaleWhenProbeWithSameAddress - TestEntryUnknownToIncomplete - TestEntryUnknownToStale - TestEntryUnknownToUnknownWhenConfirmationWithUnknownAddress pkg/tcpip/stack:stack_x_test - TestDefaultNUDConfigurations - TestNUDConfigurationFailsForNotSupported - TestNUDConfigurationsBaseReachableTime - TestNUDConfigurationsDelayFirstProbeTime - TestNUDConfigurationsMaxMulticastProbes - TestNUDConfigurationsMaxRandomFactor - TestNUDConfigurationsMaxUnicastProbes - TestNUDConfigurationsMinRandomFactor - TestNUDConfigurationsRetransmitTimer - TestNUDConfigurationsUnreachableTime - TestNUDStateReachableTime - TestNUDStateRecomputeReachableTime - TestSetNUDConfigurationFailsForBadNICID - TestSetNUDConfigurationFailsForNotSupported [1]: https://tools.ietf.org/html/rfc4861 [2]: https://tools.ietf.org/html/rfc4861#section-10 [3]: https://tools.ietf.org/html/rfc4861#appendix-C Updates #1889 Updates #1894 Updates #1895 Updates #1947 Updates #1948 Updates #1949 Updates #1950 PiperOrigin-RevId: 324070795
2020-07-30Use brodcast MAC for broadcast IPv4 packetsGhanan Gowripalan
When sending packets to a known network's broadcast address, use the broadcast MAC address. Test: - stack_test.TestOutgoingSubnetBroadcast - udp_test.TestOutgoingSubnetBroadcast PiperOrigin-RevId: 324062407
2020-07-28Redirect TODO to GitHub issuesFabricio Voznika
PiperOrigin-RevId: 323715260
2020-07-27Merge release-20200622.1-240-g8dbf428a1 (automated)gVisor bot
2020-07-27Add ability to send unicast ARP requests and Neighbor SolicitationsSam Balana
The previous implementation of LinkAddressRequest only supported sending broadcast ARP requests and multicast Neighbor Solicitations. The ability to send these packets as unicast is required for Neighbor Unreachability Detection. Tests: pkg/tcpip/network/arp:arp_test - TestLinkAddressRequest pkg/tcpip/network/ipv6:ipv6_test - TestLinkAddressRequest Updates #1889 Updates #1894 Updates #1895 Updates #1947 Updates #1948 Updates #1949 Updates #1950 PiperOrigin-RevId: 323451569
2020-07-27Merge release-20200622.1-239-gca6bded95 (automated)gVisor bot
2020-07-27Fix memory accounting in TCP pending segment queue.Bhasker Hariharan
TCP now tracks the overhead of the segment structure itself in it's out-of-order queue (pending). This is required to ensure that a malicious sender sending 1 byte out-of-order segments cannot queue like 1000's of segments which bloat up memory usage. We also reduce the default receive window to 32KB. With TCP moderation there is no need to keep this window at 1MB which means that for new connections the default out-of-order queue will be small unless the application actually reads the data that is being sent. This prevents a sender from just maliciously filling up pending buf with lots of tiny out-of-order segments. PiperOrigin-RevId: 323450913
2020-07-24Merge release-20200622.1-208-g82a5cada5 (automated)gVisor bot
2020-07-23Add AfterFunc to tcpip.ClockSam Balana
Changes the API of tcpip.Clock to also provide a method for scheduling and rescheduling work after a specified duration. This change also implements the AfterFunc method for existing implementations of tcpip.Clock. This is the groundwork required to mock time within tests. All references to CancellableTimer has been replaced with the tcpip.Job interface, allowing for custom implementations of scheduling work. This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or use tcpip.CancellableTimer. Migration plan: 1. Add AfterFunc(d, f) to tcpip.Clock 2. Replace references of tcpip.CancellableTimer with tcpip.Job 3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel 4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule 5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob. PiperOrigin-RevId: 322906897
2020-07-23Merge release-20200622.1-200-gdd530eeef (automated)gVisor bot
2020-07-23iptables: use keyed array literalsKevin Krakauer
PiperOrigin-RevId: 322882426
2020-07-23Merge release-20200622.1-198-gfc26b3764 (automated)gVisor bot
2020-07-23Merge pull request #3207 from kevinGC:icmp-connectgVisor bot
PiperOrigin-RevId: 322853192
2020-07-23Merge release-20200622.1-196-g20b556e62 (automated)gVisor bot
2020-07-23Fix wildcard bind for raw socket.Bhasker Hariharan
Fixes #3334 PiperOrigin-RevId: 322846384
2020-07-23Merge release-20200622.1-191-g36257e6b7 (automated)gVisor bot
2020-07-22make connect(2) fail when dest is unreachableKevin Krakauer
Previously, ICMP destination unreachable datagrams were ignored by TCP endpoints. This caused connect to hang when an intermediate router couldn't find a route to the host. This manifested as a Kokoro error when Docker IPv6 was enabled. The Ruby image test would try to install the sinatra gem and hang indefinitely attempting to use an IPv6 address. Fixes #3079.
2020-07-22iptables: don't NAT existing connectionsKevin Krakauer
Fixes a NAT bug that manifested as: - A SYN was sent from gVisor to another host, unaffected by iptables. - The corresponding SYN/ACK was NATted by a PREROUTING REDIRECT rule despite being part of the existing connection. - The socket that sent the SYN never received the SYN/ACK and thus a connection could not be established. We handle this (as Linux does) by tracking all connections, inserting a no-op conntrack rule for new connections with no rules of their own. Needed for istio support (#170).
2020-07-22Merge release-20200622.1-187-gbd98f8201 (automated)gVisor bot
2020-07-22iptables: replace maps with arraysKevin Krakauer
For iptables users, Check() is a hot path called for every packet one or more times. Let's avoid a bunch of map lookups. PiperOrigin-RevId: 322678699
2020-07-22Merge release-20200622.1-184-g71bf90c55 (automated)gVisor bot
2020-07-22Support for receiving outbound packets in AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 322665518
2020-07-17Merge release-20200622.1-173-gdcf6ddc27 (automated)gVisor bot
2020-07-16Add support to return protocol in recvmsg for AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 321690756
2020-07-16Merge release-20200622.1-171-gc66991ad7 (automated)gVisor bot
2020-07-16Add ethernet broadcast address constantGhanan Gowripalan
PiperOrigin-RevId: 321620517
2020-07-15Merge release-20200622.1-167-ge92f38ff0 (automated)gVisor bot
2020-07-15iptables: remove check for NetworkHeaderKevin Krakauer
This is no longer necessary, as we always set NetworkHeader before calling iptables.Check. PiperOrigin-RevId: 321461978
2020-07-15Merge release-20200622.1-164-gdb653bb34 (automated)gVisor bot
2020-07-15Merge release-20200622.1-163-g857d03f25 (automated)gVisor bot
2020-07-15fdbased: Vectorized write for packet; relax writev syscall filter.Ting-Yu Wang
Now it calls pkt.Data.ToView() when writing the packet. This may require copying when the packet is large, which puts the worse case in an even worse situation. This sent out in a separate preparation change as it requires syscall filter changes. This change will be followed by the change for the adoption of the new PacketHeader API. PiperOrigin-RevId: 321447003
2020-07-15Add support for SO_ERROR to packet sockets.Bhasker Hariharan
Packet sockets also seem to allow double binding and do not return an error on linux. This was tested by running the syscall test in a linux namespace as root and the current test DoubleBind fails@HEAD. Passes after this change. Updates #173 PiperOrigin-RevId: 321445137
2020-07-15Merge release-20200622.1-162-gfef90c61c (automated)gVisor bot
2020-07-15Fix minor bugs in a couple of interface IOCTLs.Bhasker Hariharan
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks tcpdump as it tries to interpret the packets incorrectly. Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which fails with an EINVAL since we don't implement it. For now change it to return EOPNOTSUPP to indicate that we don't support the query rather than return EINVAL. NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type field while NIC capabilities are more like the device features which can be queried using SIOCETHTOOL but not modified and NIC Flags are fields that can be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc. Updates #2746 PiperOrigin-RevId: 321436525
2020-07-13Merge pull request #2672 from amscanne:shim-integratedgVisor bot
PiperOrigin-RevId: 321053634
2020-07-13Merge release-20200622.1-106-ga287309d9 (automated)gVisor bot
2020-07-13Fix recvMMsgDispatcher not slicing link header correctly.Ting-Yu Wang
PiperOrigin-RevId: 321035635
2020-07-13Merge release-20200622.1-97-g43c209f48 (automated)gVisor bot