summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip
AgeCommit message (Collapse)Author
2021-02-18Remove deprecated NUD types Failed and FailedEntryLookupsSam Balana
Completes the soft migration to Unreachable state by removing the Failed state and the the FailedEntryLookups StatCounter. Fixes #4667 PiperOrigin-RevId: 358226380
2021-02-17Move Name() out of netstack Matcher. It can live in the sentry.Kevin Krakauer
PiperOrigin-RevId: 358078157
2021-02-17[infra] Split tcpip/integration test targets to aid investigation.Ayush Ranjan
tcpip integration tests have been flaky lately. They usually run in 20 seconds and have a 60 seconds timeout. Sometimes they timeout which could be due to a bug or deadlock. To further investigate it might be helpful to split the targets and see which test is causing the flake. Added a new tcpip/tests/utils package to hold all common utilities across all tests. PiperOrigin-RevId: 358012936
2021-02-11[rack] TLP: ACK Processing and PTO scheduling.Ayush Ranjan
This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3 Fixes #5085 PiperOrigin-RevId: 357125037
2021-02-11[netstack] Fix recovery entry and exit checks.Ayush Ranjan
Entry check: - Earlier implementation was preventing us from entering recovery even if SND.UNA is lost but dupAckCount is still below threshold. Fixed that. - We should only enter recovery when at least one more byte of data beyond the highest byte that was outstanding when fast retransmit was last entered is acked. Added that check. Exit check: - Earlier we were checking if SEG.ACK is in range [SND.UNA, SND.NXT]. The intention was to check if any unacknowledged data was ACKed. Note that (SEG.ACK - 1) is actually the sequence number which was ACKed. So we were incorrectly including (SND.UNA - 1) in the range. Fixed the check to now be (SEG.ACK - 1) in range [SND.UNA, SND.NXT). Additionally, moved a RACK specific test to the rack tests file. Added tests for the changes I made. PiperOrigin-RevId: 357091322
2021-02-11Let sentry understand tcpip.ErrMalformedHeaderKevin Krakauer
Added a LINT IfChange/ThenChange check to catch this in the future. PiperOrigin-RevId: 357077564
2021-02-10RACK: Fix re-transmitting the segment twice when entering recovery.Nayana Bidari
TestRACKWithDuplicateACK is flaky as the reorder window can expire before receiving three duplicate ACKs which will result in sending the first unacknowledged segment twice: when reorder timer expired and again after receiving the third duplicate ACK. This CL will fix this behavior and will not resend the segment again if it was already re-transmittted when reorder timer expired. Update the TestRACKWithDuplicateACK to test that the first segment is considered as lost and is re-transmitted. PiperOrigin-RevId: 356855168
2021-02-10Fix broken IFTTT link in tcpip.Ayush Ranjan
PiperOrigin-RevId: 356852625
2021-02-09Add support for setting SO_SNDBUF for unix domain sockets.Bhasker Hariharan
The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-09Move network internal code to internal packageGhanan Gowripalan
Utilities written to be common across IPv4/IPv6 are not planned to be available for public use. https://golang.org/doc/go1.4#internalpackages PiperOrigin-RevId: 356554862
2021-02-09Deprecate Failed state in favor of Unreachable stateSam Balana
... as per RFC 7048. The Failed state is an internal state that is not specified by any RFC; replacing it with the Unreachable state enables us to expose this state while keeping our terminology consistent with RFC 4861 and RFC 7048. Unreachable state replaces all internal references for Failed state. However unlike the Failed state, change events are dispatched when moving into Unreachable state. This gives developers insight into whether a neighbor entry failed address resolution or whether it was explicitly removed. The Failed state will be removed entirely once all references to it are removed. This is done to avoid a Fuchsia roll failure. Updates #4667 PiperOrigin-RevId: 356554104
2021-02-09add IPv4 options processing for forwarding and reassemblyJulian Elischer
IPv4 forwarding and reassembly needs support for option processing and regular processing also needs options to be processed before being passed to the transport layer. This patch extends option processing to those cases and provides additional testing. A small change to the ICMP error generation API code was required to allow it to know when a packet was being forwarded or not. Updates #4586 PiperOrigin-RevId: 356446681
2021-02-08Remove unnecessary lockingGhanan Gowripalan
The thing the lock protects will never be accessed concurrently. PiperOrigin-RevId: 356423331
2021-02-08Allow UDP sockets connect()ing to port 0Zeling Feng
We previously return EINVAL when connecting to port 0, however this is not the observed behavior on Linux. One of the observable effects after connecting to port 0 on Linux is that getpeername() will fail with ENOTCONN. PiperOrigin-RevId: 356413451
2021-02-08Support performing DAD for any addressGhanan Gowripalan
...as long as the network protocol supports duplicate address detection. This CL provides the facilities for a netstack integrator to perform DAD. DHCP recommends that clients effectively perform DAD before accepting an offer. As per RFC 2131 section 4.4.1 pg 38, The client SHOULD perform a check on the suggested address to ensure that the address is not already in use. For example, if the client is on a network that supports ARP, the client may issue an ARP request for the suggested request. The implementation of ARP-based IPv4 DAD effectively operates the same as IPv6's NDP DAD - using ARP requests and responses in place of NDP neighbour solicitations and advertisements, respectively. DAD performed by calls to (*Stack).CheckDuplicateAddress don't interfere with DAD performed when a new IPv6 address is added. This is so that integrator requests to check for duplicate addresses aren't unexpectedly aborted when addresses are removed. A network package internal package provides protocol agnostic DAD state management that specific protocols that provide DAD can use. Fixes #4550. Tests: - internal/ip_test.* - integration_test.TestDAD - arp_test.TestDADARPRequestPacket - ipv6.TestCheckDuplicateAddress PiperOrigin-RevId: 356405593
2021-02-08RACK: Detect lossNayana Bidari
Detect packet loss using reorder window and re-transmit them after the reorder timer expires. PiperOrigin-RevId: 356321786
2021-02-06Remove linkAddrCacheGhanan Gowripalan
It was replaced by NUD/neighborCache. Fixes #4658. PiperOrigin-RevId: 356085221
2021-02-06Synchronously send packets over pipe link endpointGhanan Gowripalan
Before this change, packets were delivered asynchronously to the remote end of a pipe. This was to avoid a deadlock during link resolution where the stack would attempt to double-lock a mutex (see removed comments in the parent commit for details). As of https://github.com/google/gvisor/commit/4943347137, we do not hold locks while sending link resolution probes so the deadlock will no longer occur. PiperOrigin-RevId: 356066224
2021-02-06Use fine grained locks while sending NDP packetsGhanan Gowripalan
Previously when sending NDP DAD or RS messages, we would hold a shared lock which lead to deadlocks (due to synchronous packet loooping (e.g. pipe and loopback link endpoints)) and lock contention. Writing packets may be an expensive operation which could prevent other goroutines from doing meaningful work if a shared lock is held while writing packets. This change upates the NDP DAD/RS timers to not hold shared locks while sending packets. PiperOrigin-RevId: 356053146
2021-02-06Remove (*stack.Stack).FindNetworkEndpointGhanan Gowripalan
The network endpoints only look for other network endpoints of the same kind. Since the network protocols keeps track of all endpoints, go through the protocol to find an endpoint with an address instead of the stack. PiperOrigin-RevId: 356051498
2021-02-06Use fine grained locks while sending NUD probesGhanan Gowripalan
Previously when sending probe messages, we would hold a shared lock which lead to deadlocks (due to synchronous packet loooping (e.g. pipe and loopback link endpoints)) and lock contention. Writing packets may be an expensive operation which could prevent other goroutines from doing meaningful work if a shared lock is held while writing packets. This change upates the NUD timers to not hold shared locks while sending packets. PiperOrigin-RevId: 356048697
2021-02-06Use embedded mutex pattern in neighbor cache/entryGhanan Gowripalan
Also while I'm here, update neighbor cahce/entry tests to use the stack's RNG instead of creating a neigbor cache/entry specific one. PiperOrigin-RevId: 356040581
2021-02-06Unexpose NICGhanan Gowripalan
The NIC structure is not to be used outside of the stack package directly. PiperOrigin-RevId: 356036737
2021-02-06Check local address directly through NICGhanan Gowripalan
Network endpoints that wish to check addresses on another NIC-local network endpoint may now do so through the NetworkInterface. This fixes a lock ordering issue between NIC removal and link resolution. Before this change: NIC Removal takes the stack lock, neighbor cache lock then neighbor entries' locks. When performing IPv4 link resolution, we take the entry lock then ARP would try check IPv4 local addresses through the stack which tries to obtain the stack's lock. Now that ARP can check IPv4 addreses through the NIC, we avoid the lock ordering issue, while also removing the need for stack to lookup the NIC. PiperOrigin-RevId: 356034245
2021-02-05Batch write packets after iptables checksGhanan Gowripalan
After IPTables checks a batch of packets, we can write packets that are not dropped or locally destined as a batch instead of individually. This previously caused a bug since WritePacket* functions expect to take ownership of passed PacketBuffer{List}. WritePackets assumed the list of PacketBuffers will not be invalidated when calling WritePacket for each PacketBuffer in the list, but this is not true. WritePacket may add the passed PacketBuffer into a different list which would modify the PacketBuffer in such a way that it no longer points to the next PacketBuffer to write. Example: Given a PB list of PB_a -> PB_b -> PB_c WritePackets may be iterating over the list and calling WritePacket for each PB. When WritePacket takes PB_a, it may add it to a new list which would update pointers such that PB_a no longer points to PB_b. Test: integration_test.TestIPTableWritePackets PiperOrigin-RevId: 355969560
2021-02-05Refactor locally delivered packetsGhanan Gowripalan
Make it clear that failing to parse a looped back is not a packet sending error but a malformed received packet error. FindNetworkEndpoint returns nil when no network endpoint is found instead of an error. PiperOrigin-RevId: 355954946
2021-02-04Lock ConnTrack before initializing bucketsGhanan Gowripalan
PiperOrigin-RevId: 355751801
2021-02-03Add a function to enable RACK in tests.Nayana Bidari
- Adds a function to enable RACK in tests. - RACK update functions are guarded behind the flag tcpRecovery. PiperOrigin-RevId: 355435973
2021-02-02Rename HandleNDupAcks in TCP.Nayana Bidari
Rename HandleNDupAcks() to HandleLossDetected() as it will enter this when is detected after: - reorder window expires and TLP (in case of RACK) - dupAckCount >= 3 PiperOrigin-RevId: 355237858
2021-02-02Add support for rate limiting out of window ACKs.Bhasker Hariharan
Netstack today will send dupACK's with no rate limit for incoming out of window segments. This can result in ACK loops for example if a TCP socket connects to itself (actually permitted by TCP). Where the ACK sent in response to packets being out of order itself gets considered as an out of window segment resulting in another ACK being generated. PiperOrigin-RevId: 355206877
2021-02-01Refactor HandleControlPacket/SockErrorGhanan Gowripalan
...to remove the need for the transport layer to deduce the type of error it received. Rename HandleControlPacket to HandleError as HandleControlPacket only handles errors. tcpip.SockError now holds a tcpip.SockErrorCause interface that different errors can implement. PiperOrigin-RevId: 354994306
2021-01-31Default to NUD/neighborCache instead of linkAddrCacheGhanan Gowripalan
This change flips gvisor to use Neighbor unreachability detection by default to populate the neighbor table as defined by RFC 4861 section 7. Although RFC 4861 is targeted at IPv6, the same algorithm is used for link resolution on IPv4 networks using ARP. Integrators may still use the legacy link address cache by setting stack.Options.UseLinkAddrCache to true; stack.Options.UseNeighborCache is now unused and will be removed. A later change will remove linkAddrCache and associated code. Updates #4658. PiperOrigin-RevId: 354850531
2021-01-31Use closure for IPv6 testContext cleanupGhanan Gowripalan
PiperOrigin-RevId: 354827491
2021-01-31Remove NICs before closing their link endpointsGhanan Gowripalan
...in IPv6 ICMP tests. A channel link endpoint's channel is closed when the link endpoint is closed. When the stack tries to send packets through a NIC with a closed channel endpoint, a panic will occur when attempting to write to a closed channel (https://golang.org/ref/spec#Close). To make sure the stack does not try to send packets through a NIC, we remove it. PiperOrigin-RevId: 354822085
2021-01-31Use different neighbor tables per network endpointGhanan Gowripalan
This stores each protocol's neighbor state separately. This change also removes the need for each neighbor entry to keep track of their own link address resolver now that all the entries in a cache will use the same resolver. PiperOrigin-RevId: 354818155
2021-01-31Hide neighbor table kind from NetworkEndpointGhanan Gowripalan
The network endpoint should not need to have logic to handle different kinds of neighbor tables. Network endpoints can let the NIC know about differnt neighbor discovery messages and let the NIC decide which table to update. This allows us to remove the LinkAddressCache interface. PiperOrigin-RevId: 354812584
2021-01-30Extract route table from Stack lockTamir Duberstein
PiperOrigin-RevId: 354746864
2021-01-30Implement LinkAddressResolver on NetworkEndpointsGhanan Gowripalan
This removes the need to provide the link address request with the NIC the request is being performed on since the NetworkEndpoints already have a reference to the NIC. PiperOrigin-RevId: 354721940
2021-01-29Make fragmentation return a reassembled PacketBufferTing-Yu Wang
This allows later decoupling of the backing network buffer implementation. PiperOrigin-RevId: 354643297
2021-01-29Clear IGMPv1 present flag on NIC downGhanan Gowripalan
This is dynamic state that can be re-learned when the NIC comes back up. Test: ipv4_test.TestIgmpV1Present PiperOrigin-RevId: 354630921
2021-01-29Refresh delayed report timers on query messagesGhanan Gowripalan
...as per As per RFC 2236 section 3 page 3 (for IGMPv2) and RFC 2710 section 4 page 5 (for MLDv1). See comments in code for more details. Test: ip_test.TestHandleQuery PiperOrigin-RevId: 354603068
2021-01-29Discard invalid Neighbor AdvertisementsPeter Johnston
...per RFC 4861 s7.1.2. Startblock: has LGTM from sbalana and then add reviewer ghanan PiperOrigin-RevId: 354539026
2021-01-28Avoid locking when route doesn't require resolutionGhanan Gowripalan
When a route does not need to resolve a remote link address to send a packet, avoid having to obtain the pending packets queue's lock. PiperOrigin-RevId: 354456280
2021-01-28RACK: Update reorder window.Nayana Bidari
After receiving an ACK(cumulative or selective), RACK will update the reorder window which is used as a settling time before marking the packet as lost. This change will add an init function to initialize the variables in RACK and also store the reference to sender in rackControl. The reorder window is calculated as per rfc: https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.2 Step 4. PiperOrigin-RevId: 354453528
2021-01-28Acquire entry lock with cache lock heldTamir Duberstein
Avoid a race condition in which an entry is acquired while it is being evicted by overlapping the entry lock with the cache lock. PiperOrigin-RevId: 354452639
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28Do not use clockwork for faketimeGhanan Gowripalan
Clockwork does not support timers being reset/stopped from different goroutines. Our current use of clockwork causes data races and gotsan complains about clockwork. This change uses our own implementation of faketime, avoiding data races. PiperOrigin-RevId: 354428208
2021-01-28Respect SO_BINDTODEVICE in unconnected UDP writesMarina Ciocea
Previously, sending on an unconnected UDP socket would ignore the SO_BINDTODEVICE option. Send on the configured interface when an UDP socket is bound to an interface through setsockop SO_BINDTODEVICE. Add packetimpact tests exercising UDP reads and writes with every combination of bound/unbound, broadcast/multicast/unicast destination, and bound/not-bound to device. PiperOrigin-RevId: 354299670
2021-01-27Confirm neighbor reachability with TCP ACKsGhanan Gowripalan
As per RFC 4861 section 7.3.1, A neighbor is considered reachable if the node has recently received a confirmation that packets sent recently to the neighbor were received by its IP layer. Positive confirmation can be gathered in two ways: hints from upper-layer protocols that indicate a connection is making "forward progress", or receipt of a Neighbor Advertisement message that is a response to a Neighbor Solicitation message. This change adds support for TCP to let the IP/link layers know that a neighbor is reachable. Test: integration_test.TestTCPConfirmNeighborReachability PiperOrigin-RevId: 354222833
2021-01-27Rename anonymous struct "mu"Tamir Duberstein
This clarifies that there is a lock involved. PiperOrigin-RevId: 354213848