summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/tcpip.go
AgeCommit message (Collapse)Author
2021-04-20Move SO_RCVBUF to socketops.Nayana Bidari
Fixes #2926, #674 PiperOrigin-RevId: 369457123
2021-04-16Enlarge port range and fix integer overflowKevin Krakauer
Also count failed TCP port allocations PiperOrigin-RevId: 368939619
2021-04-15Add field support to the sentry metrics.Nayana Bidari
Fields allow counter metrics to have multiple tabular values. At most one field is supported at the moment. PiperOrigin-RevId: 368767040
2021-04-09iptables: support postrouting hook and SNAT targetToshi Kikuchi
The current SNAT implementation has several limitations: - SNAT source port has to be specified. It is not optional. - SNAT source port range is not supported. - SNAT for UDP is a one-way translation. No response packets are handled (because conntrack doesn't support UDP currently). - SNAT and REDIRECT can't work on the same connection. Fixes #5489 PiperOrigin-RevId: 367750325
2021-04-05Fix listen backlog handling to be in parity with LinuxMithun Iyer
- Change the accept queue full condition for a listening endpoint to only honor completed (and delivered) connections. - Use syncookies if the number of incomplete connections is beyond listen backlog. This also cleans up the SynThreshold option code as that is no longer used with this change. - Added a new stack option to unconditionally generate syncookies. Similar to sysctl -w net.ipv4.tcp_syncookies=2 on Linux. - Enable keeping of incomplete connections beyond listen backlog. - Drop incoming SYNs only if the accept queue is filled up. - Drop incoming ACKs that complete handshakes when accept queue is full - Enable the stack to accept one more connection than programmed by listen backlog. - Handle backlog argument being zero, negative for listen, as Linux. - Add syscall and packetimpact tests to reflect the changes above. - Remove TCPConnectBacklog test which is polling for completed connections on the client side which is not reflective of whether the accept queue is filled up by the test. The modified syscall test in this CL addresses testing of connecting sockets. Fixes #3153 PiperOrigin-RevId: 366935921
2021-03-03Export stats that were forgottenArthur Sfez
While I'm here, simplify the comments and unify naming of certain stats across protocols. PiperOrigin-RevId: 360728849
2021-02-18Validate IGMP packetsArthur Sfez
This change also adds support for Router Alert option processing on incoming packets, a new stat for Router Alert option, and exports all the IP-option related stats. Fixes #5491 PiperOrigin-RevId: 358238123
2021-02-11[rack] TLP: ACK Processing and PTO scheduling.Ayush Ranjan
This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3 Fixes #5085 PiperOrigin-RevId: 357125037
2021-02-10Fix broken IFTTT link in tcpip.Ayush Ranjan
PiperOrigin-RevId: 356852625
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28Respect SO_BINDTODEVICE in unconnected UDP writesMarina Ciocea
Previously, sending on an unconnected UDP socket would ignore the SO_BINDTODEVICE option. Send on the configured interface when an UDP socket is bound to an interface through setsockop SO_BINDTODEVICE. Add packetimpact tests exercising UDP reads and writes with every combination of bound/unbound, broadcast/multicast/unicast destination, and bound/not-bound to device. PiperOrigin-RevId: 354299670
2021-01-27Add support for more fields in netstack for TCP_INFONayana Bidari
This CL adds support for the following fields: - RTT, RTTVar, RTO - send congestion window (sndCwnd) and send slow start threshold (sndSsthresh) - congestion control state(CaState) - ReorderSeen PiperOrigin-RevId: 354195361
2021-01-26Implement error on pointersTamir Duberstein
This improves type-assertion safety. PiperOrigin-RevId: 353931228
2021-01-26Move SO_SNDBUF to socketops.Nayana Bidari
This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For unix sockets, we do not support setting of this option. PiperOrigin-RevId: 353871484
2021-01-25Add per endpoint ARP statisticsArthur Sfez
The ARP stat NetworkUnreachable was removed, and was replaced by InterfaceHasNoLocalAddress. No stats are recorded when dealing with an missing endpoint (ErrNotConnected) (because if there is no endpoint, there is no valid per-endpoint stats). PiperOrigin-RevId: 353759462
2021-01-22Define tcpip.Payloader in terms of io.ReaderTamir Duberstein
Fixes #1509. PiperOrigin-RevId: 353295589
2021-01-19Per NIC NetworkEndpoint statisticsArthur Sfez
To facilitate the debugging of multi-homed setup, track Network protocols statistics for each endpoint. Note that the original stack-wide stats still exist. A new type of statistic counter is introduced, which track two versions of a stat at the same time. This lets a network endpoint increment both the local stat and the stack-wide stat at the same time. Fixes #4605 PiperOrigin-RevId: 352663276
2021-01-15Remove count argument from tcpip.Endpoint.ReadTamir Duberstein
The same intent can be specified via the io.Writer. PiperOrigin-RevId: 352098747
2021-01-14Add stats for ARPArthur Sfez
Fixes #4963 Startblock: has LGTM from sbalana and then add reviewer ghanan PiperOrigin-RevId: 351886320
2021-01-13Do not resolve remote link address at transport layerGhanan Gowripalan
Link address resolution is performed at the link layer (if required) so we can defer it from the transport layer. When link resolution is required, packets will be queued and sent once link resolution completes. If link resolution fails, the transport layer will receive a control message indicating that the stack failed to route the packet. tcpip.Endpoint.Write no longer returns a channel now that writes do not wait for link resolution at the transport layer. tcpip.ErrNoLinkAddress is no longer used so it is removed. Removed calls to stack.Route.ResolveWith from the transport layer so that link resolution is performed when a route is created in response to an incoming packet (e.g. to complete TCP handshakes or send a RST). Tests: - integration_test.TestForwarding - integration_test.TestTCPLinkResolutionFailure Fixes #4458 RELNOTES: n/a PiperOrigin-RevId: 351684158
2021-01-07netstack: Refactor tcpip.Endpoint.ReadTing-Yu Wang
Read now takes a destination io.Writer, count, options. Keeping the method name Read, in contrast to the Write method. This enables: * direct transfer of views under VV * zero copy It also eliminates the need for sentry to keep a slice of view because userspace had requested a read that is smaller than the view returned, removing the complexity there. Read/Peek/ReadPacket are now consolidated together and some duplicate code is removed. PiperOrigin-RevId: 350636322
2020-12-22Move SO_BINDTODEVICE to socketops.Nayana Bidari
PiperOrigin-RevId: 348696094
2020-12-21Prefer matching labels and longest matching prefixGhanan Gowripalan
...when performing source address selection for IPv6. These are defined in RFC 6724 section 5 rule 6 (prefer matching label) and rule 8 (use longest matching prefix). This change also considers ULA of global scope instead of its own scope, as per RFC 6724 section 3.1: Also, note that ULAs are considered as global, not site-local, scope but are handled via the prefix policy table as discussed in Section 10.6. Test: stack_test.TestIPv6SourceAddressSelectionScope Startblock: has LGTM from peterjohnston and then add reviewer brunodalbo PiperOrigin-RevId: 348580996
2020-12-17[netstack] Implement MSG_ERRQUEUE flag for recvmsg(2).Ayush Ranjan
Introduces the per-socket error queue and the necessary cmsg mechanisms. PiperOrigin-RevId: 348028508
2020-12-14Move SO_LINGER option to socketops.Nayana Bidari
PiperOrigin-RevId: 347437786
2020-12-14Move SO_ERROR and SO_OOBINLINE option to socketops.Nayana Bidari
SO_OOBINLINE option is set/get as boolean value, which is the same as linux. As we currently do not support disabling this option, we always return it as true. PiperOrigin-RevId: 347413905
2020-12-11[netstack] Decouple tcpip.ControlMessages from the IP control messges.Ayush Ranjan
tcpip.ControlMessages can not contain Linux specific structures which makes it painful to convert back and forth from Linux to tcpip back to Linux when passing around control messages in hostinet and raw sockets. Now we convert to the Linux version of the control message as soon as we are out of tcpip. PiperOrigin-RevId: 347027065
2020-12-09Add support for IP_RECVORIGDSTADDR IP option.Bhasker Hariharan
Fixes #5004 PiperOrigin-RevId: 346643745
2020-12-09[netstack] Make tcpip.Error savable.Ayush Ranjan
Earlier we could not save tcpip.Error objects in structs because upon restore the constant's address changes in netstack's error translation map and translating the error would panic because the map is based on the address of the tcpip.Error instead of the error itself. Now I made that translations map use the error message as key instead of the address. Added relevant synchronization mechanisms to protect the structure and initialize it upon restore. PiperOrigin-RevId: 346590485
2020-12-07Export IGMP statsArthur Sfez
PiperOrigin-RevId: 346197760
2020-12-02Extract ICMPv4/v6 specific stats to their own typesArthur Sfez
This change lets us split the v4 stats from the v6 stats, which will be useful when adding stats for each network endpoint. PiperOrigin-RevId: 345322615
2020-11-26[netstack] Add SOL_TCP options to SocketOptions.Ayush Ranjan
Ports the following options: - TCP_NODELAY - TCP_CORK - TCP_QUICKACK Also deletes the {Get/Set}SockOptBool interface methods from all implementations PiperOrigin-RevId: 344378824
2020-11-25[netstack] Add SOL_IP and SOL_IPV6 options to SocketOptions.Ayush Ranjan
We will use SocketOptions for all kinds of options, not just SOL_SOCKET options because (1) it is consistent with Linux which defines all option variables on the top level socket struct, (2) avoid code complexity. Appropriate checks have been added for matching option level to the endpoint type. Ported the following options to this new utility: - IP_MULTICAST_LOOP - IP_RECVTOS - IPV6_RECVTCLASS - IP_PKTINFO - IP_HDRINCL - IPV6_V6ONLY Changes in behavior (these are consistent with what Linux does AFAICT): - Now IP_MULTICAST_LOOP can be set for TCP (earlier it was a noop) but does not affect the endpoint itself. - We can now getsockopt IP_HDRINCL (earlier we would get an error). - Now we return ErrUnknownProtocolOption if SOL_IP or SOL_IPV6 options are used on unix sockets. - Now we return ErrUnknownProtocolOption if SOL_IPV6 options are used on non AF_INET6 endpoints. This change additionally makes the following modifications: - Add State() uint32 to commonEndpoint because both tcpip.Endpoint and transport.Endpoint interfaces have it. It proves to be quite useful. - Gets rid of SocketOptionsHandler.IsListening(). It was an anomaly as it was not a handler. It is now implemented on netstack itself. - Gets rid of tcp.endpoint.EndpointInfo and directly embeds stack.TransportEndpointInfo. There was an unnecessary level of embedding which served no purpose. - Removes some checks dual_stack_test.go that used the errors from GetSockOptBool(tcpip.V6OnlyOption) to confirm some state. This is not consistent with the new design and also seemed to be testing the implementation instead of behavior. PiperOrigin-RevId: 344354051
2020-11-25Support listener-side MLDv1Ghanan Gowripalan
...as defined by RFC 2710. Querier (router)-side MLDv1 is not yet supported. The core state machine is shared with IGMPv2. This is guarded behind a flag (ipv6.Options.MLDEnabled). Tests: ip_test.TestMGP* Bug #4861 PiperOrigin-RevId: 344344095
2020-11-24Extract IGMPv2 core state machineGhanan Gowripalan
The IGMPv2 core state machine can be shared with MLDv1 since they are almost identical, ignoring specific addresses, constants and packets. Bug #4682, #4861 PiperOrigin-RevId: 344102615
2020-11-19Perform IGMPv2 when joining IPv4 multicast groupsRyan Heacock
Added headers, stats, checksum parsing capabilities from RFC 2236 describing IGMPv2. IGMPv2 state machine is implemented for each condition, sending and receiving IGMP Membership Reports and Leave Group messages with backwards compatibility with IGMPv1 routers. Test: * Implemented igmp header parser and checksum calculator in header/igmp_test.go * ipv4/igmp_test.go tests incoming and outgoing IGMP messages and pathways. * Added unit test coverage for IGMPv2 RFC behavior + IGMPv1 backwards compatibility in ipv4/igmp_test.go. Fixes #4682 PiperOrigin-RevId: 343408809
2020-11-19Remove unused NoChecksumOptionBruno Dal Bo
Migration to unified socket options left this behind. PiperOrigin-RevId: 343305434
2020-11-18[netstack] Move SO_KEEPALIVE and SO_ACCEPTCONN option to SocketOptions.Ayush Ranjan
PiperOrigin-RevId: 343217712
2020-11-18[netstack] Move SO_REUSEPORT and SO_REUSEADDR option to SocketOptions.Ayush Ranjan
This changes also introduces: - `SocketOptionsHandler` interface which can be implemented by endpoints to handle endpoint specific behavior on SetSockOpt. This is analogous to what Linux does. - `DefaultSocketOptionsHandler` which is a default implementation of the above. This is embedded in all endpoints so that we don't have to uselessly implement empty functions. Endpoints with specific behavior can override the embedded method by manually defining its own implementation. PiperOrigin-RevId: 343158301
2020-11-18[netstack] Move SO_PASSCRED option to SocketOptions.Ayush Ranjan
This change also makes the following fixes: - Make SocketOptions use atomic operations instead of having to acquire/drop locks upon each get/set option. - Make documentation more consistent. - Remove tcpip.SocketOptions from socketOpsCommon because it already exists in transport.Endpoint. - Refactors get/set socket options tests to be easily extendable. PiperOrigin-RevId: 343103780
2020-11-12Refactor SOL_SOCKET optionsNayana Bidari
Store all the socket level options in a struct and call {Get/Set}SockOpt on this struct. This will avoid implementing socket level options on all endpoints. This CL contains implementing one socket level option for tcp and udp endpoints. PiperOrigin-RevId: 342203981
2020-11-12Move packet handling to NetworkEndpointGhanan Gowripalan
The NIC should not hold network-layer state or logic - network packet handling/forwarding should be performed at the network layer instead of the NIC. Fixes #4688 PiperOrigin-RevId: 342166985
2020-10-27Add support for Timestamp and RecordRoute IP optionsJulian Elischer
IPv4 options extend the size of the IP header and have a basic known format. The framework can process that format without needing to know about every possible option. We can add more code to handle additional option types as we need them. Bad options or mangled option entries can result in ICMP Parameter Problem packets. The first types we support are the Timestamp option and the Record Route option, included in this change. The options are processed at several points in the packet flow within the Network stack, with slightly different requirements. The framework includes a mechanism to control this at each point. Support has been added for such points which are only present in upcoming CLs such as during packet forwarding and fragmentation. With this change, 'ping -R' and 'ping -T' work against gVisor and Fuchsia. $ ping -R 192.168.1.2 PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.990 ms NOP RR: 192.168.1.1 192.168.1.2 192.168.1.1 $ ping -T tsprespec 192.168.1.2 192.168.1.1 192.168.1.2 PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=1.20 ms TS: 192.168.1.2 71486821 absolute 192.168.1.1 746 Unit tests included for generic options, Timestamp options and Record Route options. PiperOrigin-RevId: 339379076
2020-10-27Add basic address deletion to netlinkIan Lewis
Updates #3921 PiperOrigin-RevId: 339195417
2020-10-23Support getsockopt for SO_ACCEPTCONN.Nayana Bidari
The SO_ACCEPTCONN option is used only on getsockopt(). When this option is specified, getsockopt() indicates whether socket listening is enabled for the socket. A value of zero indicates that socket listening is disabled; non-zero that it is enabled. PiperOrigin-RevId: 338703206
2020-10-16Enable IPv6 WriteHeaderIncludedPacketGhanan Gowripalan
Allow writing an IPv6 packet where the IPv6 header is a provided by the user. * Introduce an error to let callers know a header is malformed. We previously useed tcpip.ErrInvalidOptionValue but that did not seem appropriate for generic malformed header errors. * Populate network header in WriteHeaderIncludedPacket IPv4's implementation of WriteHeaderIncludedPacket did not previously populate the packet buffer's network header. This change fixes that. Fixes #4527 Test: ip_test.TestWriteHeaderIncludedPacket PiperOrigin-RevId: 337534548
2020-09-29Don't allow broadcast/multicast source addressGhanan Gowripalan
As per relevant IP RFCS (see code comments), broadcast (for IPv4) and multicast addresses are not allowed. Currently checks for these are done at the transport layer, but since it is explicitly forbidden at the IP layers, check for them there. This change also removes the UDP.InvalidSourceAddress stat since there is no longer a need for it. Test: ip_test.TestSourceAddressValidation PiperOrigin-RevId: 334490971
2020-09-29Move IP state from NIC to NetworkEndpoint/ProtocolGhanan Gowripalan
* Add network address to network endpoints. Hold network-specific state in the NetworkEndpoint instead of the stack. This results in the stack no longer needing to "know" about the network endpoints and special case certain work for various endpoints (e.g. IPv6 DAD). * Provide NetworkEndpoints with an NetworkInterface interface. Instead of just passing the NIC ID of a NIC, pass an interface so the network endpoint may query other information about the NIC such as whether or not it is a loopback device. * Move NDP code and state to the IPv6 package. NDP is IPv6 specific so there is no need for it to live in the stack. * Control forwarding through NetworkProtocols instead of Stack Forwarding should be controlled on a per-network protocol basis so forwarding configurations are now controlled through network protocols. * Remove stack.referencedNetworkEndpoint. Now that addresses are exposed via AddressEndpoint and only one NetworkEndpoint is created per interface, there is no need for a referenced NetworkEndpoint. * Assume network teardown methods are infallible. Fixes #3871, #3916 PiperOrigin-RevId: 334319433
2020-09-26Remove generic ICMP errorsGhanan Gowripalan
Generic ICMP errors were required because the transport dispatcher was given the responsibility of sending ICMP errors in response to transport packet delivery failures. Instead, the transport dispatcher should let network layer know it failed to deliver a packet (and why) and let the network layer make the decision as to what error to send (if any). Fixes #4068 PiperOrigin-RevId: 333962333
2020-09-23Extract ICMP error sender from UDPJulian Elischer
Store transport protocol number on packet buffers for use in ICMP error generation. Updates #2211. PiperOrigin-RevId: 333252762