summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/stack
AgeCommit message (Collapse)Author
2019-11-06Use PacketBuffers, rather than VectorisedViews, in netstack.Kevin Krakauer
PacketBuffers are analogous to Linux's sk_buff. They hold all information about a packet, headers, and payload. This is important for: * iptables to access various headers of packets * Preventing the clutter of passing different net and link headers along with VectorisedViews to packet handling functions. This change only affects the incoming packet path, and a future change will change the outgoing path. Benchmark Regular PacketBufferPtr PacketBufferConcrete -------------------------------------------------------------------------------- BM_Recvmsg 400.715MB/s 373.676MB/s 396.276MB/s BM_Sendmsg 361.832MB/s 333.003MB/s 335.571MB/s BM_Recvfrom 453.336MB/s 393.321MB/s 381.650MB/s BM_Sendto 378.052MB/s 372.134MB/s 341.342MB/s BM_SendmsgTCP/0/1k 353.711MB/s 316.216MB/s 322.747MB/s BM_SendmsgTCP/0/2k 600.681MB/s 588.776MB/s 565.050MB/s BM_SendmsgTCP/0/4k 995.301MB/s 888.808MB/s 941.888MB/s BM_SendmsgTCP/0/8k 1.517GB/s 1.274GB/s 1.345GB/s BM_SendmsgTCP/0/16k 1.872GB/s 1.586GB/s 1.698GB/s BM_SendmsgTCP/0/32k 1.017GB/s 1.020GB/s 1.133GB/s BM_SendmsgTCP/0/64k 475.626MB/s 584.587MB/s 627.027MB/s BM_SendmsgTCP/0/128k 416.371MB/s 503.434MB/s 409.850MB/s BM_SendmsgTCP/0/256k 323.449MB/s 449.599MB/s 388.852MB/s BM_SendmsgTCP/0/512k 243.992MB/s 267.676MB/s 314.474MB/s BM_SendmsgTCP/0/1M 95.138MB/s 95.874MB/s 95.417MB/s BM_SendmsgTCP/0/2M 96.261MB/s 94.977MB/s 96.005MB/s BM_SendmsgTCP/0/4M 96.512MB/s 95.978MB/s 95.370MB/s BM_SendmsgTCP/0/8M 95.603MB/s 95.541MB/s 94.935MB/s BM_SendmsgTCP/0/16M 94.598MB/s 94.696MB/s 94.521MB/s BM_SendmsgTCP/0/32M 94.006MB/s 94.671MB/s 94.768MB/s BM_SendmsgTCP/0/64M 94.133MB/s 94.333MB/s 94.746MB/s BM_SendmsgTCP/0/128M 93.615MB/s 93.497MB/s 93.573MB/s BM_SendmsgTCP/0/256M 93.241MB/s 95.100MB/s 93.272MB/s BM_SendmsgTCP/1/1k 303.644MB/s 316.074MB/s 308.430MB/s BM_SendmsgTCP/1/2k 537.093MB/s 584.962MB/s 529.020MB/s BM_SendmsgTCP/1/4k 882.362MB/s 939.087MB/s 892.285MB/s BM_SendmsgTCP/1/8k 1.272GB/s 1.394GB/s 1.296GB/s BM_SendmsgTCP/1/16k 1.802GB/s 2.019GB/s 1.830GB/s BM_SendmsgTCP/1/32k 2.084GB/s 2.173GB/s 2.156GB/s BM_SendmsgTCP/1/64k 2.515GB/s 2.463GB/s 2.473GB/s BM_SendmsgTCP/1/128k 2.811GB/s 3.004GB/s 2.946GB/s BM_SendmsgTCP/1/256k 3.008GB/s 3.159GB/s 3.171GB/s BM_SendmsgTCP/1/512k 2.980GB/s 3.150GB/s 3.126GB/s BM_SendmsgTCP/1/1M 2.165GB/s 2.233GB/s 2.163GB/s BM_SendmsgTCP/1/2M 2.370GB/s 2.219GB/s 2.453GB/s BM_SendmsgTCP/1/4M 2.005GB/s 2.091GB/s 2.214GB/s BM_SendmsgTCP/1/8M 2.111GB/s 2.013GB/s 2.109GB/s BM_SendmsgTCP/1/16M 1.902GB/s 1.868GB/s 1.897GB/s BM_SendmsgTCP/1/32M 1.655GB/s 1.665GB/s 1.635GB/s BM_SendmsgTCP/1/64M 1.575GB/s 1.547GB/s 1.575GB/s BM_SendmsgTCP/1/128M 1.524GB/s 1.584GB/s 1.580GB/s BM_SendmsgTCP/1/256M 1.579GB/s 1.607GB/s 1.593GB/s PiperOrigin-RevId: 278940079
2019-11-06Validate incoming NDP Router Advertisements, as per RFC 4861 section 6.1.2Ghanan Gowripalan
This change validates incoming NDP Router Advertisements as per RFC 4861 section 6.1.2. It also includes the skeleton to handle Router Advertiements that arrive on some NIC. Tests: Unittest to make sure only valid NDP Router Advertisements are received/ not dropped. PiperOrigin-RevId: 278891972
2019-10-30Deep copy dispatcher views.Kevin Krakauer
When VectorisedViews were passed up the stack from packet_dispatchers, we were passing a sub-slice of the dispatcher's views fields. The dispatchers then immediately set those views to nil. This wasn't caught before because every implementer copied the data in these views before returning. PiperOrigin-RevId: 277615351
2019-10-30Store endpoints inside multiPortEndpoint in a sorted orderAndrei Vagin
It is required to guarantee the same order of endpoints after save/restore. PiperOrigin-RevId: 277598665
2019-10-29Add Close and Wait methods to stack.Ian Gudger
Link endpoints still don't have a unified way to be requested to stop. Updates #837 PiperOrigin-RevId: 277398952
2019-10-29Add endpoint tracking to the stack.Ian Gudger
In the future this will replace DanglingEndpoints. DanglingEndpoints must be kept for now due to issues with save/restore. This is arguably a cleaner design and allows the stack to know which transport endpoints might still be using its link endpoints. Updates #837 PiperOrigin-RevId: 277386633
2019-10-29Allow waiting for Endpoint worker goroutines to finish.Ian Gudger
Updates #837 PiperOrigin-RevId: 277325162
2019-10-28Use the user supplied TCP MSS when creating a new active socketGhanan Gowripalan
This change supports using a user supplied TCP MSS for new active TCP connections. Note, the user supplied MSS must be less than or equal to the maximum possible MSS for a TCP connection's route. If it is greater than the maximum possible MSS, the maximum possible MSS will be used as the connection's MSS instead. This change does not use this user supplied MSS for connections accepted from listening sockets - that will come in a later change. Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user supplied MSS if it is not greater than the maximum possible MSS for the route. PiperOrigin-RevId: 277185125
2019-10-24Use interface-specific NDP configurations instead of the stack-wide default.Ghanan Gowripalan
This change makes it so that NDP work is done using the per-interface NDP configurations instead of the stack-wide default NDP configurations to correctly implement RFC 4861 section 6.3.2 (note here, a host is a single NIC operating as a host device), and RFC 4862 section 5.1. Test: Test that we can set NDP configurations on a per-interface basis without affecting the configurations of other interfaces or the stack-wide default. Also make sure that after the configurations are updated, the updated configurations are used for NDP processes (e.g. Duplicate Address Detection). PiperOrigin-RevId: 276525661
2019-10-23Inform netstack integrator when Duplicate Address Detection completesGhanan Gowripalan
This change introduces a new interface, stack.NDPDispatcher. It can be implemented by the netstack integrator to receive NDP related events. As of this change, only DAD related events are supported. Tests: Existing tests were modified to use the NDPDispatcher's DAD events for DAD tests where it needed to wait for DAD completing (failing and resolving). PiperOrigin-RevId: 276338733
2019-10-22Respect new PrimaryEndpointBehavior when addresses gets promoted to permanentGhanan Gowripalan
This change makes sure that when an address which is already known by a NIC and has kind = permanentExpired gets promoted to permanent, the new PrimaryEndpointBehavior is respected. PiperOrigin-RevId: 276136317
2019-10-22netstack/tcp: software segmentation offloadAndrei Vagin
Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-22Auto-generate an IPv6 link-local address based on the NIC's MAC Address.Ghanan Gowripalan
This change adds support for optionally auto-generating an IPv6 link-local address based on the NIC's MAC Address on NIC enable. Note, this change will not break existing uses of netstack as the default configuration for the stack options is set in such a way that a link-local address will not be auto-generated unless the stack is explicitly configured. See `stack.Options` for more details. Specifically, see `stack.Options.AutoGenIPv6LinkLocal`. Tests: Tests to make sure that the IPb6 link-local address is only auto-generated if the stack is specifically configured to do so. Also tests to make sure that an auto-generated address goes through the DAD process. PiperOrigin-RevId: 276059813
2019-10-21AF_PACKET support for netstack (aka epsocket).Kevin Krakauer
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366
2019-10-18Store primary endpoints in a sliceTamir Duberstein
There's no need for a linked list here. PiperOrigin-RevId: 275565920
2019-10-18Remove restrictions on the sending addressTamir Duberstein
It is quite legal to send from the ANY address (it is required for DHCP). I can't figure out why the broadcast address was included here, so removing that as well. PiperOrigin-RevId: 275541954
2019-10-17NDP Neighbor Solicitations sent during DAD must have an IP hop limit of 255Ghanan Gowripalan
NDP Neighbor Solicitations sent during Duplicate Address Detection must have an IP hop limit of 255, as all NDP Neighbor Solicitations should have. Test: Test that DAD messages have the IPv6 hop limit field set to 255. PiperOrigin-RevId: 275321680
2019-10-16Do Duplicate Address Detection on permanent IPv6 addresses.Ghanan Gowripalan
This change adds support for Duplicate Address Detection on IPv6 addresses as defined by RFC 4862 section 5.4. Note, this change will not break existing uses of netstack as the default configuration for the stack options is set in such a way that DAD will not be performed. See `stack.Options` and `stack.NDPConfigurations` for more details. Tests: Tests to make sure that the DAD process properly resolves or fails. That is, tests make sure that DAD resolves only if: - No other node is performing DAD for the same address - No other node owns the same address PiperOrigin-RevId: 275189471
2019-10-15Set NDP hop limit in accordance with RFC 4861Tamir Duberstein
...and do not populate link address cache at dispatch. This partially reverts 313c767b0001bf6271405f1b765b60a334d6e911, which caused malformed packets (e.g. NDP Neighbor Adverts with incorrect hop limit values) to populate the address cache. In particular, this masked a bug that was introduced to the Neighbor Advert generation code in 7c1587e3401a010d1865df61dbaf117c77dd062e. PiperOrigin-RevId: 274865182
2019-10-14Internal change.gVisor bot
PiperOrigin-RevId: 274700093
2019-10-14Reorder BUILD license and load functions in netstack.Kevin Krakauer
PiperOrigin-RevId: 274672346
2019-10-09Internal change.gVisor bot
PiperOrigin-RevId: 273861936
2019-10-07Implement IP_TTL.Ian Gudger
Also change the default TTL to 64 to match Linux. PiperOrigin-RevId: 273430341
2019-10-03Implement proper local broadcast behaviorChris Kuiper
The behavior for sending and receiving local broadcast (255.255.255.255) traffic is as follows: Outgoing -------- * A broadcast packet sent on a socket that is bound to an interface goes out that interface * A broadcast packet sent on an unbound socket follows the route table to select the outgoing interface + if an explicit route entry exists for 255.255.255.255/32, use that one + else use the default route * Broadcast packets are looped back and delivered following the rules for incoming packets (see next). This is the same behavior as for multicast packets, except that it cannot be disabled via sockopt. Incoming -------- * Sockets wishing to receive broadcast packets must bind to either INADDR_ANY (0.0.0.0) or INADDR_BROADCAST (255.255.255.255). No other socket receives broadcast packets. * Broadcast packets are multiplexed to all sockets matching it. This is the same behavior as for multicast packets. * A socket can bind to 255.255.255.255:<port> and then receive its own broadcast packets sent to 255.255.255.255:<port> In addition, this change implicitly fixes an issue with multicast reception. If two sockets want to receive a given multicast stream and one is bound to ANY while the other is bound to the multicast address, only one of them will receive the traffic. PiperOrigin-RevId: 272792377
2019-09-30Fix bugs in PickEphemeralPort for TCP.Bhasker Hariharan
Netstack always picks a random start point everytime PickEphemeralPort is called. While this is required for UDP so that DNS requests go out through a randomized set of ports it is not required for TCP. Infact Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret initialized at start of the application to get a random offset. But to ensure it doesn't start from the same point on every scan it uses a static hint that is incremented by 2 in every call to pick ephemeral ports. The reason for 2 is Linux seems to split the port ranges where active connects seem to use even ones while odd ones are used by listening sockets. This CL implements a similar strategy where we use a hash + hint to generate the offset to start the search for a free Ephemeral port. This ensures that we cycle through the available port space in order for repeated connects to the same destination and significantly reduces the chance of picking a recently released port. PiperOrigin-RevId: 272058370
2019-09-27Implement SO_BINDTODEVICE sockoptgVisor bot
PiperOrigin-RevId: 271644926
2019-09-25Remove centralized registration of protocols.Kevin Krakauer
Also removes the need for protocol names. PiperOrigin-RevId: 271186030
2019-09-24Return only primary addresses in Stack.NICInfo()Chris Kuiper
Non-primary addresses are used for endpoints created to accept multicast and broadcast packets, as well as "helper" endpoints (0.0.0.0) that allow sending packets when no proper address has been assigned yet (e.g., for DHCP). These addresses are not real addresses from a user point of view and should not be part of the NICInfo() value. Also see b/127321246 for more info. This switches NICInfo() to call a new NIC.PrimaryAddresses() function. To still allow an option to get all addresses (mostly for testing) I added Stack.GetAllAddresses() and NIC.AllAddresses(). In addition, the return value for GetMainNICAddress() was changed for the case where the NIC has no primary address. Instead of returning an error here, it now returns an empty AddressWithPrefix() value. The rational for this change is that it is a valid case for a NIC to have no primary addresses. Lastly, I refactored the code based on the new additions. PiperOrigin-RevId: 270971764
2019-09-24Simplify ICMPRateLimiterTamir Duberstein
https://github.com/golang/time/commit/c4c64ca added SetBurst upstream. PiperOrigin-RevId: 270925077
2019-09-23netstack: convert more socket options to {Set,Get}SockOptIntAndrei Vagin
PiperOrigin-RevId: 270763208
2019-09-20Allow waiting for LinkEndpoint worker goroutines to finish.Ian Gudger
Previously, the only safe way to use an fdbased endpoint was to leak the FD. This change makes it possible to safely close the FD. This is the first step towards having stoppable stacks. Updates #837 PiperOrigin-RevId: 270346582
2019-09-17Automated rollback of changelist 268047073Ghanan Gowripalan
PiperOrigin-RevId: 269658971
2019-09-12Implement splice methods for pipes and sockets.Adin Scannell
This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207
2019-09-12Remove go_test from go_stateify and go_marshalMichael Pratt
They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-09-12Automated rollback of changelist 268047073Ghanan Gowripalan
PiperOrigin-RevId: 268757842
2019-09-09Join IPv6 all-nodes and solicited-node multicast addresses where appropriate.Ghanan Gowripalan
The IPv6 all-nodes multicast address will be joined on NIC enable, and the appropriate IPv6 solicited-node multicast address will be joined when IPv6 addresses are added. Tests: Test receiving packets destined to the IPv6 link-local all-nodes multicast address and the IPv6 solicted node address of an added IPv6 address. PiperOrigin-RevId: 268047073
2019-09-06Remove reundant global tcpip.LinkEndpointID.Ian Gudger
PiperOrigin-RevId: 267709597
2019-09-04Handle subnet and broadcast addresses correctly with NIC.subnetsChris Kuiper
This also renames "subnet" to "addressRange" to avoid any more confusion with an interface IP's subnet. Lastly, this also removes the Stack.ContainsSubnet(..) API since it isn't used by anyone. Plus the same information can be obtained from Stack.NICAddressRanges(). PiperOrigin-RevId: 267229843
2019-09-03Make UDP traceroute work.Bhasker Hariharan
Adds support to generate Port Unreachable messages for UDP datagrams received on a port for which there is no valid endpoint. Fixes #703 PiperOrigin-RevId: 267034418
2019-08-30Fix data race accessing referencedNetworkEndpoint.kindChris Kuiper
Wrapping "kind" into atomic access functions. Fixes #789 PiperOrigin-RevId: 266485501
2019-08-28Export generated linkAddrEntryEntryTamir Duberstein
PiperOrigin-RevId: 266000128
2019-08-27Populate link address cache at dispatchTamir Duberstein
This allows the stack to learn remote link addresses on incoming packets, reducing the need to ARP to send responses. This also reduces the number of round trips to the system clock, since that may also prove to be performance-sensitive. Fixes #739. PiperOrigin-RevId: 265815816
2019-08-26Prevent a network endpoint to send/rcv if its address was removedChris Kuiper
This addresses the problem where an endpoint has its address removed but still has outstanding references held by routes used in connected TCP/UDP sockets which prevent the removal of the endpoint. The fix adds a new "expired" flag to the referenced network endpoint, which is set when an endpoint has its address removed. Incoming packets are not delivered to an expired endpoint (unless in promiscuous mode), while sending outgoing packets triggers an error to the caller (unless in spoofing mode). In addition, a few helper functions were added to stack_test.go to reduce code duplications. PiperOrigin-RevId: 265514326
2019-08-21Use tcpip.Subnet in tcpip.RouteTamir Duberstein
This is the first step in replacing some of the redundant types with the standard library equivalents. PiperOrigin-RevId: 264706552
2019-08-16netstack: disconnect an unix socket only if the address family is AF_UNSPECAndrei Vagin
Linux allows to call connect for ANY and the zero port. PiperOrigin-RevId: 263892534
2019-08-16Add subnet checking to NIC.findEndpoint and consolidate with NIC.getRefChris Kuiper
This adds the same logic to NIC.findEndpoint that is already done in NIC.getRef. Since this makes the two functions very similar they were combined into one with the originals being wrappers. PiperOrigin-RevId: 263864708
2019-08-14Replace uinptr with int64 when returning lengthsTamir Duberstein
This is in accordance with newer parts of the standard library. PiperOrigin-RevId: 263449916
2019-08-08netstack: Don't start endpoint goroutines too soon on restore.Rahat Mahmood
Endpoint protocol goroutines were previously started as part of loading the endpoint. This is potentially too soon, as resources used by these goroutine may not have been loaded. Protocol goroutines may perform meaningful work as soon as they're started (ex: incoming connect) which can cause them to indirectly access resources that haven't been loaded yet. This CL defers resuming all protocol goroutines until the end of restore. PiperOrigin-RevId: 262409429
2019-08-02Plumbing for iptables sockopts.Kevin Krakauer
PiperOrigin-RevId: 261413396
2019-08-02Automated rollback of changelist 261191548Rahat Mahmood
PiperOrigin-RevId: 261373749