summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/stack/stack.go
AgeCommit message (Collapse)Author
2020-07-15Fix minor bugs in a couple of interface IOCTLs.Bhasker Hariharan
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks tcpdump as it tries to interpret the packets incorrectly. Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which fails with an EINVAL since we don't implement it. For now change it to return EOPNOTSUPP to indicate that we don't support the query rather than return EINVAL. NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type field while NIC capabilities are more like the device features which can be queried using SIOCETHTOOL but not modified and NIC Flags are fields that can be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc. Updates #2746 PiperOrigin-RevId: 321436525
2020-07-13garbage collect connectionsKevin Krakauer
As in Linux, we must periodically clean up unused connections. PiperOrigin-RevId: 321003353
2020-06-24Add support for Stack level options.Bhasker Hariharan
Linux controls socket send/receive buffers using a few sysctl variables - net.core.rmem_default - net.core.rmem_max - net.core.wmem_max - net.core.wmem_default - net.ipv4.tcp_rmem - net.ipv4.tcp_wmem The first 4 control the default socket buffer sizes for all sockets raw/packet/tcp/udp and also the maximum permitted socket buffer that can be specified in setsockopt(SOL_SOCKET, SO_(RCV|SND)BUF,...). The last two control the TCP auto-tuning limits and override the default specified in rmem_default/wmem_default as well as the max limits. Netstack today only implements tcp_rmem/tcp_wmem and incorrectly uses it to limit the maximum size in setsockopt() as well as uses it for raw/udp sockets. This changelist introduces the other 4 and updates the udp/raw sockets to use the newly introduced variables. The values for min/max match the current tcp_rmem/wmem values and the default value buffers for UDP/RAW sockets is updated to match the linux value of 212KiB up from the really low current value of 32 KiB. Updates #3043 Fixes #3043 PiperOrigin-RevId: 318089805
2020-06-23Add support for SO_REUSEADDR to TCP sockets/endpoints.Ian Gudger
For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses. gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did not allow disabling it. This change brings the SO_REUSEADDR behavior closer to the behavior implemented by Linux and adds a new SO_REUSEADDR disabled behavior. Like Linux, SO_REUSEADDR is now disabled by default. PiperOrigin-RevId: 317984380
2020-06-18Cleanup tcp.timer and tcpip.RouteGhanan Gowripalan
When a tcp.timer or tcpip.Route is no longer used, clean up its resources so that unused memory may be released. PiperOrigin-RevId: 317046582
2020-06-10Add support for SO_REUSEADDR to UDP sockets/endpoints.Ian Gudger
On UDP sockets, SO_REUSEADDR allows multiple sockets to bind to the same address, but only delivers packets to the most recently bound socket. This differs from the behavior of SO_REUSEADDR on TCP sockets. SO_REUSEADDR for TCP sockets will likely need an almost completely independent implementation. SO_REUSEADDR has some odd interactions with the similar SO_REUSEPORT. These interactions are tested fairly extensively and all but one particularly odd one (that honestly seems like a bug) behave the same on gVisor and Linux. PiperOrigin-RevId: 315844832
2020-06-09Handle removed NIC in NDP timer for packet txGhanan Gowripalan
NDP packets are sent periodically from NDP timers. These timers do not hold the NIC lock when sending packets as the packet write operation may take some time. While the lock is not held, the NIC may be removed by some other goroutine. This change handles that scenario gracefully. Test: stack_test.TestRemoveNICWhileHandlingRSTimer PiperOrigin-RevId: 315524143
2020-06-05Fix copylocks error about copying IPTables.Ting-Yu Wang
IPTables.connections contains a sync.RWMutex. Copying it will trigger copylocks analysis. Tested by manually enabling nogo tests. sync.RWMutex is added to IPTables for the additional race condition discovered. PiperOrigin-RevId: 314817019
2020-06-03Pass PacketBuffer as pointer.Ting-Yu Wang
Historically we've been passing PacketBuffer by shallow copying through out the stack. Right now, this is only correct as the caller would not use PacketBuffer after passing into the next layer in netstack. With new buffer management effort in gVisor/netstack, PacketBuffer will own a Buffer (to be added). Internally, both PacketBuffer and Buffer may have pointers and shallow copying shouldn't be used. Updates #2404. PiperOrigin-RevId: 314610879
2020-05-15Minor formatting updates for gvisor.dev.Adin Scannell
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense in one place. * Drop "user-space kernel" and use "application kernel". The term "user-space kernel" is confusing when some platform implementation do not run in user-space (instead running in guest ring zero). * Clear up the relationship between the Platform page in the user guide and the Platform page in the architecture guide, and ensure they are cross-linked. * Restore the call-to-action quick start link in the main page, and drop the GitHub link (which also appears in the top-right). * Improve image formatting by centering all doc and blog images, and move the image captions to the alt text. PiperOrigin-RevId: 311845158
2020-05-08iptables - filter packets using outgoing interface.gVisor bot
Enables commands with -o (--out-interface) for iptables rules. $ iptables -A OUTPUT -o eth0 -j ACCEPT PiperOrigin-RevId: 310642286
2020-05-01Support for connection tracking of TCP packets.Nayana Bidari
Connection tracking is used to track packets in prerouting and output hooks of iptables. The NAT rules modify the tuples in connections. The connection tracking code modifies the packets by looking at the modified tuples.
2020-04-28Support IPv6 Privacy Extensions for SLAACGhanan Gowripalan
Support generating temporary (short-lived) IPv6 SLAAC addresses to address privacy concerns outlined in RFC 4941. Tests: - stack_test.TestAutoGenTempAddr - stack_test.TestNoAutoGenTempAddrForLinkLocal - stack_test.TestAutoGenTempAddrRegen - stack_test.TestAutoGenTempAddrRegenTimerUpdates - stack_test.TestNoAutoGenTempAddrWithoutStableAddr - stack_test.TestAutoGenAddrInResponseToDADConflicts PiperOrigin-RevId: 308915566
2020-03-24Add support for setting TCP segment hash.Bhasker Hariharan
This allows the link layer endpoints to consistenly hash a TCP segment to a single underlying queue in case a link layer endpoint does support multiple underlying queues. Updates #231 PiperOrigin-RevId: 302760664
2020-03-24Move tcpip.PacketBuffer and IPTables to stack package.Bhasker Hariharan
This is a precursor to be being able to build an intrusive list of PacketBuffers for use in queuing disciplines being implemented. Updates #2214 PiperOrigin-RevId: 302677662
2020-03-10The packet forwarding should resolve the link address if necessary.gVisor bot
Fixes #1510 Test: - stack_test.TestForwardingWithStaticResolver - stack_test.TestForwardingWithFakeResolver - stack_test.TestForwardingWithNoResolver - stack_test.TestForwardingWithFakeResolverPartialTimeout - stack_test.TestForwardingWithFakeResolverTwoPackets - stack_test.TestForwardingWithFakeResolverManyPackets - stack_test.TestForwardingWithFakeResolverManyResolutions PiperOrigin-RevId: 300182570
2020-03-03Fix datarace on TransportEndpointInfo.ID and clean up semantics.Ian Gudger
Ensures that all access to TransportEndpointInfo.ID is either: * In a function ending in a Locked suffix. * While holding the appropriate mutex. This primary affects the checkV4Mapped method on affected endpoints, which has been renamed to checkV4MappedLocked. Also document the method and change its argument to be a value instead of a pointer which had caused some awkwardness. This race was possible in the udp and icmp endpoints between Connect and uses of TransportEndpointInfo.ID including in both itself and Bind. The tcp endpoint did not suffer from this bug, but benefited from better documentation. Updates #357 PiperOrigin-RevId: 298682913
2020-02-24Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.Ian Gudger
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to test this change. Also fix a race when a socket in SYN-RCVD is closed. This is also required to test this change. PiperOrigin-RevId: 296922548
2020-02-21Implement tap/tun device in vfs.Ting-Yu Wang
PiperOrigin-RevId: 296526279
2020-02-21Attach LinkEndpoint to NetworkDispatcher immediatelyGhanan Gowripalan
Tests: stack_test.TestAttachToLinkEndpointImmediately PiperOrigin-RevId: 296474068
2020-02-20Support disabling a NICgVisor bot
- Disabled NICs will have their associated NDP state cleared. - Disabled NICs will not accept incoming packets. - Writes through a Route with a disabled NIC will return an invalid endpoint state error. - stack.Stack.FindRoute will not return a route with a disabled NIC. - NIC's Running flag will report the NIC's enabled status. Tests: - stack_test.TestDisableUnknownNIC - stack_test.TestDisabledNICsNICInfoAndCheckNIC - stack_test.TestRoutesWithDisabledNIC - stack_test.TestRouteWritePacketWithDisabledNIC - stack_test.TestStopStartSolicitingRouters - stack_test.TestCleanupNDPState - stack_test.TestAddRemoveIPv4BroadcastAddressOnNICEnableDisable - stack_test.TestJoinLeaveAllNodesMulticastOnNICEnableDisable PiperOrigin-RevId: 296298588
2020-02-11Disallow duplicate NIC names.gVisor bot
PiperOrigin-RevId: 294500858
2020-02-04Support RTM_NEWADDR and RTM_GETLINK in (rt)netlink.Ting-Yu Wang
PiperOrigin-RevId: 293271055
2020-01-21Merge pull request #1558 from kevinGC:iptables-write-input-dropgVisor bot
PiperOrigin-RevId: 290793754
2020-01-15Solicit IPv6 routers when a NIC becomes enabled as a hostGhanan Gowripalan
This change adds support to send NDP Router Solicitation messages when a NIC becomes enabled as a host, as per RFC 4861 section 6.3.7. Note, Router Solicitations will only be sent when the stack has forwarding disabled. Tests: Unittests to make sure that the initial Router Solicitations are sent as configured. The tests also validate the sent Router Solicitations' fields. PiperOrigin-RevId: 289964095
2020-01-14Implement {g,s}etsockopt(IP_RECVTOS) for UDP socketsTamir Duberstein
PiperOrigin-RevId: 289718534
2020-01-13Do Source Address Selection when choosing an IPv6 source addressGhanan Gowripalan
Do Source Address Selection when choosing an IPv6 source address as per RFC 6724 section 5 rules 1-3: 1) Prefer same address 2) Prefer appropriate scope 3) Avoid deprecated addresses. A later change will update Source Address Selection to follow rules 4-8. Tests: Rule 1 & 2: stack.TestIPv6SourceAddressSelectionScopeAndSameAddress, Rule 3: stack.TestAutoGenAddrTimerDeprecation, stack.TestAutoGenAddrDeprecateFromPI PiperOrigin-RevId: 289559373
2020-01-13Allow dual stack sockets to operate on AF_INETTamir Duberstein
Fixes #1490 Fixes #1495 PiperOrigin-RevId: 289523250
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2020-01-09Change BindToDeviceOption to store NICIDEyal Soha
This makes it possible to call the sockopt from go even when the NIC has no name. PiperOrigin-RevId: 288955236
2020-01-09Allow clients to store an opaque NICContext with NICsBert Muthalaly
...retrievable later via stack.NICInfo(). Clients of this library can use it to add metadata that should be tracked alongside a NIC, to avoid having to keep a map[tcpip.NICID]metadata mirroring stack.Stack's nic map. PiperOrigin-RevId: 288924900
2020-01-08Combine various Create*NIC methods into CreateNICWithOptions.Bert Muthalaly
PiperOrigin-RevId: 288779416
2020-01-08Add NIC.isLoopback()Bert Muthalaly
...enabling us to remove the "CreateNamedLoopbackNIC" variant of CreateNIC and all the plumbing to connect it through to where the value is read in FindRoute. PiperOrigin-RevId: 288713093
2020-01-07Support deprecating SLAAC addresses after the preferred lifetimeGhanan Gowripalan
Support deprecating network endpoints on a NIC. If an endpoint is deprecated, it should not be used for new connections unless a more preferred endpoint is not available, or unless the deprecated endpoint was explicitly requested. Test: Test that deprecated endpoints are only returned when more preferred endpoints are not available and SLAAC addresses are deprecated after its preferred lifetime PiperOrigin-RevId: 288562705
2020-01-07Disable auto-generation of IPv6 link-local addresses for loopback NICsGhanan Gowripalan
Test: Test that an IPv6 link-local address is not auto-generated for loopback NICs, even when it is enabled for non-loopback NICS. PiperOrigin-RevId: 288519591
2020-01-06Pass the NIC-internal name to the NIC name function when generating opaque IIDsGhanan Gowripalan
Pass the NIC-internal name to the NIC name function when generating opaque IIDs so implementations can use the name that was provided when the NIC was created. Previously, explicit NICID to NIC name resolution was required from the netstack integrator. Tests: Test that the name provided when creating a NIC is passed to the NIC name function when generating opaque IIDs. PiperOrigin-RevId: 288395359
2020-01-03Support generating opaque interface identifiers as defined by RFC 7217Ghanan Gowripalan
Support generating opaque interface identifiers as defined by RFC 7217 for auto-generated IPv6 link-local addresses. Opaque interface identifiers will also be used for IPv6 addresses auto-generated via SLAAC in a later change. Note, this change does not handle retries in response to DAD conflicts yet. That will also come in a later change. Tests: Test that when configured to generated opaque IIDs, they are properly generated as outlined by RFC 7217. PiperOrigin-RevId: 288035349
2019-12-26Automated rollback of changelist 287029703gVisor bot
PiperOrigin-RevId: 287217899
2019-12-24Enable IP_RECVTOS socket option for datagram socketsRyan Heacock
Added the ability to get/set the IP_RECVTOS socket option on UDP endpoints. If enabled, TOS from the incoming Network Header passed as ancillary data in the ControlMessages. Test: * Added unit test to udp_test.go that tests getting/setting as well as verifying that we receive expected TOS from incoming packet. * Added a syscall test PiperOrigin-RevId: 287029703
2019-12-23Clear any host-specific NDP state when becoming a routerGhanan Gowripalan
This change supports clearing all host-only NDP state when NICs become routers. All discovered routers, discovered on-link prefixes and auto-generated addresses will be invalidated when becoming a router. This is because normally, routers do not process Router Advertisements to discover routers or on-link prefixes, and do not do SLAAC. Tests: Unittest to make sure that all discovered routers, discovered prefixes and auto-generated addresses get invalidated when transitioning from a host to a router. PiperOrigin-RevId: 286902309
2019-11-14Use PacketBuffers for outgoing packets.Kevin Krakauer
PiperOrigin-RevId: 280455453
2019-11-07Add support for TIME_WAIT timeout.Bhasker Hariharan
This change adds explicit support for honoring the 2MSL timeout for sockets in TIME_WAIT state. It also adds support for the TCP_LINGER2 option that allows modification of the FIN_WAIT2 state timeout duration for a given socket. It also adds an option to modify the Stack wide TIME_WAIT timeout but this is only for testing. On Linux this is fixed at 60s. Further, we also now correctly process RST's in CLOSE_WAIT and close the socket similar to linux without moving it to error state. We also now handle SYN in ESTABLISHED state as per RFC5961#section-4.1. Earlier we would just drop these SYNs. Which can result in some tests that pass on linux to fail on gVisor. Netstack now honors TIME_WAIT correctly as well as handles the following cases correctly. - TCP RSTs in TIME_WAIT are ignored. - A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT and a dup ACK is sent in response to the FIN as the dup FIN indicates potential loss of the original final ACK. - An out of order segment during TIME_WAIT generates a dup ACK. - A new SYN w/ a sequence number > the highest sequence number in the previous connection closes the TIME_WAIT early and opens a new connection. Further to make the SYN case work correctly the ISN (Initial Sequence Number) generation for Netstack has been updated to be as per RFC. Its not a pure random number anymore and follows the recommendation in https://tools.ietf.org/html/rfc6528#page-3. The current hash used is not a cryptographically secure hash function. A separate change will update the hash function used to Siphash similar to what is used in Linux. PiperOrigin-RevId: 279106406
2019-11-06Rename nicid to nicID to follow go-readability initialismsGhanan Gowripalan
https://github.com/golang/go/wiki/CodeReviewComments#initialisms This change does not introduce any new functionality. It just renames variables from `nicid` to `nicID`. PiperOrigin-RevId: 278992966
2019-11-06Use PacketBuffers, rather than VectorisedViews, in netstack.Kevin Krakauer
PacketBuffers are analogous to Linux's sk_buff. They hold all information about a packet, headers, and payload. This is important for: * iptables to access various headers of packets * Preventing the clutter of passing different net and link headers along with VectorisedViews to packet handling functions. This change only affects the incoming packet path, and a future change will change the outgoing path. Benchmark Regular PacketBufferPtr PacketBufferConcrete -------------------------------------------------------------------------------- BM_Recvmsg 400.715MB/s 373.676MB/s 396.276MB/s BM_Sendmsg 361.832MB/s 333.003MB/s 335.571MB/s BM_Recvfrom 453.336MB/s 393.321MB/s 381.650MB/s BM_Sendto 378.052MB/s 372.134MB/s 341.342MB/s BM_SendmsgTCP/0/1k 353.711MB/s 316.216MB/s 322.747MB/s BM_SendmsgTCP/0/2k 600.681MB/s 588.776MB/s 565.050MB/s BM_SendmsgTCP/0/4k 995.301MB/s 888.808MB/s 941.888MB/s BM_SendmsgTCP/0/8k 1.517GB/s 1.274GB/s 1.345GB/s BM_SendmsgTCP/0/16k 1.872GB/s 1.586GB/s 1.698GB/s BM_SendmsgTCP/0/32k 1.017GB/s 1.020GB/s 1.133GB/s BM_SendmsgTCP/0/64k 475.626MB/s 584.587MB/s 627.027MB/s BM_SendmsgTCP/0/128k 416.371MB/s 503.434MB/s 409.850MB/s BM_SendmsgTCP/0/256k 323.449MB/s 449.599MB/s 388.852MB/s BM_SendmsgTCP/0/512k 243.992MB/s 267.676MB/s 314.474MB/s BM_SendmsgTCP/0/1M 95.138MB/s 95.874MB/s 95.417MB/s BM_SendmsgTCP/0/2M 96.261MB/s 94.977MB/s 96.005MB/s BM_SendmsgTCP/0/4M 96.512MB/s 95.978MB/s 95.370MB/s BM_SendmsgTCP/0/8M 95.603MB/s 95.541MB/s 94.935MB/s BM_SendmsgTCP/0/16M 94.598MB/s 94.696MB/s 94.521MB/s BM_SendmsgTCP/0/32M 94.006MB/s 94.671MB/s 94.768MB/s BM_SendmsgTCP/0/64M 94.133MB/s 94.333MB/s 94.746MB/s BM_SendmsgTCP/0/128M 93.615MB/s 93.497MB/s 93.573MB/s BM_SendmsgTCP/0/256M 93.241MB/s 95.100MB/s 93.272MB/s BM_SendmsgTCP/1/1k 303.644MB/s 316.074MB/s 308.430MB/s BM_SendmsgTCP/1/2k 537.093MB/s 584.962MB/s 529.020MB/s BM_SendmsgTCP/1/4k 882.362MB/s 939.087MB/s 892.285MB/s BM_SendmsgTCP/1/8k 1.272GB/s 1.394GB/s 1.296GB/s BM_SendmsgTCP/1/16k 1.802GB/s 2.019GB/s 1.830GB/s BM_SendmsgTCP/1/32k 2.084GB/s 2.173GB/s 2.156GB/s BM_SendmsgTCP/1/64k 2.515GB/s 2.463GB/s 2.473GB/s BM_SendmsgTCP/1/128k 2.811GB/s 3.004GB/s 2.946GB/s BM_SendmsgTCP/1/256k 3.008GB/s 3.159GB/s 3.171GB/s BM_SendmsgTCP/1/512k 2.980GB/s 3.150GB/s 3.126GB/s BM_SendmsgTCP/1/1M 2.165GB/s 2.233GB/s 2.163GB/s BM_SendmsgTCP/1/2M 2.370GB/s 2.219GB/s 2.453GB/s BM_SendmsgTCP/1/4M 2.005GB/s 2.091GB/s 2.214GB/s BM_SendmsgTCP/1/8M 2.111GB/s 2.013GB/s 2.109GB/s BM_SendmsgTCP/1/16M 1.902GB/s 1.868GB/s 1.897GB/s BM_SendmsgTCP/1/32M 1.655GB/s 1.665GB/s 1.635GB/s BM_SendmsgTCP/1/64M 1.575GB/s 1.547GB/s 1.575GB/s BM_SendmsgTCP/1/128M 1.524GB/s 1.584GB/s 1.580GB/s BM_SendmsgTCP/1/256M 1.579GB/s 1.607GB/s 1.593GB/s PiperOrigin-RevId: 278940079
2019-11-06Validate incoming NDP Router Advertisements, as per RFC 4861 section 6.1.2Ghanan Gowripalan
This change validates incoming NDP Router Advertisements as per RFC 4861 section 6.1.2. It also includes the skeleton to handle Router Advertiements that arrive on some NIC. Tests: Unittest to make sure only valid NDP Router Advertisements are received/ not dropped. PiperOrigin-RevId: 278891972
2019-10-30Store endpoints inside multiPortEndpoint in a sorted orderAndrei Vagin
It is required to guarantee the same order of endpoints after save/restore. PiperOrigin-RevId: 277598665
2019-10-29Add Close and Wait methods to stack.Ian Gudger
Link endpoints still don't have a unified way to be requested to stop. Updates #837 PiperOrigin-RevId: 277398952
2019-10-29Add endpoint tracking to the stack.Ian Gudger
In the future this will replace DanglingEndpoints. DanglingEndpoints must be kept for now due to issues with save/restore. This is arguably a cleaner design and allows the stack to know which transport endpoints might still be using its link endpoints. Updates #837 PiperOrigin-RevId: 277386633
2019-10-24Use interface-specific NDP configurations instead of the stack-wide default.Ghanan Gowripalan
This change makes it so that NDP work is done using the per-interface NDP configurations instead of the stack-wide default NDP configurations to correctly implement RFC 4861 section 6.3.2 (note here, a host is a single NIC operating as a host device), and RFC 4862 section 5.1. Test: Test that we can set NDP configurations on a per-interface basis without affecting the configurations of other interfaces or the stack-wide default. Also make sure that after the configurations are updated, the updated configurations are used for NDP processes (e.g. Duplicate Address Detection). PiperOrigin-RevId: 276525661
2019-10-23Inform netstack integrator when Duplicate Address Detection completesGhanan Gowripalan
This change introduces a new interface, stack.NDPDispatcher. It can be implemented by the netstack integrator to receive NDP related events. As of this change, only DAD related events are supported. Tests: Existing tests were modified to use the NDPDispatcher's DAD events for DAD tests where it needed to wait for DAD completing (failing and resolving). PiperOrigin-RevId: 276338733