summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip
AgeCommit message (Collapse)Author
2021-01-07netstack: Refactor tcpip.Endpoint.ReadTing-Yu Wang
Read now takes a destination io.Writer, count, options. Keeping the method name Read, in contrast to the Write method. This enables: * direct transfer of views under VV * zero copy It also eliminates the need for sentry to keep a slice of view because userspace had requested a read that is smaller than the view returned, removing the complexity there. Read/Peek/ReadPacket are now consolidated together and some duplicate code is removed. PiperOrigin-RevId: 350636322
2021-01-06Do not filter frames in ethernet link endpointGhanan Gowripalan
Ethernet frames are usually filtered at the hardware-level so there is no need to filter the frames in software. For test purposes, a new link endpoint was introduced to filter frames based on their destination. PiperOrigin-RevId: 350422941
2021-01-06Support add/remove IPv6 multicast group sock optGhanan Gowripalan
IPv4 was always supported but UDP never supported joining/leaving IPv6 multicast groups via socket options. Add: IPPROTO_IPV6, IPV6_JOIN_GROUP/IPV6_ADD_MEMBERSHIP Remove: IPPROTO_IPV6, IPV6_LEAVE_GROUP/IPV6_DROP_MEMBERSHIP Test: integration_test.TestUDPAddRemoveMembershipSocketOption PiperOrigin-RevId: 350396072
2020-12-22Move SO_BINDTODEVICE to socketops.Nayana Bidari
PiperOrigin-RevId: 348696094
2020-12-22Correctly log sniffed ARP packetsTamir Duberstein
This condition was inverted in 360006d. PiperOrigin-RevId: 348679088
2020-12-22Invoke address resolution upon subsequent traffic to Failed neighborPeter Johnston
Removes the period of time in which subseqeuent traffic to a Failed neighbor immediately fails with ErrNoLinkAddress. A Failed neighbor is one in which address resolution fails; or in other words, the neighbor's IP address cannot be translated to a MAC address. This means removing the Failed state for linkAddrCache and allowing transitiong out of Failed into Incomplete for neighborCache. Previously, both caches would transition entries to Failed after address resolution fails. In this state, any subsequent traffic requested within an unreachable time would immediately fail with ErrNoLinkAddress. This does not follow RFC 4861 section 7.3.3: If address resolution fails, the entry SHOULD be deleted, so that subsequent traffic to that neighbor invokes the next-hop determination procedure again. Invoking next-hop determination at this point ensures that alternate default routers are tried. The API for getting a link address for a given address, whether through the link address cache or the neighbor table, is updated to optionally take a callback which will be called when address resolution completes. This allows `Route` to handle completing link resolution internally, so callers of (*Route).Resolve (e.g. endpoints) don’t have to keep track of when it completes and update the Route accordingly. This change also removes the wakers from LinkAddressCache, NeighborCache, and Route in favor of the callbacks, and callers that previously used a waker can now just pass a callback to (*Route).Resolve that will notify the waker on resolution completion. Fixes #4796 Startblock: has LGTM from sbalana and then add reviewer ghanan PiperOrigin-RevId: 348597478
2020-12-21Prefer matching labels and longest matching prefixGhanan Gowripalan
...when performing source address selection for IPv6. These are defined in RFC 6724 section 5 rule 6 (prefer matching label) and rule 8 (use longest matching prefix). This change also considers ULA of global scope instead of its own scope, as per RFC 6724 section 3.1: Also, note that ULAs are considered as global, not site-local, scope but are handled via the prefix policy table as discussed in Section 10.6. Test: stack_test.TestIPv6SourceAddressSelectionScope Startblock: has LGTM from peterjohnston and then add reviewer brunodalbo PiperOrigin-RevId: 348580996
2020-12-21Don't modify a packet header when it can be used by other endpointsAndrei Vagin
Reported-by: syzbot+48c43f82fe7738fceae9@syzkaller.appspotmail.com PiperOrigin-RevId: 348540796
2020-12-21RLock Endpoint in raw.Endpoint.HandlePacketKevin Krakauer
PiperOrigin-RevId: 348530530
2020-12-17[netstack] Implement IP(V6)_RECVERR socket option.Ayush Ranjan
PiperOrigin-RevId: 348055514
2020-12-17[netstack] Implement MSG_ERRQUEUE flag for recvmsg(2).Ayush Ranjan
Introduces the per-socket error queue and the necessary cmsg mechanisms. PiperOrigin-RevId: 348028508
2020-12-17Remove duplicate `return`Tamir Duberstein
PiperOrigin-RevId: 347974624
2020-12-16Cleanup locking in multicast group protocol testsGhanan Gowripalan
Startblock: has LGTM from asfez and then add reviewer tamird PiperOrigin-RevId: 347928471
2020-12-16Automated rollback of changelist 346565589gVisor bot
PiperOrigin-RevId: 347911316
2020-12-16Add support to count the number of packets SACKed.Nayana Bidari
sacked_out is required in RACK to check the number of duplicate acknowledgements during updating the reorder window. If there is no reordering and the value for sacked_out is greater than the classic threshold value 3, then reorder window is set to zero. It is calculated by counting the number of segments sacked in the ACK and is reduced when a cumulative ACK is received which covers the SACK blocks. This value is set to zero when the connection enters recovery. PiperOrigin-RevId: 347872246
2020-12-16Ensure correctness of saved receive windowMithun Iyer
When the scaled receive window size > 65535 (max uint16), we advertise the scaled value as 65535, but are not adjusting the saved receive window value when doing so. This would keep our current window calculation logic to be incorrect, as the saved receive window value is different from what was advertised. Fixes #4903 PiperOrigin-RevId: 347771340
2020-12-15Validate router alert's data lengthGhanan Gowripalan
RFC 2711 specifies that the router alert's length field is always 2 so we should make sure only 2 bytes are read from a router alert option's data field. Test: header.TestIPv6OptionsExtHdrIterErr PiperOrigin-RevId: 347727876
2020-12-15Don't split enabled flag across multicast group stateGhanan Gowripalan
Startblock: has LGTM from asfez and then add reviewer brunodalbo PiperOrigin-RevId: 347716242
2020-12-15Fix error code for connect in raw sockets.Nayana Bidari
PiperOrigin-RevId: 347650354
2020-12-15Fix a data race in packetEPsTing-Yu Wang
packetEPs may get into a state that `len < cap`, casuing append() modifying the original slice storage. Reported-by: syzbot+978dd0e9c2600ab7a76b@syzkaller.appspotmail.com PiperOrigin-RevId: 347634351
2020-12-14Move SO_LINGER option to socketops.Nayana Bidari
PiperOrigin-RevId: 347437786
2020-12-14Move SO_ERROR and SO_OOBINLINE option to socketops.Nayana Bidari
SO_OOBINLINE option is set/get as boolean value, which is the same as linux. As we currently do not support disabling this option, we always return it as true. PiperOrigin-RevId: 347413905
2020-12-12Reduce the memory overhead in IP fragment managementToshi Kikuchi
- Deep-copy pkt.Data and hold it instead of shallow-copy (vv.Clone). This allows the pkt's backing array, which includes the header portion, to be freed. - Remove fragHeap. The fragments are now held in holes struct instead. - Stop reserving the initial capacity of holes slice. PiperOrigin-RevId: 347198744
2020-12-12Introduce IPv6 extension header serialization facilitiesBruno Dal Bo
Adds IPv6 extension header serializer and Hop by Hop options serializer. Add RouterAlert option serializer and use it in MLD. Fixed #4996 Startblock: has LGTM from marinaciocea and then add reviewer ghanan PiperOrigin-RevId: 347174537
2020-12-11Fix panic when IPv4 address is used in sendmsg for IPv6 socketsNayana Bidari
We do not rely on error for getsockopt options(which have boolean values) anymore. This will cause issue in sendmsg where we used to return error for IPV6_V6Only option. Fix the panic by returning error (for sockets other than TCP and UDP) if the address does not match the type(AF_INET/AF_INET6) of the socket. PiperOrigin-RevId: 347063838
2020-12-11[netstack] Decouple tcpip.ControlMessages from the IP control messges.Ayush Ranjan
tcpip.ControlMessages can not contain Linux specific structures which makes it painful to convert back and forth from Linux to tcpip back to Linux when passing around control messages in hostinet and raw sockets. Now we convert to the Linux version of the control message as soon as we are out of tcpip. PiperOrigin-RevId: 347027065
2020-12-10Disable host reassembly for fragments.Bhasker Hariharan
fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket to ensure that fragments are delivered inorder to the right dispatcher. But this prevents fragments from being delivered to gvisor at all and makes testing of gvisor's fragment reassembly code impossible. The potential impact from this is minimal since IP Fragmentation is not really that prevelant and in cases where we do get fragments we may deliver the fragment out of order to the TCP layer as multiple network dispatchers may process the fragments and deliver a reassembled fragment after the next packet has been delivered to the TCP endpoint. While not desirable I believe the impact from this is minimal due to low prevalence of fragmentation. Also removed PktType and Hatype fields when binding the socket as these are not used when binding. Its just confusing to have them specified. See: https://man7.org/linux/man-pages/man7/packet.7.html "Fields used for binding are sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex." Fixes #5055 PiperOrigin-RevId: 346919439
2020-12-10Use specified source address for IGMP/MLD packetsGhanan Gowripalan
This change also considers interfaces and network endpoints enabled up up to the point all work to disable them are complete. This was needed so that protocols can perform shutdown work while being disabled (e.g. sending a packet which requires the endpoint to be enabled to obtain a source address). Bug #4682, #4861 Fixes #4888 Startblock: has LGTM from peterjohnston and then add reviewer brunodalbo PiperOrigin-RevId: 346869702
2020-12-09Add support for IP_RECVORIGDSTADDR IP option.Bhasker Hariharan
Fixes #5004 PiperOrigin-RevId: 346643745
2020-12-09[netstack] Make tcpip.Error savable.Ayush Ranjan
Earlier we could not save tcpip.Error objects in structs because upon restore the constant's address changes in netstack's error translation map and translating the error would panic because the map is based on the address of the tcpip.Error instead of the error itself. Now I made that translations map use the error message as key instead of the address. Added relevant synchronization mechanisms to protect the structure and initialize it upon restore. PiperOrigin-RevId: 346590485
2020-12-09Do not perform IGMP/MLD on loopback interfacesGhanan Gowripalan
The loopback interface will never have any neighbouring nodes so advertising its interest in multicast groups is unnecessary. Bug #4682, #4861 Startblock: has LGTM from asfez and then add reviewer tamird PiperOrigin-RevId: 346587604
2020-12-09Cap UDP payload size to length informed in UDP headerBruno Dal Bo
startblock: has LGTM from peterjohnston and then add reviewer ghanan,tamird PiperOrigin-RevId: 346565589
2020-12-09export MountTempDirectoryZeling Feng
PiperOrigin-RevId: 346487763
2020-12-07Export IGMP statsArthur Sfez
PiperOrigin-RevId: 346197760
2020-12-07Remove stale commentSam Balana
Removes comment lines about MaxUnsolicitedReportDelay. This is already documented in the comment for GenericMulticastProtocolOptions. PiperOrigin-RevId: 346185053
2020-12-05Fix zero receive window advertisements.Mithun Iyer
With the recent changes db36d948fa63ce950d94a5e8e9ebc37956543661, we try to balance the receive window advertisements between payload lengths vs segment overhead length. This works fine when segment size are much higher than the overhead, but not otherwise. In cases where the segment length is smaller than the segment overhead, we may end up not advertising zero receive window for long time and end up tail-dropping segments. This is especially pronounced when application socket reads are slow or stopped. In this change we do not grow the right edge of the receive window for smaller segment sizes similar to Linux. Also, we keep track of the socket buffer usage and let the window grow if the application is actively reading data. Fixes #4903 PiperOrigin-RevId: 345832012
2020-12-04Remove stack.ReadOnlyAddressableEndpointStateGhanan Gowripalan
Startblock: has LGTM from asfez and then add reviewer tamird PiperOrigin-RevId: 345815146
2020-12-04Introduce IPv4 options serializer and add RouterAlert to IGMPBruno Dal Bo
PiperOrigin-RevId: 345701623
2020-12-03Make `stack.Route` thread safePeter Johnston
Currently we rely on the user to take the lock on the endpoint that owns the route, in order to modify it safely. We can instead move `Route.RemoteLinkAddress` under `Route`'s mutex, and allow non-locking and thread-safe access to other fields of `Route`. PiperOrigin-RevId: 345461586
2020-12-03Support partitions for other tests.Adin Scannell
PiperOrigin-RevId: 345399936
2020-12-02Extract ICMPv4/v6 specific stats to their own typesArthur Sfez
This change lets us split the v4 stats from the v6 stats, which will be useful when adding stats for each network endpoint. PiperOrigin-RevId: 345322615
2020-12-02Abandon reassembly of a packet if fragments overlapArthur Sfez
However, receiving duplicated fragments will not cause reassembly to fail. This is what Linux does too: https://github.com/torvalds/linux/blob/38525c6/net/ipv4/inet_fragment.c#L355 PiperOrigin-RevId: 345309546
2020-12-02[netstack] Add back EndpointInfo struct in tcp.Ayush Ranjan
This was removed in an earlier commit. This should remain as it allows to add tcp-only state to be exposed. PiperOrigin-RevId: 345246155
2020-12-01Deflake stack_test.TestRouterSolicitationGhanan Gowripalan
...by using the fake clock. TestRouterSolicitation no longer runs its sub-tests in parallel now that the sub-tests are not long-running - the fake clock simulates time moving forward. PiperOrigin-RevId: 345165794
2020-12-01Correctly lock when listing neighbor entriesGhanan Gowripalan
PiperOrigin-RevId: 345162450
2020-12-01Track join count in multicast group protocol stateGhanan Gowripalan
Before this change, the join count and the state for IGMP/MLD was held across different types which required multiple locks to be held when accessing a multicast group's state. Bug #4682, #4861 Fixes #4916 PiperOrigin-RevId: 345019091
2020-11-30Fix deadlock in UDP handleControlPacket path.Bhasker Hariharan
Fixing the sendto deadlock exposed yet another deadlock where a lock inversion occurs on the handleControlPacket path where e.mu and demuxer.epsByNIC.mu are acquired in reverse order from say when RegisterTransportEndpoint is called in endpoint.Connect(). This fix sidesteps the issue by just making endpoint.state an atomic and gets rid of the need to acquire e.mu in e.HandleControlPacket. PiperOrigin-RevId: 344939895
2020-11-30Add more fragment reassembly testsToshi Kikuchi
These tests check if a maximum-sized (64k) packet is reassembled without receiving a fragment with MF flag set to zero. PiperOrigin-RevId: 344913172
2020-11-30Perform IGMP/MLD when the NIC is enabled/disabledGhanan Gowripalan
Test: ip_test.TestMGPWithNICLifecycle Bug #4682, #4861 PiperOrigin-RevId: 344888091
2020-11-27Don't add a temporary address to send DAD/RS packetsGhanan Gowripalan
Bug #4803 PiperOrigin-RevId: 344553664