summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/transport
AgeCommit message (Collapse)Author
2021-09-13Separate IPv4 ToS & IPv6 TClass in dgram endpointGhanan Gowripalan
Setting the ToS for IPv4 packets (SOL_IP, IP_TOS) should not affect the Traffic Class of IPv6 packets (SOL_IPV6, IPV6_TCLASS). Also only return the ToS value XOR Traffic Class as a packet cannot be both an IPv4 and an IPv6 packet; It is invalid to return both the IPv4 ToS and IPv6 Traffic Class control messages when reading packets. Updates #6389. PiperOrigin-RevId: 396399096
2021-09-09Remove linux-compat loopback hacks from packet endpointGhanan Gowripalan
Previously, gVisor did not represent loopback devices as an ethernet device as Linux does. To maintain Linux API compatibility for packet sockets, a workaround was used to add an ethernet header if a link header was not already present in the packet buffer delivered to a packet endpoint. However, this workaround is a bug for non-ethernet based interfaces; not all links use an ethernet header (e.g. pure L3/TUN interfaces). As of 3b4bb947517d0d9010120aaa1c3989fd6abf278e, gVisor represents loopback devices as an ethernet-based device so this workaround can now be removed. BUG: https://fxbug.dev/81592 Updates #6530, #6531. PiperOrigin-RevId: 395819151
2021-09-07Remove protocolMainLoop unused return valueArthur Sfez
PiperOrigin-RevId: 395325998
2021-09-01Support sending with packet socketsGhanan Gowripalan
...through the loopback interface, only. This change only supports sending on packet sockets through the loopback interface as the loopback interface is the only interface used in packet socket syscall tests - the other link endpoints are not excercised with the existing test infrastructure. Support for sending on packet sockets through the other interfaces will be added as needed. BUG: https://fxbug.dev/81592 PiperOrigin-RevId: 394368899
2021-09-01Out-of-order segment should not block in-sequence segments.Bhasker Hariharan
For a small receive buffer the first out-of-order segment will get accepted and fill up the receive buffer today. This change now includes the size of the out-of-order segment when checking whether to queue the out of order segment or not. PiperOrigin-RevId: 394351309
2021-09-01Extract network datagram endpoint common facilitiesGhanan Gowripalan
...from the UDP endpoint. Datagram-based transport endpoints (e.g. UDP, RAW IP) can share a lot of their write path due to the datagram-based nature of these endpoints. Extract the common facilities from UDP so they can be shared with other transport endpoints (in a later change). Test: UDP syscall tests. PiperOrigin-RevId: 394347774
2021-08-30Avoid pseudo endpoint for TSVal generationZeling Feng
PiperOrigin-RevId: 393808461
2021-08-26Centralize TCP timestamp logicTamir Duberstein
Remove freestanding functions that convert time values to raw integers; centralize time->uint32 logic in methods on tcp.endpoint. Importantly, the knowledge that TSVal is in milliseconds now lives in adjacent functions rather than being spread around various files. Incidental cleanup: - Remove unused constant - Remove redundant conversion - Remove redundant parentheses - Add missing error check PiperOrigin-RevId: 393184768
2021-08-26Avoid unhandled error warningsTamir Duberstein
PiperOrigin-RevId: 393104589
2021-08-26Remove unused argumentTamir Duberstein
PiperOrigin-RevId: 393100095
2021-08-26Pass must-not-be-nil by valueTamir Duberstein
PiperOrigin-RevId: 393095246
2021-08-25Improve TestTimestampSynCookiesZeling Feng
.. by advancing the clock so that NowMonotonic does not return 0. PiperOrigin-RevId: 393005373
2021-08-25Avoid the appearance of allocationTamir Duberstein
PiperOrigin-RevId: 393004533
2021-08-24Measure RTT during handshake since Linux does the sameZeling Feng
Some tcp unit tests are affected by this change: - Some retransmission tests assumed RTO=1s when connection is established. This is no longer true because minRTO was set to 3s in tests so now RTO becomes 3s after the first updateRTO call. Set minRTO=1s for these tests. - Some RACK enabled tests are affected because now that RTT is initialized, and the estimated RTT is quite small, spurious TLP might be sent out and causing flakes, introduce an artificial delay for these tests so that the estimated RTT is larger. PiperOrigin-RevId: 392768725
2021-08-19Add loopback interface as an ethernet-based deviceGhanan Gowripalan
...to match Linux behaviour. We can see evidence of Linux representing loopback as an ethernet-based device below: ``` # EUI-48 based MAC addresses. $ ip link show lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # tcpdump showing ethernet frames when sniffing loopback and logging the # link-type as EN10MB (Ethernet). $ sudo tcpdump -i lo -e -c 2 -n tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0 03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0 2 packets captured 116 packets received by filter 0 packets dropped by kernel ``` Wireshark shows a similar result as the tcpdump example above. Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162 PiperOrigin-RevId: 391836719
2021-08-19Use a hash function to generate tcp timestamp offsetZeling Feng
Also fix an option parsing error in checker.TCPTimestampChecker while I am here. PiperOrigin-RevId: 391828329
2021-08-18Split TCP secrets from Stack to tcp.protocolZeling Feng
Use different secrets for different purposes (port picking, ISN generation, tsOffset generation) and moved the secrets from stack.Stack to tcp.protocol. PiperOrigin-RevId: 391641238
2021-08-13Free multicastMemberships on UDP endpoint closeGhanan Gowripalan
tcpip.Endpoint.Close is documented to free all resources associated with an endpoint so we don't need to create an empty map to clear the multicast memberships. PiperOrigin-RevId: 390609826
2021-08-12Add support for TCP send buffer auto tuning.Nayana Bidari
Send buffer size in TCP indicates the amount of bytes available for the sender to transmit. This change will allow TCP to update the send buffer size when - TCP enters established state. - ACK is received. The auto tuning is disabled when the send buffer size is set with the SO_SNDBUF option. PiperOrigin-RevId: 390312274
2021-07-21Add metric to count number of segments acknowledged by DSACK.Nayana Bidari
- Creates new metric "/tcp/segments_acked_with_dsack" to count the number of segments acked with DSACK. - Added check to verify the metric is getting incremented when a DSACK is sent in the unit tests. PiperOrigin-RevId: 386135949
2021-07-20Enable RACK by default in netstack.Nayana Bidari
PiperOrigin-RevId: 385944428
2021-07-20Expose local address from raw socketsGhanan Gowripalan
PiperOrigin-RevId: 385940836
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-14Set tcp endpoint state atomicallyTamir Duberstein
PiperOrigin-RevId: 384776517
2021-07-12netstack: move SO_SNDBUF/RCVBUF clamping logic out of //pkg/tcpipKevin Krakauer
- Keeps Linux-specific behavior out of //pkg/tcpip - Makes it clearer that clamping is done only for setsockopt calls from users - Removes code duplication PiperOrigin-RevId: 384389809
2021-07-08Do not queue zero sized segments.Bhasker Hariharan
Commit 16b751b6c610ec2c5a913cb8a818e9239ee7da71 introduced a bug where writes of zero size would end up queueing a zero sized segment which will cause the sandbox to panic when trying to send a zero sized segment(e.g. after an RTO) as netstack asserts that the all non FIN segments have size > 0. This change adds the check for a zero sized payload back to avoid queueing such segments. The associated test panics without the fix and passes with it. PiperOrigin-RevId: 383677884
2021-07-01Mix checklocks and atomic analyzers.Adin Scannell
This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-06-28netstack: deflake TestSynRcvdBadSeqNumberKevin Krakauer
There was a race wherein Accept() could fail, then the handshake would complete, and then a waiter would be created to listen for the handshake. In such cases, no notification was ever sent and the test timed out. PiperOrigin-RevId: 381913041
2021-06-25Remove sndQueue as its pointless now.Bhasker Hariharan
sndQueue made sense when the worker goroutine and the syscall context held different locks. Now both lock the endpoint lock before doing anything which means adding to sndQueue is pointless as we move it to writeList immediately after that in endpoint.Write() by calling e.drainSendQueue. PiperOrigin-RevId: 381523177
2021-06-22Wake up Writers when tcp socket is shutdown for writes.Bhasker Hariharan
PiperOrigin-RevId: 380967023
2021-06-22netstack: further deflake tcp_testKevin Krakauer
There are unnecessarily short timeouts in several places. Note: a later change will switch tcp_test to fake clocks intead of the built-in `time` package. PiperOrigin-RevId: 380935400
2021-06-21clean up tcpdump TODOsKevin Krakauer
tcpdump is largely supported. We've also chose not to implement writeable AF_PACKET sockets, and there's a bug specifically for promiscuous mode (#3333). Fixes #173. PiperOrigin-RevId: 380733686
2021-06-21netstack: don't ACK SYNs in TIME-WAITKevin Krakauer
It was possible for a SYN to arrive after the endpoint sent an ACK as part of the transition to TIME-WAIT, but before returning from handleSegmentsLocked(). This caused the SYN to be dequeued and ACK'd despite the change in EndpointState. Deflakes TestTCPTimeWaitNewSyn. Tested with: blaze test --config=gotsan --runs_per_test 10000 \ //third_party/gvisor/pkg/tcpip/transport/tcp:tcp_x_test -j 2000 \ // --test_filter TestTCPTimeWaitNewSyn PiperOrigin-RevId: 380639808
2021-06-17raw sockets: don't overwrite destination addressKevin Krakauer
Also makes the behavior of raw sockets WRT fragmentation clearer, and makes the ICMPv4 header-length check explicit. Fixes #3160. PiperOrigin-RevId: 380033450
2021-06-16Fix broken hdrincl testKevin Krakauer
Fixes #3159. PiperOrigin-RevId: 379814096
2021-06-14Cleanup iptables bug TODOsKevin Krakauer
There are many references to unimplemented iptables features that link to #170, but that bug is about Istio support specifically. Istio is supported, so the references should change. Some TODOs are addressed, some removed because they are not features requested by users, and some are left as implementation notes. Fixes #170. PiperOrigin-RevId: 379328488
2021-06-04Honor data and FIN from the ACK completing handshakeMithun Iyer
If the ACK completing the handshake has FIN or data, requeue the segment for further processing by the newly established endpoint. Otherwise, the segments would have to be retransmitted by the peer to be processed by the established endpoint. Doing this, keeps the behavior in parity with Linux. This also addresses a test flake with TCPNonBlockingConnectClose where the ACK (completing the handshake) and multiple retransmitted FINACKs from the peer could be dropped by the listener, when using syncookies and the accept queue is full. The handshake could eventually get completed with a retransmitted FINACK, without actual processing of FIN. This can cause the poll with POLLRDHUP on the accepted socket to sometimes time out before the next FINACK retransmission. PiperOrigin-RevId: 377651695
2021-06-01Ensure full shutdown of endpoint on notifyCloseMithun Iyer
Address a race with non-blocking connect and socket close, causing the FIN (because of socket close) to not be sent out, even after completing the handshake. The race occurs with this sequence: (1) endpoint Connect starts handshake, sending out SYN (2) handshake complete() releases endpoint lock, waiting on sleeper.Fetch() (3) endpoint Close acquires endpoint lock, does not enqueue FIN (as the endpoint is not yet connected) and asserts notifyClose (4) SYNACK from peer gets enqueued asserting newSegmentWaker (5) handshake complete() re-aqcuires lock, first processes newSegmentWaker event, transitions to ESTABLISHED and proceeds to protocolMainLoop() (6) protocolMainLoop() exits while processing notifyClose When the execution follows the above sequence, no FIN is sent to the peer. This causes the listener side to have a half-open connection sitting in the accept queue. Fix this by ensuring that the protocolMainLoop() performs clean shutdown when the endpoint state is still ESTABLISHED. This would not be a bug, if during handshake complete(), sleeper.Fetch() prioritized notificationWaker over newSegmentWaker. In that case, the handshake would not have completed in (5) above. Fixes #6067 PiperOrigin-RevId: 376994395
2021-06-01Ignore RST received for a TCP listenerMithun Iyer
The current implementation has a bug where TCP listener does not ignore RSTs from the peer. While handling RST+ACK from the peer, this bug can complete handshakes that use syncookies. This results in half-open connection delivered to the accept queue. Fixes #6076 PiperOrigin-RevId: 376868749
2021-05-27Support SO_BINDTODEVICE in ICMP socketsSam Balana
Adds support for the SO_BINDTODEVICE socket option in ICMP sockets with an accompanying packetimpact test to exercise use of this socket option. Adds a unit test to exercise the NIC selection logic introduced by this change. The remaining unit tests for ICMP sockets need to be added in a subsequent CL. See https://gvisor.dev/issues/5623 for the list of remaining unit tests. Adds a "timeout" field to PacketimpactTestInfo, necessary due to the long runtime of the newly added packetimpact test. Fixes #5678 Fixes #4896 Updates #5623 Updates #5681 Updates #5763 Updates #5956 Updates #5966 Updates #5967 PiperOrigin-RevId: 376271581
2021-05-27Use fake clocks in all testsTamir Duberstein
...except TCP tests and NDP tests that mutate globals. These will be undertaken later. Updates #5940. PiperOrigin-RevId: 376145608
2021-05-26Use the stack RNG everywhereTamir Duberstein
...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666
2021-05-26Move presence methods from segment to TCPFlagsTamir Duberstein
PiperOrigin-RevId: 376001032
2021-05-26Use the stack clock everywhereTamir Duberstein
Updates #5939. Updates #6012. RELNOTES: n/a PiperOrigin-RevId: 375931554
2021-05-25Use opaque types to represent timeTamir Duberstein
Introduce tcpip.MonotonicTime; replace int64 in tcpip.Clock method returns with time.Time and MonotonicTime to improve type safety and ensure that monotonic clock readings are never compared to wall clock readings. PiperOrigin-RevId: 375775907
2021-05-22Remove detritusTamir Duberstein
- Unused constants - Unused functions - Unused arguments - Unkeyed literals - Unnecessary conversions PiperOrigin-RevId: 375253464
2021-05-21Add aggregated NIC statsArthur Sfez
This change also includes miscellaneous improvements: * UnknownProtocolRcvdPackets has been separated into two stats, to specify at which layer the unknown protocol was found (L3 or L4) * MalformedRcvdPacket is not aggregated across every endpoint anymore. Doing it this way did not add useful information, and it was also error-prone (example: ipv6 forgot to increment this aggregated stat, it only incremented its own ipv6.MalformedPacketsReceived). It is now only incremented the NIC. * Removed TestStatsString test which was outdated and had no real utility. PiperOrigin-RevId: 375057472
2021-05-20Add protocol state to TCPINFOMithun Iyer
Add missing protocol state to TCPINFO struct and update packetimpact. This re-arranges the TCP state definitions to align with Linux. Fixes #478 PiperOrigin-RevId: 374996751
2021-05-13Apply SWS avoidance to ACKs with window updatesMithun Iyer
When recovering from a zero-receive-window situation, and asked to send out an ACK, ensure that we apply SWS avoidance in our window updates. Fixes #5984 PiperOrigin-RevId: 373689578
2021-05-12Fix not calling decRef on merged segmentsTing-Yu Wang
This code path is for outgoing packets, and we don't currently do memory accounting on this path. So it wasn't breaking anything. This change did not add a test for ref-counting issue fixed, but will switch to the leak-checking ref-counter later when all ref-counting issues are fixed. PiperOrigin-RevId: 373447913