summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip
AgeCommit message (Collapse)Author
2021-07-15netstack: support SO_RCVBUFFORCEKevin Krakauer
TCP is fully supported. As with SO_RCVBUF, other transport protocols perform no-ops per DefaultSocketOptionsHandler.OnSetReceiveBufferSize. PiperOrigin-RevId: 385023239
2021-07-14Set tcp endpoint state atomicallyTamir Duberstein
PiperOrigin-RevId: 384776517
2021-07-13netstack: atomically update buffer sizesKevin Krakauer
Previously, two calls to set the send or receive buffer size could have raced and left state wherein: - The actual size depended on one call - The value returned by getsockopt() depended on the other PiperOrigin-RevId: 384508720
2021-07-13Deflake TestRouterSolicitationGhanan Gowripalan
Before this change, transmission of the first router solicitation races with the adding of an IPv6 link-local address. This change creates the NIC in the disabled state and is only enabled after the address is added (if required) to avoid this race. PiperOrigin-RevId: 384493553
2021-07-12netstack: move SO_SNDBUF/RCVBUF clamping logic out of //pkg/tcpipKevin Krakauer
- Keeps Linux-specific behavior out of //pkg/tcpip - Makes it clearer that clamping is done only for setsockopt calls from users - Removes code duplication PiperOrigin-RevId: 384389809
2021-07-12[syserror] Update syserror to linuxerr for more errors.Zach Koopmans
Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-12Prevent interleaving in sniffer pcap outputTamir Duberstein
Remove "partial write" handling as io.Writer.Write is not permitted to return a nil error on partial writes, and this code was already panicking on non-nil errors. PiperOrigin-RevId: 384289970
2021-07-08Do not queue zero sized segments.Bhasker Hariharan
Commit 16b751b6c610ec2c5a913cb8a818e9239ee7da71 introduced a bug where writes of zero size would end up queueing a zero sized segment which will cause the sandbox to panic when trying to send a zero sized segment(e.g. after an RTO) as netstack asserts that the all non FIN segments have size > 0. This change adds the check for a zero sized payload back to avoid queueing such segments. The associated test panics without the fix and passes with it. PiperOrigin-RevId: 383677884
2021-07-07Move time.Now() call to snifferTamir Duberstein
PiperOrigin-RevId: 383481745
2021-07-07Use time package-level variableTamir Duberstein
PiperOrigin-RevId: 383426091
2021-07-02Discover more specific routes as per RFC 4191Ghanan Gowripalan
More-specific route discovery allows hosts to pick a more appropriate router for off-link destinations. Fixes #6172. PiperOrigin-RevId: 382779880
2021-07-01Mix checklocks and atomic analyzers.Adin Scannell
This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01Fix bug with TCP bind w/ SO_REUSEADDR.Bhasker Hariharan
In gVisor today its possible that when trying to bind a TCP socket w/ SO_REUSEADDR specified and requesting the kernel pick a port by setting port to zero can result in a previously bound port being returned. This behaviour is incorrect as the user is clearly requesting a free port. The behaviour is fine when the user explicity specifies a port. This change now checks if the user specified a port when making a port reservation for a TCP port and only returns unbound ports even if SO_REUSEADDR was specified. Fixes #6209 PiperOrigin-RevId: 382607638
2021-06-30Implement fmt.Stringer for NDPRoutePreferenceGhanan Gowripalan
PiperOrigin-RevId: 382427879
2021-06-30[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.Zach Koopmans
Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29Support parsing NDP Route Information optionGhanan Gowripalan
This change prepares for a later change which supports the NDP Route Information option to discover more-specific routes, as per RFC 4191. Updates #6172. PiperOrigin-RevId: 382225812
2021-06-29Merge pull request #6085 from liornm:fix-tun-no_pigVisor bot
PiperOrigin-RevId: 382202462
2021-06-29[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVALZach Koopmans
Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-29Fix TUN IFF_NO_PI bugliornm
When TUN is created with IFF_NO_PI flag, there will be no Ethernet header and no packet info, therefore, both read and write will fail. This commit fix this bug.
2021-06-28netstack: deflake TestSynRcvdBadSeqNumberKevin Krakauer
There was a race wherein Accept() could fail, then the handshake would complete, and then a waiter would be created to listen for the handshake. In such cases, no notification was ever sent and the test timed out. PiperOrigin-RevId: 381913041
2021-06-25Remove sndQueue as its pointless now.Bhasker Hariharan
sndQueue made sense when the worker goroutine and the syscall context held different locks. Now both lock the endpoint lock before doing anything which means adding to sndQueue is pointless as we move it to writeList immediately after that in endpoint.Write() by calling e.drainSendQueue. PiperOrigin-RevId: 381523177
2021-06-24Incrementally update checksum when NAT-ingGhanan Gowripalan
...instead of calculating a fresh checksum to avoid re-calcalculating a checksum on unchanged bytes. Fixes #5340. PiperOrigin-RevId: 381403888
2021-06-24Refactor default router state to off-link route stateGhanan Gowripalan
This change prepares for a later change which supports the NDP Route Information option to discover more-specific routes, as per RFC 4191. The newly introduced off-link route state will be used to hold both the state for default routers (which is a default (off-link) route through the router, and more-specific routes (which are routes through some router to some destination subnet more specific than the IPv6 empty subnet). Updates #6172. PiperOrigin-RevId: 381403761
2021-06-24Internal change.Jamie Liu
PiperOrigin-RevId: 381375705
2021-06-22Wake up Writers when tcp socket is shutdown for writes.Bhasker Hariharan
PiperOrigin-RevId: 380967023
2021-06-22netstack: further deflake tcp_testKevin Krakauer
There are unnecessarily short timeouts in several places. Note: a later change will switch tcp_test to fake clocks intead of the built-in `time` package. PiperOrigin-RevId: 380935400
2021-06-21clean up tcpdump TODOsKevin Krakauer
tcpdump is largely supported. We've also chose not to implement writeable AF_PACKET sockets, and there's a bug specifically for promiscuous mode (#3333). Fixes #173. PiperOrigin-RevId: 380733686
2021-06-21Use fake clocks in NDP testsGhanan Gowripalan
Updates #5940. PiperOrigin-RevId: 380668609
2021-06-21netstack: don't ACK SYNs in TIME-WAITKevin Krakauer
It was possible for a SYN to arrive after the endpoint sent an ACK as part of the transition to TIME-WAIT, but before returning from handleSegmentsLocked(). This caused the SYN to be dequeued and ACK'd despite the change in EndpointState. Deflakes TestTCPTimeWaitNewSyn. Tested with: blaze test --config=gotsan --runs_per_test 10000 \ //third_party/gvisor/pkg/tcpip/transport/tcp:tcp_x_test -j 2000 \ // --test_filter TestTCPTimeWaitNewSyn PiperOrigin-RevId: 380639808
2021-06-18Add endpoints to map only if registerEndpoint succeeds.Bhasker Hariharan
epsByNIC.registerEndpoint can add a multiportEndpoint to its map of nic->multiportEndpoint even if multiport.Endpoint.singleRegisterEndpoint failed. Same for transportDemuxer.singleRegisterEndpoint which ends up adding an entry to nic->epsByNIC even if epsByNIC.registerEndpoint fails. These breaks an invariant which the code assumes that a multiportEndpoint/endpointsByNIC always have at least one valid entry. PiperOrigin-RevId: 380310115
2021-06-18Include off-link route's preference in update eventsGhanan Gowripalan
RFC 4191 supports the notion of a preference value for default routers and more-specific routes, so update the OffLinkRouteUpdate event to include this preference value so integrators may prioritize routes based on a route's advertised preference value. Note, more-specific route discovery is not supported yet, but will be in a later change. Updates #6172. Test: ndp_test.TestRouterDiscovery PiperOrigin-RevId: 380243716
2021-06-17raw sockets: don't overwrite destination addressKevin Krakauer
Also makes the behavior of raw sockets WRT fragmentation clearer, and makes the ICMPv4 header-length check explicit. Fixes #3160. PiperOrigin-RevId: 380033450
2021-06-16Fix broken hdrincl testKevin Krakauer
Fixes #3159. PiperOrigin-RevId: 379814096
2021-06-14Support parsing Prf field in RAsGhanan Gowripalan
This change prepares for a later change which actually handles the Prf field in RAs to discover default routers with preference values, as per RFC 4191. Updates #6172. Test: header_test.TestNDPRouterAdvert PiperOrigin-RevId: 379421710
2021-06-14Rename DefaultRouter event to OffLinkRoute eventGhanan Gowripalan
This change prepares for a later change which supports the NDP Route Information option to discover more-specific routes, as per RFC 4191. Updates #6172. PiperOrigin-RevId: 379361330
2021-06-14Cleanup iptables bug TODOsKevin Krakauer
There are many references to unimplemented iptables features that link to #170, but that bug is about Istio support specifically. Istio is supported, so the references should change. Some TODOs are addressed, some removed because they are not features requested by users, and some are left as implementation notes. Fixes #170. PiperOrigin-RevId: 379328488
2021-06-14Always accept discovered configurations from NDPGhanan Gowripalan
Before this change, the NDPDispatcher was allowed to "cancel" the discovery of default routers/prefixes and auto-generate addresses. No use case exists for this today so we drop this for now. If a use case comes up in the future, we should instead invalidate the discovered configuration through the stack instead of during discovery. PiperOrigin-RevId: 379327009
2021-06-09Merge pull request #6128 from kevinGC:fanout-pidgVisor bot
PiperOrigin-RevId: 378506076
2021-06-09Avoid fanout group collisions with best effortKevin Krakauer
Running multiple instances of netstack in the same network namespace can cause collisions when enabling packet fanout for fdbased endpoints. The only bulletproof fix is to run in different network namespaces, but by using `getpid()` instead of 0 as the fanout ID starting point we can avoid collisions in the common case, particularly when testing/experimenting. Addresses #6124
2021-06-07Exclusively lock IPv6 EP when modifying addressesGhanan Gowripalan
...as address add/removal updates multicast group memberships and NDP state. This partially reverts the change made to the IPv6 endpoint in https://github.com/google/gvisor/commit/ebebb3059f7c5dbe42af85715f1c51c. PiperOrigin-RevId: 378061726
2021-06-05Use the NIC packets arrived at when filteringGhanan Gowripalan
As per https://linux.die.net/man/8/iptables, ``` Parameters -i, --in-interface [!] name Name of an interface via which a packet was received (only for packets entering the INPUT, FORWARD and PREROUTING chains). ``` Before this change, iptables would use the NIC that a packet was delivered to after forwarding a packet locally (when forwarding is enabled) instead of the NIC the packet arrived at. Updates #170, #3549. Test: iptables_test.TestInputHookWithLocalForwarding PiperOrigin-RevId: 377714971
2021-06-04Honor data and FIN from the ACK completing handshakeMithun Iyer
If the ACK completing the handshake has FIN or data, requeue the segment for further processing by the newly established endpoint. Otherwise, the segments would have to be retransmitted by the peer to be processed by the established endpoint. Doing this, keeps the behavior in parity with Linux. This also addresses a test flake with TCPNonBlockingConnectClose where the ACK (completing the handshake) and multiple retransmitted FINACKs from the peer could be dropped by the listener, when using syncookies and the accept queue is full. The handshake could eventually get completed with a retransmitted FINACK, without actual processing of FIN. This can cause the poll with POLLRDHUP on the accepted socket to sometimes time out before the next FINACK retransmission. PiperOrigin-RevId: 377651695
2021-06-01Ensure full shutdown of endpoint on notifyCloseMithun Iyer
Address a race with non-blocking connect and socket close, causing the FIN (because of socket close) to not be sent out, even after completing the handshake. The race occurs with this sequence: (1) endpoint Connect starts handshake, sending out SYN (2) handshake complete() releases endpoint lock, waiting on sleeper.Fetch() (3) endpoint Close acquires endpoint lock, does not enqueue FIN (as the endpoint is not yet connected) and asserts notifyClose (4) SYNACK from peer gets enqueued asserting newSegmentWaker (5) handshake complete() re-aqcuires lock, first processes newSegmentWaker event, transitions to ESTABLISHED and proceeds to protocolMainLoop() (6) protocolMainLoop() exits while processing notifyClose When the execution follows the above sequence, no FIN is sent to the peer. This causes the listener side to have a half-open connection sitting in the accept queue. Fix this by ensuring that the protocolMainLoop() performs clean shutdown when the endpoint state is still ESTABLISHED. This would not be a bug, if during handshake complete(), sleeper.Fetch() prioritized notificationWaker over newSegmentWaker. In that case, the handshake would not have completed in (5) above. Fixes #6067 PiperOrigin-RevId: 376994395
2021-06-01Ignore RST received for a TCP listenerMithun Iyer
The current implementation has a bug where TCP listener does not ignore RSTs from the peer. While handling RST+ACK from the peer, this bug can complete handshakes that use syncookies. This results in half-open connection delivered to the accept queue. Fixes #6076 PiperOrigin-RevId: 376868749
2021-05-28Clean up warningsTamir Duberstein
- Typos - Unused arguments - Useless conversions PiperOrigin-RevId: 376362730
2021-05-27Support SO_BINDTODEVICE in ICMP socketsSam Balana
Adds support for the SO_BINDTODEVICE socket option in ICMP sockets with an accompanying packetimpact test to exercise use of this socket option. Adds a unit test to exercise the NIC selection logic introduced by this change. The remaining unit tests for ICMP sockets need to be added in a subsequent CL. See https://gvisor.dev/issues/5623 for the list of remaining unit tests. Adds a "timeout" field to PacketimpactTestInfo, necessary due to the long runtime of the newly added packetimpact test. Fixes #5678 Fixes #4896 Updates #5623 Updates #5681 Updates #5763 Updates #5956 Updates #5966 Updates #5967 PiperOrigin-RevId: 376271581
2021-05-27Speed up TestBindToDeviceDistributionKevin Krakauer
Testing only TestBindToDeviceDistribution decreased from 24s to 11s, and with TSAN from 186s to 21s. Note: using `t.Parallel()` actually slows the test down. PiperOrigin-RevId: 376251420
2021-05-27Use fake clocks in all testsTamir Duberstein
...except TCP tests and NDP tests that mutate globals. These will be undertaken later. Updates #5940. PiperOrigin-RevId: 376145608
2021-05-27Avoid warningsTamir Duberstein
- Don't shadow package name - Don't defer in a loop - Remove unnecessary type conversion PiperOrigin-RevId: 376137822
2021-05-26Use the stack RNG everywhereTamir Duberstein
...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666