summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/transport
AgeCommit message (Collapse)Author
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-24Unexpose immutable fields in stack.RouteNick Brown
This change sets the inner `routeInfo` struct to be a named private member and replaces direct access with access through getters. Note that direct access to the fields of `routeInfo` is still possible through the `RouteInfo` struct. Fixes #4902 PiperOrigin-RevId: 364822872
2021-03-23Use constant (TestInitialSequenceNumber) instead of integer (789) in tests.Nayana Bidari
PiperOrigin-RevId: 364596526
2021-03-17Do not use martian loopback packets in testsGhanan Gowripalan
Transport demuxer and UDP tests should not use a loopback address as the source address for packets injected into the stack as martian loopback packets will be dropped in a later change. PiperOrigin-RevId: 363479681
2021-03-16Fix tcp_fin_retransmission_netstack_testZeling Feng
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT unconditionally. Fixing the state machine will give us back the retransmission of FIN. PiperOrigin-RevId: 363301883
2021-03-16Fix a race with synRcvdCount and acceptMithun Iyer
There is a race in handling new incoming connections on a listening endpoint that causes the endpoint to reply to more incoming SYNs than what is permitted by the listen backlog. The race occurs when there is a successful passive connection handshake and the synRcvdCount counter is decremented, followed by the endpoint delivered to the accept queue. In the window of time between synRcvdCount decrementing and the endpoint being enqueued for accept, new incoming SYNs can be handled without honoring the listen backlog value, as the backlog could be perceived not full. Fixes #5637 PiperOrigin-RevId: 363279372
2021-03-15Make netstack (//pkg/tcpip) buildable for 32 bitKevin Krakauer
Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-11improve readability of ports packageKevin Krakauer
Lots of small changes: - simplify package API via Reservation type - rename some single-letter variable names that were hard to follow - rename some types PiperOrigin-RevId: 362442366
2021-03-09Give TCP flags a dedicated typeZeling Feng
- Implement Stringer for it so that we can improve error messages. - Use TCPFlags through the code base. There used to be a mixed usage of byte, uint8 and int as TCP flags. PiperOrigin-RevId: 361940150
2021-03-08Implement /proc/sys/net/ipv4/ip_local_port_rangeKevin Krakauer
Speeds up the socket stress tests by a couple orders of magnitude. PiperOrigin-RevId: 361721050
2021-03-05Increment the counters when sending Echo requestsArthur Sfez
Updates #5597 PiperOrigin-RevId: 361252003
2021-03-04Nit fix: Should use maxTimeout in backoffTimerTing-Yu Wang
The only user is in (*handshake).complete and it specifies MaxRTO, so there is no behavior changes. PiperOrigin-RevId: 360954447
2021-03-03Make dedicated methods for data operations in PacketBufferTing-Yu Wang
One of the preparation to decouple underlying buffer implementation. There are still some methods that tie to VectorisedView, and they will be changed gradually in later CLs. This CL also introduce a new ICMPv6ChecksumParams to replace long list of parameters when calling ICMPv6Checksum, aiming to be more descriptive. PiperOrigin-RevId: 360778149
2021-03-03Add checklocks analyzer.Bhasker Hariharan
This validates that struct fields if annotated with "// checklocks:mu" where "mu" is a mutex field in the same struct then access to the field is only done with "mu" locked. All types that are guarded by a mutex must be annotated with // +checklocks:<mutex field name> For more details please refer to README.md. PiperOrigin-RevId: 360729328
2021-03-01tcp: endpoint.Write has to send all data that has been read from payloadAndrei Vagin
io.Reader.ReadFull returns the number of bytes copied and an error if fewer bytes were read. PiperOrigin-RevId: 360247614
2021-02-26Fix panic due to zero length writes in TCP.Bhasker Hariharan
There is a short race where in Write an endpoint can transition from writable to non-writable state due to say an incoming RST during the time we release the endpoint lock and reacquire after copying the payload. In such a case if the write happens to be a zero sized write we end up trying to call sendData() even though nothing was queued. This can panic when trying to enable/disable TCP timers if the endpoint had already transitioned to a CLOSED/ERROR state due to the incoming RST as we cleanup timers when the protocol goroutine terminates. Sadly the race window is small enough that my attempts at reproducing the panic in a syscall test has not been successful. PiperOrigin-RevId: 359887905
2021-02-26Use closure to avoid manual unlockingTamir Duberstein
Also increase refcount of raw.endpoint.route while in use. Avoid allocating an array of size zero. PiperOrigin-RevId: 359797788
2021-02-25RACK: recovery logic should check for receive window before re-transmitting.Nayana Bidari
Use maybeSendSegment while sending segments in RACK recovery which checks if the receiver has space and splits the segments when the segment size is greater than MSS. PiperOrigin-RevId: 359641097
2021-02-25Remove deadlock in raw.endpoint caused by recursive read lockingKevin Krakauer
Prevents the following deadlock: - Raw packet is sent via e.Write(), which read locks e.mu - Connect() is called, blocking on write locking e.mu - The packet is routed to loopback and back to e.HandlePacket(), which read locks e.mu Per the atomic.RWMutex documentation, this deadlocks: "If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking. This is to ensure that the lock eventually becomes available; a blocked Lock call excludes new readers from acquiring the lock." Also, release eps.mu earlier in deliverRawPacket. PiperOrigin-RevId: 359600926
2021-02-11[rack] TLP: ACK Processing and PTO scheduling.Ayush Ranjan
This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3 Fixes #5085 PiperOrigin-RevId: 357125037
2021-02-11[netstack] Fix recovery entry and exit checks.Ayush Ranjan
Entry check: - Earlier implementation was preventing us from entering recovery even if SND.UNA is lost but dupAckCount is still below threshold. Fixed that. - We should only enter recovery when at least one more byte of data beyond the highest byte that was outstanding when fast retransmit was last entered is acked. Added that check. Exit check: - Earlier we were checking if SEG.ACK is in range [SND.UNA, SND.NXT]. The intention was to check if any unacknowledged data was ACKed. Note that (SEG.ACK - 1) is actually the sequence number which was ACKed. So we were incorrectly including (SND.UNA - 1) in the range. Fixed the check to now be (SEG.ACK - 1) in range [SND.UNA, SND.NXT). Additionally, moved a RACK specific test to the rack tests file. Added tests for the changes I made. PiperOrigin-RevId: 357091322
2021-02-10RACK: Fix re-transmitting the segment twice when entering recovery.Nayana Bidari
TestRACKWithDuplicateACK is flaky as the reorder window can expire before receiving three duplicate ACKs which will result in sending the first unacknowledged segment twice: when reorder timer expired and again after receiving the third duplicate ACK. This CL will fix this behavior and will not resend the segment again if it was already re-transmittted when reorder timer expired. Update the TestRACKWithDuplicateACK to test that the first segment is considered as lost and is re-transmitted. PiperOrigin-RevId: 356855168
2021-02-09Add support for setting SO_SNDBUF for unix domain sockets.Bhasker Hariharan
The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-08Allow UDP sockets connect()ing to port 0Zeling Feng
We previously return EINVAL when connecting to port 0, however this is not the observed behavior on Linux. One of the observable effects after connecting to port 0 on Linux is that getpeername() will fail with ENOTCONN. PiperOrigin-RevId: 356413451
2021-02-08RACK: Detect lossNayana Bidari
Detect packet loss using reorder window and re-transmit them after the reorder timer expires. PiperOrigin-RevId: 356321786
2021-02-03Add a function to enable RACK in tests.Nayana Bidari
- Adds a function to enable RACK in tests. - RACK update functions are guarded behind the flag tcpRecovery. PiperOrigin-RevId: 355435973
2021-02-02Rename HandleNDupAcks in TCP.Nayana Bidari
Rename HandleNDupAcks() to HandleLossDetected() as it will enter this when is detected after: - reorder window expires and TLP (in case of RACK) - dupAckCount >= 3 PiperOrigin-RevId: 355237858
2021-02-02Add support for rate limiting out of window ACKs.Bhasker Hariharan
Netstack today will send dupACK's with no rate limit for incoming out of window segments. This can result in ACK loops for example if a TCP socket connects to itself (actually permitted by TCP). Where the ACK sent in response to packets being out of order itself gets considered as an out of window segment resulting in another ACK being generated. PiperOrigin-RevId: 355206877
2021-02-01Refactor HandleControlPacket/SockErrorGhanan Gowripalan
...to remove the need for the transport layer to deduce the type of error it received. Rename HandleControlPacket to HandleError as HandleControlPacket only handles errors. tcpip.SockError now holds a tcpip.SockErrorCause interface that different errors can implement. PiperOrigin-RevId: 354994306
2021-01-31Hide neighbor table kind from NetworkEndpointGhanan Gowripalan
The network endpoint should not need to have logic to handle different kinds of neighbor tables. Network endpoints can let the NIC know about differnt neighbor discovery messages and let the NIC decide which table to update. This allows us to remove the LinkAddressCache interface. PiperOrigin-RevId: 354812584
2021-01-28RACK: Update reorder window.Nayana Bidari
After receiving an ACK(cumulative or selective), RACK will update the reorder window which is used as a settling time before marking the packet as lost. This change will add an init function to initialize the variables in RACK and also store the reference to sender in rackControl. The reorder window is calculated as per rfc: https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.2 Step 4. PiperOrigin-RevId: 354453528
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28Respect SO_BINDTODEVICE in unconnected UDP writesMarina Ciocea
Previously, sending on an unconnected UDP socket would ignore the SO_BINDTODEVICE option. Send on the configured interface when an UDP socket is bound to an interface through setsockop SO_BINDTODEVICE. Add packetimpact tests exercising UDP reads and writes with every combination of bound/unbound, broadcast/multicast/unicast destination, and bound/not-bound to device. PiperOrigin-RevId: 354299670
2021-01-27Confirm neighbor reachability with TCP ACKsGhanan Gowripalan
As per RFC 4861 section 7.3.1, A neighbor is considered reachable if the node has recently received a confirmation that packets sent recently to the neighbor were received by its IP layer. Positive confirmation can be gathered in two ways: hints from upper-layer protocols that indicate a connection is making "forward progress", or receipt of a Neighbor Advertisement message that is a response to a Neighbor Solicitation message. This change adds support for TCP to let the IP/link layers know that a neighbor is reachable. Test: integration_test.TestTCPConfirmNeighborReachability PiperOrigin-RevId: 354222833
2021-01-27Add support for more fields in netstack for TCP_INFONayana Bidari
This CL adds support for the following fields: - RTT, RTTVar, RTO - send congestion window (sndCwnd) and send slow start threshold (sndSsthresh) - congestion control state(CaState) - ReorderSeen PiperOrigin-RevId: 354195361
2021-01-26Initialize the send buffer handler in endpoint creation.Nayana Bidari
- This CL will initialize the function handler used for getting the send buffer size limits during endpoint creation and does not require the caller of SetSendBufferSize(..) to know the endpoint type(tcp/udp/..) PiperOrigin-RevId: 353992634
2021-01-26Implement error on pointersTamir Duberstein
This improves type-assertion safety. PiperOrigin-RevId: 353931228
2021-01-26Fix couple of potential route leaks.Bhasker Hariharan
connect() can be invoked multiple times on UDP/RAW sockets and in such a case we should release the cached route from the previous connect. Fixes #5359 PiperOrigin-RevId: 353919891
2021-01-26Drop nicID from transport endpoint reg/cleanup fnsGhanan Gowripalan
...as it is unused. PiperOrigin-RevId: 353896981
2021-01-26Move SO_SNDBUF to socketops.Nayana Bidari
This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For unix sockets, we do not support setting of this option. PiperOrigin-RevId: 353871484
2021-01-25Unlock tcp endpoint on zero-length atomic writesTamir Duberstein
Rewrite tcp.endpoint.Write to avoid manual locking and unlocking. This should prevent similar mistakes in the future. PiperOrigin-RevId: 353675734
2021-01-22Define tcpip.Payloader in terms of io.ReaderTamir Duberstein
Fixes #1509. PiperOrigin-RevId: 353295589
2021-01-15Remove count argument from tcpip.Endpoint.ReadTamir Duberstein
The same intent can be specified via the io.Writer. PiperOrigin-RevId: 352098747
2021-01-15[rack] Retransmit the probe segment after the probe timer expires.Ayush Ranjan
This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.2. Fixes #5084 PiperOrigin-RevId: 352093473
2021-01-15Populate EgressRoute, GSO, Netproto for batch writesGhanan Gowripalan
We loop over the list of packets anyways so setting these aren't expensive. Now that they are populated only by the link endpoint that uses them, TCP does not need to. PiperOrigin-RevId: 352090853
2021-01-14Remove impossible errorsTamir Duberstein
Commit 25b5ec7 moved link address resolution out of the transport layer; special handling of link address resolution is no longer necessary in tcp. PiperOrigin-RevId: 351839254
2021-01-13Do not resolve remote link address at transport layerGhanan Gowripalan
Link address resolution is performed at the link layer (if required) so we can defer it from the transport layer. When link resolution is required, packets will be queued and sent once link resolution completes. If link resolution fails, the transport layer will receive a control message indicating that the stack failed to route the packet. tcpip.Endpoint.Write no longer returns a channel now that writes do not wait for link resolution at the transport layer. tcpip.ErrNoLinkAddress is no longer used so it is removed. Removed calls to stack.Route.ResolveWith from the transport layer so that link resolution is performed when a route is created in response to an incoming packet (e.g. to complete TCP handshakes or send a RST). Tests: - integration_test.TestForwarding - integration_test.TestTCPLinkResolutionFailure Fixes #4458 RELNOTES: n/a PiperOrigin-RevId: 351684158
2021-01-13Clean up the dummy network interface used by UDP testsArthur Sfez
It is now composed by a NetworkInterface interface which lets us delete the methods we don't need. PiperOrigin-RevId: 351613267
2021-01-13[rack] TLP: Recovery detection.Ayush Ranjan
This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.6 Fixes #5131 PiperOrigin-RevId: 351558449
2021-01-12Drop TransportEndpointID from HandleControlPacketGhanan Gowripalan
When a control packet is delivered, it is delivered to a transport endpoint with a matching stack.TransportEndpointID so there is no need to pass the ID to the endpoint as it already knows its ID. PiperOrigin-RevId: 351497588