summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/transport/tcp/connect.go
AgeCommit message (Collapse)Author
2020-02-24Add support for tearing down protocol dispatchers and TIME_WAIT endpoints.Ian Gudger
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to test this change. Also fix a race when a socket in SYN-RCVD is closed. This is also required to test this change. PiperOrigin-RevId: 296922548
2020-02-04Add socket connection stress test.Ian Gudger
Tests 65k connection attempts on common types of sockets to check for port leaks. Also fixes a bug where dual-stack sockets wouldn't properly re-queue segments received while closing. PiperOrigin-RevId: 293241166
2020-01-29Add support for TCP_DEFER_ACCEPT.Bhasker Hariharan
PiperOrigin-RevId: 292233574
2020-01-21Add a new TCP stat for current open connections.Mithun Iyer
Such a stat accounts for all connections that are currently established and not yet transitioned to close state. Also fix bug in double increment of CurrentEstablished stat. Fixes #1579 PiperOrigin-RevId: 290827365
2020-01-15Bugfix to terminate the protocol loop on StateError.Bhasker Hariharan
The change to introduce worker goroutines can cause the endpoint to transition to StateError and we should terminate the loop rather than let the endpoint transition to a CLOSED state as we do in case the endpoint enters TIME-WAIT/CLOSED. Moving to a closed state would cause the actual error to not be propagated to any read() calls etc. PiperOrigin-RevId: 289923568
2020-01-14Changes TCP packet dispatch to use a pool of goroutines.Bhasker Hariharan
All inbound segments for connections in ESTABLISHED state are delivered to the endpoint's queue but for every segment delivered we also queue the endpoint for processing to a selected processor. This ensures that when there are a large number of connections in ESTABLISHED state the inbound packets are all handled by a small number of goroutines and significantly reduces the amount of work the goscheduler has to perform. We let connections in other states follow the current path where the endpoint's goroutine directly handles the segments. Updates #231 PiperOrigin-RevId: 289728325
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2019-12-17Internal change.gVisor bot
PiperOrigin-RevId: 286003946
2019-12-11Add support for TCP_USER_TIMEOUT option.Bhasker Hariharan
The implementation follows the linux behavior where specifying a TCP_USER_TIMEOUT will cause the resend timer to honor the user specified timeout rather than the default rto based timeout. Further it alters when connections are timedout due to keepalive failures. It does not alter the behavior of when keepalives are sent. This is as per the linux behavior. PiperOrigin-RevId: 285099795
2019-12-06Add TCP stats for connection close and keep-alive timeouts.Mithun Iyer
Fix bugs in updates to TCP CurrentEstablished stat. Fixes #1277 PiperOrigin-RevId: 284292459
2019-12-06Fix flakiness in tcp_test.Bhasker Hariharan
This change marks the socket as ESTABLISHED and creates the receiver and sender the moment we send the final ACK in case of an active TCP handshake or when we receive the final ACK for a passive TCP handshake. Before this change there was a short window in which an ACK can be received and processed but the state on the socket is not yet ESTABLISHED. This can be seen in TestConnectBindToDevice which is flaky because sometimes the socket is in SYN-SENT and not ESTABLISHED even though the other side has already received the final ACK of the handshake. PiperOrigin-RevId: 284277713
2019-12-03Fix panic due to early transition to Closed.Bhasker Hariharan
The code in rcv.consumeSegment incorrectly transitions to CLOSED state from LAST-ACK before the final ACK for the FIN. Further if receiving a segment changes a socket to a closed state then we should not invoke the sender as the socket is now closed and sending any segments is incorrect. PiperOrigin-RevId: 283625300
2019-11-25Set transport and network headers on outbound packets.Kevin Krakauer
These are necessary for iptables to read and parse headers for packet filtering. PiperOrigin-RevId: 282372811
2019-11-22Internal change.Adin Scannell
PiperOrigin-RevId: 282068093
2019-11-22Use PacketBuffers with GSO.Kevin Krakauer
PiperOrigin-RevId: 282045221
2019-11-22Add segment dequeue check while emptying segment queue.Mithun Iyer
PiperOrigin-RevId: 282023891
2019-11-15Handle in-flight TCP segments when moving to CLOSE.Mithun Iyer
As we move to CLOSE state from LAST-ACK or TIME-WAIT, ensure that we re-match all in-flight segments to any listening endpoint. Also fix LISTEN state handling of any ACK segments as per RFC793. Fixes #1153 PiperOrigin-RevId: 280703556
2019-11-14Use PacketBuffers for outgoing packets.Kevin Krakauer
PiperOrigin-RevId: 280455453
2019-11-13Fix flaky behaviour during S/R.Bhasker Hariharan
PiperOrigin-RevId: 280280156
2019-11-07Add support for TIME_WAIT timeout.Bhasker Hariharan
This change adds explicit support for honoring the 2MSL timeout for sockets in TIME_WAIT state. It also adds support for the TCP_LINGER2 option that allows modification of the FIN_WAIT2 state timeout duration for a given socket. It also adds an option to modify the Stack wide TIME_WAIT timeout but this is only for testing. On Linux this is fixed at 60s. Further, we also now correctly process RST's in CLOSE_WAIT and close the socket similar to linux without moving it to error state. We also now handle SYN in ESTABLISHED state as per RFC5961#section-4.1. Earlier we would just drop these SYNs. Which can result in some tests that pass on linux to fail on gVisor. Netstack now honors TIME_WAIT correctly as well as handles the following cases correctly. - TCP RSTs in TIME_WAIT are ignored. - A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT and a dup ACK is sent in response to the FIN as the dup FIN indicates potential loss of the original final ACK. - An out of order segment during TIME_WAIT generates a dup ACK. - A new SYN w/ a sequence number > the highest sequence number in the previous connection closes the TIME_WAIT early and opens a new connection. Further to make the SYN case work correctly the ISN (Initial Sequence Number) generation for Netstack has been updated to be as per RFC. Its not a pure random number anymore and follows the recommendation in https://tools.ietf.org/html/rfc6528#page-3. The current hash used is not a cryptographically secure hash function. A separate change will update the hash function used to Siphash similar to what is used in Linux. PiperOrigin-RevId: 279106406
2019-10-28Use the user supplied TCP MSS when creating a new active socketGhanan Gowripalan
This change supports using a user supplied TCP MSS for new active TCP connections. Note, the user supplied MSS must be less than or equal to the maximum possible MSS for a TCP connection's route. If it is greater than the maximum possible MSS, the maximum possible MSS will be used as the connection's MSS instead. This change does not use this user supplied MSS for connections accepted from listening sockets - that will come in a later change. Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user supplied MSS if it is not greater than the maximum possible MSS for the route. PiperOrigin-RevId: 277185125
2019-10-24Remove the amss field from tcpip.tcp.handshake as it was unusedGhanan Gowripalan
The amss field in the tcpip.tcp.handshake was not used anywhere. Removed it to not cause confusion with the amss field in the tcpip.tcp.endpoint struct, which was documented to be used (and is actually being used) for the same purpose. PiperOrigin-RevId: 276577088
2019-10-23Merge pull request #641 from tanjianfeng:mastergVisor bot
PiperOrigin-RevId: 276380008
2019-10-22netstack/tcp: software segmentation offloadAndrei Vagin
Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-15epsocket: support /proc/net/snmpJianfeng Tan
Netstack has its own stats, we use this to fill /proc/net/snmp. Note that some metrics are not recorded in Netstack, which will be shown as 0 in the proc file. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15netstack: add counters for tcp CurrEstab and EstabResetsJianfeng Tan
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-14Internal change.gVisor bot
PiperOrigin-RevId: 274700093
2019-10-09Internal change.gVisor bot
PiperOrigin-RevId: 273861936
2019-10-07Implement IP_TTL.Ian Gudger
Also change the default TTL to 64 to match Linux. PiperOrigin-RevId: 273430341
2019-09-04Fix RST generation bugs.Bhasker Hariharan
There are a few cases addressed by this change - We no longer generate a RST in response to a RST packet. - When we receive a RST we cleanup and release all reservations immediately as the connection is now aborted. - An ACK received by a listening socket generates a RST when SYN cookies are not in-use. The only reason an ACK should land at the listening socket is if we are using SYN cookies otherwise the goroutine for the handshake in progress should have gotten the packet and it should never have arrived at the listening endpoint. - Also fixes the error returned when a connection times out due to a Keepalive timer expiration from ECONNRESET to a ETIMEDOUT. PiperOrigin-RevId: 267238427
2019-06-13Add support for TCP receive buffer auto tuning.Bhasker Hariharan
The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-10Fixes to listen backlog handling.Bhasker Hariharan
Changes netstack to confirm to current linux behaviour where if the backlog is full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto backlog connections to be in SYN-RCVD state as long as the backlog is not full. We also now drop a SYN if syn cookies are in use and the backlog for the listening endpoint is full. Added new tests to confirm the behaviour. Also reverted the change to increase the backlog in TcpPortReuseMultiThread syscall test. Fixes #236 PiperOrigin-RevId: 252500462
2019-06-06Track and export socket state.Rahat Mahmood
This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850
2019-06-04Fix data race in synRcvdState.Bhasker Hariharan
When checking the length of the acceptedChan we should hold the endpoint mutex otherwise a syn received while the listening socket is being closed can result in a data race where the cleanupLocked routine sets acceptedChan to nil while a handshake goroutine in progress could try and check it at the same time. PiperOrigin-RevId: 251537697
2019-05-30Fixes to TCP listen behavior.Bhasker Hariharan
Netstack listen loop can get stuck if cookies are in-use and the app is slow to accept incoming connections. Further we continue to complete handshake for a connection even if the backlog is full. This creates a problem when a lots of connections come in rapidly and we end up with lots of completed connections just hanging around to be delivered. These fixes change netstack behaviour to mirror what linux does as described here in the following article http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html Now when cookies are not in-use Netstack will silently drop the ACK to a SYN-ACK and not complete the handshake if the backlog is full. This will result in the connection staying in a half-complete state. Eventually the sender will retransmit the ACK and if backlog has space we will transition to a connected state and deliver the endpoint. Similarly when cookies are in use we do not try and create an endpoint unless there is space in the accept queue to accept the newly created endpoint. If there is no space then we again silently drop the ACK as we can just recreate it when the ACK is retransmitted by the peer. We also now use the backlog to cap the size of the SYN-RCVD queue for a given endpoint. So at any time there can be N connections in the backlog and N in a SYN-RCVD state if the application is not accepting connections. Any new SYNs will be dropped. This CL also fixes another small bug where we mark a new endpoint which has not completed handshake as connected. We should wait till handshake successfully completes before marking it connected. Updates #236 PiperOrigin-RevId: 250717817
2019-05-03Implement support for SACK based recovery(RFC 6675).Bhasker Hariharan
PiperOrigin-RevId: 246536003 Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-09Add TCP checksum verification.Bhasker Hariharan
PiperOrigin-RevId: 242704699 Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
2019-03-28netstack/fdbased: add generic segmentation offload (GSO) supportAndrei Vagin
The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 240809103 Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-03-26netstack: Don't exclude length when a pseudo-header checksum is calculatedAndrei Vagin
This is a preparation for GSO changes (cl/234508902). RELNOTES[gofers]: Refactor checksum code to include length, which it already did, but in a convoluted way. Should be a no-op. PiperOrigin-RevId: 240460794 Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2
2019-03-14Remove duplicate TCP flag definitionsTamir Duberstein
PiperOrigin-RevId: 238467634 Change-Id: If4cd8efff7386fbee1195f051d15549b495910a9
2019-02-11Do not drop packets w/ missing TCP timestamps.Bhasker Hariharan
RFC7323 recommends that if the timestamp option was negotiated then all packets should carry a TCP Timestamp and any packets that do not should be dropped. Netstack implemented this behaviour. Linux OTOH does not and will accept such packets. This change makes Netstack behaviour compatible with Linux. Also now that we allow such packets, we do need to update RTO calculations based on these packets even if timestamp option is enabled. PiperOrigin-RevId: 233432268 Change-Id: I9f4742ae6b63930ac3b5e37d8c238761e6a4b29f
2018-12-13transport/tcp: remove unused error return valuesIan Gudger
PiperOrigin-RevId: 225421480 Change-Id: I1e9259b0b7e8490164e830b73338a615129c7f0e
2018-12-05sentry: skip waiting for undrain for netstack TCP endpoints in error state.Zhaozhong Ni
PiperOrigin-RevId: 224214981 Change-Id: I4c1dd5b1c856f7a4f9866a5dda44a5297e92486a
2018-11-05Merge segments in sender's writeListIan Gudger
PiperOrigin-RevId: 220185891 Change-Id: Iaea73fd7b2fa8c399b989cdcaabf4885f370df4b
2018-10-31Fix a race where keepalives could be sent while there is pending dataIan Gudger
PiperOrigin-RevId: 219571556 Change-Id: I5a1042c1cb05eb2711eb01627fd298bad6c543a6
2018-10-19Use correct company name in copyright headerIan Gudger
PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-09-28Block for link address resolutionSepehr Raissian
Previously, if address resolution for UDP or Ping sockets required sending packets using Write in Transport layer, Resolve would return ErrWouldBlock and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution would run in background. Further calls to Write using same address would also return ErrNoLinkAddress until resolution has been completed successfully. Since Write is not allowed to block and System Calls need to be interruptible in System Call layer, the caller to Write is responsible for blocking upon return of ErrWouldBlock. Now, when startAddressResolution is called a notification channel for the completion of the address resolution is returned. The channel will traverse up to the calling function of Write as well as ErrNoLinkAddress. Once address resolution is complete (success or not) the channel is closed. The caller would call Write again to send packets and check if address resolution was compeleted successfully or not. Fixes google/gvisor#5 Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93 PiperOrigin-RevId: 214962200
2018-09-14Pass buffer.Prependable by valueTamir Duberstein
PiperOrigin-RevId: 213053370 Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562