Age | Commit message (Collapse) | Author |
|
All inbound segments for connections in ESTABLISHED state are delivered to the
endpoint's queue but for every segment delivered we also queue the endpoint for
processing to a selected processor. This ensures that when there are a large
number of connections in ESTABLISHED state the inbound packets are all handled
by a small number of goroutines and significantly reduces the amount of work the
goscheduler has to perform.
We let connections in other states follow the current path where the
endpoint's goroutine directly handles the segments.
Updates #231
PiperOrigin-RevId: 289728325
|
|
This is a band-aid fix for now to prevent panics.
PiperOrigin-RevId: 289078453
|
|
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387
|
|
The implementation follows the linux behavior where specifying
a TCP_USER_TIMEOUT will cause the resend timer to honor the
user specified timeout rather than the default rto based timeout.
Further it alters when connections are timedout due to keepalive
failures. It does not alter the behavior of when keepalives are
sent. This is as per the linux behavior.
PiperOrigin-RevId: 285099795
|
|
Fix bugs in updates to TCP CurrentEstablished stat.
Fixes #1277
PiperOrigin-RevId: 284292459
|
|
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
PiperOrigin-RevId: 273861936
|
|
Add missing state transition to LastAck, which should happen when the
endpoint has already recieved a FIN from the remote side, and is
sending its own FIN.
PiperOrigin-RevId: 265568314
|
|
This change just introduces different congestion control states and
ensures the sender.state is updated to reflect the current state
of the connection.
It is not used for any decisions yet but this is required before
algorithms like Eiffel/PRR can be implemented.
Fixes #394
PiperOrigin-RevId: 262638292
|
|
PiperOrigin-RevId: 261373749
|
|
Export some readily-available fields for TCP_INFO and stub out the rest.
PiperOrigin-RevId: 261191548
|
|
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.
This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.
The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf
NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.
Updates #230
PiperOrigin-RevId: 253168283
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.
PiperOrigin-RevId: 252889093
|
|
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).
For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.
PiperOrigin-RevId: 251934850
|
|
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.
PiperOrigin-RevId: 251743020
|
|
PiperOrigin-RevId: 246536003
Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.
Here are numbers for the nginx-1m test:
runsc: 579330.01 [Kbytes/sec] received
runsc-gso: 1794121.66 [Kbytes/sec] received
runc: 2122139.06 [Kbytes/sec] received
and for tcp_benchmark:
$ tcp_benchmark --duration 15 --ideal
[ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal
[ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal --gso 65536
[ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec
PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
|
|
PiperOrigin-RevId: 239417224
Change-Id: I14a9adc31a6330a79a6156c105969cd5f1f63d20
|
|
See: https://tools.ietf.org/html/rfc6691#section-2
PiperOrigin-RevId: 239305632
Change-Id: Ie8eb912a43332e6490045dc95570709c5b81855e
|
|
PiperOrigin-RevId: 238467634
Change-Id: If4cd8efff7386fbee1195f051d15549b495910a9
|
|
PiperOrigin-RevId: 236945145
Change-Id: I051760d95154ea5574c8bb6aea526f488af5e07b
|
|
This change does not make use of SACK information but adds support to track
SACK information and store it in the endpoint.
The actual SACK based recovery will be in a separate CL.
Part of commits to add RFC 6675 support to Netstack.
PiperOrigin-RevId: 235612264
Change-Id: I261f94844d7bad5abda803152ce6cc6125a467ff
|
|
RFC7323 recommends that if the timestamp option was negotiated
then all packets should carry a TCP Timestamp and any packets that
do not should be dropped.
Netstack implemented this behaviour. Linux OTOH does not and will
accept such packets. This change makes Netstack behaviour compatible
with Linux.
Also now that we allow such packets, we do need to update RTO calculations
based on these packets even if timestamp option is enabled.
PiperOrigin-RevId: 233432268
Change-Id: I9f4742ae6b63930ac3b5e37d8c238761e6a4b29f
|
|
PiperOrigin-RevId: 224033418
Change-Id: I780be973e8be68ac93e8c9e7a100002e912f40d2
|
|
PiperOrigin-RevId: 221484739
Change-Id: I44c71f79f99d0d00a2e70a7f06d7024a62a5de0a
|
|
Previously, TCP_NODELAY was always enabled and we would lie about it being
configurable. TCP_NODELAY is now disabled by default (to match Linux) in the
socket layer so that non-gVisor users don't automatically start using this
questionable optimization.
PiperOrigin-RevId: 221368472
Change-Id: Ib0240f66d94455081f4e0ca94f09d9338b2c1356
|
|
PiperOrigin-RevId: 221117846
Change-Id: I2a43fd8135b1d1194ff81e98644ce6b6182ece50
|
|
PiperOrigin-RevId: 220185891
Change-Id: Iaea73fd7b2fa8c399b989cdcaabf4885f370df4b
|
|
PiperOrigin-RevId: 219571556
Change-Id: I5a1042c1cb05eb2711eb01627fd298bad6c543a6
|
|
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
|
|
From RFC7323#Section-4
The [RFC6298] RTT estimator has weighting factors, alpha and beta, based on an
implicit assumption that at most one RTTM will be sampled per RTT. When
multiple RTTMs per RTT are available to update the RTT estimator, an
implementation SHOULD try to adhere to the spirit of the history specified in
[RFC6298]. An implementation suggestion is detailed in Appendix G.
From RFC7323#appendix-G
Appendix G. RTO Calculation Modification
Taking multiple RTT samples per window would shorten the history calculated
by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a
similar history as originally intended by [RFC6298].
It is roughly known how many samples a congestion window worth of data will
yield, not accounting for ACK compression, and ACK losses. Such events will
result in more history of the path being reflected in the final value for
RTO, and are uncritical. This modification will ensure that a similar amount
of time is taken into account for the RTO estimation, regardless of how many
samples are taken per window:
ExpectedSamples = ceiling(FlightSize / (SMSS * 2))
alpha' = alpha / ExpectedSamples
beta' = beta / ExpectedSamples
Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".
Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and
beta' instead:
RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|
SRTT <- (1 - alpha') * SRTT + alpha' * R'
(for each sample R')
PiperOrigin-RevId: 213644795
Change-Id: I52278b703540408938a8edb8c38be97b37f4a10e
|
|
Makes it possible to avoid copying or allocating in cases where DeliverNetworkPacket (rx)
needs to turn around and call WritePacket (tx) with its VectorisedView.
Also removes the restriction on having VectorisedViews with multiple views in the write path.
PiperOrigin-RevId: 211728717
Change-Id: Ie03a65ecb4e28bd15ebdb9c69f05eced18fdfcff
|
|
PiperOrigin-RevId: 211670620
Change-Id: Ia8a3d8ae53a7fece1dee08ee9c74964bd7f71bb7
|
|
PiperOrigin-RevId: 211504634
Change-Id: I9a7bcbbdd40e5036894930f709278725ef477293
|
|
This CL implements CUBIC as described in https://tools.ietf.org/html/rfc8312.
PiperOrigin-RevId: 207353142
Change-Id: I329cbf3277f91127e99e488f07d906f6779c6603
|
|
PiperOrigin-RevId: 207125440
Change-Id: I6c572afb4d693ee72a0c458a988b0e96d191cd49
|
|
PiperOrigin-RevId: 207037226
Change-Id: I8b5f1a056d4f3eab17846f2e0193bb737ecb5428
|
|
PiperOrigin-RevId: 207007153
Change-Id: Ifedf1cc3758dc18be16647a4ece9c840c1c636c9
|
|
This CL also puts the congestion control logic behind an
interface so that we can easily swap it out for say CUBIC
in the future.
PiperOrigin-RevId: 205732848
Change-Id: I891cdfd17d4d126b658b5faa0c6bd6083187944b
|
|
PiperOrigin-RevId: 203958972
Change-Id: Ia6fe16547539296d48e2c6731edacdd96bd6e93c
|
|
Fixes #27
PiperOrigin-RevId: 203825288
Change-Id: Ie9f3a2b2c1e296b026b024f75c07da1a7e118633
|
|
There is a subtle bug where during cleanup with unread data a FIN can
be converted to a RST, at that point the entire connection should be
aborted as we're not expecting any ACKs to the RST.
PiperOrigin-RevId: 202691271
Change-Id: Idae70800208ca26e07a379bc6b2b8090805d0a22
|
|
PiperOrigin-RevId: 202151720
Change-Id: I0491172c436bbb32b977f557953ba0bc41cfe299
|
|
PiperOrigin-RevId: 201596247
Change-Id: Id22f47b2cdcbe14aa0d930f7807ba75f91a56724
|
|
Today when we transmit a RST it's happening during the time-wait
flow. Because a FIN is allowed to advance the acceptable ACK window
we're incorrectly doing that for a RST.
PiperOrigin-RevId: 197637565
Change-Id: I080190b06bd0225326cd68c1fbf37bd3fdbd414e
|
|
PiperOrigin-RevId: 195307689
Change-Id: I499f19af49875a43214797d63376f20ae788d2f4
|
|
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
|