Age | Commit message (Collapse) | Author |
|
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather
than hack these on in sys_poll etc it felt cleaner to just cleanup
the call sites to notify for both events. This is what linux does
as well.
Fixes #5544
PiperOrigin-RevId: 364859977
|
|
This change sets the inner `routeInfo` struct to be a named private member
and replaces direct access with access through getters. Note that direct
access to the fields of `routeInfo` is still possible through the `RouteInfo`
struct.
Fixes #4902
PiperOrigin-RevId: 364822872
|
|
PiperOrigin-RevId: 364596526
|
|
Transport demuxer and UDP tests should not use a loopback address as the
source address for packets injected into the stack as martian loopback
packets will be dropped in a later change.
PiperOrigin-RevId: 363479681
|
|
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT
unconditionally. Fixing the state machine will give us back the retransmission
of FIN.
PiperOrigin-RevId: 363301883
|
|
There is a race in handling new incoming connections on a listening
endpoint that causes the endpoint to reply to more incoming SYNs than
what is permitted by the listen backlog.
The race occurs when there is a successful passive connection handshake
and the synRcvdCount counter is decremented, followed by the endpoint
delivered to the accept queue. In the window of time between
synRcvdCount decrementing and the endpoint being enqueued for accept,
new incoming SYNs can be handled without honoring the listen backlog
value, as the backlog could be perceived not full.
Fixes #5637
PiperOrigin-RevId: 363279372
|
|
Doing so involved breaking dependencies between //pkg/tcpip and the rest
of gVisor, which are discouraged anyways.
Tested on the Go branch via:
gvisor.dev/gvisor/pkg/tcpip/...
Addresses #1446.
PiperOrigin-RevId: 363081778
|
|
Lots of small changes:
- simplify package API via Reservation type
- rename some single-letter variable names that were hard to follow
- rename some types
PiperOrigin-RevId: 362442366
|
|
- Implement Stringer for it so that we can improve error messages.
- Use TCPFlags through the code base. There used to be a mixed usage of byte,
uint8 and int as TCP flags.
PiperOrigin-RevId: 361940150
|
|
Speeds up the socket stress tests by a couple orders of magnitude.
PiperOrigin-RevId: 361721050
|
|
Updates #5597
PiperOrigin-RevId: 361252003
|
|
The only user is in (*handshake).complete and it specifies MaxRTO, so there is
no behavior changes.
PiperOrigin-RevId: 360954447
|
|
One of the preparation to decouple underlying buffer implementation.
There are still some methods that tie to VectorisedView, and they will be
changed gradually in later CLs.
This CL also introduce a new ICMPv6ChecksumParams to replace long list of
parameters when calling ICMPv6Checksum, aiming to be more descriptive.
PiperOrigin-RevId: 360778149
|
|
This validates that struct fields if annotated with "// checklocks:mu" where
"mu" is a mutex field in the same struct then access to the field is only
done with "mu" locked.
All types that are guarded by a mutex must be annotated with
// +checklocks:<mutex field name>
For more details please refer to README.md.
PiperOrigin-RevId: 360729328
|
|
io.Reader.ReadFull returns the number of bytes copied and an error if fewer
bytes were read.
PiperOrigin-RevId: 360247614
|
|
There is a short race where in Write an endpoint can transition from writable
to non-writable state due to say an incoming RST during the time we release
the endpoint lock and reacquire after copying the payload. In such a case
if the write happens to be a zero sized write we end up trying to call
sendData() even though nothing was queued.
This can panic when trying to enable/disable TCP timers if the endpoint had
already transitioned to a CLOSED/ERROR state due to the incoming RST as we
cleanup timers when the protocol goroutine terminates.
Sadly the race window is small enough that my attempts at reproducing the panic
in a syscall test has not been successful.
PiperOrigin-RevId: 359887905
|
|
Also increase refcount of raw.endpoint.route while in use.
Avoid allocating an array of size zero.
PiperOrigin-RevId: 359797788
|
|
Use maybeSendSegment while sending segments in RACK recovery which checks if
the receiver has space and splits the segments when the segment size is
greater than MSS.
PiperOrigin-RevId: 359641097
|
|
Prevents the following deadlock:
- Raw packet is sent via e.Write(), which read locks e.mu
- Connect() is called, blocking on write locking e.mu
- The packet is routed to loopback and back to e.HandlePacket(), which read
locks e.mu
Per the atomic.RWMutex documentation, this deadlocks:
"If a goroutine holds a RWMutex for reading and another goroutine might call
Lock, no goroutine should expect to be able to acquire a read lock until the
initial read lock is released. In particular, this prohibits recursive read
locking. This is to ensure that the lock eventually becomes available; a blocked
Lock call excludes new readers from acquiring the lock."
Also, release eps.mu earlier in deliverRawPacket.
PiperOrigin-RevId: 359600926
|
|
This change implements TLP details enumerated in
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3
Fixes #5085
PiperOrigin-RevId: 357125037
|
|
Entry check:
- Earlier implementation was preventing us from entering recovery even if
SND.UNA is lost but dupAckCount is still below threshold. Fixed that.
- We should only enter recovery when at least one more byte of data beyond the
highest byte that was outstanding when fast retransmit was last entered is
acked. Added that check.
Exit check:
- Earlier we were checking if SEG.ACK is in range [SND.UNA, SND.NXT]. The
intention was to check if any unacknowledged data was ACKed. Note that
(SEG.ACK - 1) is actually the sequence number which was ACKed. So we were
incorrectly including (SND.UNA - 1) in the range. Fixed the check to now be
(SEG.ACK - 1) in range [SND.UNA, SND.NXT).
Additionally, moved a RACK specific test to the rack tests file.
Added tests for the changes I made.
PiperOrigin-RevId: 357091322
|
|
TestRACKWithDuplicateACK is flaky as the reorder window can expire before
receiving three duplicate ACKs which will result in sending the first
unacknowledged segment twice: when reorder timer expired and again after
receiving the third duplicate ACK.
This CL will fix this behavior and will not resend the segment again if it was
already re-transmittted when reorder timer expired.
Update the TestRACKWithDuplicateACK to test that the first segment is
considered as lost and is re-transmitted.
PiperOrigin-RevId: 356855168
|
|
The limits for snd/rcv buffers for unix domain socket is controlled by the
following sysctls on linux
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
Today in gVisor we do not expose these sysctls but we do support setting the
equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor
can be used without netstack, with hostinet or even without any networking stack
at all. Which means ideally these sysctls need to live as globals in gVisor.
But rather than make this a big change for now we hardcode the limits in the
AF_UNIX implementation itself (which in itself is better than where we were
before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial
limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use
today.
Updates #5132
PiperOrigin-RevId: 356665498
|
|
We previously return EINVAL when connecting to port 0, however this is not the
observed behavior on Linux. One of the observable effects after connecting to
port 0 on Linux is that getpeername() will fail with ENOTCONN.
PiperOrigin-RevId: 356413451
|
|
Detect packet loss using reorder window and re-transmit them after the reorder
timer expires.
PiperOrigin-RevId: 356321786
|
|
- Adds a function to enable RACK in tests.
- RACK update functions are guarded behind the flag tcpRecovery.
PiperOrigin-RevId: 355435973
|
|
Rename HandleNDupAcks() to HandleLossDetected() as it will enter this when
is detected after:
- reorder window expires and TLP (in case of RACK)
- dupAckCount >= 3
PiperOrigin-RevId: 355237858
|
|
Netstack today will send dupACK's with no rate limit for incoming out of
window segments. This can result in ACK loops for example if a TCP socket
connects to itself (actually permitted by TCP). Where the ACK sent in
response to packets being out of order itself gets considered as an out
of window segment resulting in another ACK being generated.
PiperOrigin-RevId: 355206877
|
|
...to remove the need for the transport layer to deduce the type of
error it received.
Rename HandleControlPacket to HandleError as HandleControlPacket only
handles errors.
tcpip.SockError now holds a tcpip.SockErrorCause interface that
different errors can implement.
PiperOrigin-RevId: 354994306
|
|
The network endpoint should not need to have logic to handle different
kinds of neighbor tables. Network endpoints can let the NIC know about
differnt neighbor discovery messages and let the NIC decide which table
to update.
This allows us to remove the LinkAddressCache interface.
PiperOrigin-RevId: 354812584
|
|
After receiving an ACK(cumulative or selective), RACK will update the reorder
window which is used as a settling time before marking the packet as lost.
This change will add an init function to initialize the variables in RACK and
also store the reference to sender in rackControl.
The reorder window is calculated as per rfc:
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.2 Step 4.
PiperOrigin-RevId: 354453528
|
|
This makes it possible to add data to types that implement tcpip.Error.
ErrBadLinkEndpoint is removed as it is unused.
PiperOrigin-RevId: 354437314
|
|
Previously, sending on an unconnected UDP socket would ignore the
SO_BINDTODEVICE option. Send on the configured interface when an UDP socket
is bound to an interface through setsockop SO_BINDTODEVICE.
Add packetimpact tests exercising UDP reads and writes with every combination
of bound/unbound, broadcast/multicast/unicast destination, and bound/not-bound
to device.
PiperOrigin-RevId: 354299670
|
|
As per RFC 4861 section 7.3.1,
A neighbor is considered reachable if the node has recently received
a confirmation that packets sent recently to the neighbor were
received by its IP layer. Positive confirmation can be gathered in
two ways: hints from upper-layer protocols that indicate a connection
is making "forward progress", or receipt of a Neighbor Advertisement
message that is a response to a Neighbor Solicitation message.
This change adds support for TCP to let the IP/link layers know that a
neighbor is reachable.
Test: integration_test.TestTCPConfirmNeighborReachability
PiperOrigin-RevId: 354222833
|
|
This CL adds support for the following fields:
- RTT, RTTVar, RTO
- send congestion window (sndCwnd) and send slow start threshold (sndSsthresh)
- congestion control state(CaState)
- ReorderSeen
PiperOrigin-RevId: 354195361
|
|
- This CL will initialize the function handler used for getting the send
buffer size limits during endpoint creation and does not require the caller of
SetSendBufferSize(..) to know the endpoint type(tcp/udp/..)
PiperOrigin-RevId: 353992634
|
|
This improves type-assertion safety.
PiperOrigin-RevId: 353931228
|
|
connect() can be invoked multiple times on UDP/RAW sockets and in such
a case we should release the cached route from the previous connect.
Fixes #5359
PiperOrigin-RevId: 353919891
|
|
...as it is unused.
PiperOrigin-RevId: 353896981
|
|
This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For
unix sockets, we do not support setting of this option.
PiperOrigin-RevId: 353871484
|
|
Rewrite tcp.endpoint.Write to avoid manual locking and unlocking. This should
prevent similar mistakes in the future.
PiperOrigin-RevId: 353675734
|
|
Fixes #1509.
PiperOrigin-RevId: 353295589
|
|
The same intent can be specified via the io.Writer.
PiperOrigin-RevId: 352098747
|
|
This change implements TLP details enumerated in
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.2.
Fixes #5084
PiperOrigin-RevId: 352093473
|
|
We loop over the list of packets anyways so setting these aren't
expensive.
Now that they are populated only by the link endpoint that uses them,
TCP does not need to.
PiperOrigin-RevId: 352090853
|
|
Commit 25b5ec7 moved link address resolution out of the transport layer;
special handling of link address resolution is no longer necessary in tcp.
PiperOrigin-RevId: 351839254
|
|
Link address resolution is performed at the link layer (if required) so
we can defer it from the transport layer. When link resolution is
required, packets will be queued and sent once link resolution
completes. If link resolution fails, the transport layer will receive a
control message indicating that the stack failed to route the packet.
tcpip.Endpoint.Write no longer returns a channel now that writes do not
wait for link resolution at the transport layer.
tcpip.ErrNoLinkAddress is no longer used so it is removed.
Removed calls to stack.Route.ResolveWith from the transport layer so
that link resolution is performed when a route is created in response
to an incoming packet (e.g. to complete TCP handshakes or send a RST).
Tests:
- integration_test.TestForwarding
- integration_test.TestTCPLinkResolutionFailure
Fixes #4458
RELNOTES: n/a
PiperOrigin-RevId: 351684158
|
|
It is now composed by a NetworkInterface interface which lets us delete
the methods we don't need.
PiperOrigin-RevId: 351613267
|
|
This change implements TLP details enumerated in
https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.6
Fixes #5131
PiperOrigin-RevId: 351558449
|
|
When a control packet is delivered, it is delivered to a transport
endpoint with a matching stack.TransportEndpointID so there is no
need to pass the ID to the endpoint as it already knows its ID.
PiperOrigin-RevId: 351497588
|