Age | Commit message (Collapse) | Author |
|
If the ACK completing the handshake has FIN or data, requeue the segment
for further processing by the newly established endpoint. Otherwise,
the segments would have to be retransmitted by the peer to be processed
by the established endpoint. Doing this, keeps the behavior in parity
with Linux.
This also addresses a test flake with TCPNonBlockingConnectClose where
the ACK (completing the handshake) and multiple retransmitted FINACKs
from the peer could be dropped by the listener, when using syncookies
and the accept queue is full. The handshake could eventually get
completed with a retransmitted FINACK, without actual processing of
FIN. This can cause the poll with POLLRDHUP on the accepted socket to
sometimes time out before the next FINACK retransmission.
PiperOrigin-RevId: 377651695
|
|
Address a race with non-blocking connect and socket close, causing the
FIN (because of socket close) to not be sent out, even after completing
the handshake.
The race occurs with this sequence:
(1) endpoint Connect starts handshake, sending out SYN
(2) handshake complete() releases endpoint lock, waiting on sleeper.Fetch()
(3) endpoint Close acquires endpoint lock, does not enqueue FIN (as the
endpoint is not yet connected) and asserts notifyClose
(4) SYNACK from peer gets enqueued asserting newSegmentWaker
(5) handshake complete() re-aqcuires lock, first processes newSegmentWaker
event, transitions to ESTABLISHED and proceeds to protocolMainLoop()
(6) protocolMainLoop() exits while processing notifyClose
When the execution follows the above sequence, no FIN is sent to the peer.
This causes the listener side to have a half-open connection sitting in
the accept queue.
Fix this by ensuring that the protocolMainLoop() performs clean shutdown
when the endpoint state is still ESTABLISHED.
This would not be a bug, if during handshake complete(), sleeper.Fetch()
prioritized notificationWaker over newSegmentWaker. In that case, the
handshake would not have completed in (5) above.
Fixes #6067
PiperOrigin-RevId: 376994395
|
|
The current implementation has a bug where TCP listener does not ignore
RSTs from the peer. While handling RST+ACK from the peer, this bug can
complete handshakes that use syncookies. This results in half-open
connection delivered to the accept queue.
Fixes #6076
PiperOrigin-RevId: 376868749
|
|
Adds support for the SO_BINDTODEVICE socket option in ICMP sockets with an
accompanying packetimpact test to exercise use of this socket option.
Adds a unit test to exercise the NIC selection logic introduced by this change.
The remaining unit tests for ICMP sockets need to be added in a subsequent CL.
See https://gvisor.dev/issues/5623 for the list of remaining unit tests.
Adds a "timeout" field to PacketimpactTestInfo, necessary due to the long
runtime of the newly added packetimpact test.
Fixes #5678
Fixes #4896
Updates #5623
Updates #5681
Updates #5763
Updates #5956
Updates #5966
Updates #5967
PiperOrigin-RevId: 376271581
|
|
...except TCP tests and NDP tests that mutate globals. These will be
undertaken later.
Updates #5940.
PiperOrigin-RevId: 376145608
|
|
...except in tests.
Note this replaces some uses of a cryptographic RNG with a plain RNG.
PiperOrigin-RevId: 376070666
|
|
PiperOrigin-RevId: 376001032
|
|
Updates #5939.
Updates #6012.
RELNOTES: n/a
PiperOrigin-RevId: 375931554
|
|
Introduce tcpip.MonotonicTime; replace int64 in tcpip.Clock method
returns with time.Time and MonotonicTime to improve type safety and
ensure that monotonic clock readings are never compared to wall clock
readings.
PiperOrigin-RevId: 375775907
|
|
- Unused constants
- Unused functions
- Unused arguments
- Unkeyed literals
- Unnecessary conversions
PiperOrigin-RevId: 375253464
|
|
This change also includes miscellaneous improvements:
* UnknownProtocolRcvdPackets has been separated into two stats, to
specify at which layer the unknown protocol was found (L3 or L4)
* MalformedRcvdPacket is not aggregated across every endpoint anymore.
Doing it this way did not add useful information, and it was also error-prone
(example: ipv6 forgot to increment this aggregated stat, it only
incremented its own ipv6.MalformedPacketsReceived). It is now only incremented
the NIC.
* Removed TestStatsString test which was outdated and had no real
utility.
PiperOrigin-RevId: 375057472
|
|
Add missing protocol state to TCPINFO struct and update packetimpact.
This re-arranges the TCP state definitions to align with Linux.
Fixes #478
PiperOrigin-RevId: 374996751
|
|
When recovering from a zero-receive-window situation, and asked to
send out an ACK, ensure that we apply SWS avoidance in our window
updates.
Fixes #5984
PiperOrigin-RevId: 373689578
|
|
This code path is for outgoing packets, and we don't currently do memory
accounting on this path. So it wasn't breaking anything.
This change did not add a test for ref-counting issue fixed, but will switch to
the leak-checking ref-counter later when all ref-counting issues are fixed.
PiperOrigin-RevId: 373447913
|
|
On receiving an ICMP error during handshake, the error is propagated
by reading `endpoint.lastError`. This can race with the socket layer
invoking getsockopt() with SO_ERROR where the same value is read and
cleared, causing the handshake to bail out with a non-error state.
Fix the race by checking for lastError state and failing the
handshake with ErrConnectionAborted if the lastError was read and
cleared by say SO_ERROR.
The race mentioned in the bug, is caught only with the newly added
tcp_test unit test, where we have control over stopping/resuming
protocol loop. Adding a packetimpact test as well for sanity testing
of ICMP error handling during handshake.
Fixes #5922
PiperOrigin-RevId: 372135662
|
|
Listen backlog value is 1 more than what is configured by the socket
layer listen call. TestListenBacklogFull expects this behavior which
is incorrect as it directly invokes endpoint Listen and with
cl/369974744, backlog++ logic is moved to the callers of Listen().
This test passes sometimes, because the handshakes could overlap causing
the last SYN to arrive at the listener before the previous handshake is
enqueued to the accept queue. In such a case the accept queue is still
not full and the SYN is replied to. The final ACK of this last handshake
would get dropped eventually.
PiperOrigin-RevId: 372041827
|
|
PiperOrigin-RevId: 372021039
|
|
...as all GSO capable endpoints must implement GSOEndpoint.
PiperOrigin-RevId: 371804175
|
|
PiperOrigin-RevId: 371231148
|
|
In https://github.com/google/gvisor/commit/f075522849fa a check to increase zero
to a minimum backlog length was removed from sys_socket.go to bring it in parity
with linux and then in tcp/endpoint.go we bump backlog by 1. But this broke
calling listen on a AF_UNIX socket w/ a zero backlog as in linux it does allow 1
connection even with a zero backlog.
This was caught by a php runtime test socket_abstract_path.phpt.
PiperOrigin-RevId: 369974744
|
|
With this change, GSO options no longer needs to be passed around as
a function argument in the write path.
This change is done in preparation for a later change that defers
segmentation, and may change GSO options for a packet as it flows
down the stack.
Updates #170.
PiperOrigin-RevId: 369774872
|
|
Fixes #2926, #674
PiperOrigin-RevId: 369457123
|
|
This is done for IPv4, UDP and TCP headers.
This also changes the packet checkers used in tests to error on
zero-checksum, not sure why it was allowed before.
And while I'm here, make comments' case consistent.
RELNOTES: n/a
Fixes #5049
PiperOrigin-RevId: 369383862
|
|
This change replaces individual private members in tcp.endpoint with a single
private TCPEndpointState member.
Some internal substructures within endpoint (receiver, sender) have been broken
into a public substructure (which is then copied into the TCPEndpointState
returned from completeState()) alongside other private fields.
Fixes #4466
PiperOrigin-RevId: 369329514
|
|
- Added delay to increase the RTT: In DSACK tests with RACK enabled and low
RTT, TLP can be detected before sending ACK and the tests flake. Increasing
the RTT will ensure that TLP does not happen before the ACK is sent.
- Fix TestRACKOnePacketTailLoss: The ACK does not contain DSACK, which means
either the original or retransmission (probe) was lost and SACKRecovery count
must be incremented.
Before: http://sponge2/c9bd51de-f72f-481c-a7f3-e782e7524883
After: http://sponge2/1307a796-103a-4a45-b699-e8d239220ed1
PiperOrigin-RevId: 369305720
|
|
Also count failed TCP port allocations
PiperOrigin-RevId: 368939619
|
|
Reduce the ephemeral port range, which decreases the calls to makeEP.
PiperOrigin-RevId: 368748379
|
|
This was semi-automated -- there are many addresses that were not replaced.
Future commits should clean those up.
Parse4 and Parse6 were given their own package because //pkg/test can introduce
dependency cycles, as it depends transitively on //pkg/tcpip and some other
netstack packages.
PiperOrigin-RevId: 368726528
|
|
Fix a race where the ACK completing the handshake can be dropped by
a closing listener without RST to the peer. The listener close would
reset the accepted queue and that causes the connecting endpoint
in SYNRCVD state to drop the ACK thinking the queue if filled up.
PiperOrigin-RevId: 368165509
|
|
Holding this lock can cause the user's callback to deadlock if it
attempts to inspect the accept queue.
PiperOrigin-RevId: 368068334
|
|
Some other cleanup while I'm here:
- Remove unused arguments
- Handle some unhandled errors
- Remove redundant casts
- Remove redundant parens
- Avoid shadowing `hash` package name
PiperOrigin-RevId: 367816161
|
|
Use a linked list with cached length and capacity. The current channel
is already composed with a mutex and condition variable, and is never
used for its channel-like properties. Channels also require eager
allocation equal to their capacity, which a linked list does not.
PiperOrigin-RevId: 367766626
|
|
Move maxListenBacklog check to the caller of endpoint Listen so that it
is applicable to Unix domain sockets as well.
This was changed in cl/366935921.
Reported-by: syzbot+a35ae7cdfdde0c41cf7a@syzkaller.appspotmail.com
PiperOrigin-RevId: 367728052
|
|
Both code paths perform this check; extract it and remove the comment
that suggests it is unique to one of the paths.
PiperOrigin-RevId: 367666160
|
|
Both callers of this function still drop this error on the floor, but
progress is progress.
Updates #4690.
PiperOrigin-RevId: 367604788
|
|
- Change the accept queue full condition for a listening endpoint
to only honor completed (and delivered) connections.
- Use syncookies if the number of incomplete connections is beyond
listen backlog. This also cleans up the SynThreshold option code
as that is no longer used with this change.
- Added a new stack option to unconditionally generate syncookies.
Similar to sysctl -w net.ipv4.tcp_syncookies=2 on Linux.
- Enable keeping of incomplete connections beyond listen backlog.
- Drop incoming SYNs only if the accept queue is filled up.
- Drop incoming ACKs that complete handshakes when accept queue is full
- Enable the stack to accept one more connection than programmed by
listen backlog.
- Handle backlog argument being zero, negative for listen, as Linux.
- Add syscall and packetimpact tests to reflect the changes above.
- Remove TCPConnectBacklog test which is polling for completed
connections on the client side which is not reflective of whether
the accept queue is filled up by the test. The modified syscall test
in this CL addresses testing of connecting sockets.
Fixes #3153
PiperOrigin-RevId: 366935921
|
|
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather
than hack these on in sys_poll etc it felt cleaner to just cleanup
the call sites to notify for both events. This is what linux does
as well.
Fixes #5544
PiperOrigin-RevId: 364859977
|
|
This change sets the inner `routeInfo` struct to be a named private member
and replaces direct access with access through getters. Note that direct
access to the fields of `routeInfo` is still possible through the `RouteInfo`
struct.
Fixes #4902
PiperOrigin-RevId: 364822872
|
|
PiperOrigin-RevId: 364596526
|
|
Transport demuxer and UDP tests should not use a loopback address as the
source address for packets injected into the stack as martian loopback
packets will be dropped in a later change.
PiperOrigin-RevId: 363479681
|
|
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT
unconditionally. Fixing the state machine will give us back the retransmission
of FIN.
PiperOrigin-RevId: 363301883
|
|
There is a race in handling new incoming connections on a listening
endpoint that causes the endpoint to reply to more incoming SYNs than
what is permitted by the listen backlog.
The race occurs when there is a successful passive connection handshake
and the synRcvdCount counter is decremented, followed by the endpoint
delivered to the accept queue. In the window of time between
synRcvdCount decrementing and the endpoint being enqueued for accept,
new incoming SYNs can be handled without honoring the listen backlog
value, as the backlog could be perceived not full.
Fixes #5637
PiperOrigin-RevId: 363279372
|
|
Doing so involved breaking dependencies between //pkg/tcpip and the rest
of gVisor, which are discouraged anyways.
Tested on the Go branch via:
gvisor.dev/gvisor/pkg/tcpip/...
Addresses #1446.
PiperOrigin-RevId: 363081778
|
|
Lots of small changes:
- simplify package API via Reservation type
- rename some single-letter variable names that were hard to follow
- rename some types
PiperOrigin-RevId: 362442366
|
|
- Implement Stringer for it so that we can improve error messages.
- Use TCPFlags through the code base. There used to be a mixed usage of byte,
uint8 and int as TCP flags.
PiperOrigin-RevId: 361940150
|
|
Speeds up the socket stress tests by a couple orders of magnitude.
PiperOrigin-RevId: 361721050
|
|
Updates #5597
PiperOrigin-RevId: 361252003
|
|
The only user is in (*handshake).complete and it specifies MaxRTO, so there is
no behavior changes.
PiperOrigin-RevId: 360954447
|
|
One of the preparation to decouple underlying buffer implementation.
There are still some methods that tie to VectorisedView, and they will be
changed gradually in later CLs.
This CL also introduce a new ICMPv6ChecksumParams to replace long list of
parameters when calling ICMPv6Checksum, aiming to be more descriptive.
PiperOrigin-RevId: 360778149
|
|
This validates that struct fields if annotated with "// checklocks:mu" where
"mu" is a mutex field in the same struct then access to the field is only
done with "mu" locked.
All types that are guarded by a mutex must be annotated with
// +checklocks:<mutex field name>
For more details please refer to README.md.
PiperOrigin-RevId: 360729328
|