Age | Commit message (Collapse) | Author |
|
Every call to sender.NextSeg does not need to iterate from the
front of the writeList as in a given recovery episode we can cache
the last nextSeg returned. There cannot be a lower sequenced segment
that matches the next call to NextSeg as otherwise we would have
returned that instead in the previous call.
This fixes the issue of excessive CPU usage w/ large send buffers
where we spend a lot of time iterating from the front of the list on
every NextSeg invocation.
Further the following other bugs were also fixed:
* Iteration of segments never sent in NextSeg() when looking for segments for
retransmission that match step1/3/4 of the NextSeg algorithm
* Correctly setting rescueRxt only if the rescue segment was actually sent.
* Correctly initializing rescueRxt/highRxt when entering SACK recovery.
* Correctly re-arming the timer only on retransmissions when SACK is in use
and not for every segment being sent as it was being done before.
* Copy over xmitTime and xmitCount on segment clone.
* Move writeNext along when skipping over SACKED segments. This is required
to prevent spurious retransmissions where we end up retransmitting data
that was never lost.
PiperOrigin-RevId: 310387671
|
|
As per RFC 1122 4.2.2.17, when the remote advertizes zero receive window,
the sender needs to probe for the window-size to become non-zero starting
from the next retransmission interval. The TCP connection needs to be kept
open as long as the remote is acknowledging the zero window probes.
We reuse the retransmission timers to support this.
Fixes #1644
PiperOrigin-RevId: 310021575
|
|
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
|
|
PiperOrigin-RevId: 309491861
|
|
Updates #231
PiperOrigin-RevId: 309323808
|
|
Poll for metric updates as immediately trying to read them can sometimes be
flaky if due to goroutine scheduling the check happens before the sender has got
a chance to update the corresponding sent metric.
PiperOrigin-RevId: 308712817
|
|
PiperOrigin-RevId: 308674219
|
|
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.
The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
PiperOrigin-RevId: 308163542
|
|
This change adds a layer of abstraction around the internal Docker APIs,
and eliminates all direct dependencies on Dockerfiles in the infrastructure.
A subsequent change will automated the generation of local images (with
efficient caching). Note that this change drops the use of bazel container
rules, as that experiment does not seem to be viable.
PiperOrigin-RevId: 308095430
|
|
Right now, sentry panics in this case:
panic: close of nil channel
goroutine 67 [running]:
pkg/tcpip/transport/tcp/tcp.(*endpoint).listen(0xc0000ce000, 0x9, 0x0)
pkg/tcpip/transport/tcp/endpoint.go:2208 +0x170
pkg/tcpip/transport/tcp/tcp.(*endpoint).Listen(0xc0000ce000, 0x9, 0xc0003a1ad0)
pkg/tcpip/transport/tcp/endpoint.go:2179 +0x50
Fixes #2468
PiperOrigin-RevId: 307896725
|
|
PiperOrigin-RevId: 307598974
|
|
PiperOrigin-RevId: 307477185
|
|
Fixed to match RFC 793 page 69.
Fixes #1607
PiperOrigin-RevId: 307334892
|
|
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.
The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
|
|
This previously changed in 305699233, but this behaviour turned out to
be load bearing.
PiperOrigin-RevId: 307033802
|
|
When the listening socket is read shutdown, we need to reset all pending
and incoming connections. Ensure that the endpoint is not cleaned up
from the demuxer and subsequent bind to same port does not go through.
PiperOrigin-RevId: 306958038
|
|
This change makes SynRcvdCountThreshold and the global synRcvdCount into a stack
configurable value. This is required because in cases like mod_proxy which
create multiple Stack instances the count will be a global value that impacts
all Stack instances.
Further the tests relied on modifying the global threshold to simulate tests
where we want to verify SYN cookie based behaviour. This lead to data races due
to the global being modified/read without locks or atomics.
PiperOrigin-RevId: 306947723
|
|
Remove useless casts and duplicate return statements.
PiperOrigin-RevId: 306627916
|
|
Attempt to redeliver TCP segments that are enqueued into a closing
TCP endpoint. This was being done for Established endpoints but not
for those that are listening or performing connection handshake.
Fixes #2417
PiperOrigin-RevId: 306598155
|
|
Tests now use a MinRTO of 3s instead of default 200ms. This reduced flakiness in
a lot of the congestion control/recovery tests which were flaky due to
retransmit timer firing too early in case the test executors were overloaded.
This change also bumps some of the timeouts in tests which were too sensitive to
timer variations and reduces the number of slow start iterations which can
make the tests run for too long and also trigger retansmit timeouts etc if
the executor is overloaded.
PiperOrigin-RevId: 306562645
|
|
PiperOrigin-RevId: 305807868
|
|
PiperOrigin-RevId: 305699233
|
|
Updates #2243
|
|
Updates #2243
|
|
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.
This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.
While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.
Updates #231
PiperOrigin-RevId: 304731742
|
|
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
|
|
PiperOrigin-RevId: 302924789
|
|
This allows the link layer endpoints to consistenly hash a TCP
segment to a single underlying queue in case a link layer endpoint
does support multiple underlying queues.
Updates #231
PiperOrigin-RevId: 302760664
|
|
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.
Updates #2214
PiperOrigin-RevId: 302677662
|
|
PiperOrigin-RevId: 302110328
|
|
PiperOrigin-RevId: 301859066
|
|
Updates #231, #357
PiperOrigin-RevId: 301833669
|
|
workMu is removed and e.mu is now a mutex that supports TryLock. The packet
processing path tries to lock the mutex and if its locked it will just queue the
packet and move on. The endpoint.UnlockUser() will process any backlog of
packets before unlocking the socket.
This simplifies the locking inside tcp endpoints a lot. Further the
endpoint.LockUser() implements spinning as long as the lock is not held by
another syscall goroutine. This ensures low latency as not spinning leads to the
task thread being put to sleep if the lock is held by the packet dispatch
path. This is suboptimal as the lower layer rarely holds the lock for long so
implementing spinning here helps.
If the lock is held by another task goroutine then we just proceed to call
LockUser() and the task could be put to sleep.
The protocol goroutines themselves just call e.mu.Lock() and block if the
lock is currently not available.
Updates #231, #357
PiperOrigin-RevId: 301808349
|
|
This will aid in segment reordering detection.
Updates #691
PiperOrigin-RevId: 301692638
|
|
PiperOrigin-RevId: 300467253
|
|
Atomically close the endpoint. Before this change, it was possible for
multiple callers to perform duplicate work.
PiperOrigin-RevId: 300462110
|
|
Endpoints which were being terminated in an ERROR state or were moved to CLOSED
by the worker goroutine do not run cleanupLocked() as that should already be run
by the worker termination. But when making that change we made the mistake of
not removing the endpoint from the danglingEndpoints which is normally done in
cleanupLocked().
As a result these endpoints are leaked since a reference is held to them in the
danglingEndpoints array forever till Stack is torn down.
PiperOrigin-RevId: 300438426
|
|
From RFC 793 s3.9 p61 Event Processing:
CLOSE Call during TIME-WAIT: return with "error: connection closing"
Fixes #1603
PiperOrigin-RevId: 299401353
|
|
By putting slices into the pool, the slice header escapes. This can be avoided
by not putting the slice header into the pool.
This removes an allocation from the TCP segment send path.
PiperOrigin-RevId: 299215480
|
|
Properly discard segments from the segment heap.
PiperOrigin-RevId: 298704074
|
|
Ensures that all access to TransportEndpointInfo.ID is either:
* In a function ending in a Locked suffix.
* While holding the appropriate mutex.
This primary affects the checkV4Mapped method on affected endpoints, which has
been renamed to checkV4MappedLocked. Also document the method and change its
argument to be a value instead of a pointer which had caused some awkwardness.
This race was possible in the udp and icmp endpoints between Connect and uses
of TransportEndpointInfo.ID including in both itself and Bind.
The tcp endpoint did not suffer from this bug, but benefited from better
documentation.
Updates #357
PiperOrigin-RevId: 298682913
|
|
PiperOrigin-RevId: 298451319
|
|
Call stack.Close on stacks when we are done with them in tcp_test. This avoids
leaking resources and reduces the test's flakiness when race/gotsan is enabled.
It also provides test coverage for the race also fixed in this change, which
can be reliably triggered with the stack.Close change (and without the other
changes) when race/gotsan is enabled.
The race was possible when calling Abort (via stack.Close) on an endpoint
processing a SYN segment as part of a passive connect.
Updates #1564
PiperOrigin-RevId: 297685432
|
|
PiperOrigin-RevId: 297638665
|
|
TestCurrentConnectedIncrement fails consistently under gotsan due to the sleep
to check metrics is exactly the same as the TIME-WAIT duration. Under gotsan
things can be slow enough that the increment test is done before the protocol
goroutine is run after the TIME-WAIT timer expires and does its cleanup.
Increasing the sleep from 1s to 1.2s makes the test pass consistently.
PiperOrigin-RevId: 297160181
|
|
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.
Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.
PiperOrigin-RevId: 296922548
|
|
Added the ability to get/set the IP_RECVTCLASS socket option on UDP endpoints.
If enabled, traffic class from the incoming Network Header passed as ancillary
data in the ControlMessages.
Adding Get/SetSockOptBool to decrease the overhead of getting/setting simple
options. (This was absorbed in a CL that will be landing before this one).
Test:
* Added unit test to udp_test.go that tests getting/setting as well as
verifying that we receive expected TOS from incoming packet.
* Added a syscall test for verifying getting/setting
* Removed test skip for existing syscall test to enable end to end test.
PiperOrigin-RevId: 295840218
|
|
PiperOrigin-RevId: 294952610
|
|
These were out-of-band notes that can help provide additional context
and simplify automated imports.
PiperOrigin-RevId: 293525915
|
|
From RFC 793 s3.9 p58 Event Processing:
If RECEIVE Call arrives in CLOSED state and the user has access to such a
connection, the return should be "error: connection does not exist"
Fixes #1598
PiperOrigin-RevId: 293494287
|