Age | Commit message (Collapse) | Author |
|
This fixes a race that occurs while the endpoint is being unregistered
and the transport demuxer attempts to match the incoming packet to any
endpoint. The race specifically occurs when the unregistration (and
deletion of the endpoint) occurs, after a successful endpointsByNIC
lookup and before the endpoints map is further looked up with ingress
NICID of the packet.
The fix is to notify the caller of lookup-with-NICID failure, so that
the logic falls through to handling unknown destination packets.
For TCP this can mean replying back with RST.
The syscall test in this CL catches this race as the ACK completing the
handshake could get silently dropped on a listener close, causing no
RST sent to the peer and timing out the poll waiting for POLLHUP.
Fixes #5850
PiperOrigin-RevId: 369023779
|
|
Also count failed TCP port allocations
PiperOrigin-RevId: 368939619
|
|
Fields allow counter metrics to have multiple tabular values.
At most one field is supported at the moment.
PiperOrigin-RevId: 368767040
|
|
Reduce the ephemeral port range, which decreases the calls to makeEP.
PiperOrigin-RevId: 368748379
|
|
This was semi-automated -- there are many addresses that were not replaced.
Future commits should clean those up.
Parse4 and Parse6 were given their own package because //pkg/test can introduce
dependency cycles, as it depends transitively on //pkg/tcpip and some other
netstack packages.
PiperOrigin-RevId: 368726528
|
|
Netstack is supposed to be somewhat independent of the rest of gVisor, and
others should be able to use it without pulling in excessive dependencies.
Currently, there is no way to fight dependency creep besides careful code
review.
This change introduces a test rule `netstack_deps_check` that ensures the target
only relies on gVisor targets and a short allowlist of external dependencies.
Users who add a dependency will see an error and have to manually update the
allowlist.
The set of packages to test comes from //runsc, as it uses packages we would
expect users to commonly rely on. It was generated via:
$ find ./runsc -name BUILD | xargs grep tcpip | awk '{print $2}' | sort | uniq
(Note: We considered giving //pkg/tcpip it's own go.mod, but this breaks go
tooling.)
PiperOrigin-RevId: 368456711
|
|
Fix a race where the ACK completing the handshake can be dropped by
a closing listener without RST to the peer. The listener close would
reset the accepted queue and that causes the connecting endpoint
in SYNRCVD state to drop the ACK thinking the queue if filled up.
PiperOrigin-RevId: 368165509
|
|
Holding this lock can cause the user's callback to deadlock if it
attempts to inspect the accept queue.
PiperOrigin-RevId: 368068334
|
|
Some other cleanup while I'm here:
- Remove unused arguments
- Handle some unhandled errors
- Remove redundant casts
- Remove redundant parens
- Avoid shadowing `hash` package name
PiperOrigin-RevId: 367816161
|
|
Use a linked list with cached length and capacity. The current channel
is already composed with a mutex and condition variable, and is never
used for its channel-like properties. Channels also require eager
allocation equal to their capacity, which a linked list does not.
PiperOrigin-RevId: 367766626
|
|
The current SNAT implementation has several limitations:
- SNAT source port has to be specified. It is not optional.
- SNAT source port range is not supported.
- SNAT for UDP is a one-way translation. No response packets
are handled (because conntrack doesn't support UDP currently).
- SNAT and REDIRECT can't work on the same connection.
Fixes #5489
PiperOrigin-RevId: 367750325
|
|
Move maxListenBacklog check to the caller of endpoint Listen so that it
is applicable to Unix domain sockets as well.
This was changed in cl/366935921.
Reported-by: syzbot+a35ae7cdfdde0c41cf7a@syzkaller.appspotmail.com
PiperOrigin-RevId: 367728052
|
|
To match the V4 variant.
PiperOrigin-RevId: 367691981
|
|
Both code paths perform this check; extract it and remove the comment
that suggests it is unique to one of the paths.
PiperOrigin-RevId: 367666160
|
|
Both callers of this function still drop this error on the floor, but
progress is progress.
Updates #4690.
PiperOrigin-RevId: 367604788
|
|
As per RFC 3927 section 7 and RFC 4291 section 2.5.6.
Test: forward_test.TestMulticastForwarding
PiperOrigin-RevId: 367519336
|
|
See comments inline code for rationale.
Test: ip_test.TestJoinLeaveAllRoutersGroup
PiperOrigin-RevId: 367449434
|
|
...as per RFC 2710 section 5 page 10.
Test: ipv6_test.TestMLDSkipProtocol
PiperOrigin-RevId: 367031126
|
|
- Change the accept queue full condition for a listening endpoint
to only honor completed (and delivered) connections.
- Use syncookies if the number of incomplete connections is beyond
listen backlog. This also cleans up the SynThreshold option code
as that is no longer used with this change.
- Added a new stack option to unconditionally generate syncookies.
Similar to sysctl -w net.ipv4.tcp_syncookies=2 on Linux.
- Enable keeping of incomplete connections beyond listen backlog.
- Drop incoming SYNs only if the accept queue is filled up.
- Drop incoming ACKs that complete handshakes when accept queue is full
- Enable the stack to accept one more connection than programmed by
listen backlog.
- Handle backlog argument being zero, negative for listen, as Linux.
- Add syscall and packetimpact tests to reflect the changes above.
- Remove TCPConnectBacklog test which is polling for completed
connections on the client side which is not reflective of whether
the accept queue is filled up by the test. The modified syscall test
in this CL addresses testing of connecting sockets.
Fixes #3153
PiperOrigin-RevId: 366935921
|
|
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather
than hack these on in sys_poll etc it felt cleaner to just cleanup
the call sites to notify for both events. This is what linux does
as well.
Fixes #5544
PiperOrigin-RevId: 364859977
|
|
PiperOrigin-RevId: 364859173
|
|
This change sets the inner `routeInfo` struct to be a named private member
and replaces direct access with access through getters. Note that direct
access to the fields of `routeInfo` is still possible through the `RouteInfo`
struct.
Fixes #4902
PiperOrigin-RevId: 364822872
|
|
PiperOrigin-RevId: 364596526
|
|
...instead of opting out of them.
Loopback traffic should be stack-local but gVisor has some clients
that depend on the ability to receive loopback traffic that originated
from outside of the stack. Because of this, we guard this change behind
IP protocol options.
A previous change provided the facility to deny these martian loopback
packets but this change requires client to opt-in to accepting martian
loopback packets as accepting martian loopback packets are not meant
to be accepted, as per RFC 1122 section 3.2.1.3.g:
(g) { 127, <any> }
Internal host loopback address. Addresses of this form
MUST NOT appear outside a host.
PiperOrigin-RevId: 364581174
|
|
PiperOrigin-RevId: 364381970
|
|
Transport demuxer and UDP tests should not use a loopback address as the
source address for packets injected into the stack as martian loopback
packets will be dropped in a later change.
PiperOrigin-RevId: 363479681
|
|
Loopback traffic should be stack-local but gVisor has some clients
that depend on the ability to receive loopback traffic that originated
from outside of the stack. Because of this, we guard this change behind
IP protocol options.
Test: integration_test.TestExternalLoopbackTraffic
PiperOrigin-RevId: 363461242
|
|
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT
unconditionally. Fixing the state machine will give us back the retransmission
of FIN.
PiperOrigin-RevId: 363301883
|
|
There is a race in handling new incoming connections on a listening
endpoint that causes the endpoint to reply to more incoming SYNs than
what is permitted by the listen backlog.
The race occurs when there is a successful passive connection handshake
and the synRcvdCount counter is decremented, followed by the endpoint
delivered to the accept queue. In the window of time between
synRcvdCount decrementing and the endpoint being enqueued for accept,
new incoming SYNs can be handled without honoring the listen backlog
value, as the backlog could be perceived not full.
Fixes #5637
PiperOrigin-RevId: 363279372
|
|
They are not used outside of the header package.
PiperOrigin-RevId: 363237708
|
|
...as per RFC 7527.
If a looped-back DAD message is received, do not fail DAD since our own
DAD message does not indicate that a neighbor has the address assigned.
Test: ndp_test.TestDADResolveLoopback
PiperOrigin-RevId: 363224288
|
|
Calling into the stack from LinkAddressRequest is not needed as we
already have a reference to the network endpoint (IPv6) or network
interface (IPv4/ARP).
PiperOrigin-RevId: 363213973
|
|
Doing so involved breaking dependencies between //pkg/tcpip and the rest
of gVisor, which are discouraged anyways.
Tested on the Go branch via:
gvisor.dev/gvisor/pkg/tcpip/...
Addresses #1446.
PiperOrigin-RevId: 363081778
|
|
Lots of small changes:
- simplify package API via Reservation type
- rename some single-letter variable names that were hard to follow
- rename some types
PiperOrigin-RevId: 362442366
|
|
- Implement Stringer for it so that we can improve error messages.
- Use TCPFlags through the code base. There used to be a mixed usage of byte,
uint8 and int as TCP flags.
PiperOrigin-RevId: 361940150
|
|
Speeds up the socket stress tests by a couple orders of magnitude.
PiperOrigin-RevId: 361721050
|
|
Updates #5597
PiperOrigin-RevId: 361252003
|
|
IPv4 would violate the lock ordering of protocol > endpoint when closing
network endpoints by calling `ipv4.protocol.forgetEndpoint` while
holding the network endpoint lock.
PiperOrigin-RevId: 361232817
|
|
The integrator may be interested in who owns a duplicate address so
pass this information (if available) along.
Fixes #5605.
PiperOrigin-RevId: 361213556
|
|
While I'm here, update NDPDispatcher.OnDuplicateAddressDetectionStatus to
take a DADResult and rename it to OnDuplicateAddressDetectionResult.
Fixes #5606.
PiperOrigin-RevId: 360965416
|
|
The only user is in (*handshake).complete and it specifies MaxRTO, so there is
no behavior changes.
PiperOrigin-RevId: 360954447
|
|
clientEP.Connect may fail because serverEP was not listening.
PiperOrigin-RevId: 360780667
|
|
One of the preparation to decouple underlying buffer implementation.
There are still some methods that tie to VectorisedView, and they will be
changed gradually in later CLs.
This CL also introduce a new ICMPv6ChecksumParams to replace long list of
parameters when calling ICMPv6Checksum, aiming to be more descriptive.
PiperOrigin-RevId: 360778149
|
|
Changes the neighbor_cache_test.go tests to always assert UpdatedAtNanos.
Completes the assertion of UpdatedAtNanos in every NUD test, a field that was
historically not checked due to the lack of a deterministic, controllable
clock. This is no longer true with the tcpip.Clock interface. While the tests
have been adjusted to use Clock, asserting by the UpdatedAtNanos was neglected.
Fixes #4663
PiperOrigin-RevId: 360730077
|
|
This validates that struct fields if annotated with "// checklocks:mu" where
"mu" is a mutex field in the same struct then access to the field is only
done with "mu" locked.
All types that are guarded by a mutex must be annotated with
// +checklocks:<mutex field name>
For more details please refer to README.md.
PiperOrigin-RevId: 360729328
|
|
While I'm here, simplify the comments and unify naming of certain stats
across protocols.
PiperOrigin-RevId: 360728849
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in the following places:
- pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities
are not yet available in golang.org/x/sys.
- syscall.Stat_t is still used in some places because os.FileInfo.Sys() still
returns it and not unix.Stat_t.
Updates #214
PiperOrigin-RevId: 360701387
|
|
Prevent the situation where callers to (*stack).GetLinkAddress provide
incorrect arguments and are unable to observe this condition.
Updates #5583.
PiperOrigin-RevId: 360481557
|
|
io.Reader.ReadFull returns the number of bytes copied and an error if fewer
bytes were read.
PiperOrigin-RevId: 360247614
|
|
There is a short race where in Write an endpoint can transition from writable
to non-writable state due to say an incoming RST during the time we release
the endpoint lock and reacquire after copying the payload. In such a case
if the write happens to be a zero sized write we end up trying to call
sendData() even though nothing was queued.
This can panic when trying to enable/disable TCP timers if the endpoint had
already transitioned to a CLOSED/ERROR state due to the incoming RST as we
cleanup timers when the protocol goroutine terminates.
Sadly the race window is small enough that my attempts at reproducing the panic
in a syscall test has not been successful.
PiperOrigin-RevId: 359887905
|