Age | Commit message (Collapse) | Author |
|
Setting the ToS for IPv4 packets (SOL_IP, IP_TOS) should not affect the
Traffic Class of IPv6 packets (SOL_IPV6, IPV6_TCLASS).
Also only return the ToS value XOR Traffic Class as a packet cannot be
both an IPv4 and an IPv6 packet; It is invalid to return both the IPv4
ToS and IPv6 Traffic Class control messages when reading packets.
Updates #6389.
PiperOrigin-RevId: 396399096
|
|
Previously, gVisor did not represent loopback devices as an ethernet
device as Linux does. To maintain Linux API compatibility for packet
sockets, a workaround was used to add an ethernet header if a link
header was not already present in the packet buffer delivered to a
packet endpoint.
However, this workaround is a bug for non-ethernet based interfaces; not
all links use an ethernet header (e.g. pure L3/TUN interfaces).
As of 3b4bb947517d0d9010120aaa1c3989fd6abf278e, gVisor represents
loopback devices as an ethernet-based device so this workaround can
now be removed.
BUG: https://fxbug.dev/81592
Updates #6530, #6531.
PiperOrigin-RevId: 395819151
|
|
PiperOrigin-RevId: 395325998
|
|
...through the loopback interface, only.
This change only supports sending on packet sockets through the loopback
interface as the loopback interface is the only interface used in packet
socket syscall tests - the other link endpoints are not excercised with
the existing test infrastructure.
Support for sending on packet sockets through the other interfaces will
be added as needed.
BUG: https://fxbug.dev/81592
PiperOrigin-RevId: 394368899
|
|
For a small receive buffer the first out-of-order segment will get accepted and
fill up the receive buffer today. This change now includes the size of the
out-of-order segment when checking whether to queue the out of order segment or
not.
PiperOrigin-RevId: 394351309
|
|
...from the UDP endpoint.
Datagram-based transport endpoints (e.g. UDP, RAW IP) can share a lot
of their write path due to the datagram-based nature of these endpoints.
Extract the common facilities from UDP so they can be shared with other
transport endpoints (in a later change).
Test: UDP syscall tests.
PiperOrigin-RevId: 394347774
|
|
PiperOrigin-RevId: 393808461
|
|
Remove freestanding functions that convert time values to raw integers;
centralize time->uint32 logic in methods on tcp.endpoint. Importantly,
the knowledge that TSVal is in milliseconds now lives in adjacent
functions rather than being spread around various files.
Incidental cleanup:
- Remove unused constant
- Remove redundant conversion
- Remove redundant parentheses
- Add missing error check
PiperOrigin-RevId: 393184768
|
|
PiperOrigin-RevId: 393104589
|
|
PiperOrigin-RevId: 393100095
|
|
PiperOrigin-RevId: 393095246
|
|
.. by advancing the clock so that NowMonotonic does not return 0.
PiperOrigin-RevId: 393005373
|
|
PiperOrigin-RevId: 393004533
|
|
Some tcp unit tests are affected by this change:
- Some retransmission tests assumed RTO=1s when connection is established. This
is no longer true because minRTO was set to 3s in tests so now RTO becomes 3s
after the first updateRTO call. Set minRTO=1s for these tests.
- Some RACK enabled tests are affected because now that RTT is initialized, and
the estimated RTT is quite small, spurious TLP might be sent out and causing
flakes, introduce an artificial delay for these tests so that the estimated
RTT is larger.
PiperOrigin-RevId: 392768725
|
|
...to match Linux behaviour.
We can see evidence of Linux representing loopback as an ethernet-based
device below:
```
# EUI-48 based MAC addresses.
$ ip link show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# tcpdump showing ethernet frames when sniffing loopback and logging the
# link-type as EN10MB (Ethernet).
$ sudo tcpdump -i lo -e -c 2 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0
03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0
2 packets captured
116 packets received by filter
0 packets dropped by kernel
```
Wireshark shows a similar result as the tcpdump example above.
Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162
PiperOrigin-RevId: 391836719
|
|
Also fix an option parsing error in checker.TCPTimestampChecker while I am here.
PiperOrigin-RevId: 391828329
|
|
Use different secrets for different purposes (port picking,
ISN generation, tsOffset generation) and moved the secrets
from stack.Stack to tcp.protocol.
PiperOrigin-RevId: 391641238
|
|
tcpip.Endpoint.Close is documented to free all resources associated
with an endpoint so we don't need to create an empty map to clear
the multicast memberships.
PiperOrigin-RevId: 390609826
|
|
Send buffer size in TCP indicates the amount of bytes available for the sender
to transmit. This change will allow TCP to update the send buffer size when
- TCP enters established state.
- ACK is received.
The auto tuning is disabled when the send buffer size is set with the
SO_SNDBUF option.
PiperOrigin-RevId: 390312274
|
|
- Creates new metric "/tcp/segments_acked_with_dsack" to count the number of
segments acked with DSACK.
- Added check to verify the metric is getting incremented when a DSACK is sent
in the unit tests.
PiperOrigin-RevId: 386135949
|
|
PiperOrigin-RevId: 385944428
|
|
PiperOrigin-RevId: 385940836
|
|
PiperOrigin-RevId: 385894869
|
|
PiperOrigin-RevId: 384776517
|
|
- Keeps Linux-specific behavior out of //pkg/tcpip
- Makes it clearer that clamping is done only for setsockopt calls from users
- Removes code duplication
PiperOrigin-RevId: 384389809
|
|
Commit 16b751b6c610ec2c5a913cb8a818e9239ee7da71 introduced a bug where writes of
zero size would end up queueing a zero sized segment which will cause the
sandbox to panic when trying to send a zero sized segment(e.g. after an RTO) as
netstack asserts that the all non FIN segments have size > 0.
This change adds the check for a zero sized payload back to avoid queueing
such segments. The associated test panics without the fix and passes with it.
PiperOrigin-RevId: 383677884
|
|
This change makes the checklocks analyzer considerable more powerful, adding:
* The ability to traverse complex structures, e.g. to have multiple nested
fields as part of the annotation.
* The ability to resolve simple anonymous functions and closures, and perform
lock analysis across these invocations. This does not apply to closures that
are passed elsewhere, since it is not possible to know the context in which
they might be invoked.
* The ability to annotate return values in addition to receivers and other
parameters, with the same complex structures noted above.
* Ignoring locking semantics for "fresh" objects, i.e. objects that are
allocated in the local frame (typically a new-style function).
* Sanity checking of locking state across block transitions and returns, to
ensure that no unexpected locks are held.
Note that initially, most of these findings are excluded by a comprehensive
nogo.yaml. The findings that are included are fundamental lock violations.
The changes here should be relatively low risk, minor refactorings to either
include necessary annotations to simplify the code structure (in general
removing closures in favor of methods) so that the analyzer can be easily
track the lock state.
This change additional includes two changes to nogo itself:
* Sanity checking of all types to ensure that the binary and ast-derived
types have a consistent objectpath, to prevent the bug above from occurring
silently (and causing much confusion). This also requires a trick in
order to ensure that serialized facts are consumable downstream. This can
be removed with https://go-review.googlesource.com/c/tools/+/331789 merged.
* A minor refactoring to isolation the objdump settings in its own package.
This was originally used to implement the sanity check above, but this
information is now being passed another way. The minor refactor is preserved
however, since it cleans up the code slightly and is minimal risk.
PiperOrigin-RevId: 382613300
|
|
There was a race wherein Accept() could fail, then the handshake would complete,
and then a waiter would be created to listen for the handshake. In such cases,
no notification was ever sent and the test timed out.
PiperOrigin-RevId: 381913041
|
|
sndQueue made sense when the worker goroutine and the syscall context held
different locks. Now both lock the endpoint lock before doing anything which
means adding to sndQueue is pointless as we move it to writeList immediately
after that in endpoint.Write() by calling e.drainSendQueue.
PiperOrigin-RevId: 381523177
|
|
PiperOrigin-RevId: 380967023
|
|
There are unnecessarily short timeouts in several places.
Note: a later change will switch tcp_test to fake clocks intead of the built-in
`time` package.
PiperOrigin-RevId: 380935400
|
|
tcpdump is largely supported. We've also chose not to implement writeable
AF_PACKET sockets, and there's a bug specifically for promiscuous mode (#3333).
Fixes #173.
PiperOrigin-RevId: 380733686
|
|
It was possible for a SYN to arrive after the endpoint sent an ACK as part of
the transition to TIME-WAIT, but before returning from handleSegmentsLocked().
This caused the SYN to be dequeued and ACK'd despite the change in
EndpointState.
Deflakes TestTCPTimeWaitNewSyn.
Tested with:
blaze test --config=gotsan --runs_per_test 10000 \
//third_party/gvisor/pkg/tcpip/transport/tcp:tcp_x_test -j 2000 \
// --test_filter TestTCPTimeWaitNewSyn
PiperOrigin-RevId: 380639808
|
|
Also makes the behavior of raw sockets WRT fragmentation clearer, and makes the
ICMPv4 header-length check explicit.
Fixes #3160.
PiperOrigin-RevId: 380033450
|
|
Fixes #3159.
PiperOrigin-RevId: 379814096
|
|
There are many references to unimplemented iptables features that link to #170,
but that bug is about Istio support specifically. Istio is supported, so the
references should change.
Some TODOs are addressed, some removed because they are not features requested
by users, and some are left as implementation notes.
Fixes #170.
PiperOrigin-RevId: 379328488
|
|
If the ACK completing the handshake has FIN or data, requeue the segment
for further processing by the newly established endpoint. Otherwise,
the segments would have to be retransmitted by the peer to be processed
by the established endpoint. Doing this, keeps the behavior in parity
with Linux.
This also addresses a test flake with TCPNonBlockingConnectClose where
the ACK (completing the handshake) and multiple retransmitted FINACKs
from the peer could be dropped by the listener, when using syncookies
and the accept queue is full. The handshake could eventually get
completed with a retransmitted FINACK, without actual processing of
FIN. This can cause the poll with POLLRDHUP on the accepted socket to
sometimes time out before the next FINACK retransmission.
PiperOrigin-RevId: 377651695
|
|
Address a race with non-blocking connect and socket close, causing the
FIN (because of socket close) to not be sent out, even after completing
the handshake.
The race occurs with this sequence:
(1) endpoint Connect starts handshake, sending out SYN
(2) handshake complete() releases endpoint lock, waiting on sleeper.Fetch()
(3) endpoint Close acquires endpoint lock, does not enqueue FIN (as the
endpoint is not yet connected) and asserts notifyClose
(4) SYNACK from peer gets enqueued asserting newSegmentWaker
(5) handshake complete() re-aqcuires lock, first processes newSegmentWaker
event, transitions to ESTABLISHED and proceeds to protocolMainLoop()
(6) protocolMainLoop() exits while processing notifyClose
When the execution follows the above sequence, no FIN is sent to the peer.
This causes the listener side to have a half-open connection sitting in
the accept queue.
Fix this by ensuring that the protocolMainLoop() performs clean shutdown
when the endpoint state is still ESTABLISHED.
This would not be a bug, if during handshake complete(), sleeper.Fetch()
prioritized notificationWaker over newSegmentWaker. In that case, the
handshake would not have completed in (5) above.
Fixes #6067
PiperOrigin-RevId: 376994395
|
|
The current implementation has a bug where TCP listener does not ignore
RSTs from the peer. While handling RST+ACK from the peer, this bug can
complete handshakes that use syncookies. This results in half-open
connection delivered to the accept queue.
Fixes #6076
PiperOrigin-RevId: 376868749
|
|
Adds support for the SO_BINDTODEVICE socket option in ICMP sockets with an
accompanying packetimpact test to exercise use of this socket option.
Adds a unit test to exercise the NIC selection logic introduced by this change.
The remaining unit tests for ICMP sockets need to be added in a subsequent CL.
See https://gvisor.dev/issues/5623 for the list of remaining unit tests.
Adds a "timeout" field to PacketimpactTestInfo, necessary due to the long
runtime of the newly added packetimpact test.
Fixes #5678
Fixes #4896
Updates #5623
Updates #5681
Updates #5763
Updates #5956
Updates #5966
Updates #5967
PiperOrigin-RevId: 376271581
|
|
...except TCP tests and NDP tests that mutate globals. These will be
undertaken later.
Updates #5940.
PiperOrigin-RevId: 376145608
|
|
...except in tests.
Note this replaces some uses of a cryptographic RNG with a plain RNG.
PiperOrigin-RevId: 376070666
|
|
PiperOrigin-RevId: 376001032
|
|
Updates #5939.
Updates #6012.
RELNOTES: n/a
PiperOrigin-RevId: 375931554
|
|
Introduce tcpip.MonotonicTime; replace int64 in tcpip.Clock method
returns with time.Time and MonotonicTime to improve type safety and
ensure that monotonic clock readings are never compared to wall clock
readings.
PiperOrigin-RevId: 375775907
|
|
- Unused constants
- Unused functions
- Unused arguments
- Unkeyed literals
- Unnecessary conversions
PiperOrigin-RevId: 375253464
|
|
This change also includes miscellaneous improvements:
* UnknownProtocolRcvdPackets has been separated into two stats, to
specify at which layer the unknown protocol was found (L3 or L4)
* MalformedRcvdPacket is not aggregated across every endpoint anymore.
Doing it this way did not add useful information, and it was also error-prone
(example: ipv6 forgot to increment this aggregated stat, it only
incremented its own ipv6.MalformedPacketsReceived). It is now only incremented
the NIC.
* Removed TestStatsString test which was outdated and had no real
utility.
PiperOrigin-RevId: 375057472
|
|
Add missing protocol state to TCPINFO struct and update packetimpact.
This re-arranges the TCP state definitions to align with Linux.
Fixes #478
PiperOrigin-RevId: 374996751
|
|
When recovering from a zero-receive-window situation, and asked to
send out an ACK, ensure that we apply SWS avoidance in our window
updates.
Fixes #5984
PiperOrigin-RevId: 373689578
|
|
This code path is for outgoing packets, and we don't currently do memory
accounting on this path. So it wasn't breaking anything.
This change did not add a test for ref-counting issue fixed, but will switch to
the leak-checking ref-counter later when all ref-counting issues are fixed.
PiperOrigin-RevId: 373447913
|