Age | Commit message (Collapse) | Author |
|
Added headers, stats, checksum parsing capabilities from RFC 2236 describing
IGMPv2.
IGMPv2 state machine is implemented for each condition, sending and receiving
IGMP Membership Reports and Leave Group messages with backwards compatibility
with IGMPv1 routers.
Test:
* Implemented igmp header parser and checksum calculator in header/igmp_test.go
* ipv4/igmp_test.go tests incoming and outgoing IGMP messages and pathways.
* Added unit test coverage for IGMPv2 RFC behavior + IGMPv1 backwards
compatibility in ipv4/igmp_test.go.
Fixes #4682
PiperOrigin-RevId: 343408809
|
|
We would like to track locks ordering to detect ordering violations. Detecting
violations is much simpler if mutexes must be unlocked by the same goroutine
that locked them.
Thus, as a first step to tracking lock ordering, add this lock/unlock
requirement to gVisor's sync.Mutex. This is more strict than the Go standard
library's sync.Mutex, but initial testing indicates only a single lock that is
used across goroutines. The new sync.CrossGoroutineMutex relaxes the
requirement (but will not provide lock order checking).
Due to the additional overhead, enforcement is only enabled with the
"checklocks" build tag. Build with this tag using:
bazel build --define=gotags=checklocks ...
From my spot-checking, this has no changed inlining properties when disabled.
Updates #4804
PiperOrigin-RevId: 343370200
|
|
PiperOrigin-RevId: 343217712
|
|
This changes also introduces:
- `SocketOptionsHandler` interface which can be implemented by endpoints to
handle endpoint specific behavior on SetSockOpt. This is analogous to what
Linux does.
- `DefaultSocketOptionsHandler` which is a default implementation of the above.
This is embedded in all endpoints so that we don't have to uselessly
implement empty functions. Endpoints with specific behavior can override the
embedded method by manually defining its own implementation.
PiperOrigin-RevId: 343158301
|
|
PiperOrigin-RevId: 343152780
|
|
PiperOrigin-RevId: 343146856
|
|
Packets should be properly routed when sending packets to addresses
in the loopback subnet which are not explicitly assigned to the loopback
interface.
Tests:
- integration_test.TestLoopbackAcceptAllInSubnetUDP
- integration_test.TestLoopbackAcceptAllInSubnetTCP
PiperOrigin-RevId: 343135643
|
|
This change also makes the following fixes:
- Make SocketOptions use atomic operations instead of having to acquire/drop
locks upon each get/set option.
- Make documentation more consistent.
- Remove tcpip.SocketOptions from socketOpsCommon because it already exists
in transport.Endpoint.
- Refactors get/set socket options tests to be easily extendable.
PiperOrigin-RevId: 343103780
|
|
If the endpoint is in StateError but e.hardErrorLocked() returns
nil then return ErrClosedForRecieve.
This can happen if a concurrent write on the same endpoint was in progress
when the endpoint transitioned to an error state.
PiperOrigin-RevId: 343018257
|
|
In UDP endpoint.Write() sendUDP is called with e.mu Rlocked. But if this happens
to send a datagram over loopback which ends up generating an ICMP response of
say ErrNoPortReachable, the handling of the response in HandleControlPacket also
acquires e.mu using RLock. This is mostly fine unless there is a competing
caller trying to acquire e.mu in exclusive mode using Lock(). This will deadlock
as a caller waiting in Lock() disallows an new RLocks() to ensure it can
actually acquire the Lock.
This is documented here https://golang.org/pkg/sync/#RWMutex.
This change releases the endpoint mutex before calling sendUDP to resolve the
possibility of the deadlock.
Reported-by: syzbot+537989797548c66e8ee3@syzkaller.appspotmail.com
Reported-by: syzbot+eb0b73b4ab486f7673ba@syzkaller.appspotmail.com
PiperOrigin-RevId: 342894148
|
|
Fixes the behaviour of SO_ERROR for tcp sockets where in linux it returns
sk->sk_err and if sk->sk_err is 0 then it returns sk->sk_soft_err. In gVisor TCP
we endpoint.HardError is the equivalent of sk->sk_err and endpoint.LastError
holds soft errors. This change brings this into alignment with Linux such that
both hard/soft errors are cleared when retrieved using getsockopt(.. SO_ERROR)
is called on a socket.
Fixes #3812
PiperOrigin-RevId: 342868552
|
|
PiperOrigin-RevId: 342700744
|
|
PiperOrigin-RevId: 342669574
|
|
Detect if the ACK is a duplicate and update in RACK.
PiperOrigin-RevId: 342332569
|
|
The current implementation of loss recovery algorithms SACK and NewReno are in
the same file(snd.go). The functions have multiple checks to see which one is
currently used by the endpoint. This CL will refactor and separate the
implementation of existing recovery algorithms which will help us to implement
new recovery algorithms(such as RACK) with less changes to the existing code.
There is no change in the behavior.
PiperOrigin-RevId: 342312166
|
|
Store all the socket level options in a struct and call {Get/Set}SockOpt on
this struct. This will avoid implementing socket level options on all
endpoints. This CL contains implementing one socket level option for tcp and
udp endpoints.
PiperOrigin-RevId: 342203981
|
|
The NIC should not hold network-layer state or logic - network packet
handling/forwarding should be performed at the network layer instead
of the NIC.
Fixes #4688
PiperOrigin-RevId: 342166985
|
|
Most packets don't have options but they are an integral part of the
standard. Teaching the ipv4 code how to handle them will simplify future
testing and use. Because Options are so rare it is worth making sure
that the extra work is kept out of the fast path as much as possible.
Prior to this change, all usages of the IHL field of the IPv4Fields/Encode
system set it to the same constant value except in a couple of tests
for bad values. From this change IHL will not be a constant as it will
depend on the size of any Options. Since ipv4.Encode() now handles the
options it becomes a possible source of errors to let the callers set
this value, so remove it entirely and calculate the value from the size
of the Options if present (or not) therefore guaranteeing a correct value.
Fixes #4709
RELNOTES: n/a
PiperOrigin-RevId: 341864765
|
|
This Notify was added as part of cl/279106406; but notifying `EventHUp`
in `FIN_WAIT2` is incorrect, as we want to only notify later on
`TIME_WAIT` or a reset. However, we do need to notify any blocked
waiters of an activity on the endpoint with `EventIn`|`EventOut`.
PiperOrigin-RevId: 341490913
|
|
This avoids a needless allocation.
Updates #231
PiperOrigin-RevId: 341113160
|
|
Port 0 is not meant to identify any remote port so attempting to send
a packet to it should return an error.
PiperOrigin-RevId: 341009528
|
|
* Remove stack.Route from incoming packet path.
There is no need to pass around a stack.Route during the incoming path
of a packet. Instead, pass around the packet's link/network layer
information in the packet buffer since all layers may need this
information.
* Support address bound and outgoing packet NIC in routes.
When forwarding is enabled, the source address of a packet may be bound
to a different interface than the outgoing interface. This change
updates stack.Route to hold both NICs so that one can be used to write
packets while the other is used to check if the route's bound address
is valid. Note, we need to hold the address's interface so we can check
if the address is a spoofed address.
* Introduce the concept of a local route.
Local routes are routes where the packet never needs to leave the stack;
the destination is stack-local. We can now route between interfaces
within a stack if the packet never needs to leave the stack, even when
forwarding is disabled.
* Always obtain a route from the stack before sending a packet.
If a packet needs to be sent in response to an incoming packet, a route
must be obtained from the stack to ensure the stack is configured to
send packets to the packet's source from the packet's destination.
* Enable spoofing if a stack may send packets from unowned addresses.
This change required changes to some netgophers since previously,
promiscuous mode was enough to let the netstack respond to all
incoming packets regardless of the packet's destination address. Now
that a stack.Route is not held for each incoming packet, finding a route
may fail with local addresses we don't own but accepted packets for
while in promiscuous mode. Since we also want to be able to send from
any address (in response the received promiscuous mode packets), we need
to enable spoofing.
* Skip transport layer checksum checks for locally generated packets.
If a packet is locally generated, the stack can safely assume that no
errors were introduced while being locally routed since the packet is
never sent out the wire.
Some bugs fixed:
- transport layer checksum was never calculated after NAT.
- handleLocal didn't handle routing across interfaces.
- stack didn't support forwarding across interfaces.
- always consult the routing table before creating an endpoint.
Updates #4688
Fixes #3906
PiperOrigin-RevId: 340943442
|
|
This was occasionally causing tests to get stuck due to races with the save
process, during which the same mutex is acquired.
PiperOrigin-RevId: 340789616
|
|
Without releasing the mutex, operations on the endpoint following a
nonblocking connect will not make progress until connect is complete.
PiperOrigin-RevId: 340467654
|
|
PiperOrigin-RevId: 340274194
|
|
PiperOrigin-RevId: 339945377
|
|
PiperOrigin-RevId: 339750876
|
|
TCP endpoint unconditionly binds to v4 even when the stack only supports v6.
PiperOrigin-RevId: 339739392
|
|
Refactor TCP handshake code so that when connect is initiated, the initial SYN
is sent before creating a goroutine to handle the rest of the handshake (which
blocks). Similarly, the initial SYN-ACK is sent inline when SYN is received
during accept.
Some additional cleanup is done as well.
Eventually we would like to complete connections in the dispatcher without
requiring a wakeup to complete the handshake. This refactor makes that easier.
Updates #231
PiperOrigin-RevId: 339675182
|
|
IPv4 options extend the size of the IP header and have a basic known
format. The framework can process that format without needing to know
about every possible option. We can add more code to handle additional
option types as we need them. Bad options or mangled option entries
can result in ICMP Parameter Problem packets. The first types we
support are the Timestamp option and the Record Route option, included
in this change.
The options are processed at several points in the packet flow within
the Network stack, with slightly different requirements. The framework
includes a mechanism to control this at each point. Support has been
added for such points which are only present in upcoming CLs such as
during packet forwarding and fragmentation.
With this change, 'ping -R' and 'ping -T' work against gVisor and Fuchsia.
$ ping -R 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.990 ms
NOP
RR: 192.168.1.1
192.168.1.2
192.168.1.1
$ ping -T tsprespec 192.168.1.2 192.168.1.1 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=1.20 ms
TS: 192.168.1.2 71486821 absolute
192.168.1.1 746
Unit tests included for generic options, Timestamp options
and Record Route options.
PiperOrigin-RevId: 339379076
|
|
This change wakes up any waiters when we receive an ICMP port unreachable
control packet on an UDP socket as well as sets waiter.EventErr in
the result returned by Readiness() when e.lastError is not nil.
The latter is required where an epoll()/poll() is done after the error
is already handled since we will never notify again in such cases.
PiperOrigin-RevId: 339370469
|
|
Drain the notification channel after first accept as in case the first accept
never blocked then the notification for the first accept will still be in the
channel causing the second accept to fail as it will try to wait on the channel
and return immediately due to the older notification even though there is no
connection yet in the accept queue.
PiperOrigin-RevId: 338710062
|
|
The SO_ACCEPTCONN option is used only on getsockopt(). When this option is
specified, getsockopt() indicates whether socket listening is enabled for
the socket. A value of zero indicates that socket listening is disabled;
non-zero that it is enabled.
PiperOrigin-RevId: 338703206
|
|
Earlier the count was dropped only after calling e.deliverAccepted. This lead to
an issue where there were no connections in SYN-RCVD state for the listening
endpoint but e.synRcvdCount would not be zero because it was being reduced only
when handleSynSegment returned after deliverAccepted returned.
This issue is seen when the Nth SYN for a listen backlog of size N which would
cause the listen backlog to be full gets dropped occasionally. This happens when
the new SYN comes at when the previous completed endpoint has been delivered to
the accept queue but the synRcvdCount hasn't yet been decremented because the
goroutine running handleSynSegment has not yet completed.
PiperOrigin-RevId: 338690646
|
|
Previously a link endpoint was passed to
stack.LinkAddressResolver.LinkAddressRequest. With this change,
implementations that want a route for the link address request may
find one through the stack. Other implementations that want to send
a packet without a route may continue to do so using the network
interface directly.
Test: - arp_test.TestLinkAddressRequest
- ipv6.TestLinkAddressRequest
PiperOrigin-RevId: 338577474
|
|
PiperOrigin-RevId: 338168977
|
|
//pkg/tcpip/stack:stack_x_test_nogo
//pkg/tcpip/transport/raw:raw_nogo
PiperOrigin-RevId: 338153265
|
|
|
|
The fix in commit 028e045da93b7c1c26417e80e4b4e388b86a713d was incorrect as
it can cause the right edge of the window to shrink when we announce
a zero window due to receive buffer being full as its done before the check
for seeing if the window is being shrunk because of the selected window.
Further the window was calculated purely on available space but in cases where
we are getting full sized segments it makes more sense to use the actual bytes
being held. This CL changes to use the lower of the total available space vs
the available space in the maximal window we could advertise minus the actual
payload bytes being held.
This change also cleans up the code so that the window selection logic is
not duplicated between getSendParams() and windowCrossedACKThresholdLocked.
PiperOrigin-RevId: 336404827
|
|
RACK detects packet reordering by checking if the sender received ACK for
the packet which has the sequence number less than the already acknowledged
packets.
PiperOrigin-RevId: 336397526
|
|
PiperOrigin-RevId: 336339194
|
|
PiperOrigin-RevId: 336304024
|
|
When a response needs to be sent to an incoming packet, the stack should
consult its neighbour table to determine the remote address's link
address.
When an entry does not exist in the stack's neighbor table, the stack
should queue the packet while link resolution completes. See comments.
PiperOrigin-RevId: 336185457
|
|
The IPv4 RFCs are specific (though obtuse) that an echo response
packet needs to contain all the options from the echo request,
much as if it been routed back to the sender, though apparently
with a new TTL. They suggest copying the incoming packet header
to achieve this so that is what this patch does.
PiperOrigin-RevId: 335559176
|
|
We can get the network endpoint directly from the NIC.
This is a preparatory CL for when a Route needs to hold a dedicated NIC
as its output interface. This is because when forwarding is enabled,
packets may be sent from a NIC different from the NIC a route's local
address is associated with.
PiperOrigin-RevId: 335484500
|
|
We are currently tracking the minimum RTT for RACK as smoothed RTT. As per RFC
minimum RTT can be a global minimum of all RTTs or filtered value of recent
RTT measurements. In this cl minimum RTT is updated to global minimum of all
RTTs for the connection.
PiperOrigin-RevId: 335061518
|
|
Adds support for the IPv6-compatible redirect target. Redirection is a limited
form of DNAT, where the destination is always the localhost.
Updates #3549.
PiperOrigin-RevId: 334698344
|
|
As per relevant IP RFCS (see code comments), broadcast (for IPv4) and
multicast addresses are not allowed. Currently checks for these are
done at the transport layer, but since it is explicitly forbidden at
the IP layers, check for them there.
This change also removes the UDP.InvalidSourceAddress stat since there
is no longer a need for it.
Test: ip_test.TestSourceAddressValidation
PiperOrigin-RevId: 334490971
|
|
Currently expired IP fragments are discarded only if another fragment for the
same IP datagram is received after timeout or the total size of the fragment
queue exceeded a predefined value.
Test: fragmentation.TestReassemblingTimeout
Fixes #3960
PiperOrigin-RevId: 334423710
|
|
* Remove Capabilities and NICID methods from NetworkEndpoint.
* Remove linkEP and stack parameters from NetworkProtocol.NewEndpoint.
The LinkEndpoint can be fetched from the NetworkInterface. The stack
is passed to the NetworkProtocol when it is created so the
NetworkEndpoint can get it from its protocol.
* Remove stack parameter from TransportProtocol.NewEndpoint.
Like the NetworkProtocol/Endpoint, the stack is passed to the
TransportProtocol when it is created.
PiperOrigin-RevId: 334332721
|