Age | Commit message (Collapse) | Author |
|
Formerly, when a packet is constructed or parsed, all headers are set by the
client code. This almost always involved prepending to pk.Header buffer or
trimming pk.Data portion. This is known to prone to bugs, due to the complexity
and number of the invariants assumed across netstack to maintain.
In the new PacketHeader API, client will call Push()/Consume() method to
construct/parse an outgoing/incoming packet. All invariants, such as slicing
and trimming, are maintained by the API itself.
NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no
longer valid.
PacketBuffer now assumes the packet is a concatenation of following portions:
* LinkHeader
* NetworkHeader
* TransportHeader
* Data
Any of them could be empty, or zero-length.
PiperOrigin-RevId: 326507688
|
|
NetworkEndpoints set the number on outgoing packets in Write() and
NetworkProtocols set them on incoming packets in Parse().
Needed for #3549.
PiperOrigin-RevId: 325938745
|
|
Packets MUST NOT use a non-unicast source address for ICMP
Echo Replies.
Test: integration_test.TestPingMulticastBroadcast
PiperOrigin-RevId: 325634380
|
|
When a Neighbor Solicitation is received, a neighbor entry is created with the
remote host's link layer address, but without a link layer address resolver. If
the host decides to send a packet addressed to the IP address of that neighbor
entry, Address Resolution starts with a nil pointer to the link layer address
resolver. This causes the netstack to panic and crash.
This change ensures that when a packet is sent in that situation, the link
layer address resolver will be set before Address Resolution begins.
Tests:
pkg/tcpip/stack:stack_test
+ TestEntryUnknownToStaleToProbeToReachable
- TestNeighborCacheEntryNoLinkAddress
Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950
PiperOrigin-RevId: 325516471
|
|
Test:
- stack_test.TestJoinLeaveMulticastOnNICEnableDisable
- integration_test.TestIncomingMulticastAndBroadcast
PiperOrigin-RevId: 325185259
|
|
Test: integration_test.TestIncomingSubnetBroadcast
PiperOrigin-RevId: 325135617
|
|
Updates #231
PiperOrigin-RevId: 325097683
|
|
RACK (Recent Acknowledgement) is a new loss detection
algorithm in TCP. These are the fields which should be
stored on connections to implement RACK algorithm.
PiperOrigin-RevId: 324948703
|
|
Envoy (#170) uses this to get the original destination of redirected
packets.
|
|
This change implements the Neighbor Unreachability Detection (NUD) state
machine, as per RFC 4861 [1]. The state machine operates on a single neighbor
in the local network. This requires the state machine to be implemented on each
entry of the neighbor table.
This change also adds, but does not expose, several APIs. The first API is for
performing basic operations on the neighbor table:
- Create a static entry
- List all entries
- Delete all entries
- Remove an entry by address
The second API is used for changing the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][2] for the list of
constants.
Finally, the last API is for allowing users to subscribe to NUD state changes.
See [RFC 4861 Appendix C][3] for the list of edges.
[1]: https://tools.ietf.org/html/rfc4861
[2]: https://tools.ietf.org/html/rfc4861#section-10
[3]: https://tools.ietf.org/html/rfc4861#appendix-C
Tests:
pkg/tcpip/stack:stack_test
- TestNeighborCacheAddStaticEntryThenOverflow
- TestNeighborCacheClear
- TestNeighborCacheClearThenOverflow
- TestNeighborCacheConcurrent
- TestNeighborCacheDuplicateStaticEntryWithDifferentLinkAddress
- TestNeighborCacheDuplicateStaticEntryWithSameLinkAddress
- TestNeighborCacheEntry
- TestNeighborCacheEntryNoLinkAddress
- TestNeighborCacheGetConfig
- TestNeighborCacheKeepFrequentlyUsed
- TestNeighborCacheNotifiesWaker
- TestNeighborCacheOverflow
- TestNeighborCacheOverwriteWithStaticEntryThenOverflow
- TestNeighborCacheRemoveEntry
- TestNeighborCacheRemoveEntryThenOverflow
- TestNeighborCacheRemoveStaticEntry
- TestNeighborCacheRemoveStaticEntryThenOverflow
- TestNeighborCacheRemoveWaker
- TestNeighborCacheReplace
- TestNeighborCacheResolutionFailed
- TestNeighborCacheResolutionTimeout
- TestNeighborCacheSetConfig
- TestNeighborCacheStaticResolution
- TestEntryAddsAndClearsWakers
- TestEntryDelayToProbe
- TestEntryDelayToReachableWhenSolicitedOverrideConfirmation
- TestEntryDelayToReachableWhenUpperLevelConfirmation
- TestEntryDelayToStaleWhenConfirmationWithDifferentAddress
- TestEntryDelayToStaleWhenProbeWithDifferentAddress
- TestEntryFailedGetsDeleted
- TestEntryIncompleteToFailed
- TestEntryIncompleteToIncompleteDoesNotChangeUpdatedAt
- TestEntryIncompleteToReachable
- TestEntryIncompleteToReachableWithRouterFlag
- TestEntryIncompleteToStale
- TestEntryInitiallyUnknown
- TestEntryProbeToFailed
- TestEntryProbeToReachableWhenSolicitedConfirmationWithSameAddress
- TestEntryProbeToReachableWhenSolicitedOverrideConfirmation
- TestEntryProbeToStaleWhenConfirmationWithDifferentAddress
- TestEntryProbeToStaleWhenProbeWithDifferentAddress
- TestEntryReachableToStaleWhenConfirmationWithDifferentAddress
- TestEntryReachableToStaleWhenConfirmationWithDifferentAddressAndOverride
- TestEntryReachableToStaleWhenProbeWithDifferentAddress
- TestEntryReachableToStaleWhenTimeout
- TestEntryStaleToDelay
- TestEntryStaleToReachableWhenSolicitedOverrideConfirmation
- TestEntryStaleToStaleWhenOverrideConfirmation
- TestEntryStaleToStaleWhenProbeUpdateAddress
- TestEntryStaysDelayWhenOverrideConfirmationWithSameAddress
- TestEntryStaysProbeWhenOverrideConfirmationWithSameAddress
- TestEntryStaysReachableWhenConfirmationWithRouterFlag
- TestEntryStaysReachableWhenProbeWithSameAddress
- TestEntryStaysStaleWhenProbeWithSameAddress
- TestEntryUnknownToIncomplete
- TestEntryUnknownToStale
- TestEntryUnknownToUnknownWhenConfirmationWithUnknownAddress
pkg/tcpip/stack:stack_x_test
- TestDefaultNUDConfigurations
- TestNUDConfigurationFailsForNotSupported
- TestNUDConfigurationsBaseReachableTime
- TestNUDConfigurationsDelayFirstProbeTime
- TestNUDConfigurationsMaxMulticastProbes
- TestNUDConfigurationsMaxRandomFactor
- TestNUDConfigurationsMaxUnicastProbes
- TestNUDConfigurationsMinRandomFactor
- TestNUDConfigurationsRetransmitTimer
- TestNUDConfigurationsUnreachableTime
- TestNUDStateReachableTime
- TestNUDStateRecomputeReachableTime
- TestSetNUDConfigurationFailsForBadNICID
- TestSetNUDConfigurationFailsForNotSupported
[1]: https://tools.ietf.org/html/rfc4861
[2]: https://tools.ietf.org/html/rfc4861#section-10
[3]: https://tools.ietf.org/html/rfc4861#appendix-C
Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950
PiperOrigin-RevId: 324070795
|
|
When sending packets to a known network's broadcast address, use the
broadcast MAC address.
Test:
- stack_test.TestOutgoingSubnetBroadcast
- udp_test.TestOutgoingSubnetBroadcast
PiperOrigin-RevId: 324062407
|
|
PiperOrigin-RevId: 323715260
|
|
The previous implementation of LinkAddressRequest only supported sending
broadcast ARP requests and multicast Neighbor Solicitations. The ability to
send these packets as unicast is required for Neighbor Unreachability
Detection.
Tests:
pkg/tcpip/network/arp:arp_test
- TestLinkAddressRequest
pkg/tcpip/network/ipv6:ipv6_test
- TestLinkAddressRequest
Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950
PiperOrigin-RevId: 323451569
|
|
Changes the API of tcpip.Clock to also provide a method for scheduling and
rescheduling work after a specified duration. This change also implements the
AfterFunc method for existing implementations of tcpip.Clock.
This is the groundwork required to mock time within tests. All references to
CancellableTimer has been replaced with the tcpip.Job interface, allowing for
custom implementations of scheduling work.
This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or
use tcpip.CancellableTimer. Migration plan:
1. Add AfterFunc(d, f) to tcpip.Clock
2. Replace references of tcpip.CancellableTimer with tcpip.Job
3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel
4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule
5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob.
PiperOrigin-RevId: 322906897
|
|
PiperOrigin-RevId: 322882426
|
|
PiperOrigin-RevId: 322853192
|
|
Previously, ICMP destination unreachable datagrams were ignored by TCP
endpoints. This caused connect to hang when an intermediate router
couldn't find a route to the host.
This manifested as a Kokoro error when Docker IPv6 was enabled. The Ruby
image test would try to install the sinatra gem and hang indefinitely
attempting to use an IPv6 address.
Fixes #3079.
|
|
Fixes a NAT bug that manifested as:
- A SYN was sent from gVisor to another host, unaffected by iptables.
- The corresponding SYN/ACK was NATted by a PREROUTING REDIRECT rule
despite being part of the existing connection.
- The socket that sent the SYN never received the SYN/ACK and thus a
connection could not be established.
We handle this (as Linux does) by tracking all connections, inserting a
no-op conntrack rule for new connections with no rules of their own.
Needed for istio support (#170).
|
|
For iptables users, Check() is a hot path called for every packet one or more
times. Let's avoid a bunch of map lookups.
PiperOrigin-RevId: 322678699
|
|
Updates #173
PiperOrigin-RevId: 322665518
|
|
This is no longer necessary, as we always set NetworkHeader before calling
iptables.Check.
PiperOrigin-RevId: 321461978
|
|
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks
tcpdump as it tries to interpret the packets incorrectly.
Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which
fails with an EINVAL since we don't implement it. For now change it to return
EOPNOTSUPP to indicate that we don't support the query rather than return
EINVAL.
NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities
and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type
field while NIC capabilities are more like the device features which can be
queried using SIOCETHTOOL but not modified and NIC Flags are fields that can
be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc.
Updates #2746
PiperOrigin-RevId: 321436525
|
|
PiperOrigin-RevId: 321053634
|
|
As in Linux, we must periodically clean up unused connections.
PiperOrigin-RevId: 321003353
|
|
sleep.Waker's fields are modified as values.
PiperOrigin-RevId: 320873451
|
|
The current convention is when a header is set to pkt.XxxHeader field, it
gets removed from pkt.Data. ICMP does not currently follow this convention.
PiperOrigin-RevId: 320078606
|
|
stack_x_test: 2m -> 20s
tcp_x_test: 80s -> 25s
PiperOrigin-RevId: 319828101
|
|
- Split connTrackForPacket into 2 functions instead of switching on flag
- Replace hash with struct keys.
- Remove prefixes where possible
- Remove unused connStatus, timeout
- Flatten ConnTrack struct a bit - some intermediate structs had no meaning
outside of the context of their parent.
- Protect conn.tcb with a mutex
- Remove redundant error checking (e.g. when is pkt.NetworkHeader valid)
- Clarify that HandlePacket and CreateConnFor are the expected entrypoints for
ConnTrack
PiperOrigin-RevId: 318407168
|
|
Linux controls socket send/receive buffers using a few sysctl variables
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_max
- net.core.wmem_default
- net.ipv4.tcp_rmem
- net.ipv4.tcp_wmem
The first 4 control the default socket buffer sizes for all sockets
raw/packet/tcp/udp and also the maximum permitted socket buffer that can be
specified in setsockopt(SOL_SOCKET, SO_(RCV|SND)BUF,...).
The last two control the TCP auto-tuning limits and override the default
specified in rmem_default/wmem_default as well as the max limits.
Netstack today only implements tcp_rmem/tcp_wmem and incorrectly uses it
to limit the maximum size in setsockopt() as well as uses it for raw/udp
sockets.
This changelist introduces the other 4 and updates the udp/raw sockets to use
the newly introduced variables. The values for min/max match the current
tcp_rmem/wmem values and the default value buffers for UDP/RAW sockets is
updated to match the linux value of 212KiB up from the really low current value
of 32 KiB.
Updates #3043
Fixes #3043
PiperOrigin-RevId: 318089805
|
|
For TCP sockets, SO_REUSEADDR relaxes the rules for binding addresses.
gVisor/netstack already supported a behavior similar to SO_REUSEADDR, but did
not allow disabling it. This change brings the SO_REUSEADDR behavior closer to
the behavior implemented by Linux and adds a new SO_REUSEADDR disabled
behavior. Like Linux, SO_REUSEADDR is now disabled by default.
PiperOrigin-RevId: 317984380
|
|
Users that never set iptables rules shouldn't incur the iptables performance
cost. Suggested by Ian (@iangudger).
PiperOrigin-RevId: 317232921
|
|
Metadata was useful for debugging and safety, but enough tests exist that we
should see failures when (de)serialization is broken. It made stack
initialization more cumbersome and it's also getting in the way of ip6tables.
PiperOrigin-RevId: 317210653
|
|
When a tcp.timer or tcpip.Route is no longer used, clean up its
resources so that unused memory may be released.
PiperOrigin-RevId: 317046582
|
|
... to help reduce flakes.
When waiting for an event to occur, use a timeout of 10s. When waiting
for an event to not occur, use a timeout of 1s.
Test: Ran test locally w/ run count of 1000 with and without gotsan.
PiperOrigin-RevId: 316998128
|
|
Tentative addresses should not be used when finding a route. This change
fixes a bug where a tentative address may have been used.
Test: stack_test.TestDADResolve
PiperOrigin-RevId: 315997624
|
|
On UDP sockets, SO_REUSEADDR allows multiple sockets to bind to the same
address, but only delivers packets to the most recently bound socket. This
differs from the behavior of SO_REUSEADDR on TCP sockets. SO_REUSEADDR for TCP
sockets will likely need an almost completely independent implementation.
SO_REUSEADDR has some odd interactions with the similar SO_REUSEPORT. These
interactions are tested fairly extensively and all but one particularly odd
one (that honestly seems like a bug) behave the same on gVisor and Linux.
PiperOrigin-RevId: 315844832
|
|
NDP packets are sent periodically from NDP timers. These timers do not
hold the NIC lock when sending packets as the packet write operation
may take some time. While the lock is not held, the NIC may be removed
by some other goroutine. This change handles that scenario gracefully.
Test: stack_test.TestRemoveNICWhileHandlingRSTimer
PiperOrigin-RevId: 315524143
|
|
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.
This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.
PiperOrigin-RevId: 315179302
|
|
PiperOrigin-RevId: 315041419
|
|
Loopback traffic is not affected by rules in the PREROUTING chain.
This change is also necessary for istio's envoy to talk to other
components in the same pod.
|
|
IPTables.connections contains a sync.RWMutex. Copying it will trigger copylocks
analysis. Tested by manually enabling nogo tests.
sync.RWMutex is added to IPTables for the additional race condition discovered.
PiperOrigin-RevId: 314817019
|
|
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.
With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.
Updates #2404.
PiperOrigin-RevId: 314610879
|
|
PiperOrigin-RevId: 313842690
|
|
Updates #2404.
PiperOrigin-RevId: 313834784
|
|
|
|
The specified LinkEndpoint is not being used in a significant way.
No behavior change, existing tests pass.
This change is a breaking change.
PiperOrigin-RevId: 313496602
|
|
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense
in one place.
* Drop "user-space kernel" and use "application kernel". The term "user-space
kernel" is confusing when some platform implementation do not run in
user-space (instead running in guest ring zero).
* Clear up the relationship between the Platform page in the user guide and the
Platform page in the architecture guide, and ensure they are cross-linked.
* Restore the call-to-action quick start link in the main page, and drop the
GitHub link (which also appears in the top-right).
* Improve image formatting by centering all doc and blog images, and move the
image captions to the alt text.
PiperOrigin-RevId: 311845158
|
|
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT
PiperOrigin-RevId: 310642286
|
|
Do not assume that networks need any DHCPv6 configurations. Instead,
notify the NDP dispatcher in response to the first NDP RA's DHCPv6
flags, even if the flags indicate no DHCPv6 configurations are
available.
PiperOrigin-RevId: 310245068
|
|
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
|