Age | Commit message (Collapse) | Author |
|
This change adds an option to replace the current implementation of ARP through
linkAddrCache, with an implementation of NUD through neighborCache. Switching
to using NUD for both ARP and NDP is beneficial for the reasons described by
RFC 4861 Section 3.1:
"[Using NUD] significantly improves the robustness of packet delivery in the
presence of failing routers, partially failing or partitioned links, or nodes
that change their link-layer addresses. For instance, mobile nodes can move
off-link without losing any connectivity due to stale ARP caches."
"Unlike ARP, Neighbor Unreachability Detection detects half-link failures and
avoids sending traffic to neighbors with which two-way connectivity is
absent."
Along with these changes exposes the API for querying and operating the
neighbor cache. Operations include:
- Create a static entry
- List all entries
- Delete all entries
- Remove an entry by address
This also exposes the API to change the NUD protocol constants on a per-NIC
basis to allow Neighbor Discovery to operate over links with widely varying
performance characteristics. See [RFC 4861 Section 10][1] for the list of
constants.
Finally, an API for subscribing to NUD state changes is exposed through
NUDDispatcher. See [RFC 4861 Appendix C][3] for the list of edges.
Tests:
pkg/tcpip/network/arp:arp_test
+ TestDirectRequest
pkg/tcpip/network/ipv6:ipv6_test
+ TestLinkResolution
+ TestNDPValidation
+ TestNeighorAdvertisementWithTargetLinkLayerOption
+ TestNeighorSolicitationResponse
+ TestNeighorSolicitationWithSourceLinkLayerOption
+ TestRouterAdvertValidation
pkg/tcpip/stack:stack_test
+ TestCacheWaker
+ TestForwardingWithFakeResolver
+ TestForwardingWithFakeResolverManyPackets
+ TestForwardingWithFakeResolverManyResolutions
+ TestForwardingWithFakeResolverPartialTimeout
+ TestForwardingWithFakeResolverTwoPackets
+ TestIPv6SourceAddressSelectionScopeAndSameAddress
[1]: https://tools.ietf.org/html/rfc4861#section-10
[2]: https://tools.ietf.org/html/rfc4861#appendix-C
Fixes #1889
Fixes #1894
Fixes #1895
Fixes #1947
Fixes #1948
Fixes #1949
Fixes #1950
PiperOrigin-RevId: 328365034
|
|
The NetworkEndpoint does not need to be created for each address.
Most of the work the NetworkEndpoint does is address agnostic.
PiperOrigin-RevId: 326759605
|
|
NetworkEndpoints set the number on outgoing packets in Write() and
NetworkProtocols set them on incoming packets in Parse().
Needed for #3549.
PiperOrigin-RevId: 325938745
|
|
The previous implementation of LinkAddressRequest only supported sending
broadcast ARP requests and multicast Neighbor Solicitations. The ability to
send these packets as unicast is required for Neighbor Unreachability
Detection.
Tests:
pkg/tcpip/network/arp:arp_test
- TestLinkAddressRequest
pkg/tcpip/network/ipv6:ipv6_test
- TestLinkAddressRequest
Updates #1889
Updates #1894
Updates #1895
Updates #1947
Updates #1948
Updates #1949
Updates #1950
PiperOrigin-RevId: 323451569
|
|
Previously, ICMP destination unreachable datagrams were ignored by TCP
endpoints. This caused connect to hang when an intermediate router
couldn't find a route to the host.
This manifested as a Kokoro error when Docker IPv6 was enabled. The Ruby
image test would try to install the sinatra gem and hang indefinitely
attempting to use an IPv6 address.
Fixes #3079.
|
|
Updates #173
PiperOrigin-RevId: 322665518
|
|
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks
tcpdump as it tries to interpret the packets incorrectly.
Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which
fails with an EINVAL since we don't implement it. For now change it to return
EOPNOTSUPP to indicate that we don't support the query rather than return
EINVAL.
NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities
and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type
field while NIC capabilities are more like the device features which can be
queried using SIOCETHTOOL but not modified and NIC Flags are fields that can
be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc.
Updates #2746
PiperOrigin-RevId: 321436525
|
|
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.
This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.
PiperOrigin-RevId: 315179302
|
|
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.
With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.
Updates #2404.
PiperOrigin-RevId: 314610879
|
|
Updates #2404.
PiperOrigin-RevId: 313834784
|
|
The specified LinkEndpoint is not being used in a significant way.
No behavior change, existing tests pass.
This change is a breaking change.
PiperOrigin-RevId: 313496602
|
|
Updates #231
PiperOrigin-RevId: 309323808
|
|
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.
This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.
While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.
Updates #231
PiperOrigin-RevId: 304731742
|
|
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.
Updates #2214
PiperOrigin-RevId: 302677662
|
|
When a NIC is removed, attempt to disable the NIC first to cleanup
dynamic state and stop ongoing periodic tasks (e.g. IPv6 router
solicitations, DAD) so that a removed NIC does not attempt to send
packets.
Tests:
- stack_test.TestRemoveUnknownNIC
- stack_test.TestRemoveNIC
- stack_test.TestDADStop
- stack_test.TestCleanupNDPState
- stack_test.TestRouteWithDownNIC
- stack_test.TestStopStartSolicitingRouters
PiperOrigin-RevId: 300805857
|
|
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.
Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.
PiperOrigin-RevId: 296922548
|
|
|
|
|
|
ending up with the wrong chains and is indexing -1 into rules.
|
|
PacketLooping is already a member on the passed Route.
PiperOrigin-RevId: 288721500
|
|
PiperOrigin-RevId: 282045221
|
|
PiperOrigin-RevId: 280455453
|
|
https://github.com/golang/go/wiki/CodeReviewComments#initialisms
This change does not introduce any new functionality. It just renames variables
from `nicid` to `nicID`.
PiperOrigin-RevId: 278992966
|
|
PacketBuffers are analogous to Linux's sk_buff. They hold all information about
a packet, headers, and payload. This is important for:
* iptables to access various headers of packets
* Preventing the clutter of passing different net and link headers along with
VectorisedViews to packet handling functions.
This change only affects the incoming packet path, and a future change will
change the outgoing path.
Benchmark Regular PacketBufferPtr PacketBufferConcrete
--------------------------------------------------------------------------------
BM_Recvmsg 400.715MB/s 373.676MB/s 396.276MB/s
BM_Sendmsg 361.832MB/s 333.003MB/s 335.571MB/s
BM_Recvfrom 453.336MB/s 393.321MB/s 381.650MB/s
BM_Sendto 378.052MB/s 372.134MB/s 341.342MB/s
BM_SendmsgTCP/0/1k 353.711MB/s 316.216MB/s 322.747MB/s
BM_SendmsgTCP/0/2k 600.681MB/s 588.776MB/s 565.050MB/s
BM_SendmsgTCP/0/4k 995.301MB/s 888.808MB/s 941.888MB/s
BM_SendmsgTCP/0/8k 1.517GB/s 1.274GB/s 1.345GB/s
BM_SendmsgTCP/0/16k 1.872GB/s 1.586GB/s 1.698GB/s
BM_SendmsgTCP/0/32k 1.017GB/s 1.020GB/s 1.133GB/s
BM_SendmsgTCP/0/64k 475.626MB/s 584.587MB/s 627.027MB/s
BM_SendmsgTCP/0/128k 416.371MB/s 503.434MB/s 409.850MB/s
BM_SendmsgTCP/0/256k 323.449MB/s 449.599MB/s 388.852MB/s
BM_SendmsgTCP/0/512k 243.992MB/s 267.676MB/s 314.474MB/s
BM_SendmsgTCP/0/1M 95.138MB/s 95.874MB/s 95.417MB/s
BM_SendmsgTCP/0/2M 96.261MB/s 94.977MB/s 96.005MB/s
BM_SendmsgTCP/0/4M 96.512MB/s 95.978MB/s 95.370MB/s
BM_SendmsgTCP/0/8M 95.603MB/s 95.541MB/s 94.935MB/s
BM_SendmsgTCP/0/16M 94.598MB/s 94.696MB/s 94.521MB/s
BM_SendmsgTCP/0/32M 94.006MB/s 94.671MB/s 94.768MB/s
BM_SendmsgTCP/0/64M 94.133MB/s 94.333MB/s 94.746MB/s
BM_SendmsgTCP/0/128M 93.615MB/s 93.497MB/s 93.573MB/s
BM_SendmsgTCP/0/256M 93.241MB/s 95.100MB/s 93.272MB/s
BM_SendmsgTCP/1/1k 303.644MB/s 316.074MB/s 308.430MB/s
BM_SendmsgTCP/1/2k 537.093MB/s 584.962MB/s 529.020MB/s
BM_SendmsgTCP/1/4k 882.362MB/s 939.087MB/s 892.285MB/s
BM_SendmsgTCP/1/8k 1.272GB/s 1.394GB/s 1.296GB/s
BM_SendmsgTCP/1/16k 1.802GB/s 2.019GB/s 1.830GB/s
BM_SendmsgTCP/1/32k 2.084GB/s 2.173GB/s 2.156GB/s
BM_SendmsgTCP/1/64k 2.515GB/s 2.463GB/s 2.473GB/s
BM_SendmsgTCP/1/128k 2.811GB/s 3.004GB/s 2.946GB/s
BM_SendmsgTCP/1/256k 3.008GB/s 3.159GB/s 3.171GB/s
BM_SendmsgTCP/1/512k 2.980GB/s 3.150GB/s 3.126GB/s
BM_SendmsgTCP/1/1M 2.165GB/s 2.233GB/s 2.163GB/s
BM_SendmsgTCP/1/2M 2.370GB/s 2.219GB/s 2.453GB/s
BM_SendmsgTCP/1/4M 2.005GB/s 2.091GB/s 2.214GB/s
BM_SendmsgTCP/1/8M 2.111GB/s 2.013GB/s 2.109GB/s
BM_SendmsgTCP/1/16M 1.902GB/s 1.868GB/s 1.897GB/s
BM_SendmsgTCP/1/32M 1.655GB/s 1.665GB/s 1.635GB/s
BM_SendmsgTCP/1/64M 1.575GB/s 1.547GB/s 1.575GB/s
BM_SendmsgTCP/1/128M 1.524GB/s 1.584GB/s 1.580GB/s
BM_SendmsgTCP/1/256M 1.579GB/s 1.607GB/s 1.593GB/s
PiperOrigin-RevId: 278940079
|
|
When VectorisedViews were passed up the stack from packet_dispatchers, we were
passing a sub-slice of the dispatcher's views fields. The dispatchers then
immediately set those views to nil.
This wasn't caught before because every implementer copied the data in these
views before returning.
PiperOrigin-RevId: 277615351
|
|
It is required to guarantee the same order of endpoints after save/restore.
PiperOrigin-RevId: 277598665
|
|
Updates #837
PiperOrigin-RevId: 277325162
|
|
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.
The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.
Nginx test results:
Server Software: nginx/1.15.9
Server Hostname: 10.138.0.2
Server Port: 8080
Document Path: /10m.txt
Document Length: 10485760 bytes
w/o gso:
Concurrency Level: 5
Time taken for tests: 5.491 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 18.21 [#/sec] (mean)
Time per request: 274.525 [ms] (mean)
Time per request: 54.905 [ms] (mean, across all concurrent requests)
Transfer rate: 186508.03 [Kbytes/sec] received
sw-gso:
Concurrency Level: 5
Time taken for tests: 3.852 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 25.96 [#/sec] (mean)
Time per request: 192.576 [ms] (mean)
Time per request: 38.515 [ms] (mean, across all concurrent requests)
Transfer rate: 265874.92 [Kbytes/sec] received
w/o gso:
$ ./tcp_benchmark --client --duration 15 --ideal
[SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec
software gso:
$ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso
[SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec
PiperOrigin-RevId: 276112677
|
|
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.
Binding isn't supported yet.
PiperOrigin-RevId: 275909366
|
|
PiperOrigin-RevId: 274700093
|
|
Also removes the need for protocol names.
PiperOrigin-RevId: 271186030
|
|
Previously, the only safe way to use an fdbased endpoint was to leak the FD.
This change makes it possible to safely close the FD.
This is the first step towards having stoppable stacks.
Updates #837
PiperOrigin-RevId: 270346582
|
|
PiperOrigin-RevId: 267709597
|
|
Adds support to generate Port Unreachable messages for UDP
datagrams received on a port for which there is no valid
endpoint.
Fixes #703
PiperOrigin-RevId: 267034418
|
|
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.
PiperOrigin-RevId: 259807693
|
|
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.
PiperOrigin-RevId: 257903078
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
|
|
Having raw socket code together will make it easier to add support for other raw
network protocols. Currently, only ICMP uses the raw endpoint. However, adding
support for other protocols such as UDP shouldn't be much more difficult than
adding a few switch cases.
PiperOrigin-RevId: 241564875
Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
|
|
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.
Here are numbers for the nginx-1m test:
runsc: 579330.01 [Kbytes/sec] received
runsc-gso: 1794121.66 [Kbytes/sec] received
runc: 2122139.06 [Kbytes/sec] received
and for tcp_benchmark:
$ tcp_benchmark --duration 15 --ideal
[ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal
[ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal --gso 65536
[ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec
PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
|
|
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default
route are looped back. In order to implement this switch, support for sending
and looping back multicast packets on the default route had to be implemented.
For now we only support IPv4 multicast.
PiperOrigin-RevId: 237534603
Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
|
|
Broadly, this change:
* Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`.
* Passes the network-layer (IP) header up the stack to the transport endpoint,
which can pass it up to the socket layer. This allows a raw socket to return
the entire IP packet to users.
* Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer
that enable incoming packets to be delivered to raw endpoints. New raw sockets
of other protocols (not ICMP) just need to register with the stack.
* Enables ping.endpoint to return IP headers when created via SOCK_RAW.
PiperOrigin-RevId: 235993280
Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
|
|
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.
PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
|
|
PiperOrigin-RevId: 232937200
Change-Id: I5c3709cc8f1313313ff618a45e48c14a3a111cb4
|
|
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).
Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.
Note that every callsite passes arguments in the order (src, dst).
PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
|
|
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
|
|
Previously, if address resolution for UDP or Ping sockets required sending
packets using Write in Transport layer, Resolve would return ErrWouldBlock
and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution
would run in background. Further calls to Write using same address would also
return ErrNoLinkAddress until resolution has been completed successfully.
Since Write is not allowed to block and System Calls need to be
interruptible in System Call layer, the caller to Write is responsible for
blocking upon return of ErrWouldBlock.
Now, when startAddressResolution is called a notification channel for
the completion of the address resolution is returned.
The channel will traverse up to the calling function of Write as well as
ErrNoLinkAddress. Once address resolution is complete (success or not) the
channel is closed. The caller would call Write again to send packets and
check if address resolution was compeleted successfully or not.
Fixes google/gvisor#5
Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93
PiperOrigin-RevId: 214962200
|
|
This allows a NetworkDispatcher to implement transparent bridging,
assuming all implementations of LinkEndpoint.WritePacket call eth.Encode
with header.EthernetFields.SrcAddr set to the passed
Route.LocalLinkAddress, if it is provided.
PiperOrigin-RevId: 213686651
Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
|
|
PiperOrigin-RevId: 213053370
Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562
|