summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/link/channel
AgeCommit message (Collapse)Author
2021-09-09Remove linux-compat loopback hacks from packet endpointGhanan Gowripalan
Previously, gVisor did not represent loopback devices as an ethernet device as Linux does. To maintain Linux API compatibility for packet sockets, a workaround was used to add an ethernet header if a link header was not already present in the packet buffer delivered to a packet endpoint. However, this workaround is a bug for non-ethernet based interfaces; not all links use an ethernet header (e.g. pure L3/TUN interfaces). As of 3b4bb947517d0d9010120aaa1c3989fd6abf278e, gVisor represents loopback devices as an ethernet-based device so this workaround can now be removed. BUG: https://fxbug.dev/81592 Updates #6530, #6531. PiperOrigin-RevId: 395819151
2021-08-27Add LinkEndpoint.WriteRawPacket with stubsGhanan Gowripalan
...returning unsupported errors. PiperOrigin-RevId: 393388991
2021-05-22Remove detritusTamir Duberstein
- Unused constants - Unused functions - Unused arguments - Unkeyed literals - Unnecessary conversions PiperOrigin-RevId: 375253464
2021-05-03Convey GSO capabilities through GSOEndpointGhanan Gowripalan
...as all GSO capable endpoints must implement GSOEndpoint. PiperOrigin-RevId: 371804175
2021-04-21Only carry GSO options in the packet bufferGhanan Gowripalan
With this change, GSO options no longer needs to be passed around as a function argument in the write path. This change is done in preparation for a later change that defers segmentation, and may change GSO options for a packet as it flows down the stack. Updates #170. PiperOrigin-RevId: 369774872
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-15Only pass stack.Route's fields to LinkEndpointsGhanan Gowripalan
stack.Route is used to send network packets and resolve link addresses. A LinkEndpoint does not need to do either of these and only needs the route's fields at the time of the packet write request. Since LinkEndpoints only need the route's fields when writing packets, pass a stack.RouteInfo instead. PiperOrigin-RevId: 352108405
2020-12-22Invoke address resolution upon subsequent traffic to Failed neighborPeter Johnston
Removes the period of time in which subseqeuent traffic to a Failed neighbor immediately fails with ErrNoLinkAddress. A Failed neighbor is one in which address resolution fails; or in other words, the neighbor's IP address cannot be translated to a MAC address. This means removing the Failed state for linkAddrCache and allowing transitiong out of Failed into Incomplete for neighborCache. Previously, both caches would transition entries to Failed after address resolution fails. In this state, any subsequent traffic requested within an unreachable time would immediately fail with ErrNoLinkAddress. This does not follow RFC 4861 section 7.3.3: If address resolution fails, the entry SHOULD be deleted, so that subsequent traffic to that neighbor invokes the next-hop determination procedure again. Invoking next-hop determination at this point ensures that alternate default routers are tried. The API for getting a link address for a given address, whether through the link address cache or the neighbor table, is updated to optionally take a callback which will be called when address resolution completes. This allows `Route` to handle completing link resolution internally, so callers of (*Route).Resolve (e.g. endpoints) don’t have to keep track of when it completes and update the Route accordingly. This change also removes the wakers from LinkAddressCache, NeighborCache, and Route in favor of the callbacks, and callers that previously used a waker can now just pass a callback to (*Route).Resolve that will notify the waker on resolution completion. Fixes #4796 Startblock: has LGTM from sbalana and then add reviewer ghanan PiperOrigin-RevId: 348597478
2020-11-25Make stack.Route safe to access concurrentlyGhanan Gowripalan
Multiple goroutines may use the same stack.Route concurrently so the stack.Route should make sure that any functions called on it are thread-safe. Fixes #4073 PiperOrigin-RevId: 344320491
2020-11-18Introduce stack.WritePacketToRemote, remove LinkEndpoint.WriteRawPacketBruno Dal Bo
Redefine stack.WritePacket into stack.WritePacketToRemote which lets the NIC decide whether to append link headers. PiperOrigin-RevId: 343071742
2020-08-13Migrate to PacketHeader API for PacketBuffer.Ting-Yu Wang
Formerly, when a packet is constructed or parsed, all headers are set by the client code. This almost always involved prepending to pk.Header buffer or trimming pk.Data portion. This is known to prone to bugs, due to the complexity and number of the invariants assumed across netstack to maintain. In the new PacketHeader API, client will call Push()/Consume() method to construct/parse an outgoing/incoming packet. All invariants, such as slicing and trimming, are maintained by the API itself. NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no longer valid. PacketBuffer now assumes the packet is a concatenation of following portions: * LinkHeader * NetworkHeader * TransportHeader * Data Any of them could be empty, or zero-length. PiperOrigin-RevId: 326507688
2020-07-22Support for receiving outbound packets in AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 322665518
2020-07-15Fix minor bugs in a couple of interface IOCTLs.Bhasker Hariharan
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks tcpdump as it tries to interpret the packets incorrectly. Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which fails with an EINVAL since we don't implement it. For now change it to return EOPNOTSUPP to indicate that we don't support the query rather than return EINVAL. NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type field while NIC capabilities are more like the device features which can be queried using SIOCETHTOOL but not modified and NIC Flags are fields that can be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc. Updates #2746 PiperOrigin-RevId: 321436525
2020-06-03Pass PacketBuffer as pointer.Ting-Yu Wang
Historically we've been passing PacketBuffer by shallow copying through out the stack. Right now, this is only correct as the caller would not use PacketBuffer after passing into the next layer in netstack. With new buffer management effort in gVisor/netstack, PacketBuffer will own a Buffer (to be added). Internally, both PacketBuffer and Buffer may have pointers and shallow copying shouldn't be used. Updates #2404. PiperOrigin-RevId: 314610879
2020-05-27Remove linkEP from DeliverNetworkPacketSam Balana
The specified LinkEndpoint is not being used in a significant way. No behavior change, existing tests pass. This change is a breaking change. PiperOrigin-RevId: 313496602
2020-04-14Reduce flakiness in tcp_test.Bhasker Hariharan
Tests now use a MinRTO of 3s instead of default 200ms. This reduced flakiness in a lot of the congestion control/recovery tests which were flaky due to retransmit timer firing too early in case the test executors were overloaded. This change also bumps some of the timeouts in tests which were too sensitive to timer variations and reduces the number of slow start iterations which can make the tests run for too long and also trigger retansmit timeouts etc if the executor is overloaded. PiperOrigin-RevId: 306562645
2020-04-03Refactor software GSO code.Bhasker Hariharan
Software GSO implementation currently has a complicated code path with implicit assumptions that all packets to WritePackets carry same Data and it does this to avoid allocations on the path etc. But this makes it hard to reuse the WritePackets API. This change breaks all such assumptions by introducing a new Vectorised View API ReadToVV which can be used to cleanly split a VV into multiple independent VVs. Further this change also makes packet buffers linkable to form an intrusive list. This allows us to get rid of the array of packet buffers that are passed in the WritePackets API call and replace it with a list of packet buffers. While this code does introduce some more allocations in the benchmarks it doesn't cause any degradation. Updates #231 PiperOrigin-RevId: 304731742
2020-03-24Move tcpip.PacketBuffer and IPTables to stack package.Bhasker Hariharan
This is a precursor to be being able to build an intrusive list of PacketBuffers for use in queuing disciplines being implemented. Updates #2214 PiperOrigin-RevId: 302677662
2020-02-21Implement tap/tun device in vfs.Ting-Yu Wang
PiperOrigin-RevId: 296526279
2020-01-31Use multicast Ethernet address for multicast NDPGhanan Gowripalan
As per RFC 2464 section 7, an IPv6 packet with a multicast destination address is transmitted to the mapped Ethernet multicast address. Test: - ipv6.TestLinkResolution - stack_test.TestDADResolve - stack_test.TestRouterSolicitation PiperOrigin-RevId: 292610529
2020-01-27Refactor to hide C from channel.Endpoint.Ting-Yu Wang
This is to aid later implementation for /dev/net/tun device. PiperOrigin-RevId: 291746025
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021
2019-11-23Cleanup visibility.Adin Scannell
PiperOrigin-RevId: 282194656
2019-11-22Use PacketBuffers with GSO.Kevin Krakauer
PiperOrigin-RevId: 282045221
2019-11-14Use PacketBuffers for outgoing packets.Kevin Krakauer
PiperOrigin-RevId: 280455453
2019-11-06Use PacketBuffers, rather than VectorisedViews, in netstack.Kevin Krakauer
PacketBuffers are analogous to Linux's sk_buff. They hold all information about a packet, headers, and payload. This is important for: * iptables to access various headers of packets * Preventing the clutter of passing different net and link headers along with VectorisedViews to packet handling functions. This change only affects the incoming packet path, and a future change will change the outgoing path. Benchmark Regular PacketBufferPtr PacketBufferConcrete -------------------------------------------------------------------------------- BM_Recvmsg 400.715MB/s 373.676MB/s 396.276MB/s BM_Sendmsg 361.832MB/s 333.003MB/s 335.571MB/s BM_Recvfrom 453.336MB/s 393.321MB/s 381.650MB/s BM_Sendto 378.052MB/s 372.134MB/s 341.342MB/s BM_SendmsgTCP/0/1k 353.711MB/s 316.216MB/s 322.747MB/s BM_SendmsgTCP/0/2k 600.681MB/s 588.776MB/s 565.050MB/s BM_SendmsgTCP/0/4k 995.301MB/s 888.808MB/s 941.888MB/s BM_SendmsgTCP/0/8k 1.517GB/s 1.274GB/s 1.345GB/s BM_SendmsgTCP/0/16k 1.872GB/s 1.586GB/s 1.698GB/s BM_SendmsgTCP/0/32k 1.017GB/s 1.020GB/s 1.133GB/s BM_SendmsgTCP/0/64k 475.626MB/s 584.587MB/s 627.027MB/s BM_SendmsgTCP/0/128k 416.371MB/s 503.434MB/s 409.850MB/s BM_SendmsgTCP/0/256k 323.449MB/s 449.599MB/s 388.852MB/s BM_SendmsgTCP/0/512k 243.992MB/s 267.676MB/s 314.474MB/s BM_SendmsgTCP/0/1M 95.138MB/s 95.874MB/s 95.417MB/s BM_SendmsgTCP/0/2M 96.261MB/s 94.977MB/s 96.005MB/s BM_SendmsgTCP/0/4M 96.512MB/s 95.978MB/s 95.370MB/s BM_SendmsgTCP/0/8M 95.603MB/s 95.541MB/s 94.935MB/s BM_SendmsgTCP/0/16M 94.598MB/s 94.696MB/s 94.521MB/s BM_SendmsgTCP/0/32M 94.006MB/s 94.671MB/s 94.768MB/s BM_SendmsgTCP/0/64M 94.133MB/s 94.333MB/s 94.746MB/s BM_SendmsgTCP/0/128M 93.615MB/s 93.497MB/s 93.573MB/s BM_SendmsgTCP/0/256M 93.241MB/s 95.100MB/s 93.272MB/s BM_SendmsgTCP/1/1k 303.644MB/s 316.074MB/s 308.430MB/s BM_SendmsgTCP/1/2k 537.093MB/s 584.962MB/s 529.020MB/s BM_SendmsgTCP/1/4k 882.362MB/s 939.087MB/s 892.285MB/s BM_SendmsgTCP/1/8k 1.272GB/s 1.394GB/s 1.296GB/s BM_SendmsgTCP/1/16k 1.802GB/s 2.019GB/s 1.830GB/s BM_SendmsgTCP/1/32k 2.084GB/s 2.173GB/s 2.156GB/s BM_SendmsgTCP/1/64k 2.515GB/s 2.463GB/s 2.473GB/s BM_SendmsgTCP/1/128k 2.811GB/s 3.004GB/s 2.946GB/s BM_SendmsgTCP/1/256k 3.008GB/s 3.159GB/s 3.171GB/s BM_SendmsgTCP/1/512k 2.980GB/s 3.150GB/s 3.126GB/s BM_SendmsgTCP/1/1M 2.165GB/s 2.233GB/s 2.163GB/s BM_SendmsgTCP/1/2M 2.370GB/s 2.219GB/s 2.453GB/s BM_SendmsgTCP/1/4M 2.005GB/s 2.091GB/s 2.214GB/s BM_SendmsgTCP/1/8M 2.111GB/s 2.013GB/s 2.109GB/s BM_SendmsgTCP/1/16M 1.902GB/s 1.868GB/s 1.897GB/s BM_SendmsgTCP/1/32M 1.655GB/s 1.665GB/s 1.635GB/s BM_SendmsgTCP/1/64M 1.575GB/s 1.547GB/s 1.575GB/s BM_SendmsgTCP/1/128M 1.524GB/s 1.584GB/s 1.580GB/s BM_SendmsgTCP/1/256M 1.579GB/s 1.607GB/s 1.593GB/s PiperOrigin-RevId: 278940079
2019-10-22netstack/tcp: software segmentation offloadAndrei Vagin
Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-21AF_PACKET support for netstack (aka epsocket).Kevin Krakauer
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366
2019-09-20Allow waiting for LinkEndpoint worker goroutines to finish.Ian Gudger
Previously, the only safe way to use an fdbased endpoint was to leak the FD. This change makes it possible to safely close the FD. This is the first step towards having stoppable stacks. Updates #837 PiperOrigin-RevId: 270346582
2019-09-06Remove reundant global tcpip.LinkEndpointID.Ian Gudger
PiperOrigin-RevId: 267709597
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-18netstack: use a proper network protocol to set gso.L3HdrLenAndrei Vagin
It is possible to create a listening socket which will accept IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber for all accepted endpoints, even if they handle IPv4 connections. This means that we can't use endpoint.netProto to set gso.L3HdrLen. PiperOrigin-RevId: 244227948 Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
2019-03-28netstack/fdbased: add generic segmentation offload (GSO) supportAndrei Vagin
The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 240809103 Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-01-31Remove license commentsMichael Pratt
Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2018-11-14Rename incorrectly named (dst, src) arguments in DeliverNetworkPacket prototypeBert Muthalaly
...to (remote, local), reflecting the (correct) names in the implementation of DeliverNetworkPacket (see tcpip/stack/nic.go). Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering; since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in the parameter names. Note that every callsite passes arguments in the order (src, dst). PiperOrigin-RevId: 221514396 Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
2018-10-23Track paths and provide a rename hook.Adin Scannell
This change also adds extensive testing to the p9 package via mocks. The sanity checks and type checks are moved from the gofer into the core package, where they can be more easily validated. PiperOrigin-RevId: 218296768 Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-19Use correct company name in copyright headerIan Gudger
PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-09-19Pass local link address to DeliverNetworkPacketBert Muthalaly
This allows a NetworkDispatcher to implement transparent bridging, assuming all implementations of LinkEndpoint.WritePacket call eth.Encode with header.EthernetFields.SrcAddr set to the passed Route.LocalLinkAddress, if it is provided. PiperOrigin-RevId: 213686651 Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
2018-09-14Pass buffer.Prependable by valueTamir Duberstein
PiperOrigin-RevId: 213053370 Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562
2018-09-12Always pass buffer.VectorisedView by valueTamir Duberstein
PiperOrigin-RevId: 212757571 Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
2018-09-05Update {LinkEndpoint,NetworkEndpoint}#WritePacket to take a VectorisedViewBert Muthalaly
Makes it possible to avoid copying or allocating in cases where DeliverNetworkPacket (rx) needs to turn around and call WritePacket (tx) with its VectorisedView. Also removes the restriction on having VectorisedViews with multiple views in the write path. PiperOrigin-RevId: 211728717 Change-Id: Ie03a65ecb4e28bd15ebdb9c69f05eced18fdfcff
2018-09-04Automated rollback of changelist 211156845Bhasker Hariharan
PiperOrigin-RevId: 211525182 Change-Id: I462c20328955c77ecc7bfd8ee803ac91f15858e6
2018-08-31Automated rollback of changelist 211103930Googler
PiperOrigin-RevId: 211156845 Change-Id: Ie28011d7eb5f45f3a0158dbee2a68c5edf22f6e0
2018-08-31ipv6: ICMP supportTamir Duberstein
This CL does NDP link-address discovery for IPv6. It includes several small changes necessary to get linux to talk to this implementation. In particular, a hop limit of 255 is necessary for ICMPv6. PiperOrigin-RevId: 211103930 Change-Id: If25370ab84c6b1decfb15de917f3b0020f2c4e0e
2018-07-27stateify: support explicit annotation mode; convert refs and stack packages.Zhaozhong Ni
We have been unnecessarily creating too many savable types implicitly. PiperOrigin-RevId: 206334201 Change-Id: Idc5a3a14bfb7ee125c4f2bb2b1c53164e46f29a8
2018-07-09Switch netstack licenses to Apache 2.0.Nicolas Lacasse
Fixes #27 PiperOrigin-RevId: 203825288 Change-Id: Ie9f3a2b2c1e296b026b024f75c07da1a7e118633
2018-05-22sentry: Add simple SIOCGIFFLAGS support (IFF_RUNNING and IFF_PROMIS).Kevin Krakauer
Establishes a way of communicating interface flags between netstack and epsocket. More flags can be added over time. PiperOrigin-RevId: 197616669 Change-Id: I230448c5fb5b7d2e8d69b41a451eb4e1096a0e30
2018-04-28Check in gVisor.Googler
PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463