summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/stack
AgeCommit message (Collapse)Author
2019-09-20Allow waiting for LinkEndpoint worker goroutines to finish.Ian Gudger
Previously, the only safe way to use an fdbased endpoint was to leak the FD. This change makes it possible to safely close the FD. This is the first step towards having stoppable stacks. Updates #837 PiperOrigin-RevId: 270346582
2019-09-17Automated rollback of changelist 268047073Ghanan Gowripalan
PiperOrigin-RevId: 269658971
2019-09-12Implement splice methods for pipes and sockets.Adin Scannell
This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207
2019-09-12Remove go_test from go_stateify and go_marshalMichael Pratt
They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-09-12Automated rollback of changelist 268047073Ghanan Gowripalan
PiperOrigin-RevId: 268757842
2019-09-09Join IPv6 all-nodes and solicited-node multicast addresses where appropriate.Ghanan Gowripalan
The IPv6 all-nodes multicast address will be joined on NIC enable, and the appropriate IPv6 solicited-node multicast address will be joined when IPv6 addresses are added. Tests: Test receiving packets destined to the IPv6 link-local all-nodes multicast address and the IPv6 solicted node address of an added IPv6 address. PiperOrigin-RevId: 268047073
2019-09-06Remove reundant global tcpip.LinkEndpointID.Ian Gudger
PiperOrigin-RevId: 267709597
2019-09-04Handle subnet and broadcast addresses correctly with NIC.subnetsChris Kuiper
This also renames "subnet" to "addressRange" to avoid any more confusion with an interface IP's subnet. Lastly, this also removes the Stack.ContainsSubnet(..) API since it isn't used by anyone. Plus the same information can be obtained from Stack.NICAddressRanges(). PiperOrigin-RevId: 267229843
2019-09-03Make UDP traceroute work.Bhasker Hariharan
Adds support to generate Port Unreachable messages for UDP datagrams received on a port for which there is no valid endpoint. Fixes #703 PiperOrigin-RevId: 267034418
2019-08-30Fix data race accessing referencedNetworkEndpoint.kindChris Kuiper
Wrapping "kind" into atomic access functions. Fixes #789 PiperOrigin-RevId: 266485501
2019-08-28Export generated linkAddrEntryEntryTamir Duberstein
PiperOrigin-RevId: 266000128
2019-08-27Populate link address cache at dispatchTamir Duberstein
This allows the stack to learn remote link addresses on incoming packets, reducing the need to ARP to send responses. This also reduces the number of round trips to the system clock, since that may also prove to be performance-sensitive. Fixes #739. PiperOrigin-RevId: 265815816
2019-08-26Prevent a network endpoint to send/rcv if its address was removedChris Kuiper
This addresses the problem where an endpoint has its address removed but still has outstanding references held by routes used in connected TCP/UDP sockets which prevent the removal of the endpoint. The fix adds a new "expired" flag to the referenced network endpoint, which is set when an endpoint has its address removed. Incoming packets are not delivered to an expired endpoint (unless in promiscuous mode), while sending outgoing packets triggers an error to the caller (unless in spoofing mode). In addition, a few helper functions were added to stack_test.go to reduce code duplications. PiperOrigin-RevId: 265514326
2019-08-21Use tcpip.Subnet in tcpip.RouteTamir Duberstein
This is the first step in replacing some of the redundant types with the standard library equivalents. PiperOrigin-RevId: 264706552
2019-08-16netstack: disconnect an unix socket only if the address family is AF_UNSPECAndrei Vagin
Linux allows to call connect for ANY and the zero port. PiperOrigin-RevId: 263892534
2019-08-16Add subnet checking to NIC.findEndpoint and consolidate with NIC.getRefChris Kuiper
This adds the same logic to NIC.findEndpoint that is already done in NIC.getRef. Since this makes the two functions very similar they were combined into one with the originals being wrappers. PiperOrigin-RevId: 263864708
2019-08-14Replace uinptr with int64 when returning lengthsTamir Duberstein
This is in accordance with newer parts of the standard library. PiperOrigin-RevId: 263449916
2019-08-08netstack: Don't start endpoint goroutines too soon on restore.Rahat Mahmood
Endpoint protocol goroutines were previously started as part of loading the endpoint. This is potentially too soon, as resources used by these goroutine may not have been loaded. Protocol goroutines may perform meaningful work as soon as they're started (ex: incoming connect) which can cause them to indirectly access resources that haven't been loaded yet. This CL defers resuming all protocol goroutines until the end of restore. PiperOrigin-RevId: 262409429
2019-08-02Plumbing for iptables sockopts.Kevin Krakauer
PiperOrigin-RevId: 261413396
2019-08-02Automated rollback of changelist 261191548Rahat Mahmood
PiperOrigin-RevId: 261373749
2019-08-01Implement getsockopt(TCP_INFO).Rahat Mahmood
Export some readily-available fields for TCP_INFO and stub out the rest. PiperOrigin-RevId: 261191548
2019-07-30Pass ProtocolAddress instead of its fieldsTamir Duberstein
PiperOrigin-RevId: 260803517
2019-07-24Add support for a subnet prefix length on interface network addressesChris Kuiper
This allows the user code to add a network address with a subnet prefix length. The prefix length value is stored in the network endpoint and provided back to the user in the ProtocolAddress type. PiperOrigin-RevId: 259807693
2019-07-18net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)Andrei Vagin
PiperOrigin-RevId: 258859507
2019-07-12Add IPPROTO_RAW, which allows raw sockets to write IP headers.Kevin Krakauer
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to manipulate the kernel's tables, but it doesn't actually use any of the functionality. Blegh. PiperOrigin-RevId: 257903078
2019-06-13Add support for TCP receive buffer auto tuning.Bhasker Hariharan
The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-06Track and export socket state.Rahat Mahmood
This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850
2019-06-05Adjust route when looping multicast packetsChris Kuiper
Multicast packets are special in that their destination address does not identify a specific interface. When sending out such a packet the multicast address is the remote address, but for incoming packets it is the local address. Hence, when looping a multicast packet, the route needs to be tweaked to reflect this. PiperOrigin-RevId: 251739298
2019-05-24Remove unused wakersTamir Duberstein
These wakers are uselessly allocated and passed around; nothing ever listens for notifications on them. The code here appears to be vestigial, so removing it and allowing a nil waker to be passed seems appropriate. PiperOrigin-RevId: 249879320 Change-Id: Icd209fb77cc0dd4e5c49d7a9f2adc32bf88b4b71
2019-05-22UDP and TCP raw socket support.Kevin Krakauer
PiperOrigin-RevId: 249511348 Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-03Implement support for SACK based recovery(RFC 6675).Bhasker Hariharan
PiperOrigin-RevId: 246536003 Change-Id: I118b745f45040be9c70cb6a1028acdb06c78d8c9
2019-05-03Proper cleanup of sockets that used REUSEPORTChris Kuiper
Fixed a small logic error that broke proper accounting of MultiPortEndpoints. PiperOrigin-RevId: 246502126 Change-Id: I1a7d6ea134f811612e545676212899a3707bc2c2
2019-05-02Support reception of multicast data on more than one socketChris Kuiper
This requires two changes: 1) Support for more than one socket to join a given multicast group. 2) Duplicate delivery of incoming multicast packets to all sockets listening for it. In addition, I tweaked the code (and added a test) to disallow duplicates IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does it. PiperOrigin-RevId: 246437315 Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29Allow and document bug ids in gVisor codebase.Nicolas Lacasse
PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-26Make raw sockets a toggleable feature disabled by default.Kevin Krakauer
PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-09Add TCP checksum verification.Bhasker Hariharan
PiperOrigin-RevId: 242704699 Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
2019-04-02Add a raw socket transport endpoint and use it for raw ICMP sockets.Kevin Krakauer
Having raw socket code together will make it easier to add support for other raw network protocols. Currently, only ICMP uses the raw endpoint. However, adding support for other protocols such as UDP shouldn't be much more difficult than adding a few switch cases. PiperOrigin-RevId: 241564875 Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-03-28netstack/fdbased: add generic segmentation offload (GSO) supportAndrei Vagin
The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 240809103 Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-03-26netstack: Don't exclude length when a pseudo-header checksum is calculatedAndrei Vagin
This is a preparation for GSO changes (cl/234508902). RELNOTES[gofers]: Refactor checksum code to include length, which it already did, but in a convoluted way. Should be a no-op. PiperOrigin-RevId: 240460794 Change-Id: I537381bc670b5a9f5d70a87aa3eb7252e8f5ace2
2019-03-19Add layer 2 stats (tx, rx) X (packets, bytes) to netstackBert Muthalaly
PiperOrigin-RevId: 239194420 Change-Id: Ie193e8ac2b7a6db21195ac85824a335930483971
2019-03-12Make HandleLocal apply to all non-loopback interfaces.Ian Gudger
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify the implementations. This has the benefit of making HandleLocal apply even when the fdbased link endpoint isn't in use. In addition, move looping logic to route creation so that it doesn't need to be run for each packet. This should improve performance. PiperOrigin-RevId: 238099480 Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
2019-03-08Implement IP_MULTICAST_LOOP.Ian Gudger
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default route are looped back. In order to implement this switch, support for sending and looping back multicast packets on the default route had to be implemented. For now we only support IPv4 multicast. PiperOrigin-RevId: 237534603 Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
2019-03-05Remove unused commit() function argument to Bind.Kevin Krakauer
PiperOrigin-RevId: 236926132 Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181
2019-02-28Map IPv{4,6} addresses to ethernet addressesTamir Duberstein
...in accordance with RFCs 1112 and 2464. Fixes IPv4 multicast when IP_MULTICAST_IF is specified. Don't return ErrNoRoute when no route is needed. Don't set Route.NextHop when no route is needed. PiperOrigin-RevId: 236199813 Change-Id: I48ed33e1b7f760deaa37e18ad7f1b8b62819ab43
2019-02-27Ping support via IPv4 raw sockets.Kevin Krakauer
Broadly, this change: * Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`. * Passes the network-layer (IP) header up the stack to the transport endpoint, which can pass it up to the socket layer. This allows a raw socket to return the entire IP packet to users. * Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer that enable incoming packets to be delivered to raw endpoints. New raw sockets of other protocols (not ICMP) just need to register with the stack. * Enables ping.endpoint to return IP headers when created via SOCK_RAW. PiperOrigin-RevId: 235993280 Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-26Adds a WriteRawPacket method to the InjectableLinkEndpoint interface.Googler
Also exposes ipv4.MaxTotalSize since it is a generally useful constant. PiperOrigin-RevId: 235799755 Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
2019-02-26Use more conservative locking in NIC.DeliverNetworkPacketAmanda Tait
An earlier CL excessively minimizes the period in which it holds a lock on NIC. This earlier CL had done this out of the mistaken impression it fixed a broken test, when in fact it just reduced the rate of failure of a flaky test in tcp_test.go. This new change holds the lock on NIC for the duration of the loop over n.endpoints. PiperOrigin-RevId: 235732487 Change-Id: I53ee6df264f093ddc4d29e9acdcba6b4838cb112
2019-02-25Fix race condition in NIC.DeliverNetworkPacketAmanda Tait
cl/234850781 introduced a race condition in NIC.DeliverNetworkPacket by failing to hold a lock. This change fixes this regressesion by acquiring a read lock before iterating through n.endpoints, and then releasing the lock once iteration is complete. PiperOrigin-RevId: 235549770 Change-Id: Ib0133288be512d478cf759c3314dc95ec3205d4b