summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/transport
AgeCommit message (Collapse)Author
2020-07-31Merge release-20200622.1-300-ga7d9aa6d5 (automated)gVisor bot
2020-07-31iptables: support SO_ORIGINAL_DSTKevin Krakauer
Envoy (#170) uses this to get the original destination of redirected packets.
2020-07-30Fix TCP CurrentConnected counter updates.Mithun Iyer
CurrentConnected counter is incorrectly decremented on close of an endpoint which is still not connected. Fixes #3443 PiperOrigin-RevId: 324155171
2020-07-30Revert change to default buffer size.Bhasker Hariharan
In https://github.com/google/gvisor/commit/ca6bded95dbce07f9683904b4b768dfc2d4a09b2 we reduced the default buffer size to 32KB. This mostly works fine except at high throughput where we hit zero window very quickly and the TCP receive buffer moderation is not able to grow the window. This can be seen in the benchmarks where with a 32KB buffer and 100 connections downloading a 10MB file we get about 30 requests/s vs the 1MB buffer gives us about 53 requests/s. A proper fix requires a few changes to when we send a zero window as well as when we decide to send a zero window update. Today we consider available space below 1MSS as zero and send an update when it crosses 1MSS of available space. This is way too low and results in the window staying very small once we hit a zero window condition as we keep sending updates with size barely over 1MSS. Linux and BSD are smarter about this and use different thresholds. We should separately update our logic to match linux or BSD so that we don't send window updates that are really tiny or wait until we drop below 1MSS to advertise a zero window. PiperOrigin-RevId: 324087019
2020-07-30Use brodcast MAC for broadcast IPv4 packetsGhanan Gowripalan
When sending packets to a known network's broadcast address, use the broadcast MAC address. Test: - stack_test.TestOutgoingSubnetBroadcast - udp_test.TestOutgoingSubnetBroadcast PiperOrigin-RevId: 324062407
2020-07-28Redirect TODO to GitHub issuesFabricio Voznika
PiperOrigin-RevId: 323715260
2020-07-27Merge release-20200622.1-239-gca6bded95 (automated)gVisor bot
2020-07-27Fix memory accounting in TCP pending segment queue.Bhasker Hariharan
TCP now tracks the overhead of the segment structure itself in it's out-of-order queue (pending). This is required to ensure that a malicious sender sending 1 byte out-of-order segments cannot queue like 1000's of segments which bloat up memory usage. We also reduce the default receive window to 32KB. With TCP moderation there is no need to keep this window at 1MB which means that for new connections the default out-of-order queue will be small unless the application actually reads the data that is being sent. This prevents a sender from just maliciously filling up pending buf with lots of tiny out-of-order segments. PiperOrigin-RevId: 323450913
2020-07-24Merge release-20200622.1-208-g82a5cada5 (automated)gVisor bot
2020-07-23Add AfterFunc to tcpip.ClockSam Balana
Changes the API of tcpip.Clock to also provide a method for scheduling and rescheduling work after a specified duration. This change also implements the AfterFunc method for existing implementations of tcpip.Clock. This is the groundwork required to mock time within tests. All references to CancellableTimer has been replaced with the tcpip.Job interface, allowing for custom implementations of scheduling work. This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or use tcpip.CancellableTimer. Migration plan: 1. Add AfterFunc(d, f) to tcpip.Clock 2. Replace references of tcpip.CancellableTimer with tcpip.Job 3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel 4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule 5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob. PiperOrigin-RevId: 322906897
2020-07-23Merge release-20200622.1-198-gfc26b3764 (automated)gVisor bot
2020-07-23Merge pull request #3207 from kevinGC:icmp-connectgVisor bot
PiperOrigin-RevId: 322853192
2020-07-23Merge release-20200622.1-196-g20b556e62 (automated)gVisor bot
2020-07-23Fix wildcard bind for raw socket.Bhasker Hariharan
Fixes #3334 PiperOrigin-RevId: 322846384
2020-07-22make connect(2) fail when dest is unreachableKevin Krakauer
Previously, ICMP destination unreachable datagrams were ignored by TCP endpoints. This caused connect to hang when an intermediate router couldn't find a route to the host. This manifested as a Kokoro error when Docker IPv6 was enabled. The Ruby image test would try to install the sinatra gem and hang indefinitely attempting to use an IPv6 address. Fixes #3079.
2020-07-22Merge release-20200622.1-184-g71bf90c55 (automated)gVisor bot
2020-07-22Support for receiving outbound packets in AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 322665518
2020-07-17Merge release-20200622.1-173-gdcf6ddc27 (automated)gVisor bot
2020-07-16Add support to return protocol in recvmsg for AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 321690756
2020-07-15Merge release-20200622.1-163-g857d03f25 (automated)gVisor bot
2020-07-15Add support for SO_ERROR to packet sockets.Bhasker Hariharan
Packet sockets also seem to allow double binding and do not return an error on linux. This was tested by running the syscall test in a linux namespace as root and the current test DoubleBind fails@HEAD. Passes after this change. Updates #173 PiperOrigin-RevId: 321445137
2020-07-13Merge release-20200622.1-97-g43c209f48 (automated)gVisor bot
2020-07-13garbage collect connectionsKevin Krakauer
As in Linux, we must periodically clean up unused connections. PiperOrigin-RevId: 321003353
2020-07-11Merge release-20200622.1-90-g216dcebc0 (automated)gVisor bot
2020-07-11Stub out SO_DETACH_FILTER.Bhasker Hariharan
Updates #2746 PiperOrigin-RevId: 320757963
2020-07-10Merge release-20200622.1-89-g5df3a8fed (automated)gVisor bot
2020-07-09Discard multicast UDP source address.gVisor bot
RFC-1122 (and others) specify that UDP should not receive datagrams that have a source address that is a multicast address. Packets should never be received FROM a multicast address. See also, RFC 768: 'User Datagram Protocol' J. Postel, ISI, 28 August 1980 A UDP datagram received with an invalid IP source address (e.g., a broadcast or multicast address) must be discarded by UDP or by the IP layer (see rfc 1122 Section 3.2.1.3). This CL does not address TCP or broadcast which is more complicated. Also adds a test for both ipv6 and ipv4 UDP. Fixes #3154 PiperOrigin-RevId: 320547674
2020-07-09Merge release-20200622.1-88-g5946f1118 (automated)gVisor bot
2020-07-09Add support for IP_HDRINCL IP option for raw sockets.Bhasker Hariharan
Updates #2746 Fixes #3158 PiperOrigin-RevId: 320497190
2020-07-08Avoid accidental zero-checksumTamir Duberstein
PiperOrigin-RevId: 320250773
2020-07-07Merge release-20200622.1-76-g76c7bc51b (automated)gVisor bot
2020-07-07Set IPv4 ID on all non-atomic datagramsTony Gong
RFC 6864 imposes various restrictions on the uniqueness of the IPv4 Identification field for non-atomic datagrams, defined as an IP datagram that either can be fragmented (DF=0) or is already a fragment (MF=1 or positive fragment offset). In order to be compliant, the ID field is assigned for all non-atomic datagrams. Add a TCP unit test that induces retransmissions and checks that the IPv4 ID field is unique every time. Add basic handling of the IP_MTU_DISCOVER socket option so that the option can be used to disable PMTU discovery, effectively setting DF=0. Attempting to set the sockopt to anything other than disabled will fail because PMTU discovery is currently not implemented, and the default behavior matches that of disabled. PiperOrigin-RevId: 320081842
2020-07-07Merge release-20200622.1-75-g7e4d2d63e (automated)gVisor bot
2020-07-07icmp: When setting TransportHeader, remove from the Data portion.Ting-Yu Wang
The current convention is when a header is set to pkt.XxxHeader field, it gets removed from pkt.Data. ICMP does not currently follow this convention. PiperOrigin-RevId: 320078606
2020-07-07Merge release-20200622.1-69-gb0f656184 (automated)gVisor bot
2020-07-06Add support for SO_RCVBUF/SO_SNDBUF for AF_PACKET sockets.Bhasker Hariharan
Updates #2746 PiperOrigin-RevId: 319887810
2020-07-06Shard some slow tests.Ting-Yu Wang
stack_x_test: 2m -> 20s tcp_x_test: 80s -> 25s PiperOrigin-RevId: 319828101
2020-07-06Merge release-20200622.1-63-g043e5dddd (automated)gVisor bot
2020-07-06Remove dependency on pkg/binaryTamir Duberstein
PiperOrigin-RevId: 319770124
2020-07-05Merge release-20200622.1-62-g0c1353866 (automated)gVisor bot
2020-07-05Add wakers synchronouslyTamir Duberstein
Avoid a race where an arbitrary goroutine scheduling delay can cause the processor to miss events and hang indefinitely. Reduce allocations by storing processors by-value in the dispatcher, and by using a single WaitGroup rather than one per processor. PiperOrigin-RevId: 319665861
2020-07-01Merge release-20200622.1-54-g31b27adf9 (automated)gVisor bot
2020-07-01TCP receive should block when in SYN-SENT state.Mithun Iyer
The application can choose to initiate a non-blocking connect and later block on a read, when the endpoint is still in SYN-SENT state. PiperOrigin-RevId: 319311016
2020-07-01Merge release-20200622.1-47-gc9446f053 (automated)gVisor bot
2020-06-30Fix two bugs in TCP sender.Bhasker Hariharan
a) When GSO is in use we should not cap the segment to maxPayloadSize in sender.maybeSendSegment as the GSO logic will cap the segment to the correct size. Without this the host GSO is not used as we end up breaking up large segments into small MSS sized segments before writing the packets to the host. b) The check to not split a segment due to it not fitting in the receiver window when there are pending segments is incorrect as segments in writeList can be really large as we just take the write call's buffer size and create a single large segment. So a write of say 128KB will just be 1 segment in the writeList. The linux code checks if 1 MSS sized segments fits in the receiver's window and if not then does not split the current segment. gVisor's check was incorrect that it was checking if the whole segment which could be >>> 1 MSS would fit in the receiver's window. This was causing us to prematurely stop sending and falling back to retransmit timer/probe from the other end to send data. This was seen when running HTTPD benchmarks where @ HEAD when sending large files the benchmark was taking forever to run. The tcp_splitseg_mss_test.go is being deleted as the test as written doesn't test what is intended correctly. This is because GSO is enabled by default and the reason the MSS+1 sized segment is sent is because GSO is in use. A proper test will require disabling GSO on linux and netstack which is going to take a bit of work in packetimpact to do it correctly. Separately a new test probably should be written that verifies that a segment > availableWindow is not split if the availableWindow is < 1 MSS. Fixes #3107 PiperOrigin-RevId: 319172089
2020-06-30Merge release-20200622.1-42-g4784ed46e (automated)gVisor bot
2020-06-30Avoid multiple atomic loadsTamir Duberstein
...by calling (*tcp.endpoint).EndpointState only once when possible. Avoid wrapping (*sleep.Waker).Assert in a useless func while I'm here. PiperOrigin-RevId: 319074149
2020-06-27Merge release-20200622.1-34-g66d166544 (automated)gVisor bot
2020-06-26IPv6 raw sockets. Needed for ip6tables.Kevin Krakauer
IPv6 raw sockets never include the IPv6 header. PiperOrigin-RevId: 318582989
2020-06-27Merge release-20200622.1-33-g8dbeac53c (automated)gVisor bot