gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-05-13	Migrate PacketBuffer to use pkg/buffer	Ting-Yu Wang
	Benchmark iperf3: Before After native->runsc 5.14 5.01 (Gbps) runsc->native 4.15 4.07 (Gbps) It did introduce overhead, mainly at the bridge between pkg/buffer and VectorisedView, the ExtractVV method. Once endpoints start migrating away from VV, this overhead will be gone. Updates #2404 PiperOrigin-RevId: 373651666
2021-05-13	Rename SetForwarding to SetForwardingDefaultAndAllNICs	Ghanan Gowripalan
	...to make it clear to callers that all interfaces are updated with the forwarding flag and that future NICs will be created with the new forwarding state. PiperOrigin-RevId: 373618435
2021-05-12	Send ICMP errors when unable to forward fragmented packets	Nick Brown
	Before this change, we would silently drop packets when the packet was too big to be sent out through the NIC (and, for IPv4 packets, if DF was set). This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when the packet can't be fragmented. PiperOrigin-RevId: 373480078
2021-05-12	Fix not calling decRef on merged segments	Ting-Yu Wang
	This code path is for outgoing packets, and we don't currently do memory accounting on this path. So it wasn't breaking anything. This change did not add a test for ref-counting issue fixed, but will switch to the leak-checking ref-counter later when all ref-counting issues are fixed. PiperOrigin-RevId: 373447913
2021-05-11	Merge pull request #5694 from kevinGC:align32	gVisor bot
	PiperOrigin-RevId: 373271579
2021-05-11	Change AcquireAssignedAddress to use RLock.	Bhasker Hariharan
	This is a hot path for all incoming packets and we don't need an exclusive lock here as we are not modifying any of the fields protected by mu here. PiperOrigin-RevId: 373181254
2021-05-11	Move multicounter testutil functions out of network/ip	Arthur Sfez
	This is in preparation of having aggregated NIC stats at the stack level. These validation functions will be needed outside of the network layer packages to test aggregated NIC stats. PiperOrigin-RevId: 373180565
2021-05-11	Process Hop-by-Hop header when forwarding IPv6 packets	Nick Brown
	Currently, we process IPv6 extension headers when receiving packets but not when forwarding them. This is fine for the most part, with with one exception: RFC 8200 requires that we process the Hop-by-Hop headers even while forwarding packets. This CL adds that support by invoking the Hop-by-hop logic performed when receiving packets during forwarding as well. PiperOrigin-RevId: 373145478
2021-05-06	fix rebase error	Kevin Krakauer

2021-05-06	Moved to atomicbitops and renamed files	Kevin Krakauer

2021-05-06	Fix alignment issue with 64-bit atomics on 32 bit machines	Kevin Krakauer

2021-05-06	Solicit routers as long as RAs are handled	Ghanan Gowripalan
	...to conform with Linux's `accept_ra` sysctl option. ``` accept_ra - INTEGER Accept Router Advertisements; autoconfigure using them. It also determines whether or not to transmit Router Solicitations. If and only if the functional setting is to accept Router Advertisements, Router Solicitations will be transmitted. Possible values are: 0 Do not accept Router Advertisements. 1 Accept Router Advertisements if forwarding is disabled. 2 Overrule forwarding behaviour. Accept Router Advertisements even if forwarding is enabled. Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. ``` With this change, routers may be solicited even if the stack is has forwarding enabled, as long as the interface is configured to handle RAs when forwarding is enabled. PiperOrigin-RevId: 372406501
2021-05-05	Allow handling RAs when forwarding is enabled	Ghanan Gowripalan
	...to conform with Linux's `accept_ra` sysctl option. ``` accept_ra - INTEGER Accept Router Advertisements; autoconfigure using them. It also determines whether or not to transmit Router Solicitations. If and only if the functional setting is to accept Router Advertisements, Router Solicitations will be transmitted. Possible values are: 0 Do not accept Router Advertisements. 1 Accept Router Advertisements if forwarding is disabled. 2 Overrule forwarding behaviour. Accept Router Advertisements even if forwarding is enabled. Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. ``` PiperOrigin-RevId: 372214640
2021-05-05	Send ICMP errors when the network is unreachable	Nick Brown
	Before this change, we would silently drop packets when unable to determine a route to the destination host. This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when unable to reach the destination. Startblock: has LGTM from asfez and then add reviewer ghanan PiperOrigin-RevId: 372214051
2021-05-05	Don't cleanup NDP state when enabling forwarding	Ghanan Gowripalan
	...to match linux behaviour: ``` $ sudo sysctl net.ipv6.conf.eno1.forwarding net.ipv6.conf.eno1.forwarding = 0 $ ip addr list dev eno1 2: eno1: <...> ... inet6 PREFIX:TEMP_IID/64 scope global temporary dynamic valid_lft 209363sec preferred_lft 64024sec inet6 PREFIX:GLOBAL_STABLE_IID/64 scope global dynamic mngtmpaddr ... valid_lft 209363sec preferred_lft 209363sec inet6 fe80::LINKLOCAL_STABLE_IID/64 scope link valid_lft forever preferred_lft forever $ sudo sysctl -w "net.ipv6.conf.all.forwarding=1" net.ipv6.conf.all.forwarding = 1 $ sudo sysctl net.ipv6.conf.eno1.forwarding net.ipv6.conf.eno1.forwarding = 1 $ ip addr list dev eno1 2: eno1: <...> ... inet6 PREFIX:TEMP_IID/64 scope global temporary dynamic valid_lft 209339sec preferred_lft 64000sec inet6 PREFIX:GLOBAL_STABLE_IID/64 scope global dynamic mngtmpaddr ... valid_lft 209339sec preferred_lft 209339sec inet6 fe80::LINKLOCAL_STABLE_IID/64 scope link valid_lft forever preferred_lft forever $ ip -6 route list ... PREFIX::/64 dev eno1 proto ra metric 100 expires 209241sec pref medium default via fe80::ROUTER_IID dev eno1 proto ra ... ``` PiperOrigin-RevId: 372146689
2021-05-05	Fix a race in reading last seen ICMP error during handshake	Mithun Iyer
	On receiving an ICMP error during handshake, the error is propagated by reading `endpoint.lastError`. This can race with the socket layer invoking getsockopt() with SO_ERROR where the same value is read and cleared, causing the handshake to bail out with a non-error state. Fix the race by checking for lastError state and failing the handshake with ErrConnectionAborted if the lastError was read and cleared by say SO_ERROR. The race mentioned in the bug, is caught only with the newly added tcp_test unit test, where we have control over stopping/resuming protocol loop. Adding a packetimpact test as well for sanity testing of ICMP error handling during handshake. Fixes #5922 PiperOrigin-RevId: 372135662
2021-05-04	Fix tcp_test listen backlog expectation	Mithun Iyer
	Listen backlog value is 1 more than what is configured by the socket layer listen call. TestListenBacklogFull expects this behavior which is incorrect as it directly invokes endpoint Listen and with cl/369974744, backlog++ logic is moved to the callers of Listen(). This test passes sometimes, because the handshakes could overlap causing the last SYN to arrive at the listener before the previous handshake is enqueued to the accept queue. In such a case the accept queue is still not full and the SYN is replied to. The final ACK of this last handshake would get dropped eventually. PiperOrigin-RevId: 372041827
2021-05-04	Use cmp.Diff for tcpip.Error comparison	Mithun Iyer
	PiperOrigin-RevId: 372021039
2021-05-03	Implement standard clock safely	Ghanan Gowripalan
	Previously, tcpip.StdClock depended on linking with the unexposed method time.now to implement tcpip.Clock using the time package. This change updates the standard clock to not require manually linking to this unexported method and use publicly documented functions from the time package. PiperOrigin-RevId: 371805101
2021-05-03	Convey GSO capabilities through GSOEndpoint	Ghanan Gowripalan
	...as all GSO capable endpoints must implement GSOEndpoint. PiperOrigin-RevId: 371804175
2021-05-03	netstack: Add a test for mixed Push/Consume	Ting-Yu Wang
	Not really designed to be used this way, but it works and it's been relied upon. Add a test. PiperOrigin-RevId: 371802756
2021-04-30	Comment ip package in a single place	Ghanan Gowripalan
	Fixes the below linting error: ``` From Golint: > Package ip has package comment defined in multiple places: > duplicate_address_detection.go > generic_multicast_protocol.go ``` PiperOrigin-RevId: 371430486
2021-04-29	Fix up TODOs in the code	Fabricio Voznika
	PiperOrigin-RevId: 371231148
2021-04-29	netstack: Rename pkt.Data().TrimFront() to DeleteFront(), and ...	Ting-Yu Wang
	... it may now invalidate backing slice references This is currently safe because TrimFront() in VectorisedView only shrinks the view. This may not hold under the a different buffer implementation. Reordering method calls order to allow this. PiperOrigin-RevId: 371167610
2021-04-27	Remove uses of the binary package from networking code.	Rahat Mahmood
	Co-Author: ayushranjan PiperOrigin-RevId: 370785009
2021-04-22	Fix AF_UNIX listen() w/ zero backlog.	Bhasker Hariharan
	In https://github.com/google/gvisor/commit/f075522849fa a check to increase zero to a minimum backlog length was removed from sys_socket.go to bring it in parity with linux and then in tcp/endpoint.go we bump backlog by 1. But this broke calling listen on a AF_UNIX socket w/ a zero backlog as in linux it does allow 1 connection even with a zero backlog. This was caught by a php runtime test socket_abstract_path.phpt. PiperOrigin-RevId: 369974744
2021-04-21	Only carry GSO options in the packet buffer	Ghanan Gowripalan
	With this change, GSO options no longer needs to be passed around as a function argument in the write path. This change is done in preparation for a later change that defers segmentation, and may change GSO options for a packet as it flows down the stack. Updates #170. PiperOrigin-RevId: 369774872
2021-04-20	Move SO_RCVBUF to socketops.	Nayana Bidari
	Fixes #2926, #674 PiperOrigin-RevId: 369457123
2021-04-20	Expose header methods that validate checksums	Arthur Sfez
	This is done for IPv4, UDP and TCP headers. This also changes the packet checkers used in tests to error on zero-checksum, not sure why it was allowed before. And while I'm here, make comments' case consistent. RELNOTES: n/a Fixes #5049 PiperOrigin-RevId: 369383862
2021-04-19	De-duplicate TCP state in TCPEndpointState vs tcp.endpoint	Nick Brown
	This change replaces individual private members in tcp.endpoint with a single private TCPEndpointState member. Some internal substructures within endpoint (receiver, sender) have been broken into a public substructure (which is then copied into the TCPEndpointState returned from completeState()) alongside other private fields. Fixes #4466 PiperOrigin-RevId: 369329514
2021-04-19	Fix TCP RACK flaky unit tests.	Nayana Bidari
	- Added delay to increase the RTT: In DSACK tests with RACK enabled and low RTT, TLP can be detected before sending ACK and the tests flake. Increasing the RTT will ensure that TLP does not happen before the ACK is sent. - Fix TestRACKOnePacketTailLoss: The ACK does not contain DSACK, which means either the original or retransmission (probe) was lost and SACKRecovery count must be incremented. Before: http://sponge2/c9bd51de-f72f-481c-a7f3-e782e7524883 After: http://sponge2/1307a796-103a-4a45-b699-e8d239220ed1 PiperOrigin-RevId: 369305720
2021-04-17	Avoid ignoring incoming packet by demuxer on endpoint lookup failure	Mithun Iyer
	This fixes a race that occurs while the endpoint is being unregistered and the transport demuxer attempts to match the incoming packet to any endpoint. The race specifically occurs when the unregistration (and deletion of the endpoint) occurs, after a successful endpointsByNIC lookup and before the endpoints map is further looked up with ingress NICID of the packet. The fix is to notify the caller of lookup-with-NICID failure, so that the logic falls through to handling unknown destination packets. For TCP this can mean replying back with RST. The syscall test in this CL catches this race as the ACK completing the handshake could get silently dropped on a listener close, causing no RST sent to the peer and timing out the poll waiting for POLLHUP. Fixes #5850 PiperOrigin-RevId: 369023779
2021-04-16	Enlarge port range and fix integer overflow	Kevin Krakauer
	Also count failed TCP port allocations PiperOrigin-RevId: 368939619
2021-04-15	Add field support to the sentry metrics.	Nayana Bidari
	Fields allow counter metrics to have multiple tabular values. At most one field is supported at the moment. PiperOrigin-RevId: 368767040
2021-04-15	Reduce tcp_x_test runtime and memory usage	Kevin Krakauer
	Reduce the ephemeral port range, which decreases the calls to makeEP. PiperOrigin-RevId: 368748379
2021-04-15	Use nicer formatting for IP addresses in tests	Kevin Krakauer
	This was semi-automated -- there are many addresses that were not replaced. Future commits should clean those up. Parse4 and Parse6 were given their own package because //pkg/test can introduce dependency cycles, as it depends transitively on //pkg/tcpip and some other netstack packages. PiperOrigin-RevId: 368726528
2021-04-14	Automatically enforce limited netstack dependencies	Kevin Krakauer
	Netstack is supposed to be somewhat independent of the rest of gVisor, and others should be able to use it without pulling in excessive dependencies. Currently, there is no way to fight dependency creep besides careful code review. This change introduces a test rule `netstack_deps_check` that ensures the target only relies on gVisor targets and a short allowlist of external dependencies. Users who add a dependency will see an error and have to manually update the allowlist. The set of packages to test comes from //runsc, as it uses packages we would expect users to commonly rely on. It was generated via: $ find ./runsc -name BUILD \| xargs grep tcpip \| awk '{print $2}' \| sort \| uniq (Note: We considered giving //pkg/tcpip it's own go.mod, but this breaks go tooling.) PiperOrigin-RevId: 368456711
2021-04-13	Fix listener close, client connect race	Mithun Iyer
	Fix a race where the ACK completing the handshake can be dropped by a closing listener without RST to the peer. The listener close would reset the accepted queue and that causes the connecting endpoint in SYNRCVD state to drop the ACK thinking the queue if filled up. PiperOrigin-RevId: 368165509
2021-04-12	Drop locks before calling waiterQueue.Notify	Tamir Duberstein
	Holding this lock can cause the user's callback to deadlock if it attempts to inspect the accept queue. PiperOrigin-RevId: 368068334
2021-04-10	Use the SecureRNG to generate listener nonces	Tamir Duberstein
	Some other cleanup while I'm here: - Remove unused arguments - Handle some unhandled errors - Remove redundant casts - Remove redundant parens - Avoid shadowing `hash` package name PiperOrigin-RevId: 367816161
2021-04-10	Don't store accepted endpoints in a channel	Tamir Duberstein
	Use a linked list with cached length and capacity. The current channel is already composed with a mutex and condition variable, and is never used for its channel-like properties. Channels also require eager allocation equal to their capacity, which a linked list does not. PiperOrigin-RevId: 367766626
2021-04-09	iptables: support postrouting hook and SNAT target	Toshi Kikuchi
	The current SNAT implementation has several limitations: - SNAT source port has to be specified. It is not optional. - SNAT source port range is not supported. - SNAT for UDP is a one-way translation. No response packets are handled (because conntrack doesn't support UDP currently). - SNAT and REDIRECT can't work on the same connection. Fixes #5489 PiperOrigin-RevId: 367750325
2021-04-09	Move maxListenBacklog check to sentry	Mithun Iyer
	Move maxListenBacklog check to the caller of endpoint Listen so that it is applicable to Unix domain sockets as well. This was changed in cl/366935921. Reported-by: syzbot+a35ae7cdfdde0c41cf7a@syzkaller.appspotmail.com PiperOrigin-RevId: 367728052
2021-04-09	Rename IsV6LinkLocalAddress to IsV6LinkLocalUnicastAddress	Ghanan Gowripalan
	To match the V4 variant. PiperOrigin-RevId: 367691981
2021-04-09	Remove duplicate accept queue fullness check	Tamir Duberstein
	Both code paths perform this check; extract it and remove the comment that suggests it is unique to one of the paths. PiperOrigin-RevId: 367666160
2021-04-09	Propagate SYN handling error	Tamir Duberstein
	Both callers of this function still drop this error on the floor, but progress is progress. Updates #4690. PiperOrigin-RevId: 367604788
2021-04-08	Do not forward link-local packets	Ghanan Gowripalan
	As per RFC 3927 section 7 and RFC 4291 section 2.5.6. Test: forward_test.TestMulticastForwarding PiperOrigin-RevId: 367519336
2021-04-08	Join all routers group when forwarding is enabled	Ghanan Gowripalan
	See comments inline code for rationale. Test: ip_test.TestJoinLeaveAllRoutersGroup PiperOrigin-RevId: 367449434
2021-04-06	Do not perform MLD for certain multicast scopes	Ghanan Gowripalan
	...as per RFC 2710 section 5 page 10. Test: ipv6_test.TestMLDSkipProtocol PiperOrigin-RevId: 367031126
2021-04-05	Fix listen backlog handling to be in parity with Linux	Mithun Iyer
	- Change the accept queue full condition for a listening endpoint to only honor completed (and delivered) connections. - Use syncookies if the number of incomplete connections is beyond listen backlog. This also cleans up the SynThreshold option code as that is no longer used with this change. - Added a new stack option to unconditionally generate syncookies. Similar to sysctl -w net.ipv4.tcp_syncookies=2 on Linux. - Enable keeping of incomplete connections beyond listen backlog. - Drop incoming SYNs only if the accept queue is filled up. - Drop incoming ACKs that complete handshakes when accept queue is full - Enable the stack to accept one more connection than programmed by listen backlog. - Handle backlog argument being zero, negative for listen, as Linux. - Add syscall and packetimpact tests to reflect the changes above. - Remove TCPConnectBacklog test which is polling for completed connections on the client side which is not reflective of whether the accept queue is filled up by the test. The modified syscall test in this CL addresses testing of connecting sockets. Fixes #3153 PiperOrigin-RevId: 366935921