gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2019-10-29	Add endpoint tracking to the stack.	Ian Gudger
	In the future this will replace DanglingEndpoints. DanglingEndpoints must be kept for now due to issues with save/restore. This is arguably a cleaner design and allows the stack to know which transport endpoints might still be using its link endpoints. Updates #837 PiperOrigin-RevId: 277386633
2019-10-29	Allow waiting for Endpoint worker goroutines to finish.	Ian Gudger
	Updates #837 PiperOrigin-RevId: 277325162
2019-10-28	Use the user supplied TCP MSS when creating a new active socket	Ghanan Gowripalan
	This change supports using a user supplied TCP MSS for new active TCP connections. Note, the user supplied MSS must be less than or equal to the maximum possible MSS for a TCP connection's route. If it is greater than the maximum possible MSS, the maximum possible MSS will be used as the connection's MSS instead. This change does not use this user supplied MSS for connections accepted from listening sockets - that will come in a later change. Test: Test that outgoing TCP SYN segments contain a TCP MSS option with the user supplied MSS if it is not greater than the maximum possible MSS for the route. PiperOrigin-RevId: 277185125
2019-10-25	Convert DelayOption to the newer/faster SockOpt int type.	Ian Gudger
	DelayOption is set on all new endpoints in gVisor. PiperOrigin-RevId: 276746791
2019-10-24	Remove the amss field from tcpip.tcp.handshake as it was unused	Ghanan Gowripalan
	The amss field in the tcpip.tcp.handshake was not used anywhere. Removed it to not cause confusion with the amss field in the tcpip.tcp.endpoint struct, which was documented to be used (and is actually being used) for the same purpose. PiperOrigin-RevId: 276577088
2019-10-23	Merge pull request #641 from tanjianfeng:master	gVisor bot
	PiperOrigin-RevId: 276380008
2019-10-22	netstack/tcp: software segmentation offload	Andrei Vagin
	Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-15	epsocket: support /proc/net/snmp	Jianfeng Tan
	Netstack has its own stats, we use this to fill /proc/net/snmp. Note that some metrics are not recorded in Netstack, which will be shown as 0 in the proc file. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15	netstack: add counters for tcp CurrEstab and EstabResets	Jianfeng Tan
	Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-14	Internal change.	gVisor bot
	PiperOrigin-RevId: 274700093
2019-10-14	Reorder BUILD license and load functions in netstack.	Kevin Krakauer
	PiperOrigin-RevId: 274672346
2019-10-09	Internal change.	gVisor bot
	PiperOrigin-RevId: 273861936
2019-10-07	Implement IP_TTL.	Ian Gudger
	Also change the default TTL to 64 to match Linux. PiperOrigin-RevId: 273430341
2019-09-30	Fix bugs in PickEphemeralPort for TCP.	Bhasker Hariharan
	Netstack always picks a random start point everytime PickEphemeralPort is called. While this is required for UDP so that DNS requests go out through a randomized set of ports it is not required for TCP. Infact Linux explicitly hashes the (srcip, dstip, dstport) and a one time secret initialized at start of the application to get a random offset. But to ensure it doesn't start from the same point on every scan it uses a static hint that is incremented by 2 in every call to pick ephemeral ports. The reason for 2 is Linux seems to split the port ranges where active connects seem to use even ones while odd ones are used by listening sockets. This CL implements a similar strategy where we use a hash + hint to generate the offset to start the search for a free Ephemeral port. This ensures that we cycle through the available port space in order for repeated connects to the same destination and significantly reduces the chance of picking a recently released port. PiperOrigin-RevId: 272058370
2019-09-27	Implement SO_BINDTODEVICE sockopt	gVisor bot
	PiperOrigin-RevId: 271644926
2019-09-25	Remove centralized registration of protocols.	Kevin Krakauer
	Also removes the need for protocol names. PiperOrigin-RevId: 271186030
2019-09-23	netstack: convert more socket options to {Set,Get}SockOptInt	Andrei Vagin
	PiperOrigin-RevId: 270763208
2019-09-12	Implement splice methods for pipes and sockets.	Adin Scannell
	This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207
2019-09-12	Remove go_test from go_stateify and go_marshal	Michael Pratt
	They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-09-06	Remove reundant global tcpip.LinkEndpointID.	Ian Gudger
	PiperOrigin-RevId: 267709597
2019-09-04	Fix RST generation bugs.	Bhasker Hariharan
	There are a few cases addressed by this change - We no longer generate a RST in response to a RST packet. - When we receive a RST we cleanup and release all reservations immediately as the connection is now aborted. - An ACK received by a listening socket generates a RST when SYN cookies are not in-use. The only reason an ACK should land at the listening socket is if we are using SYN cookies otherwise the goroutine for the handshake in progress should have gotten the packet and it should never have arrived at the listening endpoint. - Also fixes the error returned when a connection times out due to a Keepalive timer expiration from ECONNRESET to a ETIMEDOUT. PiperOrigin-RevId: 267238427
2019-09-03	Make UDP traceroute work.	Bhasker Hariharan
	Adds support to generate Port Unreachable messages for UDP datagrams received on a port for which there is no valid endpoint. Fixes #703 PiperOrigin-RevId: 267034418
2019-08-26	netstack/tcp: Add LastAck transition.	Rahat Mahmood
	Add missing state transition to LastAck, which should happen when the endpoint has already recieved a FIN from the remote side, and is sending its own FIN. PiperOrigin-RevId: 265568314
2019-08-21	Use tcpip.Subnet in tcpip.Route	Tamir Duberstein
	This is the first step in replacing some of the redundant types with the standard library equivalents. PiperOrigin-RevId: 264706552
2019-08-16	netstack: disconnect an unix socket only if the address family is AF_UNSPEC	Andrei Vagin
	Linux allows to call connect for ANY and the zero port. PiperOrigin-RevId: 263892534
2019-08-15	netstack: move resumption logic into *_state.go	Tamir Duberstein
	13a98df rearranged some of this code in a way that broke compilation of the netstack-only export at github.com/google/netstack because _state.go files are not included in that export. This commit moves resumption logic back into _state.go, fixing the compilation breakage. PiperOrigin-RevId: 263601629
2019-08-14	Replace uinptr with int64 when returning lengths	Tamir Duberstein
	This is in accordance with newer parts of the standard library. PiperOrigin-RevId: 263449916
2019-08-14	Improve SendMsg performance.	Bhasker Hariharan
	SendMsg before this change would copy all the data over into a new slice even if the underlying socket could only accept a small amount of data. This is really inefficient with non-blocking sockets and under high throughput where large writes could get ErrWouldBlock or if there was say a timeout associated with the sendmsg() syscall. With this change we delay copying bytes in till they are needed and only copy what can be potentially sent/held in the socket buffer. Reducing the need to repeatedly copy data over. Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called instead of when we transmit the actual FIN. Otherwise the socket could remain in CONNECTED state even though the user has called shutdown() on the socket. Updates #627 PiperOrigin-RevId: 263430505
2019-08-09	Add congestion control states to sender.	Bhasker Hariharan
	This change just introduces different congestion control states and ensures the sender.state is updated to reflect the current state of the connection. It is not used for any decisions yet but this is required before algorithms like Eiffel/PRR can be implemented. Fixes #394 PiperOrigin-RevId: 262638292
2019-08-08	netstack: Don't start endpoint goroutines too soon on restore.	Rahat Mahmood
	Endpoint protocol goroutines were previously started as part of loading the endpoint. This is potentially too soon, as resources used by these goroutine may not have been loaded. Protocol goroutines may perform meaningful work as soon as they're started (ex: incoming connect) which can cause them to indirectly access resources that haven't been loaded yet. This CL defers resuming all protocol goroutines until the end of restore. PiperOrigin-RevId: 262409429
2019-08-06	Fix for a panic due to writing to a closed accept channel.	Bhasker Hariharan
	This can happen because endpoint.Close() closes the accept channel first and then drains/resets any accepted but not delivered connections. But there can be connections that are connected but not delivered to the channel as the channel was full. But closing the channel can cause these writes to fail with a write to a closed channel. The correct solution is to abort any connections in SYN-RCVD state and drain/abort all completed connections before closing the accept channel. PiperOrigin-RevId: 261951132
2019-08-02	Plumbing for iptables sockopts.	Kevin Krakauer
	PiperOrigin-RevId: 261413396
2019-08-02	Automated rollback of changelist 261191548	Rahat Mahmood
	PiperOrigin-RevId: 261373749
2019-08-01	Implement getsockopt(TCP_INFO).	Rahat Mahmood
	Export some readily-available fields for TCP_INFO and stub out the rest. PiperOrigin-RevId: 261191548
2019-07-23	Deduplicate EndpointState.connected some	Tamir Duberstein
	This fixes a bug introduced in cl/251934850 that caused connect-accept-close-connect races to result in the second connect call failiing when it should have succeeded. PiperOrigin-RevId: 259584525
2019-07-18	net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)	Andrei Vagin
	PiperOrigin-RevId: 258859507
2019-07-16	Internal change.	gVisor bot
	PiperOrigin-RevId: 258424489
2019-07-12	Stub out support for TCP_MAXSEG.	Bhasker Hariharan
	Adds support to set/get the TCP_MAXSEG value but does not really change the segment sizes emitted by netstack or alter the MSS advertised by the endpoint. This is currently being added only to unblock iperf3 on gVisor. Plumbing this correctly requires a bit more work which will come in separate CLs. PiperOrigin-RevId: 257859112
2019-07-03	netstack/udp: connect with the AF_UNSPEC address family means disconnect	Andrei Vagin
	PiperOrigin-RevId: 256433283
2019-06-21	Fix the logic for sending zero window updates.	Bhasker Hariharan
	Today we have the logic split in two places between endpoint Read() and the worker goroutine which actually sends a zero window. This change makes it so that when a zero window ACK is sent we set a flag in the endpoint which can be read by the endpoint to decide if it should notify the worker to send a nonZeroWindow update. The worker now does not do the check again but instead sends an ACK and flips the flag right away. Similarly today when SO_RECVBUF is set the SetSockOpt call has logic to decide if a zero window update is required. Rather than do that we move the logic to the worker goroutine and it can check the zeroWindow flag and send an update if required. PiperOrigin-RevId: 254505447
2019-06-13	Add support for TCP receive buffer auto tuning.	Bhasker Hariharan
	The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283
2019-06-13	Update canonical repository.	Adin Scannell
	This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-12	Add support for TCP_CONGESTION socket option.	Bhasker Hariharan
	This CL also cleans up the error returned for setting congestion control which was incorrectly returning EINVAL instead of ENOENT. PiperOrigin-RevId: 252889093
2019-06-10	Fixes to listen backlog handling.	Bhasker Hariharan
	Changes netstack to confirm to current linux behaviour where if the backlog is full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto backlog connections to be in SYN-RCVD state as long as the backlog is not full. We also now drop a SYN if syn cookies are in use and the backlog for the listening endpoint is full. Added new tests to confirm the behaviour. Also reverted the change to increase the backlog in TcpPortReuseMultiThread syscall test. Fixes #236 PiperOrigin-RevId: 252500462
2019-06-06	Track and export socket state.	Rahat Mahmood
	This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850
2019-06-05	netstack/tcp: fix calculating a number of outstanding packets	Andrei Vagin
	In case of GSO, a segment can container more than one packet and we need to use the pCount() helper to get a number of packets. PiperOrigin-RevId: 251743020
2019-06-04	Fix data race in synRcvdState.	Bhasker Hariharan
	When checking the length of the acceptedChan we should hold the endpoint mutex otherwise a syn received while the listening socket is being closed can result in a data race where the cleanupLocked routine sets acceptedChan to nil while a handshake goroutine in progress could try and check it at the same time. PiperOrigin-RevId: 251537697
2019-06-03	Delete debug log lines left by mistake.	Bhasker Hariharan
	Updates #236 PiperOrigin-RevId: 251337915
2019-05-31	Disable certain tests that are flaky under race detector.	Bhasker Hariharan
	PiperOrigin-RevId: 250976665
2019-05-31	Change segment queue limit to be of fixed size.	Bhasker Hariharan
	Netstack sets the unprocessed segment queue size to match the receive buffer size. This is not required as this queue only needs to hold enough for a short duration before the endpoint goroutine can process it. Updates #230 PiperOrigin-RevId: 250976323