gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-06-22	[syserror] Add conversions to linuxerr with temporary Equals method.	Zach Koopmans
	Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-22	Trigger poll/epoll events on zero-length hostinet sendmsg	Ian Lewis
	Fixes #2726 PiperOrigin-RevId: 380753516
2021-06-21	clean up tcpdump TODOs	Kevin Krakauer
	tcpdump is largely supported. We've also chose not to implement writeable AF_PACKET sockets, and there's a bug specifically for promiscuous mode (#3333). Fixes #173. PiperOrigin-RevId: 380733686
2021-06-17	remove outdated ip6tables TODOs	Kevin Krakauer
	IPv6 SO_ORIGINAL_DST is supported, and the flag check as-written will detect when other flags are needed. Fixes #3549. PiperOrigin-RevId: 380059115
2021-06-16	[syserror] Refactor linuxerr and error package.	Zach Koopmans
	Move Error struct to pkg/errors package for use in multiple places. Move linuxerr static definitions under pkg/errors/linuxerr. Add a lookup list for quick lookup of errors.Error by errno. This is useful when converting syserror errors and unix.Errno/syscall.Errrno values to errors.Error. Update benchmarks routines to include conversions. The below benchmarks show *errors.Error usage to be comparable to using unix.Errno. BenchmarkAssignUnix BenchmarkAssignUnix-32 787875022 1.284 ns/op BenchmarkAssignLinuxerr BenchmarkAssignLinuxerr-32 1000000000 1.209 ns/op BenchmarkAssignSyserror BenchmarkAssignSyserror-32 759269229 1.429 ns/op BenchmarkCompareUnix BenchmarkCompareUnix-32 1000000000 1.310 ns/op BenchmarkCompareLinuxerr BenchmarkCompareLinuxerr-32 1000000000 1.241 ns/op BenchmarkCompareSyserror BenchmarkCompareSyserror-32 147196165 8.248 ns/op BenchmarkSwitchUnix BenchmarkSwitchUnix-32 373233556 3.664 ns/op BenchmarkSwitchLinuxerr BenchmarkSwitchLinuxerr-32 476323929 3.294 ns/op BenchmarkSwitchSyserror BenchmarkSwitchSyserror-32 39293408 29.62 ns/op BenchmarkReturnUnix BenchmarkReturnUnix-32 1000000000 0.5042 ns/op BenchmarkReturnLinuxerr BenchmarkReturnLinuxerr-32 1000000000 0.8152 ns/op BenchmarkConvertUnixLinuxerr BenchmarkConvertUnixLinuxerr-32 739948875 1.547 ns/op BenchmarkConvertUnixLinuxerrZero BenchmarkConvertUnixLinuxerrZero-32 977733974 1.489 ns/op PiperOrigin-RevId: 379806801
2021-06-14	Cleanup iptables bug TODOs	Kevin Krakauer
	There are many references to unimplemented iptables features that link to #170, but that bug is about Istio support specifically. Istio is supported, so the references should change. Some TODOs are addressed, some removed because they are not features requested by users, and some are left as implementation notes. Fixes #170. PiperOrigin-RevId: 379328488
2021-06-07	Remove unsupported syscall event for setsockopt(*, SOL_SOCKET, SO_OOBINLINE).	Nicolas Lacasse
	Netstack behaves as if SO_OOBINLINE is always set, and was logging an unsupported syscall event if the app tries to disable it. We don't have a real use case for TCP urgent mechanisms (and RFC6093 says apps SHOULD NOT use it). This CL keeps the current behavior, but removes the unsupported syscall event. Fixes #6123 PiperOrigin-RevId: 378026059
2021-05-26	Use the stack RNG everywhere	Tamir Duberstein
	...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666
2021-05-25	Merge pull request #6027 from liornm:fix-unused-flag	gVisor bot
	PiperOrigin-RevId: 375740504
2021-05-21	Add aggregated NIC stats	Arthur Sfez
	This change also includes miscellaneous improvements: * UnknownProtocolRcvdPackets has been separated into two stats, to specify at which layer the unknown protocol was found (L3 or L4) * MalformedRcvdPacket is not aggregated across every endpoint anymore. Doing it this way did not add useful information, and it was also error-prone (example: ipv6 forgot to increment this aggregated stat, it only incremented its own ipv6.MalformedPacketsReceived). It is now only incremented the NIC. * Removed TestStatsString test which was outdated and had no real utility. PiperOrigin-RevId: 375057472
2021-05-21	Clean-up netstack metrics descriptions	Arthur Sfez
	PiperOrigin-RevId: 375051638
2021-05-20	Add protocol state to TCPINFO	Mithun Iyer
	Add missing protocol state to TCPINFO struct and update packetimpact. This re-arranges the TCP state definitions to align with Linux. Fixes #478 PiperOrigin-RevId: 374996751
2021-05-19	Send ICMP errors when link address resolution fails	Nick Brown
	Before this change, we would silently drop packets when link resolution failed. This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when link resolution fails. PiperOrigin-RevId: 374699789
2021-05-19	Allow use of IFF_ONE_QUEUE	liornm
	Before fix, use of this flag causes an error. It affects applications like OpenVPN which sets this flag for legacy reasons. According to linux/if_tun.h "This flag has no real effect".
2021-05-14	Don't read forwarding from netstack in sentry	Ghanan Gowripalan
	https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt: /proc/sys/net/ipv4/* Variables: ip_forward - BOOLEAN 0 - disabled (default) not 0 - enabled Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers) /proc/sys/net/ipv4/ip_forward only does work when its value is changed and always returns the last written value. The last written value may not reflect the current state of the netstack (e.g. when `ip_forward` was written a value of "1" then disable forwarding on an interface) so there is no need for sentry to probe netstack to get the current forwarding state of interfaces. ``` ~$ cat /proc/sys/net/ipv4/ip_forward 0 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo sysctl -a \| grep ipv4 \| grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo sysctl -w net.ipv4.conf.wlp1s0.forwarding=0 net.ipv4.conf.wlp1s0.forwarding = 0 ~$ sudo sysctl -a \| grep ipv4 \| grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a \| grep ipv4 \| grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo bash -c "echo 0 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a \| grep ipv4 \| grep forward sysctl: unable to open directory "/proc/sys/fs/binfmt_misc/" net.ipv4.conf.all.forwarding = 0 net.ipv4.conf.default.forwarding = 0 net.ipv4.conf.eno1.forwarding = 0 net.ipv4.conf.lo.forwarding = 0 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 0 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 0 ``` In the above example we can see that writing "1" to /proc/sys/net/ipv4/ip_forward configures the stack to be a router (all interfaces are configured to enable forwarding). However, if we manually update an interace (`wlp1s0`) to not forward packets, /proc/sys/net/ipv4/ip_forward continues to return the last written value of "1", even though not all interfaces will forward packets. Also note that writing the same value twice has no effect; work is performed iff the value changes. This change also removes the 'unset' state from sentry's ip forwarding data structures as an 'unset' ip forwarding value is the same as leaving forwarding disabled as the stack is always brought up with forwarding initially disabled; disabling forwarding on a newly created stack is a no-op. PiperOrigin-RevId: 373853106
2021-05-13	Rename SetForwarding to SetForwardingDefaultAndAllNICs	Ghanan Gowripalan
	...to make it clear to callers that all interfaces are updated with the forwarding flag and that future NICs will be created with the new forwarding state. PiperOrigin-RevId: 373618435
2021-05-12	Fix TODO comments.	Ian Lewis
	Fix TODO comments referring to incorrect issue numbers. Also fix the link in issue reviver comments to include the right url fragment. PiperOrigin-RevId: 373491821
2021-05-12	Send ICMP errors when unable to forward fragmented packets	Nick Brown
	Before this change, we would silently drop packets when the packet was too big to be sent out through the NIC (and, for IPv4 packets, if DF was set). This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when the packet can't be fragmented. PiperOrigin-RevId: 373480078
2021-05-11	[syserror] Refactor abi/linux.Errno	Zach Koopmans
	PiperOrigin-RevId: 373265454
2021-05-11	Process Hop-by-Hop header when forwarding IPv6 packets	Nick Brown
	Currently, we process IPv6 extension headers when receiving packets but not when forwarding them. This is fine for the most part, with with one exception: RFC 8200 requires that we process the Hop-by-Hop headers even while forwarding packets. This CL adds that support by invoking the Hop-by-hop logic performed when receiving packets during forwarding as well. PiperOrigin-RevId: 373145478
2021-05-05	Send ICMP errors when the network is unreachable	Nick Brown
	Before this change, we would silently drop packets when unable to determine a route to the destination host. This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when unable to reach the destination. Startblock: has LGTM from asfez and then add reviewer ghanan PiperOrigin-RevId: 372214051
2021-04-27	Remove uses of the binary package from networking code.	Rahat Mahmood
	Co-Author: ayushranjan PiperOrigin-RevId: 370785009
2021-04-23	hostinet: parse the timeval structure from a SO_TIMESTAMP control message	Andrei Vagin
	PiperOrigin-RevId: 370181621
2021-04-22	Fix AF_UNIX listen() w/ zero backlog.	Bhasker Hariharan
	In https://github.com/google/gvisor/commit/f075522849fa a check to increase zero to a minimum backlog length was removed from sys_socket.go to bring it in parity with linux and then in tcp/endpoint.go we bump backlog by 1. But this broke calling listen on a AF_UNIX socket w/ a zero backlog as in linux it does allow 1 connection even with a zero backlog. This was caught by a php runtime test socket_abstract_path.phpt. PiperOrigin-RevId: 369974744
2021-04-21	Only carry GSO options in the packet buffer	Ghanan Gowripalan
	With this change, GSO options no longer needs to be passed around as a function argument in the write path. This change is done in preparation for a later change that defers segmentation, and may change GSO options for a packet as it flows down the stack. Updates #170. PiperOrigin-RevId: 369774872
2021-04-20	Move SO_RCVBUF to socketops.	Nayana Bidari
	Fixes #2926, #674 PiperOrigin-RevId: 369457123
2021-04-16	Enlarge port range and fix integer overflow	Kevin Krakauer
	Also count failed TCP port allocations PiperOrigin-RevId: 368939619
2021-04-09	iptables: support postrouting hook and SNAT target	Toshi Kikuchi
	The current SNAT implementation has several limitations: - SNAT source port has to be specified. It is not optional. - SNAT source port range is not supported. - SNAT for UDP is a one-way translation. No response packets are handled (because conntrack doesn't support UDP currently). - SNAT and REDIRECT can't work on the same connection. Fixes #5489 PiperOrigin-RevId: 367750325
2021-03-29	[syserror] Split usermem package	Zach Koopmans
	Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-24	Add POLLRDNORM/POLLWRNORM support.	Bhasker Hariharan
	On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-15	Make netstack (//pkg/tcpip) buildable for 32 bit	Kevin Krakauer
	Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-15	Merge pull request #5618 from iangudger:unix-transport-race	gVisor bot
	PiperOrigin-RevId: 362999220
2021-03-08	Implement /proc/sys/net/ipv4/ip_local_port_range	Kevin Krakauer
	Speeds up the socket stress tests by a couple orders of magnitude. PiperOrigin-RevId: 361721050
2021-03-04	Fix race in unix socket transport.	Ian Gudger
	transport.baseEndpoint.receiver and transport.baseEndpoint.connected are protected by transport.baseEndpoint.Mutex. In order to access them without holding the mutex, we must make a copy. Notifications must be sent without holding the mutex, so we need the values without holding the mutex.
2021-03-03	Export stats that were forgotten	Arthur Sfez
	While I'm here, simplify the comments and unify naming of certain stats across protocols. PiperOrigin-RevId: 360728849
2021-03-03	[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.	Ayush Ranjan
	The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-19	Don't hold baseEndpoint.mu while calling EventUpdate().	Nicolas Lacasse
	This removes a three-lock deadlock between fdnotifier.notifier.mu, epoll.EventPoll.listsMu, and baseEndpoint.mu. A lock order comment was added to epoll/epoll.go. Also fix unsafe access of baseEndpoint.connected/receiver. PiperOrigin-RevId: 358515191
2021-02-18	Make socketops reflect correct sndbuf value for host UDS.	Bhasker Hariharan
	Also skips a test if the setsockopt to increase send buffer did not result in an increase. This is possible when the underlying socket is a host backed unix domain socket as in such cases gVisor does not permit increasing SO_SNDBUF. PiperOrigin-RevId: 358285158
2021-02-18	Validate IGMP packets	Arthur Sfez
	This change also adds support for Router Alert option processing on incoming packets, a new stat for Router Alert option, and exports all the IP-option related stats. Fixes #5491 PiperOrigin-RevId: 358238123
2021-02-17	Move Name() out of netstack Matcher. It can live in the sentry.	Kevin Krakauer
	PiperOrigin-RevId: 358078157
2021-02-11	[rack] TLP: ACK Processing and PTO scheduling.	Ayush Ranjan
	This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3 Fixes #5085 PiperOrigin-RevId: 357125037
2021-02-09	Add support for setting SO_SNDBUF for unix domain sockets.	Bhasker Hariharan
	The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-05	Replace TaskFromContext(ctx).Kernel() with KernelFromContext(ctx)	Ting-Yu Wang
	Panic seen at some code path like control.ExecAsync where ctx does not have a Task. Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com PiperOrigin-RevId: 355960596
2021-02-01	Refactor HandleControlPacket/SockError	Ghanan Gowripalan
	...to remove the need for the transport layer to deduce the type of error it received. Rename HandleControlPacket to HandleError as HandleControlPacket only handles errors. tcpip.SockError now holds a tcpip.SockErrorCause interface that different errors can implement. PiperOrigin-RevId: 354994306
2021-01-28	Change tcpip.Error to an interface	Tamir Duberstein
	This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28	Propagate reader error in ReadFrom	Tamir Duberstein
	This was removed in 6c0e1d9cfe6adbfbb32e7020d6426608ac63ad37 but turns out to be crucial to prevent flaky behaviour in sendfile. PiperOrigin-RevId: 354434144
2021-01-27	Add support for more fields in netstack for TCP_INFO	Nayana Bidari
	This CL adds support for the following fields: - RTT, RTTVar, RTO - send congestion window (sndCwnd) and send slow start threshold (sndSsthresh) - congestion control state(CaState) - ReorderSeen PiperOrigin-RevId: 354195361
2021-01-26	Initialize the send buffer handler in endpoint creation.	Nayana Bidari
	- This CL will initialize the function handler used for getting the send buffer size limits during endpoint creation and does not require the caller of SetSendBufferSize(..) to know the endpoint type(tcp/udp/..) PiperOrigin-RevId: 353992634
2021-01-26	Do not send SCM Rights more than once when message is truncated.	Dean Deng
	If data is sent over a stream socket that will not fit all at once, it will be sent over multiple packets. SCM Rights should only be sent with the first packet (see net/unix/af_unix.c:unix_stream_sendmsg in Linux). Reported-by: syzbot+aa26482e9c4887aff259@syzkaller.appspotmail.com PiperOrigin-RevId: 353886442
2021-01-26	Move SO_SNDBUF to socketops.	Nayana Bidari
	This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For unix sockets, we do not support setting of this option. PiperOrigin-RevId: 353871484