summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/socket
AgeCommit message (Collapse)Author
2021-06-14Cleanup iptables bug TODOsKevin Krakauer
There are many references to unimplemented iptables features that link to #170, but that bug is about Istio support specifically. Istio is supported, so the references should change. Some TODOs are addressed, some removed because they are not features requested by users, and some are left as implementation notes. Fixes #170. PiperOrigin-RevId: 379328488
2021-06-07Remove unsupported syscall event for setsockopt(*, SOL_SOCKET, SO_OOBINLINE).Nicolas Lacasse
Netstack behaves as if SO_OOBINLINE is always set, and was logging an unsupported syscall event if the app tries to disable it. We don't have a real use case for TCP urgent mechanisms (and RFC6093 says apps SHOULD NOT use it). This CL keeps the current behavior, but removes the unsupported syscall event. Fixes #6123 PiperOrigin-RevId: 378026059
2021-05-26Use the stack RNG everywhereTamir Duberstein
...except in tests. Note this replaces some uses of a cryptographic RNG with a plain RNG. PiperOrigin-RevId: 376070666
2021-05-25Merge pull request #6027 from liornm:fix-unused-flaggVisor bot
PiperOrigin-RevId: 375740504
2021-05-21Add aggregated NIC statsArthur Sfez
This change also includes miscellaneous improvements: * UnknownProtocolRcvdPackets has been separated into two stats, to specify at which layer the unknown protocol was found (L3 or L4) * MalformedRcvdPacket is not aggregated across every endpoint anymore. Doing it this way did not add useful information, and it was also error-prone (example: ipv6 forgot to increment this aggregated stat, it only incremented its own ipv6.MalformedPacketsReceived). It is now only incremented the NIC. * Removed TestStatsString test which was outdated and had no real utility. PiperOrigin-RevId: 375057472
2021-05-21Clean-up netstack metrics descriptionsArthur Sfez
PiperOrigin-RevId: 375051638
2021-05-20Add protocol state to TCPINFOMithun Iyer
Add missing protocol state to TCPINFO struct and update packetimpact. This re-arranges the TCP state definitions to align with Linux. Fixes #478 PiperOrigin-RevId: 374996751
2021-05-19Send ICMP errors when link address resolution failsNick Brown
Before this change, we would silently drop packets when link resolution failed. This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when link resolution fails. PiperOrigin-RevId: 374699789
2021-05-19Allow use of IFF_ONE_QUEUEliornm
Before fix, use of this flag causes an error. It affects applications like OpenVPN which sets this flag for legacy reasons. According to linux/if_tun.h "This flag has no real effect".
2021-05-14Don't read forwarding from netstack in sentryGhanan Gowripalan
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt: /proc/sys/net/ipv4/* Variables: ip_forward - BOOLEAN 0 - disabled (default) not 0 - enabled Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers) /proc/sys/net/ipv4/ip_forward only does work when its value is changed and always returns the last written value. The last written value may not reflect the current state of the netstack (e.g. when `ip_forward` was written a value of "1" then disable forwarding on an interface) so there is no need for sentry to probe netstack to get the current forwarding state of interfaces. ``` ~$ cat /proc/sys/net/ipv4/ip_forward 0 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo sysctl -w net.ipv4.conf.wlp1s0.forwarding=0 net.ipv4.conf.wlp1s0.forwarding = 0 ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo bash -c "echo 0 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a | grep ipv4 | grep forward sysctl: unable to open directory "/proc/sys/fs/binfmt_misc/" net.ipv4.conf.all.forwarding = 0 net.ipv4.conf.default.forwarding = 0 net.ipv4.conf.eno1.forwarding = 0 net.ipv4.conf.lo.forwarding = 0 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 0 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 0 ``` In the above example we can see that writing "1" to /proc/sys/net/ipv4/ip_forward configures the stack to be a router (all interfaces are configured to enable forwarding). However, if we manually update an interace (`wlp1s0`) to not forward packets, /proc/sys/net/ipv4/ip_forward continues to return the last written value of "1", even though not all interfaces will forward packets. Also note that writing the same value twice has no effect; work is performed iff the value changes. This change also removes the 'unset' state from sentry's ip forwarding data structures as an 'unset' ip forwarding value is the same as leaving forwarding disabled as the stack is always brought up with forwarding initially disabled; disabling forwarding on a newly created stack is a no-op. PiperOrigin-RevId: 373853106
2021-05-13Rename SetForwarding to SetForwardingDefaultAndAllNICsGhanan Gowripalan
...to make it clear to callers that all interfaces are updated with the forwarding flag and that future NICs will be created with the new forwarding state. PiperOrigin-RevId: 373618435
2021-05-12Fix TODO comments.Ian Lewis
Fix TODO comments referring to incorrect issue numbers. Also fix the link in issue reviver comments to include the right url fragment. PiperOrigin-RevId: 373491821
2021-05-12Send ICMP errors when unable to forward fragmented packetsNick Brown
Before this change, we would silently drop packets when the packet was too big to be sent out through the NIC (and, for IPv4 packets, if DF was set). This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when the packet can't be fragmented. PiperOrigin-RevId: 373480078
2021-05-11[syserror] Refactor abi/linux.ErrnoZach Koopmans
PiperOrigin-RevId: 373265454
2021-05-11Process Hop-by-Hop header when forwarding IPv6 packetsNick Brown
Currently, we process IPv6 extension headers when receiving packets but not when forwarding them. This is fine for the most part, with with one exception: RFC 8200 requires that we process the Hop-by-Hop headers even while forwarding packets. This CL adds that support by invoking the Hop-by-hop logic performed when receiving packets during forwarding as well. PiperOrigin-RevId: 373145478
2021-05-05Send ICMP errors when the network is unreachableNick Brown
Before this change, we would silently drop packets when unable to determine a route to the destination host. This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when unable to reach the destination. Startblock: has LGTM from asfez and then add reviewer ghanan PiperOrigin-RevId: 372214051
2021-04-27Remove uses of the binary package from networking code.Rahat Mahmood
Co-Author: ayushranjan PiperOrigin-RevId: 370785009
2021-04-23hostinet: parse the timeval structure from a SO_TIMESTAMP control messageAndrei Vagin
PiperOrigin-RevId: 370181621
2021-04-22Fix AF_UNIX listen() w/ zero backlog.Bhasker Hariharan
In https://github.com/google/gvisor/commit/f075522849fa a check to increase zero to a minimum backlog length was removed from sys_socket.go to bring it in parity with linux and then in tcp/endpoint.go we bump backlog by 1. But this broke calling listen on a AF_UNIX socket w/ a zero backlog as in linux it does allow 1 connection even with a zero backlog. This was caught by a php runtime test socket_abstract_path.phpt. PiperOrigin-RevId: 369974744
2021-04-21Only carry GSO options in the packet bufferGhanan Gowripalan
With this change, GSO options no longer needs to be passed around as a function argument in the write path. This change is done in preparation for a later change that defers segmentation, and may change GSO options for a packet as it flows down the stack. Updates #170. PiperOrigin-RevId: 369774872
2021-04-20Move SO_RCVBUF to socketops.Nayana Bidari
Fixes #2926, #674 PiperOrigin-RevId: 369457123
2021-04-16Enlarge port range and fix integer overflowKevin Krakauer
Also count failed TCP port allocations PiperOrigin-RevId: 368939619
2021-04-09iptables: support postrouting hook and SNAT targetToshi Kikuchi
The current SNAT implementation has several limitations: - SNAT source port has to be specified. It is not optional. - SNAT source port range is not supported. - SNAT for UDP is a one-way translation. No response packets are handled (because conntrack doesn't support UDP currently). - SNAT and REDIRECT can't work on the same connection. Fixes #5489 PiperOrigin-RevId: 367750325
2021-03-29[syserror] Split usermem packageZach Koopmans
Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-15Make netstack (//pkg/tcpip) buildable for 32 bitKevin Krakauer
Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-15Merge pull request #5618 from iangudger:unix-transport-racegVisor bot
PiperOrigin-RevId: 362999220
2021-03-08Implement /proc/sys/net/ipv4/ip_local_port_rangeKevin Krakauer
Speeds up the socket stress tests by a couple orders of magnitude. PiperOrigin-RevId: 361721050
2021-03-04Fix race in unix socket transport.Ian Gudger
transport.baseEndpoint.receiver and transport.baseEndpoint.connected are protected by transport.baseEndpoint.Mutex. In order to access them without holding the mutex, we must make a copy. Notifications must be sent without holding the mutex, so we need the values without holding the mutex.
2021-03-03Export stats that were forgottenArthur Sfez
While I'm here, simplify the comments and unify naming of certain stats across protocols. PiperOrigin-RevId: 360728849
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-19Don't hold baseEndpoint.mu while calling EventUpdate().Nicolas Lacasse
This removes a three-lock deadlock between fdnotifier.notifier.mu, epoll.EventPoll.listsMu, and baseEndpoint.mu. A lock order comment was added to epoll/epoll.go. Also fix unsafe access of baseEndpoint.connected/receiver. PiperOrigin-RevId: 358515191
2021-02-18Make socketops reflect correct sndbuf value for host UDS.Bhasker Hariharan
Also skips a test if the setsockopt to increase send buffer did not result in an increase. This is possible when the underlying socket is a host backed unix domain socket as in such cases gVisor does not permit increasing SO_SNDBUF. PiperOrigin-RevId: 358285158
2021-02-18Validate IGMP packetsArthur Sfez
This change also adds support for Router Alert option processing on incoming packets, a new stat for Router Alert option, and exports all the IP-option related stats. Fixes #5491 PiperOrigin-RevId: 358238123
2021-02-17Move Name() out of netstack Matcher. It can live in the sentry.Kevin Krakauer
PiperOrigin-RevId: 358078157
2021-02-11[rack] TLP: ACK Processing and PTO scheduling.Ayush Ranjan
This change implements TLP details enumerated in https://tools.ietf.org/html/draft-ietf-tcpm-rack-08#section-7.5.3 Fixes #5085 PiperOrigin-RevId: 357125037
2021-02-09Add support for setting SO_SNDBUF for unix domain sockets.Bhasker Hariharan
The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-05Replace TaskFromContext(ctx).Kernel() with KernelFromContext(ctx)Ting-Yu Wang
Panic seen at some code path like control.ExecAsync where ctx does not have a Task. Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com PiperOrigin-RevId: 355960596
2021-02-01Refactor HandleControlPacket/SockErrorGhanan Gowripalan
...to remove the need for the transport layer to deduce the type of error it received. Rename HandleControlPacket to HandleError as HandleControlPacket only handles errors. tcpip.SockError now holds a tcpip.SockErrorCause interface that different errors can implement. PiperOrigin-RevId: 354994306
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28Propagate reader error in ReadFromTamir Duberstein
This was removed in 6c0e1d9cfe6adbfbb32e7020d6426608ac63ad37 but turns out to be crucial to prevent flaky behaviour in sendfile. PiperOrigin-RevId: 354434144
2021-01-27Add support for more fields in netstack for TCP_INFONayana Bidari
This CL adds support for the following fields: - RTT, RTTVar, RTO - send congestion window (sndCwnd) and send slow start threshold (sndSsthresh) - congestion control state(CaState) - ReorderSeen PiperOrigin-RevId: 354195361
2021-01-26Initialize the send buffer handler in endpoint creation.Nayana Bidari
- This CL will initialize the function handler used for getting the send buffer size limits during endpoint creation and does not require the caller of SetSendBufferSize(..) to know the endpoint type(tcp/udp/..) PiperOrigin-RevId: 353992634
2021-01-26Do not send SCM Rights more than once when message is truncated.Dean Deng
If data is sent over a stream socket that will not fit all at once, it will be sent over multiple packets. SCM Rights should only be sent with the first packet (see net/unix/af_unix.c:unix_stream_sendmsg in Linux). Reported-by: syzbot+aa26482e9c4887aff259@syzkaller.appspotmail.com PiperOrigin-RevId: 353886442
2021-01-26Move SO_SNDBUF to socketops.Nayana Bidari
This CL moves {S,G}etsockopt of SO_SNDBUF from all endpoints to socketops. For unix sockets, we do not support setting of this option. PiperOrigin-RevId: 353871484
2021-01-25Add per endpoint ARP statisticsArthur Sfez
The ARP stat NetworkUnreachable was removed, and was replaced by InterfaceHasNoLocalAddress. No stats are recorded when dealing with an missing endpoint (ErrNotConnected) (because if there is no endpoint, there is no valid per-endpoint stats). PiperOrigin-RevId: 353759462
2021-01-22Define tcpip.Payloader in terms of io.ReaderTamir Duberstein
Fixes #1509. PiperOrigin-RevId: 353295589
2021-01-21iptables: support matching the input interface nameToshi Kikuchi
We have support for the output interface name, but not for the input interface name. This change adds the support for the input interface name, and adds the test cases for it. Fixes #5300 PiperOrigin-RevId: 353179389
2021-01-20Remove unimplemented message for SO_LINGERNayana Bidari
- Removes the unimplemented message for SO_LINGER - Fix the length for IP_PKTINFO option PiperOrigin-RevId: 352917611
2021-01-20Move Lock/UnlockPOSIX into LockFD util.Dean Deng
PiperOrigin-RevId: 352904728