summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/socket
AgeCommit message (Collapse)Author
2019-06-13Add support for TCP receive buffer auto tuning.Bhasker Hariharan
The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283
2019-06-13Plumb context through more layers of filesytem.Ian Gudger
All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709
2019-06-13Implement getsockopt() SO_DOMAIN, SO_PROTOCOL and SO_TYPE.Rahat Mahmood
SO_TYPE was already implemented for everything but netlink sockets. PiperOrigin-RevId: 253138157
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-12Add support for TCP_CONGESTION socket option.Bhasker Hariharan
This CL also cleans up the error returned for setting congestion control which was incorrectly returning EINVAL instead of ENOENT. PiperOrigin-RevId: 252889093
2019-06-10Store more information in the kernel socket table.Rahat Mahmood
Store enough information in the kernel socket table to distinguish between different types of sockets. Previously we were only storing the socket family, but this isn't enough to classify sockets. For example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets are SOCK_DGRAM sockets with a particular protocol. Instead of creating more sub-tables, flatten the socket table and provide a filtering mechanism based on the socket entry. Also generate and store a socket entry index ("sl" in linux) which allows us to output entries in a stable order from procfs. PiperOrigin-RevId: 252495895
2019-06-06Use common definition of SockType.Rahat Mahmood
SockType isn't specific to unix domain sockets, and the current definition basically mirrors the linux ABI's definition. PiperOrigin-RevId: 251956740
2019-06-06Track and export socket state.Rahat Mahmood
This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850
2019-06-03gvisor/sock/unix: pass creds when a message is sent between unconnected socketsAndrei Vagin
and don't report a sender address if it doesn't have one PiperOrigin-RevId: 251371284
2019-05-30Fixes to TCP listen behavior.Bhasker Hariharan
Netstack listen loop can get stuck if cookies are in-use and the app is slow to accept incoming connections. Further we continue to complete handshake for a connection even if the backlog is full. This creates a problem when a lots of connections come in rapidly and we end up with lots of completed connections just hanging around to be delivered. These fixes change netstack behaviour to mirror what linux does as described here in the following article http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html Now when cookies are not in-use Netstack will silently drop the ACK to a SYN-ACK and not complete the handshake if the backlog is full. This will result in the connection staying in a half-complete state. Eventually the sender will retransmit the ACK and if backlog has space we will transition to a connected state and deliver the endpoint. Similarly when cookies are in use we do not try and create an endpoint unless there is space in the accept queue to accept the newly created endpoint. If there is no space then we again silently drop the ACK as we can just recreate it when the ACK is retransmitted by the peer. We also now use the backlog to cap the size of the SYN-RCVD queue for a given endpoint. So at any time there can be N connections in the backlog and N in a SYN-RCVD state if the application is not accepting connections. Any new SYNs will be dropped. This CL also fixes another small bug where we mark a new endpoint which has not completed handshake as connected. We should wait till handshake successfully completes before marking it connected. Updates #236 PiperOrigin-RevId: 250717817
2019-05-30gvisor: socket() returns EPROTONOSUPPORT if protocol is not supportedAndrei Vagin
PiperOrigin-RevId: 250426407
2019-05-22UDP and TCP raw socket support.Kevin Krakauer
PiperOrigin-RevId: 249511348 Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-21Add basic plumbing for splice and stub implementation.Adin Scannell
This does not actually implement an efficient splice or sendfile. Rather, it adds a generic plumbing to the file internals so that this can be added. All file implementations use the stub fileutil.NoSplice implementation, which causes sendfile and splice to fall back to an internal copy. A basic splice system call interface is added, along with a test. PiperOrigin-RevId: 249335960 Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-04-29Implement the MSG_CTRUNC msghdr flag for Unix sockets.Ian Gudger
Updates google/gvisor#206 PiperOrigin-RevId: 245880573 Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29Allow and document bug ids in gVisor codebase.Nicolas Lacasse
PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-19Add support for the MSG_TRUNC msghdr flag.Ian Gudger
The MSG_TRUNC flag is set in the msghdr when a message is truncated. Fixes google/gvisor#200 PiperOrigin-RevId: 244440486 Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-18Only emit unimplemented syscall events for unsupported values.Ian Gudger
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER when attempting to set unsupported values. PiperOrigin-RevId: 244229675 Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
2019-04-17Convert poll/select to operate more directly on linux.PollFDMichael Pratt
Current, doPoll copies the user struct pollfd array into a []syscalls.PollFD, which contains internal kdefs.FD and waiter.EventMask types. While these are currently binary-compatible with the Linux versions, we generally discourage copying directly to internal types (someone may inadvertantly change kdefs.FD to uint64). Instead, copy directly to a []linux.PollFD, which will certainly be binary compatible. Most of syscalls/polling.go is included directly into syscalls/linux/sys_poll.go, as it can then operate directly on linux.PollFD. The additional syscalls.PollFD type is providing little value. I've also added explicit conversion functions for waiter.EventMask, which creates the possibility of a different binary format. PiperOrigin-RevId: 244042947 Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
2019-04-11Use open fids when fstat()ing gofer files.Jamie Liu
PiperOrigin-RevId: 243018347 Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37
2019-04-09Add TCP checksum verification.Bhasker Hariharan
PiperOrigin-RevId: 242704699 Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
2019-03-28Add ICMP statsBert Muthalaly
PiperOrigin-RevId: 240848882 Change-Id: I23dd4599f073263437aeab357c3f767e1a432b82
2019-03-20Record sockets created during accept(2) for all families.Rahat Mahmood
Track new sockets created during accept(2) in the socket table for all families. Previously we were only doing this for unix domain sockets. PiperOrigin-RevId: 239475550 Change-Id: I16f009f24a06245bfd1d72ffd2175200f837c6ac
2019-03-19Fix data race in netlink send buffer sizeFabricio Voznika
PiperOrigin-RevId: 239221041 Change-Id: Icc19e32a00fa89167447ab2f45e90dcfd61bea04
2019-03-09Fix getsockopt(IP_MULTICAST_IF).Ian Gudger
getsockopt(IP_MULTICAST_IF) only supports struct in_addr. Also adds support for setsockopt(IP_MULTICAST_IF) with struct in_addr. PiperOrigin-RevId: 237620230 Change-Id: I75e7b5b3e08972164eb1906f43ddd67aedffc27c
2019-03-08Make IP_MULTICAST_LOOP and IP_MULTICAST_TTL allow setting int or char.Ian Gudger
This is the correct Linux behavior, and at least PHP depends on it. PiperOrigin-RevId: 237565639 Change-Id: I931af09c8ed99a842cf70d22bfe0b65e330c4137
2019-03-08Implement IP_MULTICAST_LOOP.Ian Gudger
IP_MULTICAST_LOOP controls whether or not multicast packets sent on the default route are looped back. In order to implement this switch, support for sending and looping back multicast packets on the default route had to be implemented. For now we only support IPv4 multicast. PiperOrigin-RevId: 237534603 Change-Id: I490ac7ff8e8ebef417c7eb049a919c29d156ac1c
2019-03-05Add new retransmissions and recovery related metrics.Bhasker Hariharan
PiperOrigin-RevId: 236945145 Change-Id: I051760d95154ea5574c8bb6aea526f488af5e07b
2019-03-05Remove unused commit() function argument to Bind.Kevin Krakauer
PiperOrigin-RevId: 236926132 Change-Id: I5cf103f22766e6e65a581de780c7bb9ca0fa3181
2019-02-27Ping support via IPv4 raw sockets.Kevin Krakauer
Broadly, this change: * Enables sockets to be created via `socket(AF_INET, SOCK_RAW, IPPROTO_ICMP)`. * Passes the network-layer (IP) header up the stack to the transport endpoint, which can pass it up to the socket layer. This allows a raw socket to return the entire IP packet to users. * Adds functions to stack.TransportProtocol, stack.Stack, stack.transportDemuxer that enable incoming packets to be delivered to raw endpoints. New raw sockets of other protocols (not ICMP) just need to register with the stack. * Enables ping.endpoint to return IP headers when created via SOCK_RAW. PiperOrigin-RevId: 235993280 Change-Id: I60ed994f5ff18b2cbd79f063a7fdf15d093d845a
2019-02-26Fix bad mergeFabricio Voznika
PiperOrigin-RevId: 235818534 Change-Id: I99f7e3fd1dc808b35f7a08b96b7c3226603ab808
2019-02-20Implement Broadcast supportAmanda Tait
This change adds support for the SO_BROADCAST socket option in gVisor Netstack. This support includes getsockopt()/setsockopt() functionality for both UDP and TCP endpoints (the latter being a NOOP), dispatching broadcast messages up and down the stack, and route finding/creation for broadcast packets. Finally, a suite of tests have been implemented, exercising this functionality through the Linux syscall API. PiperOrigin-RevId: 234850781 Change-Id: If3e666666917d39f55083741c78314a06defb26c
2019-02-19netstack: Add SIOCGSTAMP support.Kevin Krakauer
Ping sometimes uses this instead of SO_TIMESTAMP. PiperOrigin-RevId: 234699590 Change-Id: Ibec9c34fa0d443a931557a2b1b1ecd83effe7765
2019-02-15Implement IP_MULTICAST_IF.Ian Gudger
This allows setting a default send interface for IPv4 multicast. IPv6 support will come later. PiperOrigin-RevId: 234251379 Change-Id: I65922341cd8b8880f690fae3eeb7ddfa47c8c173
2019-02-15Move SO_TIMESTAMP from different transport endpoints to epsocket.Kevin Krakauer
SO_TIMESTAMP is reimplemented in ping and UDP sockets (and needs to be added for TCP), but can just be implemented in epsocket for simplicity. This will also make SIOCGSTAMP easier to implement. PiperOrigin-RevId: 234179300 Change-Id: Ib5ea0b1261dc218c1a8b15a65775de0050fe3230
2019-02-15Redirect FIXME to more appropriate bugFabricio Voznika
PiperOrigin-RevId: 234147487 Change-Id: I779a6012832bb94a6b89f5bcc7d821b40ae969cc
2019-02-07Plumb IP_ADD_MEMBERSHIP and IP_DROP_MEMBERSHIP to netstack.Ian Gudger
Also includes a few fixes for IPv4 multicast support. IPv6 support is coming in a followup CL. PiperOrigin-RevId: 233008638 Change-Id: If7dae6222fef43fda48033f0292af77832d95e82
2019-02-07Implement /proc/net/unix.Rahat Mahmood
PiperOrigin-RevId: 232948478 Change-Id: Ib830121e5e79afaf5d38d17aeef5a1ef97913d23
2019-01-31Remove license commentsMichael Pratt
Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-14Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.Nicolas Lacasse
More helper structs have been added to the fsutil package to make it easier to implement fs.InodeOperations and fs.FileOperations. PiperOrigin-RevId: 229305982 Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2018-12-28Implement SO_REUSEPORT for TCP and UDP socketsAndrei Vagin
This option allows multiple sockets to be bound to the same port. Incoming packets are distributed to sockets using a hash based on source and destination addresses. This means that all packets from one sender will be received by the same server socket. PiperOrigin-RevId: 227153413 Change-Id: I59b6edda9c2209d5b8968671e9129adb675920cf
2018-12-26Plumb IP_MULTICAST_TTL to netstack.Ian Gudger
PiperOrigin-RevId: 226993086 Change-Id: I71757f231436538081d494da32ca69f709bc71c7
2018-12-21Stub out SO_OOBINLINE.Ian Gudger
We don't explicitly support out-of-band data and treat it like normal in-band data. This is equilivent to SO_OOBINLINE being enabled, so always report that it is enabled. PiperOrigin-RevId: 226572742 Change-Id: I4c30ccb83265e76c30dea631cbf86822e6ee1c1b
2018-12-21Implement SO_KEEPALIVE, TCP_KEEPIDLE, and TCP_KEEPINTVL.Ian Gudger
Within gVisor, plumb new socket options to netstack. Within netstack, fix GetSockOpt and SetSockOpt return value logic. PiperOrigin-RevId: 226532229 Change-Id: If40734e119eed633335f40b4c26facbebc791c74
2018-12-17Fix recv blocking for connectionless Unix sockets.Ian Gudger
Connectionless Unix sockets (DGRAM Unix sockets created with the socket system call) inherently only have a read queue. They do not establish bidirectional connections, instead, the connect system call only sets a default send location. Writes give the data to the other endpoint which has its own read queue. To simplify the code, connectionless Unix sockets still get read and write queues, but the write queue is a dummy and never waited on. The read queue is the connectionless endpoint's queue. This change fixes a bug where the dummy queue was incorrectly set as the read queue and the endpoint's queue was incorrectly set as the write queue. This meant that read notifications went to the dummy queue and were black holed. PiperOrigin-RevId: 225921042 Change-Id: I8d9059def787a2c3c305185b92d05093fbd2be2a
2018-12-14Move fdnotifier package to reduce internal confusion.Adin Scannell
PiperOrigin-RevId: 225632398 Change-Id: I909e7e2925aa369adc28e844c284d9a6108e85ce
2018-12-14Implement SO_SNDTIMEOIan Gudger
PiperOrigin-RevId: 225620490 Change-Id: Ia726107b3f58093a5f881634f90b071b32d2c269
2018-12-13Fix WAITALL and RCVTIMEO interactionIan Gudger
PiperOrigin-RevId: 225424296 Change-Id: I60fcc2b859339dca9963cb32227a287e719ab765
2018-12-10Implement MSG_WAITALLIan Gudger
MSG_WAITALL requests that recv family calls do not perform short reads. It only has an effect for SOCK_STREAM sockets, other types ignore it. PiperOrigin-RevId: 224918540 Change-Id: Id97fbf972f1f7cbd4e08eec0138f8cbdf1c94fe7
2018-12-09Stub out TCP_QUICKACKIan Gudger
PiperOrigin-RevId: 224696233 Change-Id: I45c425d9e32adee5dcce29ca7439a06567b26014