summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/socket
AgeCommit message (Collapse)Author
2020-01-08More GH comments.Kevin Krakauer
2020-01-08Return correct length with MSG_TRUNC for unix sockets.Ian Lewis
This change calls a new Truncate method on the EndpointReader in RecvMsg for both netlink and unix sockets. This allows readers such as sockets to peek at the length of data without actually reading it to a buffer. Fixes #993 #1240 PiperOrigin-RevId: 288800167
2020-01-08Addressed GH commentsKevin Krakauer
2020-01-08Fix slice bounds out of range panic in parsing socket control message.Ting-Yu Wang
Panic found by syzakller. PiperOrigin-RevId: 288799046
2020-01-08Getting a panic when running tests. For some reason the filter table isKevin Krakauer
ending up with the wrong chains and is indexing -1 into rules.
2020-01-08Introduce tcpip.SockOptBoolTamir Duberstein
...and port V6OnlyOption to it. PiperOrigin-RevId: 288789451
2020-01-08Built dead-simple traversal, but now getting depedency cycle error :'(Kevin Krakauer
2020-01-08Rename tcpip.SockOpt{,Int}Tamir Duberstein
PiperOrigin-RevId: 288772878
2020-01-08First commit -- re-adding DROPKevin Krakauer
2020-01-08Comment cleanup.Kevin Krakauer
2020-01-08Minor fixes to comments and loggingKevin Krakauer
2020-01-08Write simple ACCEPT rules to the filter table.Kevin Krakauer
This gets us closer to passing the iptables tests and opens up iptables so it can be worked on by multiple people. A few restrictions are enforced for security (i.e. we don't want to let users write a bunch of iptables rules and then just not enforce them): - Only the filter table is writable. - Only ACCEPT rules with no matching criteria can be added.
2019-12-26Automated rollback of changelist 287029703gVisor bot
PiperOrigin-RevId: 287217899
2019-12-24Enable IP_RECVTOS socket option for datagram socketsRyan Heacock
Added the ability to get/set the IP_RECVTOS socket option on UDP endpoints. If enabled, TOS from the incoming Network Header passed as ancillary data in the ControlMessages. Test: * Added unit test to udp_test.go that tests getting/setting as well as verifying that we receive expected TOS from incoming packet. * Added a syscall test PiperOrigin-RevId: 287029703
2019-12-12unix: allow to bind unix sockets only to AF_UNIX addressesAndrei Vagin
Reported-by: syzbot+2c0bcfd87fb4e8b7b009@syzkaller.appspotmail.com PiperOrigin-RevId: 285228312
2019-12-11Add support for TCP_USER_TIMEOUT option.Bhasker Hariharan
The implementation follows the linux behavior where specifying a TCP_USER_TIMEOUT will cause the resend timer to honor the user specified timeout rather than the default rto based timeout. Further it alters when connections are timedout due to keepalive failures. It does not alter the behavior of when keepalives are sent. This is as per the linux behavior. PiperOrigin-RevId: 285099795
2019-12-10Deduplicate and simplify control message processing for recvmsg and sendmsg.Dean Deng
Also, improve performance by calculating how much space is needed before making an allocation for sendmsg in hostinet. PiperOrigin-RevId: 284898581
2019-12-10Let socket.ControlMessages Release() the underlying transport.ControlMessages.Dean Deng
PiperOrigin-RevId: 284804370
2019-12-10Make comments clearer for control message handling.Dean Deng
PiperOrigin-RevId: 284791600
2019-12-06Add TCP stats for connection close and keep-alive timeouts.Mithun Iyer
Fix bugs in updates to TCP CurrentEstablished stat. Fixes #1277 PiperOrigin-RevId: 284292459
2019-12-03Remove TODO for obsolete bug.Zach Koopmans
PiperOrigin-RevId: 283571456
2019-12-03Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets.Dean Deng
There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the latter. Fixes #1188 PiperOrigin-RevId: 283550925
2019-12-02Support sending IP_TOS and IPV6_TCLASS control messages with hostinet sockets.Dean Deng
There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the former. PiperOrigin-RevId: 283346737
2019-11-27Add support for receiving TOS and TCLASS control messages in hostinet.Dean Deng
This involves allowing getsockopt/setsockopt for the corresponding socket options, as well as allowing hostinet to process control messages received from the actual recvmsg syscall. PiperOrigin-RevId: 282851425
2019-11-07Add support for TIME_WAIT timeout.Bhasker Hariharan
This change adds explicit support for honoring the 2MSL timeout for sockets in TIME_WAIT state. It also adds support for the TCP_LINGER2 option that allows modification of the FIN_WAIT2 state timeout duration for a given socket. It also adds an option to modify the Stack wide TIME_WAIT timeout but this is only for testing. On Linux this is fixed at 60s. Further, we also now correctly process RST's in CLOSE_WAIT and close the socket similar to linux without moving it to error state. We also now handle SYN in ESTABLISHED state as per RFC5961#section-4.1. Earlier we would just drop these SYNs. Which can result in some tests that pass on linux to fail on gVisor. Netstack now honors TIME_WAIT correctly as well as handles the following cases correctly. - TCP RSTs in TIME_WAIT are ignored. - A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT and a dup ACK is sent in response to the FIN as the dup FIN indicates potential loss of the original final ACK. - An out of order segment during TIME_WAIT generates a dup ACK. - A new SYN w/ a sequence number > the highest sequence number in the previous connection closes the TIME_WAIT early and opens a new connection. Further to make the SYN case work correctly the ISN (Initial Sequence Number) generation for Netstack has been updated to be as per RFC. Its not a pure random number anymore and follows the recommendation in https://tools.ietf.org/html/rfc6528#page-3. The current hash used is not a cryptographically secure hash function. A separate change will update the hash function used to Siphash similar to what is used in Linux. PiperOrigin-RevId: 279106406
2019-11-04Add NETLINK_KOBJECT_UEVENT socket supportMichael Pratt
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events. gVisor doesn't have any device events, so our sockets don't need to do anything once created. systemd's device manager needs to be able to create one of these sockets. It also wants to install a BPF filter on the socket. Since we'll never send any messages, the filter would never be invoked, thus we just fake it out. Fixes #1117 Updates #1119 PiperOrigin-RevId: 278405893
2019-11-01Add SO_PASSCRED support to netlink socketsMichael Pratt
Since we only supporting sending messages from the kernel, the peer is always the kernel, simplifying handling. There are currently no known users of SO_PASSCRED that would actually receive messages from gVisor, but adding full support is barely more work than stubbing out fake support. Updates #1117 Fixes #1119 PiperOrigin-RevId: 277981465
2019-10-29Add endpoint tracking to the stack.Ian Gudger
In the future this will replace DanglingEndpoints. DanglingEndpoints must be kept for now due to issues with save/restore. This is arguably a cleaner design and allows the stack to know which transport endpoints might still be using its link endpoints. Updates #837 PiperOrigin-RevId: 277386633
2019-10-25Convert DelayOption to the newer/faster SockOpt int type.Ian Gudger
DelayOption is set on all new endpoints in gVisor. PiperOrigin-RevId: 276746791
2019-10-23Merge pull request #641 from tanjianfeng:mastergVisor bot
PiperOrigin-RevId: 276380008
2019-10-21AF_PACKET support for netstack (aka epsocket).Kevin Krakauer
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366
2019-10-16Reorder BUILD license and load functions in gvisor.Kevin Krakauer
PiperOrigin-RevId: 275139066
2019-10-16Merge pull request #736 from tanjianfeng:fix-unixgVisor bot
PiperOrigin-RevId: 275114157
2019-10-15epsocket: support /proc/net/snmpJianfeng Tan
Netstack has its own stats, we use this to fill /proc/net/snmp. Note that some metrics are not recorded in Netstack, which will be shown as 0 in the proc file. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15netstack: add counters for tcp CurrEstab and EstabResetsJianfeng Tan
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-15hostinet: support /proc/net/snmp and /proc/net/devJianfeng Tan
For hostinet, we inherit the data from host procfs. To to that, we cache the fds for these files for later reads. Fixes #506 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I2f81215477455b9c59acf67e33f5b9af28ee0165
2019-10-14Internal change.gVisor bot
PiperOrigin-RevId: 274700093
2019-10-10Allow for zero byte iovec with MSG_PEEK | MSG_TRUNC in recvmsg.Ian Lewis
This allows for peeking at the length of the next message on a netlink socket without pulling it off the socket's buffer/queue, allowing tools like 'ip' to work. This CL also fixes an issue where dump_done_errno was not included in the NLMSG_DONE messages payload. Issue #769 PiperOrigin-RevId: 274068637
2019-10-10Fix bugs in fragment handling.Bhasker Hariharan
Strengthen the header.IPv4.IsValid check to correctly check for IHL/TotalLength fields. Also add a check to make sure fragmentOffsets + size of the fragment do not cause a wrap around for the end of the fragment. PiperOrigin-RevId: 274049313
2019-10-09Internal change.gVisor bot
PiperOrigin-RevId: 273861936
2019-10-07Implement IP_TTL.Ian Gudger
Also change the default TTL to 64 to match Linux. PiperOrigin-RevId: 273430341
2019-10-07Rename epsocket to netstack.Kevin Krakauer
PiperOrigin-RevId: 273365058
2019-09-27Implement SO_BINDTODEVICE sockoptgVisor bot
PiperOrigin-RevId: 271644926
2019-09-26Make raw socket tests pass in environments with or without CAP_NET_RAW.Kevin Krakauer
PiperOrigin-RevId: 271442321
2019-09-23netstack: convert more socket options to {Set,Get}SockOptIntAndrei Vagin
PiperOrigin-RevId: 270763208
2019-09-23internal BUILD file cleanup.gVisor bot
PiperOrigin-RevId: 270680704
2019-09-19Remove defer from hot path and ensure Atomic is applied consistently.Adin Scannell
PiperOrigin-RevId: 270114317
2019-09-12Implement splice methods for pipes and sockets.Adin Scannell
This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207
2019-09-12Remove go_test from go_stateify and go_marshalMichael Pratt
They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-08-30Return correct buffer size for ioctl(socket, FIONREAD)Fabricio Voznika
Ioctl was returning just the buffer size from epsocket.endpoint and it was not considering data from epsocket.SocketOperations that was read from the endpoint, but not yet sent to the caller. PiperOrigin-RevId: 266485461