summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/socket/hostinet
AgeCommit message (Collapse)Author
2020-09-29Merge pull request #3875 from btw616:fix/issue-3874gVisor bot
PiperOrigin-RevId: 334428344
2020-09-28Don't leak dentries returned by sockfs.NewDentry().Jamie Liu
PiperOrigin-RevId: 334263322
2020-09-24Fix socket record leak in VFS2Tiwei Bie
VFS2 socket record is not removed from the system-wide socket table when the socket is released, which will lead to a memory leak. This patch fixes this issue. Fixes: #3874 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-20Merge pull request #3651 from ianlewis:ip-forwardinggVisor bot
PiperOrigin-RevId: 332760843
2020-09-17Complete vfs2 implementation of fallocate.Dean Deng
This change includes overlay, special regular gofer files, and hostfs. Fixes #3589. PiperOrigin-RevId: 332330860
2020-09-11Move the 'marshal' and 'primitive' packages to the 'pkg' directory.Rahat Mahmood
PiperOrigin-RevId: 331256608
2020-08-17Merge branch 'master' into ip-forwardingIan Lewis
- Merges aleksej-paschenko's with HEAD - Adds vfs2 support for ip_forward
2020-08-05Add loss recovery option for TCP.Nayana Bidari
/proc/sys/net/ipv4/tcp_recovery is used to enable RACK loss recovery in TCP. PiperOrigin-RevId: 325157807
2020-08-03Plumbing context.Context to DecRef() and Release().Nayana Bidari
context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-23Merge pull request #3142 from tanjianfeng:fix-3141gVisor bot
PiperOrigin-RevId: 322937495
2020-07-23Marshallable socket opitons.Ayush Ranjan
Socket option values are now required to implement marshal.Marshallable. Co-authored-by: Rahat Mahmood <rahat@google.com> PiperOrigin-RevId: 322831612
2020-07-15hostinet: fix fd leak in fdnotifier for VFS2Tiwei Bie
When we failed to create the new socket after adding the fd to fdnotifier, we should remove the fd from fdnotifier, because we are going to close the fd directly. Fixes: #3241 Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-07-06hostinet: fix not specifying family fieldJianfeng Tan
Creating sockets by hostinet with VFS2 fails due to triggerring a seccomp violation. In essence, we fails to pass down the field of family. We fix this by passing down this field, family. Fixes #3141 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2020-07-01Port fallocate to VFS2.Zach Koopmans
PiperOrigin-RevId: 319283715
2020-06-18Remove various uses of 'whitelist'Michael Pratt
Updates #2972 PiperOrigin-RevId: 317113059
2020-06-17Implement POSIX locksFabricio Voznika
- Change FileDescriptionImpl Lock/UnlockPOSIX signature to take {start,length,whence}, so the correct offset can be calculated in the implementations. - Create PosixLocker interface to make it possible to share the same locking code from different implementations. Closes #1480 PiperOrigin-RevId: 316910286
2020-06-09Implement flock(2) in VFS2Fabricio Voznika
LockFD is the generic implementation that can be embedded in FileDescriptionImpl implementations. Unique lock ID is maintained in vfs.FileDescription and is created on demand. Updates #1480 PiperOrigin-RevId: 315604825
2020-05-19Fix flaky udp tests by polling before reading.Dean Deng
On native Linux, calling recv/read right after send/write sometimes returns EWOULDBLOCK, if the data has not made it to the receiving socket (even though the endpoints are on the same host). Poll before reading to avoid this. Making this change also uncovered a hostinet bug (gvisor.dev/issue/2726), which is noted in this CL. PiperOrigin-RevId: 312320587
2020-05-07Allocate device numbers for VFS2 filesystems.Jamie Liu
Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006
2020-05-05Update vfs2 socket TODOs.Dean Deng
Three updates: - Mark all vfs2 socket syscalls as supported. - Use the same dev number and ino number generator for all types of sockets, unlike in VFS1. - Do not use host fd for hostinet metadata. Fixes #1476, #1478, #1484, 1485, #2017. PiperOrigin-RevId: 309994579
2020-05-01Port netstack, hostinet, and netlink sockets to VFS2.Dean Deng
All three follow the same pattern: 1. Refactor VFS1 sockets into socketOpsCommon, so that most of the methods can be shared with VFS2. 2. Create a FileDescriptionImpl with the corresponding socket operations, rewriting the few that cannot be shared with VFS1. 3. Set up a VFS2 socket provider that creates a socket by setting up a dentry in the global Kernel.socketMount and connecting it with a new FileDescription. This mostly completes the work for porting sockets to VFS2, and many syscall tests can be enabled as a result. There are several networking-related syscall tests that are still not passing: 1. net gofer tests 2. socketpair gofer tests 2. sendfile tests (splice is not implemented in VFS2 yet) Updates #1478, #1484, #1485 PiperOrigin-RevId: 309457331
2020-02-13Internal change.gVisor bot
PiperOrigin-RevId: 294952610
2020-02-05Add notes to relevant tests.Adin Scannell
These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915
2020-02-04Support RTM_NEWADDR and RTM_GETLINK in (rt)netlink.Ting-Yu Wang
PiperOrigin-RevId: 293271055
2020-02-04VFS2 gofer clientJamie Liu
Updates #1198 Opening host pipes (by spinning in fdpipe) and host sockets is not yet complete, and will be done in a future CL. Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels of backportability: - "Cache policies" are replaced by InteropMode, which control the behavior of timestamps in addition to caching. Under InteropModeExclusive (analogous to cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough), client timestamps are *not* written back to the server (it is not possible in 9P or Linux for clients to set ctime, so writing back client-authoritative timestamps results in incoherence between atime/mtime and ctime). Under InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps are not used at all (remote filesystem clocks are authoritative). cacheNone is translated to InteropModeShared + new option filesystemOptions.specialRegularFiles. - Under InteropModeShared, "unstable attribute" reloading for permission checks, lookup, and revalidation are fused, which is feasible in VFS2 since gofer.filesystem controls path resolution. This results in a ~33% reduction in RPCs for filesystem operations compared to cacheRemoteRevalidating. For example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails revalidation, resulting in the instantiation of a new dentry: VFS1 RPCs: getattr("/") // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr() walkgetattr("/", "foo") = fid1 // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate() clunk(fid1) getattr("/foo") // CheckPermission walkgetattr("/foo", "bar") = fid2 // Revalidate clunk(fid2) getattr("/foo/bar") // CheckPermission walkgetattr("/foo/bar", "baz") = fid3 // Revalidate clunk(fid3) walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup getattr("/foo/bar/baz") // linux.stat() => gofer.inodeOperations.UnstableAttr() VFS2 RPCs: getattr("/") // gofer.filesystem.walkExistingLocked() walkgetattr("/", "foo") = fid1 // gofer.filesystem.stepExistingLocked() clunk(fid1) // No getattr: walkgetattr already updated metadata for permission check walkgetattr("/foo", "bar") = fid2 clunk(fid2) walkgetattr("/foo/bar", "baz") = fid3 // No clunk: fid3 used for new gofer.dentry // No getattr: walkgetattr already updated metadata for stat() - gofer.filesystem.unlinkAt() does not require instantiation of a dentry that represents the file to be deleted. Updates #898. - gofer.regularFileFD.OnClose() skips Tflushf for regular files under InteropModeExclusive, as it's nonsensical to request a remote file flush without flushing locally-buffered writes to that remote file first. - Symlink targets are cached when InteropModeShared is not in effect. - p9.QID.Path (which is already required to be unique for each file within a server, and is accordingly already synthesized from device/inode numbers in all known gofers) is used as-is for inode numbers, rather than being mapped along with attr.RDev in the client to yet another synthetic inode number. - Relevant parts of fsutil.CachingInodeOperations are inlined directly into gofer package code. This avoids having to duplicate part of its functionality in fsutil.HostMappable. PiperOrigin-RevId: 293190213
2020-01-27Update package locations.Adin Scannell
Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021
2019-12-10Deduplicate and simplify control message processing for recvmsg and sendmsg.Dean Deng
Also, improve performance by calculating how much space is needed before making an allocation for sendmsg in hostinet. PiperOrigin-RevId: 284898581
2019-12-03Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets.Dean Deng
There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the latter. Fixes #1188 PiperOrigin-RevId: 283550925
2019-12-02Support sending IP_TOS and IPV6_TCLASS control messages with hostinet sockets.Dean Deng
There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the former. PiperOrigin-RevId: 283346737
2019-11-27Add support for receiving TOS and TCLASS control messages in hostinet.Dean Deng
This involves allowing getsockopt/setsockopt for the corresponding socket options, as well as allowing hostinet to process control messages received from the actual recvmsg syscall. PiperOrigin-RevId: 282851425
2019-10-29Add endpoint tracking to the stack.Ian Gudger
In the future this will replace DanglingEndpoints. DanglingEndpoints must be kept for now due to issues with save/restore. This is arguably a cleaner design and allows the stack to know which transport endpoints might still be using its link endpoints. Updates #837 PiperOrigin-RevId: 277386633
2019-10-27Add /proc/sys/net/ipv4/ip_forwardaleksej
2019-10-23Merge pull request #641 from tanjianfeng:mastergVisor bot
PiperOrigin-RevId: 276380008
2019-10-16Reorder BUILD license and load functions in gvisor.Kevin Krakauer
PiperOrigin-RevId: 275139066
2019-10-15hostinet: support /proc/net/snmp and /proc/net/devJianfeng Tan
For hostinet, we inherit the data from host procfs. To to that, we cache the fds for these files for later reads. Fixes #506 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I2f81215477455b9c59acf67e33f5b9af28ee0165
2019-08-19hostinet: fix parsing route netlink messageJianfeng Tan
We wrongly parses output interface as gateway address. The fix is straightforward. Fixes #638 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ia4bab31f3c238b0278ea57ab22590fad00eaf061 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/684 from tanjianfeng:fix-638 b940e810367ad1273519bfa594f4371bdd293e83 PiperOrigin-RevId: 264211336
2019-08-08Return a well-defined socket address type from socket funtions.Rahat Mahmood
Previously we were representing socket addresses as an interface{}, which allowed any type which could be binary.Marshal()ed to be used as a socket address. This is fine when the address is passed to userspace via the linux ABI, but is problematic when used from within the sentry such as by networking procfs files. PiperOrigin-RevId: 262460640
2019-08-08netstack: Don't start endpoint goroutines too soon on restore.Rahat Mahmood
Endpoint protocol goroutines were previously started as part of loading the endpoint. This is potentially too soon, as resources used by these goroutine may not have been loaded. Protocol goroutines may perform meaningful work as soon as they're started (ex: incoming connect) which can cause them to indirectly access resources that haven't been loaded yet. This CL defers resuming all protocol goroutines until the end of restore. PiperOrigin-RevId: 262409429
2019-08-02Plumbing for iptables sockopts.Kevin Krakauer
PiperOrigin-RevId: 261413396
2019-07-31Basic support for 'ip route'Ian Lewis
Implements support for RTM_GETROUTE requests for netlink sockets. Fixes #507 PiperOrigin-RevId: 261051045
2019-07-15Support /proc/net/devJianfeng Tan
This proc file reports the stats of interfaces. We could use ifconfig command to check the result. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ia7c1e637f5c76c30791ffda68ee61e861b6ef827 COPYBARA_INTEGRATE_REVIEW=https://gvisor-review.googlesource.com/c/gvisor/+/18282/ PiperOrigin-RevId: 258303936
2019-07-02Remove map from fd_map, change to fd_table.Adin Scannell
This renames FDMap to FDTable and drops the kernel.FD type, which had an entire package to itself and didn't serve much use (it was freely cast between types, and served as more of an annoyance than providing any protection.) Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup operation, and 10-15 ns per concurrent lookup operation of savings. This also fixes two tangential usage issues with the FDMap. Namely, non-atomic use of NewFDFrom and associated calls to Remove (that are both racy and fail to drop the reference on the underlying file.) PiperOrigin-RevId: 256285890
2019-06-27Complete pipe support on overlayfsFabricio Voznika
Get/Set pipe size and ioctl support were missing from overlayfs. It required moving the pipe.Sizer interface to fs so that overlay could get access. Fixes #318 PiperOrigin-RevId: 255511125
2019-06-18gvisor/fs: don't update file.offset for sockets, pipes, etcAndrei Vagin
sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644
2019-06-13Implement getsockopt() SO_DOMAIN, SO_PROTOCOL and SO_TYPE.Rahat Mahmood
SO_TYPE was already implemented for everything but netlink sockets. PiperOrigin-RevId: 253138157
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-10Store more information in the kernel socket table.Rahat Mahmood
Store enough information in the kernel socket table to distinguish between different types of sockets. Previously we were only storing the socket family, but this isn't enough to classify sockets. For example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets are SOCK_DGRAM sockets with a particular protocol. Instead of creating more sub-tables, flatten the socket table and provide a filtering mechanism based on the socket entry. Also generate and store a socket entry index ("sl" in linux) which allows us to output entries in a stable order from procfs. PiperOrigin-RevId: 252495895
2019-06-06Use common definition of SockType.Rahat Mahmood
SockType isn't specific to unix domain sockets, and the current definition basically mirrors the linux ABI's definition. PiperOrigin-RevId: 251956740
2019-06-06Track and export socket state.Rahat Mahmood
This is necessary for implementing network diagnostic interfaces like /proc/net/{tcp,udp,unix} and sock_diag(7). For pass-through endpoints such as hostinet, we obtain the socket state from the backend. For netstack, we add explicit tracking of TCP states. PiperOrigin-RevId: 251934850