summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/fs
AgeCommit message (Collapse)Author
2020-04-02Fix typo in TODO comments.Dean Deng
PiperOrigin-RevId: 304508083
2020-03-31Include original copyUp error in panic if cleanupUpper fails.Nicolas Lacasse
When copyUp fails, we attempt to clean up the upper filesystem by removing any files that have already been copied-up. If the cleanup fails, we panic because the "overlay filesystem is in an inconsistent state". This CL adds the original copy-up error to the panic information, to hopefully make it easier to track down how the overlay filesystem got into the inconsistent state. PiperOrigin-RevId: 304053370
2020-03-26Use host-defined file owner and mode, when possible, for imported fds.Dean Deng
Using the host-defined file owner matches VFS1. It is more correct to use the host-defined mode, since the cached value may become out of date. However, kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are in-memory so retrieving mode should not fail. Therefore, if the host syscall fails, we rely on a cached value instead. Updates #1672. PiperOrigin-RevId: 303220864
2020-03-19Remove the "frozen" bit from dirents.Zach Koopmans
Frozen was to lock down changes to the host filesystem for hostFS. Now that hostFS is gone, it can be removed. PiperOrigin-RevId: 301907923
2020-03-18Port imported TTY fds to vfs2.Dean Deng
Refactor fs/host.TTYFileOperations so that the relevant functionality can be shared with VFS2 (fsimpl/host.ttyFD). Incorporate host.defaultFileFD into the default host.fileDescription. This way, there is no need for a separate default_file.go. As in vfs1, the TTY file implementation can be built on top of this default and override operations as necessary (PRead/Read/PWrite/Write, Release, Ioctl). Note that these changes still need to be plumbed into runsc, which refers to imported TTYs in control/proc.go:ExecAsync. Updates #1672. PiperOrigin-RevId: 301718157
2020-03-17Remove HostFS from Sentry.Zach Koopmans
PiperOrigin-RevId: 301402181
2020-03-14Plumb VFS2 imported fds into virtual filesystem.Dean Deng
- When setting up the virtual filesystem, mount a host.filesystem to contain all files that need to be imported. - Make read/preadv syscalls to the host in cases where preadv2 may not be supported yet (likewise for writing). - Make save/restore functions in kernel/kernel.go return early if vfs2 is enabled. PiperOrigin-RevId: 300922353
2020-03-13Fix oom_score_adj.Jamie Liu
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj). - Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu and TaskSet.mu in the wrong order (via Task.ExitState()). PiperOrigin-RevId: 300814698
2020-03-09Move /proc/net to /proc/PID/net, and make /proc/net -> /proc/self/net.Ting-Yu Wang
Issue #1833 PiperOrigin-RevId: 299998105
2020-03-06Add plumbing for importing fds in VFS2, along with non-socket, non-TTY impl.Dean Deng
In VFS2, imported file descriptors are stored in a kernfs-based filesystem. Upon calling ImportFD, the host fd can be accessed in two ways: 1. a FileDescription that can be added to the FDTable, and 2. a Dentry in the host.filesystem mount, which we will want to access through magic symlinks in /proc/[pid]/fd/. An implementation of the kernfs.Inode interface stores a unique host fd. This inode can be inserted into file descriptions as well as dentries. This change also plumbs in three FileDescriptionImpls corresponding to fds for sockets, TTYs, and other files (only the latter is implemented here). These implementations will mostly make corresponding syscalls to the host. Where possible, the logic is ported over from pkg/sentry/fs/host. Updates #1672 PiperOrigin-RevId: 299417263
2020-03-06Prevent memory leaks in ilistTamir Duberstein
When list elements are removed from a list but not discarded, it becomes important to invalidate the references they hold to their former neighbors to prevent memory leaks. PiperOrigin-RevId: 299412421
2020-03-05Stub oom_score_adj and oom_score.Ian Lewis
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts writes of values -1000 to 1000 and persists the value with the task. New tasks inherit the parent's oom_score_adj. oom_score is a read-only stub that always returns the value '0'. Issue #202 PiperOrigin-RevId: 299245355
2020-03-02Take write lock when removing xattrMichael Pratt
PiperOrigin-RevId: 298380654
2020-02-28Hide /dev/net/tun when using hostinet.Ting-Yu Wang
/dev/net/tun does not currently work with hostinet. This has caused some program starts failing because it thinks the feature exists. PiperOrigin-RevId: 297876196
2020-02-25Fix DATA RACE in fs.MayDelete.Adin Scannell
MayDelete must lock the directory also, otherwise concurrent renames may race. Note that this also changes the methods to be aligned with the actual Remove and RemoveDirectory methods to minimize confusion when reading the code. (It was hard to see that resolution was correct.) PiperOrigin-RevId: 297258304
2020-02-25Fix mount refcount issue.Adin Scannell
Each mount is holds a reference on a root Dirent, but the mount itself may live beyond it's own reference. This means that a call to Root() can come after the associated reference has been dropped. Instead of introducing a separate layer of references for mount objects, we simply change the Root() method to use TryIncRef() and allow it to return nil if the mount is already gone. This requires updating a small number of callers and minimizes the change (since VFSv2 will replace this code shortly). PiperOrigin-RevId: 297174230
2020-02-21Implement tap/tun device in vfs.Ting-Yu Wang
PiperOrigin-RevId: 296526279
2020-02-20Initial network namespace support.gVisor bot
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-14Plumb VFS2 inside the SentrygVisor bot
- Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950
2020-02-11Ensure fsimpl/gofer.dentryPlatformFile.hostFileMapper is initialized.gVisor bot
Fixes #1812. (The more direct cause of the deadlock is panic unsafety because the historically high cost of defer means that we avoid it in hot paths, including much of MM; defer is much cheaper as of Go 1.14, but still a measurable overhead.) PiperOrigin-RevId: 294560316
2020-02-11Prevent DATA RACE in UnstableAttr.Adin Scannell
The slaveInodeOperations is currently copying the object when truncate is called (which is a no-op). This may result in a (unconsequential) data race when being modified concurrently. PiperOrigin-RevId: 294484276
2020-02-10Add context to comments.Adin Scannell
PiperOrigin-RevId: 294295852
2020-02-07Support listxattr and removexattr syscalls.Dean Deng
Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-05Add notes to relevant tests.Adin Scannell
These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915
2020-02-04VFS2 gofer clientJamie Liu
Updates #1198 Opening host pipes (by spinning in fdpipe) and host sockets is not yet complete, and will be done in a future CL. Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels of backportability: - "Cache policies" are replaced by InteropMode, which control the behavior of timestamps in addition to caching. Under InteropModeExclusive (analogous to cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough), client timestamps are *not* written back to the server (it is not possible in 9P or Linux for clients to set ctime, so writing back client-authoritative timestamps results in incoherence between atime/mtime and ctime). Under InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps are not used at all (remote filesystem clocks are authoritative). cacheNone is translated to InteropModeShared + new option filesystemOptions.specialRegularFiles. - Under InteropModeShared, "unstable attribute" reloading for permission checks, lookup, and revalidation are fused, which is feasible in VFS2 since gofer.filesystem controls path resolution. This results in a ~33% reduction in RPCs for filesystem operations compared to cacheRemoteRevalidating. For example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails revalidation, resulting in the instantiation of a new dentry: VFS1 RPCs: getattr("/") // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr() walkgetattr("/", "foo") = fid1 // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate() clunk(fid1) getattr("/foo") // CheckPermission walkgetattr("/foo", "bar") = fid2 // Revalidate clunk(fid2) getattr("/foo/bar") // CheckPermission walkgetattr("/foo/bar", "baz") = fid3 // Revalidate clunk(fid3) walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup getattr("/foo/bar/baz") // linux.stat() => gofer.inodeOperations.UnstableAttr() VFS2 RPCs: getattr("/") // gofer.filesystem.walkExistingLocked() walkgetattr("/", "foo") = fid1 // gofer.filesystem.stepExistingLocked() clunk(fid1) // No getattr: walkgetattr already updated metadata for permission check walkgetattr("/foo", "bar") = fid2 clunk(fid2) walkgetattr("/foo/bar", "baz") = fid3 // No clunk: fid3 used for new gofer.dentry // No getattr: walkgetattr already updated metadata for stat() - gofer.filesystem.unlinkAt() does not require instantiation of a dentry that represents the file to be deleted. Updates #898. - gofer.regularFileFD.OnClose() skips Tflushf for regular files under InteropModeExclusive, as it's nonsensical to request a remote file flush without flushing locally-buffered writes to that remote file first. - Symlink targets are cached when InteropModeShared is not in effect. - p9.QID.Path (which is already required to be unique for each file within a server, and is accordingly already synthesized from device/inode numbers in all known gofers) is used as-is for inode numbers, rather than being mapped along with attr.RDev in the client to yet another synthetic inode number. - Relevant parts of fsutil.CachingInodeOperations are inlined directly into gofer package code. This avoids having to duplicate part of its functionality in fsutil.HostMappable. PiperOrigin-RevId: 293190213
2020-02-04Add support for sentry internal pipe for gofer mountsFabricio Voznika
Internal pipes are supported similarly to how internal UDS is done. It is also controlled by the same flag. Fixes #1102 PiperOrigin-RevId: 293150045
2020-01-31Internal change.gVisor bot
PiperOrigin-RevId: 292587459
2020-01-30Enforce splice offset limitsMichael Pratt
Splice must not allow negative offsets. Writes also must not allow offset + size to overflow int64. Reads are similarly broken, but not just in splice (b/148095030). Reported-by: syzbot+0e1ff0b95fb2859b4190@syzkaller.appspotmail.com PiperOrigin-RevId: 292361208
2020-01-28fs/splice: don't report partial errors for special filesAndrei Vagin
Special files can have additional requirements for granularity. For example, read from eventfd returns EINVAL if a size is less 8 bytes. Reported-by: syzbot+3905f5493bec08eb7b02@syzkaller.appspotmail.com PiperOrigin-RevId: 292002926
2020-01-27Update package locations.Adin Scannell
Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021
2020-01-21Rename DowngradableRWMutex to RWmutex.Ian Gudger
Also renames TMutex to Mutex. These custom mutexes aren't any worse than the standard library versions (same code), so having both seems redundant. PiperOrigin-RevId: 290873587
2020-01-21Add line break to /proc/net filesFabricio Voznika
Some files were missing the last line break. PiperOrigin-RevId: 290808898
2020-01-18Include the cgroup name in the superblock options in /proc/self/mountinfo.Nicolas Lacasse
Java 11 parses /proc/self/mountinfo for cgroup information. Java 11.0.4 uses the mount path to determine what cgroups existed, but Java 11.0.5 reads the cgroup names from the superblock options. This CL adds the cgroup name to the superblock options if the filesystem type is "cgroup". Since gVisor doesn't actually support cgroups yet, we just infer the cgroup name from the path. PiperOrigin-RevId: 290434323
2020-01-17Fix data race in MountNamespace.resolve.Nicolas Lacasse
We must hold fs.renameMu to access Dirent.parent. PiperOrigin-RevId: 290340804
2020-01-17Fix data race in tty.queue.readableSize.Nicolas Lacasse
We were setting queue.readable without holding the lock. PiperOrigin-RevId: 290306922
2020-01-17Add explanation for implementation of BSD full file locks.Dean Deng
PiperOrigin-RevId: 290272560
2020-01-16Remove unused rpcinet.Adin Scannell
PiperOrigin-RevId: 290198756
2020-01-16Implement setxattr for overlays.Dean Deng
PiperOrigin-RevId: 290186303
2020-01-16Add IfChange/ThenChange reminders in fs/procFabricio Voznika
There is a lot of code duplication for VFSv2 and this serves as remind to keep the copies in sync. Updates #1195 PiperOrigin-RevId: 290139234
2020-01-16Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.Dean Deng
There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300
2020-01-16Add remaining /proc/* and /proc/sys/* filesFabricio Voznika
Except for one under /proc/sys/net/ipv4/tcp_sack. /proc/pid/* is still incomplete. Updates #1195 PiperOrigin-RevId: 290120438
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2020-01-07fs/splice: don't report a partialResult error if there is no data lossAndrei Vagin
PiperOrigin-RevId: 288642552
2020-01-06Convert memfs into proto-tmpfs.Nicolas Lacasse
- Renamed memfs to tmpfs. - Copied fileRangeSet bits from fs/fsutil/ to fsimpl/tmpfs/ - Changed tmpfs to be backed by filemem instead of byte slice. - regularFileReadWriter uses a sync.Pool, similar to gofer client. PiperOrigin-RevId: 288356380
2019-12-19Make masterInodeOperations.Truncate take a pointer receiver.Nicolas Lacasse
Otherwise a copy happens, which triggers a data race when reading masterInodeOperations.SimpleFileOperations.uattr, which must be accessed with a lock held. PiperOrigin-RevId: 286464473
2019-12-18Add Mems_allowed to /proc/PID/statusMichael Pratt
PiperOrigin-RevId: 286248378
2019-12-17Merge pull request #1394 from zhuangel:bindlockgVisor bot
PiperOrigin-RevId: 286051631
2019-12-16Merge pull request #1392 from zhuangel:bindleakgVisor bot
PiperOrigin-RevId: 285874181
2019-12-16Implement checks for get/setxattr at the syscall layer.Dean Deng
Add checks for input arguments, file type, permissions, etc. that match the Linux implementation. A call to get/setxattr that passes all the checks will still currently return EOPNOTSUPP. Actual support will be added in following commits. Only allow user.* extended attributes for the time being. PiperOrigin-RevId: 285835159