Age | Commit message (Collapse) | Author |
|
Refactor fs/host.TTYFileOperations so that the relevant functionality can be
shared with VFS2 (fsimpl/host.ttyFD).
Incorporate host.defaultFileFD into the default host.fileDescription. This way,
there is no need for a separate default_file.go. As in vfs1, the TTY file
implementation can be built on top of this default and override operations as
necessary (PRead/Read/PWrite/Write, Release, Ioctl).
Note that these changes still need to be plumbed into runsc, which refers to
imported TTYs in control/proc.go:ExecAsync.
Updates #1672.
PiperOrigin-RevId: 301718157
|
|
PiperOrigin-RevId: 301402181
|
|
- When setting up the virtual filesystem, mount a host.filesystem to contain
all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
enabled.
PiperOrigin-RevId: 300922353
|
|
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj).
- Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu
and TaskSet.mu in the wrong order (via Task.ExitState()).
PiperOrigin-RevId: 300814698
|
|
Issue #1833
PiperOrigin-RevId: 299998105
|
|
In VFS2, imported file descriptors are stored in a kernfs-based filesystem.
Upon calling ImportFD, the host fd can be accessed in two ways:
1. a FileDescription that can be added to the FDTable, and
2. a Dentry in the host.filesystem mount, which we will want to access through
magic symlinks in /proc/[pid]/fd/.
An implementation of the kernfs.Inode interface stores a unique host fd. This
inode can be inserted into file descriptions as well as dentries.
This change also plumbs in three FileDescriptionImpls corresponding to fds for
sockets, TTYs, and other files (only the latter is implemented here).
These implementations will mostly make corresponding syscalls to the host.
Where possible, the logic is ported over from pkg/sentry/fs/host.
Updates #1672
PiperOrigin-RevId: 299417263
|
|
When list elements are removed from a list but not discarded, it becomes
important to invalidate the references they hold to their former
neighbors to prevent memory leaks.
PiperOrigin-RevId: 299412421
|
|
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts
writes of values -1000 to 1000 and persists the value with the task. New tasks
inherit the parent's oom_score_adj.
oom_score is a read-only stub that always returns the value '0'.
Issue #202
PiperOrigin-RevId: 299245355
|
|
PiperOrigin-RevId: 298380654
|
|
/dev/net/tun does not currently work with hostinet. This has caused some
program starts failing because it thinks the feature exists.
PiperOrigin-RevId: 297876196
|
|
MayDelete must lock the directory also, otherwise concurrent renames may
race. Note that this also changes the methods to be aligned with the actual
Remove and RemoveDirectory methods to minimize confusion when reading the
code. (It was hard to see that resolution was correct.)
PiperOrigin-RevId: 297258304
|
|
Each mount is holds a reference on a root Dirent, but the mount itself may
live beyond it's own reference. This means that a call to Root() can come
after the associated reference has been dropped.
Instead of introducing a separate layer of references for mount objects,
we simply change the Root() method to use TryIncRef() and allow it to return
nil if the mount is already gone. This requires updating a small number of
callers and minimizes the change (since VFSv2 will replace this code shortly).
PiperOrigin-RevId: 297174230
|
|
PiperOrigin-RevId: 296526279
|
|
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.
Before the userspace is able to create device, the default loopback device can
be used to test.
/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.
Issue #1833
PiperOrigin-RevId: 296309389
|
|
- Added fsbridge package with interface that can be used to open
and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup
Updates #1623
PiperOrigin-RevId: 295183950
|
|
Fixes #1812. (The more direct cause of the deadlock is panic unsafety because
the historically high cost of defer means that we avoid it in hot paths,
including much of MM; defer is much cheaper as of Go 1.14, but still a
measurable overhead.)
PiperOrigin-RevId: 294560316
|
|
The slaveInodeOperations is currently copying the object when
truncate is called (which is a no-op). This may result in a
(unconsequential) data race when being modified concurrently.
PiperOrigin-RevId: 294484276
|
|
PiperOrigin-RevId: 294295852
|
|
Note that these are only implemented for tmpfs, and other impls will still
return EOPNOTSUPP.
PiperOrigin-RevId: 293899385
|
|
These were out-of-band notes that can help provide additional context
and simplify automated imports.
PiperOrigin-RevId: 293525915
|
|
Updates #1198
Opening host pipes (by spinning in fdpipe) and host sockets is not yet
complete, and will be done in a future CL.
Major differences from VFS1 gofer client (sentry/fs/gofer), with varying levels
of backportability:
- "Cache policies" are replaced by InteropMode, which control the behavior of
timestamps in addition to caching. Under InteropModeExclusive (analogous to
cacheAll) and InteropModeWritethrough (analogous to cacheAllWritethrough),
client timestamps are *not* written back to the server (it is not possible in
9P or Linux for clients to set ctime, so writing back client-authoritative
timestamps results in incoherence between atime/mtime and ctime). Under
InteropModeShared (analogous to cacheRemoteRevalidating), client timestamps
are not used at all (remote filesystem clocks are authoritative). cacheNone
is translated to InteropModeShared + new option
filesystemOptions.specialRegularFiles.
- Under InteropModeShared, "unstable attribute" reloading for permission
checks, lookup, and revalidation are fused, which is feasible in VFS2 since
gofer.filesystem controls path resolution. This results in a ~33% reduction
in RPCs for filesystem operations compared to cacheRemoteRevalidating. For
example, consider stat("/foo/bar/baz") where "/foo/bar/baz" fails
revalidation, resulting in the instantiation of a new dentry:
VFS1 RPCs:
getattr("/") // fs.MountNamespace.FindLink() => fs.Inode.CheckPermission() => gofer.inodeOperations.check() => gofer.inodeOperations.UnstableAttr()
walkgetattr("/", "foo") = fid1 // fs.Dirent.walk() => gofer.session.Revalidate() => gofer.cachePolicy.Revalidate()
clunk(fid1)
getattr("/foo") // CheckPermission
walkgetattr("/foo", "bar") = fid2 // Revalidate
clunk(fid2)
getattr("/foo/bar") // CheckPermission
walkgetattr("/foo/bar", "baz") = fid3 // Revalidate
clunk(fid3)
walkgetattr("/foo/bar", "baz") = fid4 // fs.Dirent.walk() => gofer.inodeOperations.Lookup
getattr("/foo/bar/baz") // linux.stat() => gofer.inodeOperations.UnstableAttr()
VFS2 RPCs:
getattr("/") // gofer.filesystem.walkExistingLocked()
walkgetattr("/", "foo") = fid1 // gofer.filesystem.stepExistingLocked()
clunk(fid1)
// No getattr: walkgetattr already updated metadata for permission check
walkgetattr("/foo", "bar") = fid2
clunk(fid2)
walkgetattr("/foo/bar", "baz") = fid3
// No clunk: fid3 used for new gofer.dentry
// No getattr: walkgetattr already updated metadata for stat()
- gofer.filesystem.unlinkAt() does not require instantiation of a dentry that
represents the file to be deleted. Updates #898.
- gofer.regularFileFD.OnClose() skips Tflushf for regular files under
InteropModeExclusive, as it's nonsensical to request a remote file flush
without flushing locally-buffered writes to that remote file first.
- Symlink targets are cached when InteropModeShared is not in effect.
- p9.QID.Path (which is already required to be unique for each file within a
server, and is accordingly already synthesized from device/inode numbers in
all known gofers) is used as-is for inode numbers, rather than being mapped
along with attr.RDev in the client to yet another synthetic inode number.
- Relevant parts of fsutil.CachingInodeOperations are inlined directly into
gofer package code. This avoids having to duplicate part of its functionality
in fsutil.HostMappable.
PiperOrigin-RevId: 293190213
|
|
Internal pipes are supported similarly to how internal UDS is done.
It is also controlled by the same flag.
Fixes #1102
PiperOrigin-RevId: 293150045
|
|
PiperOrigin-RevId: 292587459
|
|
Splice must not allow negative offsets. Writes also must not allow offset +
size to overflow int64. Reads are similarly broken, but not just in splice
(b/148095030).
Reported-by: syzbot+0e1ff0b95fb2859b4190@syzkaller.appspotmail.com
PiperOrigin-RevId: 292361208
|
|
Special files can have additional requirements for granularity.
For example, read from eventfd returns EINVAL if a size is less 8 bytes.
Reported-by: syzbot+3905f5493bec08eb7b02@syzkaller.appspotmail.com
PiperOrigin-RevId: 292002926
|
|
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.
PiperOrigin-RevId: 291811289
|
|
PiperOrigin-RevId: 291745021
|
|
Also renames TMutex to Mutex.
These custom mutexes aren't any worse than the standard library versions (same
code), so having both seems redundant.
PiperOrigin-RevId: 290873587
|
|
Some files were missing the last line break.
PiperOrigin-RevId: 290808898
|
|
Java 11 parses /proc/self/mountinfo for cgroup information. Java 11.0.4 uses
the mount path to determine what cgroups existed, but Java 11.0.5 reads the
cgroup names from the superblock options.
This CL adds the cgroup name to the superblock options if the filesystem type
is "cgroup". Since gVisor doesn't actually support cgroups yet, we just infer
the cgroup name from the path.
PiperOrigin-RevId: 290434323
|
|
We must hold fs.renameMu to access Dirent.parent.
PiperOrigin-RevId: 290340804
|
|
We were setting queue.readable without holding the lock.
PiperOrigin-RevId: 290306922
|
|
PiperOrigin-RevId: 290272560
|
|
PiperOrigin-RevId: 290198756
|
|
PiperOrigin-RevId: 290186303
|
|
There is a lot of code duplication for VFSv2 and this
serves as remind to keep the copies in sync.
Updates #1195
PiperOrigin-RevId: 290139234
|
|
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.
Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.
Extended attributes will be implemented in future commits.
PiperOrigin-RevId: 290121300
|
|
Except for one under /proc/sys/net/ipv4/tcp_sack.
/proc/pid/* is still incomplete.
Updates #1195
PiperOrigin-RevId: 290120438
|
|
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387
|
|
PiperOrigin-RevId: 288642552
|
|
- Renamed memfs to tmpfs.
- Copied fileRangeSet bits from fs/fsutil/ to fsimpl/tmpfs/
- Changed tmpfs to be backed by filemem instead of byte slice.
- regularFileReadWriter uses a sync.Pool, similar to gofer client.
PiperOrigin-RevId: 288356380
|
|
Otherwise a copy happens, which triggers a data race when reading
masterInodeOperations.SimpleFileOperations.uattr, which must be accessed with a
lock held.
PiperOrigin-RevId: 286464473
|
|
PiperOrigin-RevId: 286248378
|
|
PiperOrigin-RevId: 286051631
|
|
PiperOrigin-RevId: 285874181
|
|
Add checks for input arguments, file type, permissions, etc. that match
the Linux implementation. A call to get/setxattr that passes all the
checks will still currently return EOPNOTSUPP. Actual support will be
added in following commits.
Only allow user.* extended attributes for the time being.
PiperOrigin-RevId: 285835159
|
|
Copy up parent when binding UDS on overlayfs is supported in commit
02ab1f187cd24c67b754b004229421d189cee264.
But the using of copyUp in overlayBind will cause sentry stuck, reason
is dead lock in renameMu.
1 [Process A] Invoke a Unix socket bind operation
renameMu is hold in fs.(*Dirent).genericCreate by process A
2 [Process B] Invoke a read syscall on /proc/task/mounts
waitng on Lock of renameMu in fs.(*MountNamespace).FindMount
3 [Process A] Continue Unix socket bind operation
wating on RLock of renameMu in fs.copyUp
Root cause is recursive reading lock of reanmeMu in bind call trace,
if there are writing lock between the two reading lock, then deadlock
occured.
Fixes #1397
|
|
After the finalizer optimize in 76039f895995c3fe0deef5958f843868685ecc38
commit, clientFile needs to closed before finalizer release it.
The clientFile is not closed if it is created via
gofer.(*inodeOperations).Bind, this will cause fd leak which is hold
by gofer process.
Fixes #1396
Signed-off-by: Yong He <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
PiperOrigin-RevId: 284606233
|
|
Threadgroups already know their TTY (if they have one), which now contains the
TTY Index, and is returned in the Processes() call.
PiperOrigin-RevId: 284263850
|