Age | Commit message (Collapse) | Author |
|
This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and
vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted
soon anyway.
The following structs now use the new tool, with leak check enabled:
devpts:rootInode
fuse:inode
kernfs:Dentry
kernfs:dir
kernfs:readonlyDir
kernfs:StaticDirectory
proc:fdDirInode
proc:fdInfoDirInode
proc:subtasksInode
proc:taskInode
proc:tasksInode
vfs:FileDescription
vfs:MountNamespace
vfs:Filesystem
sys:dir
kernel:FSContext
kernel:ProcessGroup
kernel:Session
shm:Shm
mm:aioMappable
mm:SpecialMappable
transport:queue
And the following use the template, but because they currently are not leak
checked, a TODO is left instead of enabling leak check in this patch:
kernel:FDTable
tun:tunEndpoint
Updates #1486.
PiperOrigin-RevId: 328460377
|
|
Unlike linux mount(2), OCI spec allows mounting on top of an existing
non-directory file.
PiperOrigin-RevId: 327914342
|
|
This lets us create "synthetic" mountpoint directories in ReadOnly mounts
during VFS setup.
Also add context.WithMountNamespace, as some filesystems (like overlay) require
a MountNamespace on ctx to handle vfs.Filesystem Operations.
PiperOrigin-RevId: 327874971
|
|
Our "Preconditions:" blocks are very useful to determine the input invariants,
but they are bit inconsistent throughout the codebase, which makes them harder
to read (particularly cases with 5+ conditions in a single paragraph).
I've reformatted all of the cases to fit in simple rules:
1. Cases with a single condition are placed on a single line.
2. Cases with multiple conditions are placed in a bulleted list.
This format has been added to the style guide.
I've also mentioned "Postconditions:", though those are much less frequently
used, and all uses already match this style.
PiperOrigin-RevId: 327687465
|
|
Updates #1035
PiperOrigin-RevId: 327351475
|
|
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.
PiperOrigin-RevId: 324672584
|
|
This is mostly syscall plumbing, VFS2 already implements the internals of
mounts. In addition to the syscall defintions, the following mount-related
mechanisms are updated:
- Implement MS_NOATIME for VFS2, but only for tmpfs and goferfs. The other VFS2
filesystems don't implement node-level timestamps yet.
- Implement the 'mode', 'uid' and 'gid' mount options for VFS2's tmpfs.
- Plumb mount namespace ownership, which is necessary for checking appropriate
capabilities during mount(2).
Updates #1035
PiperOrigin-RevId: 315035352
|
|
Updates #179
PiperOrigin-RevId: 314563830
|
|
This makes it straightforward to create bind mounts internally in VFS2: Given a
bind mount root represented by vfs.VirtualDentry vd:
- Create a new mount with VFS.NewDisconnectedMount(vd.Mount().Filesystem(),
vd.Dentry()).
- Connect the resulting mount in the appropriate namespace with
VFS.ConnectMountAt().
Note that the resulting bind mount is non-recursive; recursive bind mounting
requires explicitly duplicating all children of the original mount, which is
best handled internally by VFS.
Updates #179
PiperOrigin-RevId: 313703963
|
|
This change:
- Drastically simplifies the synchronization model: filesystem structure is
both implementation-defined and implementation-synchronized.
- Allows implementations of vfs.DentryImpl to use implementation-specific
dentry types, reducing casts during path traversal.
- Doesn't require dentries representing non-directory files to waste space on a
map of children.
- Allows dentry revalidation and mount lookup to be correctly ordered (fixed
FIXME in fsimpl/gofer/filesystem.go).
- Removes the need to have two separate maps in gofer.dentry
(dentry.vfsd.children and dentry.negativeChildren) for positive and negative
lookups respectively.
//pkg/sentry/fsimpl/tmpfs/benchmark_test.go:
name old time/op new time/op delta
VFS2TmpfsStat/1-112 172ns ± 4% 165ns ± 3% -4.08% (p=0.002 n=9+9)
VFS2TmpfsStat/2-112 199ns ± 3% 195ns ±10% ~ (p=0.132 n=8+9)
VFS2TmpfsStat/3-112 230ns ± 2% 216ns ± 2% -6.15% (p=0.000 n=8+8)
VFS2TmpfsStat/8-112 390ns ± 2% 358ns ± 4% -8.33% (p=0.000 n=9+8)
VFS2TmpfsStat/64-112 2.20µs ± 3% 2.01µs ± 3% -8.48% (p=0.000 n=10+8)
VFS2TmpfsStat/100-112 3.42µs ± 9% 3.08µs ± 2% -9.82% (p=0.000 n=9+8)
VFS2TmpfsMountStat/1-112 278ns ± 1% 286ns ±15% ~ (p=0.712 n=8+10)
VFS2TmpfsMountStat/2-112 311ns ± 4% 298ns ± 2% -4.27% (p=0.000 n=9+8)
VFS2TmpfsMountStat/3-112 339ns ± 3% 330ns ± 9% ~ (p=0.070 n=8+9)
VFS2TmpfsMountStat/8-112 503ns ± 3% 466ns ± 3% -7.38% (p=0.000 n=8+8)
VFS2TmpfsMountStat/64-112 2.53µs ±16% 2.17µs ± 7% -14.19% (p=0.000 n=10+9)
VFS2TmpfsMountStat/100-112 3.60µs ± 4% 3.30µs ± 8% -8.33% (p=0.001 n=8+9)
Updates #1035
PiperOrigin-RevId: 307655892
|
|
PiperOrigin-RevId: 305592245
|
|
PiperOrigin-RevId: 304508083
|
|
Some extra fields were added to the Mount type to expose necessary data to the
proc filesystem.
PiperOrigin-RevId: 304053361
|
|
Analagous to Linux's mount.mnt_id. This ID is displayed in
/proc/[pid]/mountinfo.
PiperOrigin-RevId: 303185564
|
|
Plumbs MS_NOEXEC and MS_RDONLY. Others are TODO.
Updates #1623 #1193
PiperOrigin-RevId: 300764669
|
|
Analogous to Linux's kern_mount().
PiperOrigin-RevId: 297259580
|
|
This saves one pointer dereference per VFS access.
Updates #1623
PiperOrigin-RevId: 295216176
|
|
- Added fsbridge package with interface that can be used to open
and read from VFS1 and VFS2 files.
- Converted ELF loader to use fsbridge
- Added VFS2 types to FSContext
- Added vfs.MountNamespace to ThreadGroup
Updates #1623
PiperOrigin-RevId: 295183950
|
|
Updates #1035
PiperOrigin-RevId: 293194631
|
|
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.
PiperOrigin-RevId: 291811289
|
|
Updates #1195
PiperOrigin-RevId: 287269106
|
|
PiperOrigin-RevId: 284892289
|
|
- Remove the Filesystem argument from DentryImpl.*Ref(); in general DentryImpls
that need the Filesystem for reference counting will probably also need it
for other interface methods that don't plumb Filesystem, so it's easier to
just store a pointer to the filesystem in the DentryImpl.
- Add a pointer to the VirtualFilesystem to Filesystem, which is needed by the
gofer client to disown dentries for cache eviction triggered by dentry
reference count changes.
- Rename FilesystemType.NewFilesystem to GetFilesystem; in some cases (e.g.
sysfs, cgroupfs) it's much cleaner for there to be only one Filesystem that
is used by all mounts, and in at least one case (devtmpfs) it's visibly
incorrect not to do so, so NewFilesystem doesn't always actually create and
return a *new* Filesystem.
- Require callers of FileDescription.Init() to increment Mount/Dentry
references. This is because the gofer client may, in the OpenAt() path, take
a reference on a dentry with 0 references, which is safe due to
synchronization that is outside the scope of this CL, and it would be safer
to still have its implementation of DentryImpl.IncRef() check for an
increment for 0 references in other cases.
- Add FileDescription.TryIncRef. This is used by the gofer client to take
references on "special file descriptions" (FDs for files such as pipes,
sockets, and devices), which use per-FD handles (fids) instead of
dentry-shared handles, for sync() and syncfs().
PiperOrigin-RevId: 282473364
|
|
This is required to test filesystems with a non-trivial implementation of
FilesystemImpl.Release(). Propagation isn't handled yet, and umount isn't yet
plumbed out to VirtualFilesystem.UmountAt(), but otherwise the implementation
of umount is believed to be correct.
- Move entering mountTable.seq writer critical sections to callers of
mountTable.{insert,remove}Seqed. This is required since umount(2) must ensure
that no new references are taken on the candidate mount after checking that
it isn't busy, which is only possible by entering a vfs.mountTable.seq writer
critical section before the check and remaining in it until after
VFS.umountRecursiveLocked() is complete. (Linux does the same thing:
fs/namespace.c:do_umount() => lock_mount_hash(),
fs/pnode.c:propagate_mount_busy(), umount_tree(), unlock_mount_hash().)
- It's not possible for dentry deletion to umount while only holding
VFS.mountMu for reading, but it's also very unappealing to hold VFS.mountMu
exclusively around e.g. gofer unlink RPCs. Introduce dentry.mu to avoid these
problems. This means that VFS.mountMu is never acquired for reading, so
change it to a sync.Mutex.
PiperOrigin-RevId: 282444343
|
|
Major differences from the current ("v1") sentry VFS:
- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).
- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.
Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:
BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op
BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op
BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op
BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op
BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op
BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op
BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op
BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op
BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op
BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op
BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op
BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op
BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op
BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op
BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op
BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op
From this we can infer that, on this machine:
- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.
- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.
- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.
PiperOrigin-RevId: 258853946
|