Age | Commit message (Collapse) | Author |
|
Updates #1663
PiperOrigin-RevId: 333539293
|
|
PiperOrigin-RevId: 330554450
|
|
This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and
vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted
soon anyway.
The following structs now use the new tool, with leak check enabled:
devpts:rootInode
fuse:inode
kernfs:Dentry
kernfs:dir
kernfs:readonlyDir
kernfs:StaticDirectory
proc:fdDirInode
proc:fdInfoDirInode
proc:subtasksInode
proc:taskInode
proc:tasksInode
vfs:FileDescription
vfs:MountNamespace
vfs:Filesystem
sys:dir
kernel:FSContext
kernel:ProcessGroup
kernel:Session
shm:Shm
mm:aioMappable
mm:SpecialMappable
transport:queue
And the following use the template, but because they currently are not leak
checked, a TODO is left instead of enabling leak check in this patch:
kernel:FDTable
tun:tunEndpoint
Updates #1486.
PiperOrigin-RevId: 328460377
|
|
Our "Preconditions:" blocks are very useful to determine the input invariants,
but they are bit inconsistent throughout the codebase, which makes them harder
to read (particularly cases with 5+ conditions in a single paragraph).
I've reformatted all of the cases to fit in simple rules:
1. Cases with a single condition are placed on a single line.
2. Cases with multiple conditions are placed in a bulleted list.
This format has been added to the style guide.
I've also mentioned "Postconditions:", though those are much less frequently
used, and all uses already match this style.
PiperOrigin-RevId: 327687465
|
|
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.
PiperOrigin-RevId: 324672584
|
|
Also refactor HandleDeletion().
Updates #1479.
PiperOrigin-RevId: 317989000
|
|
Enforce write permission checks in BoundEndpointAt, which corresponds to the
permission checks in Linux (net/unix/af_unix.c:unix_find_other).
Also, create bound socket files with the correct permissions in VFS2.
Fixes #2324.
PiperOrigin-RevId: 308949084
|
|
Named pipes and sockets can be represented in two ways in gofer fs:
1. As a file on the remote filesystem. In this case, all file operations are
passed through 9p.
2. As a synthetic file that is internal to the sandbox. In this case, the
dentry stores an endpoint or VFSPipe for sockets and pipes respectively,
which replaces interactions with the remote fs through the gofer.
In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p,
and if it fails, fall back to the synthetic version.
Updates #1200.
PiperOrigin-RevId: 308828161
|
|
PiperOrigin-RevId: 308143529
|
|
This change:
- Drastically simplifies the synchronization model: filesystem structure is
both implementation-defined and implementation-synchronized.
- Allows implementations of vfs.DentryImpl to use implementation-specific
dentry types, reducing casts during path traversal.
- Doesn't require dentries representing non-directory files to waste space on a
map of children.
- Allows dentry revalidation and mount lookup to be correctly ordered (fixed
FIXME in fsimpl/gofer/filesystem.go).
- Removes the need to have two separate maps in gofer.dentry
(dentry.vfsd.children and dentry.negativeChildren) for positive and negative
lookups respectively.
//pkg/sentry/fsimpl/tmpfs/benchmark_test.go:
name old time/op new time/op delta
VFS2TmpfsStat/1-112 172ns ± 4% 165ns ± 3% -4.08% (p=0.002 n=9+9)
VFS2TmpfsStat/2-112 199ns ± 3% 195ns ±10% ~ (p=0.132 n=8+9)
VFS2TmpfsStat/3-112 230ns ± 2% 216ns ± 2% -6.15% (p=0.000 n=8+8)
VFS2TmpfsStat/8-112 390ns ± 2% 358ns ± 4% -8.33% (p=0.000 n=9+8)
VFS2TmpfsStat/64-112 2.20µs ± 3% 2.01µs ± 3% -8.48% (p=0.000 n=10+8)
VFS2TmpfsStat/100-112 3.42µs ± 9% 3.08µs ± 2% -9.82% (p=0.000 n=9+8)
VFS2TmpfsMountStat/1-112 278ns ± 1% 286ns ±15% ~ (p=0.712 n=8+10)
VFS2TmpfsMountStat/2-112 311ns ± 4% 298ns ± 2% -4.27% (p=0.000 n=9+8)
VFS2TmpfsMountStat/3-112 339ns ± 3% 330ns ± 9% ~ (p=0.070 n=8+9)
VFS2TmpfsMountStat/8-112 503ns ± 3% 466ns ± 3% -7.38% (p=0.000 n=8+8)
VFS2TmpfsMountStat/64-112 2.53µs ±16% 2.17µs ± 7% -14.19% (p=0.000 n=10+9)
VFS2TmpfsMountStat/100-112 3.60µs ± 4% 3.30µs ± 8% -8.33% (p=0.001 n=8+9)
Updates #1035
PiperOrigin-RevId: 307655892
|
|
As in VFS1, we only support the user.* namespace. Plumbing is added to tmpfs
and goferfs.
Note that because of the slightly different order of checks between VFS2 and
Linux, one of the xattr tests needs to be relaxed slightly.
Fixes #2363.
PiperOrigin-RevId: 305985121
|
|
PiperOrigin-RevId: 305592245
|
|
Both have analogues in Linux:
* struct file_system_type has a char *name field.
* struct super_block keeps a pointer to the file_system_type.
These fields are necessary to support the `filesystem type` field in
/proc/[pid]/mountinfo.
PiperOrigin-RevId: 303434063
|
|
BoundEndpointAt() is needed to support Unix sockets bound at a
file path, corresponding to BoundEndpoint() in VFS1.
Updates #1476.
PiperOrigin-RevId: 303258251
|
|
Only gofer filesystem was calling vfs.CheckSetStat for
vfs.FilesystemImpl.SetStatAt and vfs.FileDescriptionImpl.SetStat.
Updates #1193, #1672, #1197
PiperOrigin-RevId: 301226522
|
|
Note that the raw faccessat system call does not actually take a flags argument;
according to faccessat(2), the glibc wrapper implements the flags by using
fstatat(2). Remove the flag argument that we try to extract from vfs1, which
would just be a garbage value.
Updates #1965
Fixes #2101
PiperOrigin-RevId: 300796067
|
|
This saves one pointer dereference per VFS access.
Updates #1623
PiperOrigin-RevId: 295216176
|
|
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.
PiperOrigin-RevId: 291811289
|
|
- Add FileDescriptionOptions.UseDentryMetadata, which reduces the amount of
boilerplate needed for device FDs and the like between filesystems.
- Switch back to having FileDescription.Init() take references on the Mount and
Dentry; otherwise managing refcounts around failed calls to
OpenDeviceSpecialFile() / Device.Open() is tricky.
PiperOrigin-RevId: 287575574
|
|
- Make FilesystemImpl methods that operate on parent directories require
!rp.Done() (i.e. there is at least one path component to resolve) as
precondition and postcondition (in cases where they do not finish path
resolution due to mount boundary / absolute symlink), and require that they
do not need to follow the last path component (the file being created /
deleted) as a symlink. Check for these in VFS.
- Add FilesystemImpl.GetParentDentryAt(), which is required to obtain the old
parent directory for VFS.RenameAt(). (Passing the Dentry to be renamed
instead has the wrong semantics if the file named by the old path is a mount
point since the Dentry will be on the wrong Mount.)
- Update memfs to implement these methods correctly (?), including RenameAt.
- Change fspath.Parse() to allow empty paths (to simplify implementation of
AT_EMPTY_PATH).
- Change vfs.PathOperation to take a fspath.Path instead of a raw pathname;
non-test callers will need to fspath.Parse() pathnames themselves anyway in
order to detect absolute paths and select PathOperation.Start accordingly.
PiperOrigin-RevId: 286934941
|
|
PiperOrigin-RevId: 286281274
|
|
The former is needed for vfs.FileDescription to implement
memmap.MappingIdentity, and the latter is needed to implement getcwd(2).
PiperOrigin-RevId: 285051855
|
|
PiperOrigin-RevId: 284892289
|
|
- Remove the Filesystem argument from DentryImpl.*Ref(); in general DentryImpls
that need the Filesystem for reference counting will probably also need it
for other interface methods that don't plumb Filesystem, so it's easier to
just store a pointer to the filesystem in the DentryImpl.
- Add a pointer to the VirtualFilesystem to Filesystem, which is needed by the
gofer client to disown dentries for cache eviction triggered by dentry
reference count changes.
- Rename FilesystemType.NewFilesystem to GetFilesystem; in some cases (e.g.
sysfs, cgroupfs) it's much cleaner for there to be only one Filesystem that
is used by all mounts, and in at least one case (devtmpfs) it's visibly
incorrect not to do so, so NewFilesystem doesn't always actually create and
return a *new* Filesystem.
- Require callers of FileDescription.Init() to increment Mount/Dentry
references. This is because the gofer client may, in the OpenAt() path, take
a reference on a dentry with 0 references, which is safe due to
synchronization that is outside the scope of this CL, and it would be safer
to still have its implementation of DentryImpl.IncRef() check for an
increment for 0 references in other cases.
- Add FileDescription.TryIncRef. This is used by the gofer client to take
references on "special file descriptions" (FDs for files such as pipes,
sockets, and devices), which use per-FD handles (fids) instead of
dentry-shared handles, for sync() and syncfs().
PiperOrigin-RevId: 282473364
|
|
Major differences from the current ("v1") sentry VFS:
- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).
- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.
Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:
BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op
BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op
BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op
BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op
BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op
BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op
BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op
BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op
BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op
BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op
BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op
BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op
BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op
BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op
BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op
BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op
From this we can infer that, on this machine:
- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.
- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.
- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.
PiperOrigin-RevId: 258853946
|