diff options
author | Jamie Liu <jamieliu@google.com> | 2019-07-18 15:09:14 -0700 |
---|---|---|
committer | gVisor bot <gvisor-bot@google.com> | 2019-07-18 15:10:29 -0700 |
commit | 163ab5e9bab4f14923433967656d20f169d0f904 (patch) | |
tree | 5e51b1573e48fe87fe0e277a32f13c78b0c2f058 /pkg/sentry/vfs/options.go | |
parent | 6f7e2bb388cb29a355dece8921671c0085f53ea9 (diff) |
Sentry virtual filesystem, v2
Major differences from the current ("v1") sentry VFS:
- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).
- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.
Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:
BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op
BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op
BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op
BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op
BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op
BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op
BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op
BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op
BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op
BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op
BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op
BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op
BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op
BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op
BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op
BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op
From this we can infer that, on this machine:
- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.
- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.
- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.
PiperOrigin-RevId: 258853946
Diffstat (limited to 'pkg/sentry/vfs/options.go')
-rw-r--r-- | pkg/sentry/vfs/options.go | 123 |
1 files changed, 123 insertions, 0 deletions
diff --git a/pkg/sentry/vfs/options.go b/pkg/sentry/vfs/options.go new file mode 100644 index 000000000..187e5410c --- /dev/null +++ b/pkg/sentry/vfs/options.go @@ -0,0 +1,123 @@ +// Copyright 2019 The gVisor Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package vfs + +import ( + "gvisor.dev/gvisor/pkg/abi/linux" +) + +// GetDentryOptions contains options to VirtualFilesystem.GetDentryAt() and +// FilesystemImpl.GetDentryAt(). +type GetDentryOptions struct { + // If CheckSearchable is true, FilesystemImpl.GetDentryAt() must check that + // the returned Dentry is a directory for which creds has search + // permission. + CheckSearchable bool +} + +// MkdirOptions contains options to VirtualFilesystem.MkdirAt() and +// FilesystemImpl.MkdirAt(). +type MkdirOptions struct { + // Mode is the file mode bits for the created directory. + Mode uint16 +} + +// MknodOptions contains options to VirtualFilesystem.MknodAt() and +// FilesystemImpl.MknodAt(). +type MknodOptions struct { + // Mode is the file type and mode bits for the created file. + Mode uint16 + + // If Mode specifies a character or block device special file, DevMajor and + // DevMinor are the major and minor device numbers for the created device. + DevMajor uint32 + DevMinor uint32 +} + +// OpenOptions contains options to VirtualFilesystem.OpenAt() and +// FilesystemImpl.OpenAt(). +type OpenOptions struct { + // Flags contains access mode and flags as specified for open(2). + // + // FilesystemImpls is reponsible for implementing the following flags: + // O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, O_CREAT, O_DIRECT, O_DSYNC, + // O_EXCL, O_NOATIME, O_NOCTTY, O_NONBLOCK, O_PATH, O_SYNC, O_TMPFILE, and + // O_TRUNC. VFS is responsible for handling O_DIRECTORY, O_LARGEFILE, and + // O_NOFOLLOW. VFS users are responsible for handling O_CLOEXEC, since file + // descriptors are mostly outside the scope of VFS. + Flags uint32 + + // If FilesystemImpl.OpenAt() creates a file, Mode is the file mode for the + // created file. + Mode uint16 +} + +// ReadOptions contains options to FileDescription.PRead(), +// FileDescriptionImpl.PRead(), FileDescription.Read(), and +// FileDescriptionImpl.Read(). +type ReadOptions struct { + // Flags contains flags as specified for preadv2(2). + Flags uint32 +} + +// RenameOptions contains options to VirtualFilesystem.RenameAt() and +// FilesystemImpl.RenameAt(). +type RenameOptions struct { + // Flags contains flags as specified for renameat2(2). + Flags uint32 +} + +// SetStatOptions contains options to VirtualFilesystem.SetStatAt(), +// FilesystemImpl.SetStatAt(), FileDescription.SetStat(), and +// FileDescriptionImpl.SetStat(). +type SetStatOptions struct { + // Stat is the metadata that should be set. Only fields indicated by + // Stat.Mask should be set. + // + // If Stat specifies that a timestamp should be set, + // FilesystemImpl.SetStatAt() and FileDescriptionImpl.SetStat() must + // special-case StatxTimestamp.Nsec == UTIME_NOW as described by + // utimensat(2); however, they do not need to check for StatxTimestamp.Nsec + // == UTIME_OMIT (VFS users must unset the corresponding bit in Stat.Mask + // instead). + Stat linux.Statx +} + +// StatOptions contains options to VirtualFilesystem.StatAt(), +// FilesystemImpl.StatAt(), FileDescription.Stat(), and +// FileDescriptionImpl.Stat(). +type StatOptions struct { + // Mask is the set of fields in the returned Statx that the FilesystemImpl + // or FileDescriptionImpl should provide. Bits are as in linux.Statx.Mask. + // + // The FilesystemImpl or FileDescriptionImpl may return fields not + // requested in Mask, and may fail to return fields requested in Mask that + // are not supported by the underlying filesystem implementation, without + // returning an error. + Mask uint32 + + // Sync specifies the synchronization required, and is one of + // linux.AT_STATX_SYNC_AS_STAT (which is 0, and therefore the default), + // linux.AT_STATX_SYNC_FORCE_SYNC, or linux.AT_STATX_SYNC_DONT_SYNC. + Sync uint32 +} + +// WriteOptions contains options to FileDescription.PWrite(), +// FileDescriptionImpl.PWrite(), FileDescription.Write(), and +// FileDescriptionImpl.Write(). +type WriteOptions struct { + // Flags contains flags as specified for pwritev2(2). + Flags uint32 +} |