Age | Commit message (Collapse) | Author |
|
fsimpl is the keeper of all filesystem implementations in VFS2.
PiperOrigin-RevId: 262617869
|
|
Added benchmark tests which emulate memfs benchmarks.
Stat benchmarks
BenchmarkVFS2Ext4fsStat/1-12 10000000 145 ns/op
BenchmarkVFS2Ext4fsStat/2-12 10000000 170 ns/op
BenchmarkVFS2Ext4fsStat/3-12 10000000 202 ns/op
BenchmarkVFS2Ext4fsStat/8-12 3000000 374 ns/op
BenchmarkVFS2Ext4fsStat/64-12 500000 2159 ns/op
BenchmarkVFS2Ext4fsStat/100-12 300000 3459 ns/op
BenchmarkVFS1TmpfsStat/1-12 5000000 348 ns/op
BenchmarkVFS1TmpfsStat/2-12 3000000 487 ns/op
BenchmarkVFS1TmpfsStat/3-12 2000000 655 ns/op
BenchmarkVFS1TmpfsStat/8-12 1000000 1365 ns/op
BenchmarkVFS1TmpfsStat/64-12 200000 9565 ns/op
BenchmarkVFS1TmpfsStat/100-12 100000 15158 ns/op
BenchmarkVFS2MemfsStat/1-12 10000000 133 ns/op
BenchmarkVFS2MemfsStat/2-12 10000000 155 ns/op
BenchmarkVFS2MemfsStat/3-12 10000000 182 ns/op
BenchmarkVFS2MemfsStat/8-12 5000000 310 ns/op
BenchmarkVFS2MemfsStat/64-12 1000000 1659 ns/op
BenchmarkVFS2MemfsStat/100-12 500000 2787 ns/op
Mount Stat benchmarks
BenchmarkVFS2ExtfsMountStat/1-12 5000000 245 ns/op
BenchmarkVFS2ExtfsMountStat/2-12 5000000 266 ns/op
BenchmarkVFS2ExtfsMountStat/3-12 5000000 304 ns/op
BenchmarkVFS2ExtfsMountStat/8-12 3000000 456 ns/op
BenchmarkVFS2ExtfsMountStat/64-12 500000 2308 ns/op
BenchmarkVFS2ExtfsMountStat/100-12 300000 3482 ns/op
BenchmarkVFS1TmpfsMountStat/1-12 3000000 488 ns/op
BenchmarkVFS1TmpfsMountStat/2-12 2000000 658 ns/op
BenchmarkVFS1TmpfsMountStat/3-12 2000000 806 ns/op
BenchmarkVFS1TmpfsMountStat/8-12 1000000 1514 ns/op
BenchmarkVFS1TmpfsMountStat/64-12 100000 10037 ns/op
BenchmarkVFS1TmpfsMountStat/100-12 100000 15280 ns/op
BenchmarkVFS2MemfsMountStat/1-12 10000000 212 ns/op
BenchmarkVFS2MemfsMountStat/2-12 5000000 232 ns/op
BenchmarkVFS2MemfsMountStat/3-12 5000000 264 ns/op
BenchmarkVFS2MemfsMountStat/8-12 3000000 390 ns/op
BenchmarkVFS2MemfsMountStat/64-12 1000000 1813 ns/op
BenchmarkVFS2MemfsMountStat/100-12 500000 2812 ns/op
PiperOrigin-RevId: 262477158
|
|
Previously we were representing socket addresses as an interface{},
which allowed any type which could be binary.Marshal()ed to be used as
a socket address. This is fine when the address is passed to userspace
via the linux ABI, but is problematic when used from within the sentry
such as by networking procfs files.
PiperOrigin-RevId: 262460640
|
|
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.
This CL defers resuming all protocol goroutines until the end of
restore.
PiperOrigin-RevId: 262409429
|
|
PiperOrigin-RevId: 262402929
|
|
- Unexport Filesystem/Dentry/Inode.
- Support SEEK_CUR in directoryFD.Seek().
- Hold Filesystem.mu before touching directoryFD.off in
directoryFD.Seek().
- Remove deleted Dentries from their parent directory.childLists.
- Remove invalid FIXMEs.
PiperOrigin-RevId: 262400633
|
|
PiperOrigin-RevId: 262264674
|
|
PiperOrigin-RevId: 262249166
|
|
PiperOrigin-RevId: 262242410
|
|
PiperOrigin-RevId: 262226761
|
|
- This also gets rid of pipes for now because pipe does not have vfs2 specific
support yet.
- Added file path resolution logic.
- Fixes testing infrastructure.
- Does not include unit tests yet.
PiperOrigin-RevId: 262213950
|
|
If there is an offset, the file must support pread/pwrite. See
fs/splice.c:do_splice.
PiperOrigin-RevId: 261944932
|
|
syscall.EPOLLET has been defined with different values on amd64 and
arm64(-0x80000000 on amd64, and 0x80000000 on arm64), while unix.EPOLLET
has been unified this value to 0x80000000(golang/go#5328). ref #63
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Id97d075c4e79d86a2ea3227ffbef02d8b00ffbb8
|
|
PiperOrigin-RevId: 261413396
|
|
(Don't worry, this is mostly tests.)
Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().
Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.
PiperOrigin-RevId: 261387276
|
|
PiperOrigin-RevId: 261373749
|
|
We can get the mount namespace from the CreateProcessArgs in all cases where we
need it. This also gets rid of kernel.Destroy method, since the only thing it
was doing was DecRefing the mounts.
Removing the need to call kernel.SetRootMountNamespace also allowed for some
more simplifications in the container fs setup code.
PiperOrigin-RevId: 261357060
|
|
PiperOrigin-RevId: 261203674
|
|
This is the source of many warnings like:
AtomicRefCount 0x7f5ff84e3500 owned by "fs.Inode" garbage collected with ref count of 1 (want 0)
PiperOrigin-RevId: 261197093
|
|
Export some readily-available fields for TCP_INFO and stub out the rest.
PiperOrigin-RevId: 261191548
|
|
Implements support for RTM_GETROUTE requests for netlink sockets.
Fixes #507
PiperOrigin-RevId: 261051045
|
|
This is initialized lazily on the first unimplemented
syscall. Without the sync.Once, this is racy.
PiperOrigin-RevId: 260971758
|
|
PiperOrigin-RevId: 260851452
|
|
It gets rid of holding state of the io.Reader offset (which is anyways held by
the vfs.FileDescriptor struct. It is also odd using a io.Reader becuase we
using io.ReaderAt to interact with the device. So making a io.ReaderAt wrapper
makes more sense.
Most importantly, it gets rid of the complexity of extracting the file reader
from a regular file implementation and then using it. Now we can just use the
regular file implementation as a reader which is more intuitive.
PiperOrigin-RevId: 260846927
|
|
Also adds stress tests for block map reader and intensifies extent reader tests.
PiperOrigin-RevId: 260838177
|
|
PiperOrigin-RevId: 260783254
|
|
Adds feature to launch from an open host FD instead of a binary_path.
The FD should point to a valid executable and most likely be statically
compiled. If the executable is not statically compiled, the loader will
search along the interpreter paths, which must be able to be resolved in
the Sandbox's file system or start will fail.
PiperOrigin-RevId: 260756825
|
|
This provides the following benefits:
- We can now use pkg/fd package which does not take ownership
of the file descriptor. So it does not close the fd when garbage collected.
This reduces scope of errors from unexpected garbage collection of io.File.
- It enforces the offset parameter in every read call.
It does not affect the fd offset nor is it affected by it. Hence reducing
scope of error of using stale offsets when reading.
- We do not need to serialize the usage of any global file descriptor anymore.
So this drops the mutual exclusion req hence reducing complexity and
congestion.
PiperOrigin-RevId: 260635174
|
|
Allocate a larger memory buffer and combine multiple copies into one copy,
to reduce the number of copies from kernel memory to user memory.
Signed-off-by: Hang Su <darcy.sh@antfin.com>
|
|
PiperOrigin-RevId: 260629559
|
|
PiperOrigin-RevId: 260624470
|
|
This introduces two new types of Emitters:
1. MultiEmitter, which will forward events to other registered Emitters, and
2. RateLimitedEmitter, which will forward events to a wrapped Emitter, subject
to given rate limits.
The methods in the eventchannel package itself act like a multiEmitter, but is
not actually an Emitter. Now we have a DefaultEmitter, and the methods in
eventchannel simply forward calls to the DefaultEmitter.
The unimplemented syscall handler now uses a RateLimetedEmitter that wraps the
DefaultEmitter.
PiperOrigin-RevId: 260612770
|
|
PiperOrigin-RevId: 260220279
|
|
PiperOrigin-RevId: 260047477
|
|
PiperOrigin-RevId: 259865366
|
|
PiperOrigin-RevId: 259856442
|
|
PiperOrigin-RevId: 259835948
|
|
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.
PiperOrigin-RevId: 259807693
|
|
The different containers in a sandbox used only one pid
namespace before. This results in that a container can see
the processes in another container in the same sandbox.
This patch use different pid namespace for different containers.
Signed-off-by: chris.zn <chris.zn@antfin.com>
|
|
PiperOrigin-RevId: 259666476
|
|
PiperOrigin-RevId: 259657917
|
|
PiperOrigin-RevId: 259628657
|
|
This keeps all container filesystem completely separate from eachother
(including from the root container filesystem), and allows us to get rid of the
"__runsc_containers__" directory.
It also simplifies container startup/teardown as we don't have to muck around
in the root container's filesystem.
PiperOrigin-RevId: 259613346
|
|
PiperOrigin-RevId: 259427074
|
|
m.available.Signal() has to be called under m.mu.RLock, otherwise it can
race with machine.Get:
m.Get | m.Put
-------------------------------------
m.mu.Lock() |
Seatching available vcpu|
| m.available.Signal()
m.available.Wait |
PiperOrigin-RevId: 259394051
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 258996346
|
|
PiperOrigin-RevId: 258859507
|
|
Major differences from the current ("v1") sentry VFS:
- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).
- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.
Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:
BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op
BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op
BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op
BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op
BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op
BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op
BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op
BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op
BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op
BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op
BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op
BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op
BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op
BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op
BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op
BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op
From this we can infer that, on this machine:
- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.
- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.
- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.
PiperOrigin-RevId: 258853946
|
|
|