Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 258859507
|
|
Major differences from the current ("v1") sentry VFS:
- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).
- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.
Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:
BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op
BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op
BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op
BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op
BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op
BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op
BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op
BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op
BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op
BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op
BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op
BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op
BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op
BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op
BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op
BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op
From this we can infer that, on this machine:
- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.
- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.
- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.
PiperOrigin-RevId: 258853946
|
|
copyMu is required to read child.overlay.upper.
PiperOrigin-RevId: 258662209
|
|
PiperOrigin-RevId: 258657913
|
|
PiperOrigin-RevId: 258657776
|
|
PiperOrigin-RevId: 258645957
|
|
PiperOrigin-RevId: 258643966
|
|
PiperOrigin-RevId: 258635459
|
|
tcpdump creates these.
PiperOrigin-RevId: 258611829
|
|
PiperOrigin-RevId: 258607547
|
|
We were invalidating the wrong overlayEntry in rename and missing invalidation
in rename and remove if lower exists.
PiperOrigin-RevId: 258604685
|
|
PiperOrigin-RevId: 258479216
|
|
This proc file reports the stats of interfaces. We could use ifconfig
command to check the result.
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ia7c1e637f5c76c30791ffda68ee61e861b6ef827
COPYBARA_INTEGRATE_REVIEW=https://gvisor-review.googlesource.com/c/gvisor/+/18282/
PiperOrigin-RevId: 258303936
|
|
Now we call FUTEX_WAKE with ^uintptr(0) of waiters, but in this case only one
waiter will be waked up. If we want to wake up all of them, the number of
waiters has to be set to math.MaxInt32.
PiperOrigin-RevId: 258285286
|
|
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.
PiperOrigin-RevId: 257903078
|
|
Adds support to set/get the TCP_MAXSEG value but does not
really change the segment sizes emitted by netstack or
alter the MSS advertised by the endpoint. This is currently
being added only to unblock iperf3 on gVisor. Plumbing
this correctly requires a bit more work which will come
in separate CLs.
PiperOrigin-RevId: 257859112
|
|
PiperOrigin-RevId: 257855479
|
|
These are filesystem-specific, and filesystems are allowed to return ENOTSUP if
they are not supported.
PiperOrigin-RevId: 257813477
|
|
|
|
Actual implementation to follow, but this will satisfy applications that
want it to just exist.
|
|
The image is of size 64Kb which supports 64 1k blocks
and 16 inodes. This is the smallest size mkfs.ext4 works with.
Added README.md documenting how this was created and included
all files on the device under assets.
PiperOrigin-RevId: 257712672
|
|
Renamed ext4 to ext since we are targeting ext(2/3/4).
Removed fs.go since we are targeting VFS2.
Added ext.go with filesystem struct.
PiperOrigin-RevId: 257689775
|
|
A userspace process (CPL=3) can access an i/o port if the bit corresponding to
the port is set to 0 in the I/O permission bitmap.
Configure the I/O permission bitmap address beyond the last valid byte in the
TSS so access to all i/o ports is blocked.
Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Change-Id: I3df76980c3735491db768f7210e71703f86bb989
PiperOrigin-RevId: 257336518
|
|
PiperOrigin-RevId: 257314911
|
|
PiperOrigin-RevId: 257297820
|
|
PiperOrigin-RevId: 257293198
|
|
The error set in the loop in createAt was being masked
by other errors declared with ":=". This allowed an
ErrResolveViaReadlink error to escape, which can cause
a sentry panic.
Added test case which repros without the fix.
PiperOrigin-RevId: 257061767
|
|
PiperOrigin-RevId: 257037608
|
|
PiperOrigin-RevId: 257010414
|
|
PiperOrigin-RevId: 256494243
|
|
PiperOrigin-RevId: 256481284
|
|
PiperOrigin-RevId: 256453827
|
|
PiperOrigin-RevId: 256433283
|
|
PiperOrigin-RevId: 256319059
|
|
BounceToKernel will make vCPU quit from guest ring3 to guest ring0, but
vCPUWaiter is not cleared when we unlock the vCPU, when next time this vCPU
enter guest mode ring3, vCPU may enter guest mode with vCPUWaiter bit setted,
this will cause the following BounceToKernel to this vCPU hangs at
waitUntilNot.
Halt may workaroud this issue, because halt process will reset vCPU status into
vCPUUser, and notify all waiter for vCPU state change, but if there is no
exception or syscall in this period, BounceToKernel will hang at waitUntilNot.
PiperOrigin-RevId: 256299660
|
|
This renames FDMap to FDTable and drops the kernel.FD type, which had an entire
package to itself and didn't serve much use (it was freely cast between types,
and served as more of an annoyance than providing any protection.)
Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup
operation, and 10-15 ns per concurrent lookup operation of savings.
This also fixes two tangential usage issues with the FDMap. Namely, non-atomic
use of NewFDFrom and associated calls to Remove (that are both racy and fail to
drop the reference on the underlying file.)
PiperOrigin-RevId: 256285890
|
|
Adds support level documentation for all syscalls. Removes the Undocumented
utility function to discourage usage while leaving SupportUndocumented as the
default support level for Syscall structs.
PiperOrigin-RevId: 256281927
|
|
PiperOrigin-RevId: 256234390
|
|
fileOpAt holds references on the Dirents passed as arguments to the callback,
and drops refs when finished, so we don't need to DecRef those Dirents
ourselves
However, all Dirents that we get from FindInode/FindLink must be DecRef'd.
This CL cleans up the ref-counting logic, and fixes some refcount issues in the
process.
PiperOrigin-RevId: 256220882
|
|
It feels like "reticulating splines" is missing from the list of meaningless
syslog messages.
Signed-off-by: Ahmet Alp Balkan <ahmetb@google.com>
|
|
Fix two leaks for connectionless Unix sockets:
* Double connect: Subsequent connects would leak a reference on the previously
connected endpoint.
* Close unconnected: Sockets which were not connected at the time of closure
would leak a reference on their receiver.
PiperOrigin-RevId: 256070451
|
|
This fixes the case when an app tries to create a file that already exists, and
is a symlink to itself. A test was added.
PiperOrigin-RevId: 256044811
|
|
PiperOrigin-RevId: 255711454
|
|
These are unfortunately unused and unmaintained. They can be brought back in
the future if need requires it.
PiperOrigin-RevId: 255697132
|
|
These syscalls require filesystem support that gVisor does not provide, and is
not planning to implement. Their absense should not trigger an event.
PiperOrigin-RevId: 255692871
|
|
PiperOrigin-RevId: 255687771
|
|
PiperOrigin-RevId: 255679453
|
|
Right now, if we can't create a stub process, we will see this error:
panic: unable to activate mm: resource temporarily unavailable
It would be better to know the root cause of this "resource temporarily
unavailable".
PiperOrigin-RevId: 255656831
|
|
PiperOrigin-RevId: 255644277
|
|
Readdir of /proc/x/task/ will get direntry entries
from tasks of specified taskgroup. Now the tasks
slice is unsorted, use sort.SearchInts search entry
from the slice may cause infinity loops.
The fix is sort the slice before search.
This issue could be easily reproduced via following
steps, revise Readdir in pkg/sentry/fs/proc/task.go,
force set taskInts into test slice
[]int{1, 11, 7, 5, 10, 6, 8, 3, 9, 2, 4},
then run docker image and run ls /proc/1/task, the
command will cause infinity loops.
|