Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 371015541
|
|
PiperOrigin-RevId: 369686285
|
|
While using remote-validation, the vast majority of time spent during
FS operations is re-walking the path to check for modifications and
then closing the file given that in most cases it has not been
modified externally.
This change introduces a new 9P message called MultiGetAttr which bulks
query attributes of several files in one shot. The returned attributes are
then used to update cached dentries before they are walked. File attributes
are updated for files that still exist. Dentries that have been deleted are
removed from the cache. And negative cache entries are removed if a new
file/directory was created externally. Similarly, synthetic dentries are
replaced if a file/directory is created externally.
The bulk update needs to be carefull not to follow symlinks, cross mount
points, because the gofer doesn't know how to resolve symlinks and where
mounts points are located. It also doesn't walk to the parent ("..") to
avoid deadlocks.
Here are the results:
Workload VFS1 VFS2 Change
bazel action 115s 70s 28.8s
Stat/100 11,043us 7,623us 974us
Updates #1638
PiperOrigin-RevId: 369325957
|
|
These host calls are needed for Verity fs to generate/verify hashes.
PiperOrigin-RevId: 364598180
|
|
PiperOrigin-RevId: 361689477
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential
Updates #214
PiperOrigin-RevId: 361381490
|
|
Syzkaller hosts contains many audit messages that runsc tries
to call the clock_nanosleep syscall.
PiperOrigin-RevId: 359331413
|
|
rt_sigaction may be called by Go runtime when trying to panic:
https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;drc=ed3e4afa12d655a0c5606bcf3dd4e1cdadcb1476;bpv=1;bpt=1;l=780?q=rt_sigaction&ss=go
Updates #5038
PiperOrigin-RevId: 357013186
|
|
Some versions of the Go runtime call getcpu(), so add it for compatibility. The
hostcpu package already uses getcpu() on arm64.
PiperOrigin-RevId: 355717757
|
|
abi package is to be used by the Sentry to implement the Linux ABI.
Code dealing with the host should use x/sys/unix.
PiperOrigin-RevId: 353272679
|
|
Previously fsgofer was skipping chown call if the uid and gid
were the same as the current user/group. However, when setgid
is set, the group may not be the same as the caller. Instead,
compare the actual uid/gid of the file after it has been
created and change ownership only if needed.
Updates #180
PiperOrigin-RevId: 353118733
|
|
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.
PiperOrigin-RevId: 351425971
|
|
- Skip chown call in case owner change is not needed
- Skip filepath.Clean() calls when joining paths
- Pass unix.Stat_t by value to reduce runtime.duffcopy calls.
This change allows for better inlining in localFile.walk().
Change Baseline Improvement
BenchmarkWalkOne-6 2912 ns/op 3082 ns/op 5.5%
BenchmarkCreate-6 15915 ns/op 19126 ns/op 16.8%
BenchmarkCreateDiffOwner-6 18795 ns/op 19741 ns/op 4.8%
PiperOrigin-RevId: 347667833
|
|
PiperOrigin-RevId: 333853498
|
|
https://go.googlesource.com/go/+/0941fc3 switches the Go runtime (on amd64)
from using arch_prctl(ARCH_SET_FS) to CLONE_SETTLS to set the TLS.
PiperOrigin-RevId: 333100550
|
|
Neither CLONE_PARENT_SETTID nor CLONE_CHILD_SETTID are used, so these arguments
will always be NULL.
PiperOrigin-RevId: 333085326
|
|
Go does not call arch_prctl(ARCH_GET_FS), nor am I sure it ever did. Drop the
filter.
PiperOrigin-RevId: 332470532
|
|
Gofer panics are suppressed by p9 server and an error
is returned to the caller, making it effectively the
same as returning EROFS.
PiperOrigin-RevId: 332282959
|
|
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.
The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.
- New conditional operations were added to pkg/seccomp to support operations
available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
that syscalls matching the conditionals result in the provided action not
simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
jumps are still not permitted.
Fixes #510
PiperOrigin-RevId: 331938697
|
|
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.
Benchmark results:
BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2: 68109 ns
VFS1: 72507 ns
Updates #1198
PiperOrigin-RevId: 329628461
|
|
Updates #3494
PiperOrigin-RevId: 327548511
|
|
Replace mknod call with mknodat equivalent to protect
against symlink attacks. Also added Mknod tests.
Remove goferfs reliance on gofer to check for file
existence before creating a synthetic entry.
Updates #2923
PiperOrigin-RevId: 327544516
|
|
PiperOrigin-RevId: 327300635
|
|
9P2000.L is silent as to how readdir RPCs interact with directory mutation. The
most performant option is for Treaddir with offset=0 to restart iteration,
avoiding needing to walk+open+clunk a new directory fid between invocations of
getdents64(2), and the VFS2 gofer client assumes this is the case. Make this
actually true for the runsc fsgofer.
Fixes #3344, #3345, #3355
PiperOrigin-RevId: 324090384
|
|
... when it is possible.
The guitar gVisorKernel*Workflow-s runs test with the local execution_method.
In this case, blaze runs test cases locally without sandboxes. This means
that all tests run in the same network namespace. We have a few tests which
use hard-coded network ports and they can fail if one of these port will be
used by someone else or by another test cases.
PiperOrigin-RevId: 323137254
|
|
Implement WalkGetAttr() to reuse the stat that is already
needed for Walk(). In addition, cache file QID, so it
doesn't need to stat the file to compute it.
open(2) time improved by 10%:
Baseline: 6780 ns
Change: 6083 ns
Also fixed file type which was not being set in all places.
PiperOrigin-RevId: 323102560
|
|
Open tries to reuse the control file to save syscalls and
file descriptors when opening a file. However, when the
control file was opened using O_PATH (e.g. no file permission
to open readonly), Open() would not check for it.
PiperOrigin-RevId: 322821729
|
|
This change fixes a few things:
- creating sockets using mknod(2) is supported via vfs2
- fsgofer can create regular files via mknod(2)
- mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well
Updates #2923
PiperOrigin-RevId: 320074267
|
|
The previous format skipped many important structs that
are pointers, especially for cgroups. Change to print
as json, removing parts of the spec that are not relevant.
Also removed debug message from gofer that can be very
noisy when directories are large.
PiperOrigin-RevId: 316713267
|
|
PiperOrigin-RevId: 313663382
|
|
PiperOrigin-RevId: 305344989
|
|
Note that these are only implemented for tmpfs, and other impls will still
return EOPNOTSUPP.
PiperOrigin-RevId: 293899385
|
|
Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g
stack to prevent register corruption. We need to allow this syscall until it is
removed from Go.
PiperOrigin-RevId: 293212935
|
|
PiperOrigin-RevId: 291745021
|
|
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.
Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.
Extended attributes will be implemented in future commits.
PiperOrigin-RevId: 290121300
|
|
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387
|
|
PiperOrigin-RevId: 285012278
|
|
Refer to golang mallocgc(), each time of allocating an object > 32 KB,
a gc will be triggered.
When we do readdir, sentry always passes 65535, which leads to a malloc
of 65535 * sizeof(p9.Direnta) > 32 KB.
Considering we already use slice append, let's avoid defining the
capability for this slide.
Command for test:
Before this change:
(container)$ time tree linux-5.3.1 > /dev/null
real 0m54.272s
user 0m2.010s
sys 0m1.740s
(CPU usage of Gofer: ~30 cores)
(host)$ perf top -p <pid-of-gofer>
42.57% runsc [.] runtime.gcDrain
23.41% runsc [.] runtime.(*lfstack).pop
9.74% runsc [.] runtime.greyobject
8.06% runsc [.] runtime.(*lfstack).push
4.33% runsc [.] runtime.scanobject
1.69% runsc [.] runtime.findObject
1.12% runsc [.] runtime.findrunnable
0.69% runsc [.] runtime.runqgrab
...
(host)$ mkdir test && cd test
(host)$ for i in `seq 1 65536`; do mkdir $i; done
(container)$ time ls test/ > /dev/null
real 2m10.934s
user 0m0.280s
sys 0m4.260s
(CPU usage of Gofer: ~1 core)
After this change:
(container)$ time tree linux-5.3.1 > /dev/null
real 0m22.465s
user 0m1.270s
sys 0m1.310s
(CPU usage of Gofer: ~1 core)
$ perf top -p <pid-of-gofer>
20.57% runsc [.] runtime.gcDrain
7.15% runsc [.] runtime.(*lfstack).pop
4.11% runsc [.] runtime.scanobject
3.78% runsc [.] runtime.greyobject
2.78% runsc [.] runtime.(*lfstack).push
...
(host)$ mkdir test && cd test
(host)$ for i in `seq 1 65536`; do mkdir $i; done
(container)$ time ls test/ > /dev/null
real 0m13.338s
user 0m0.190s
sys 0m3.980s
(CPU usage of Gofer: ~0.8 core)
Fixes #898
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
This is required to implement O_TRUNC correctly on filesystems backed by
gofers.
9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags
bits, e.g. O_RDONLY, O_RDWR, O_WRONLY."
open(2): "The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation
flags and file status flags can be bitwise-or'd in flags."
The reference 9P2000.L implementation also appears to expect arbitrary flags,
not just access modes, in Tlopen.flags:
https://github.com/chaos/diod/blob/master/diod/ops.c#L703
PiperOrigin-RevId: 278972683
|
|
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
|
|
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
|
|
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.
A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.
Updates #235
Updates #1003
[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.
PiperOrigin-RevId: 275558502
|
|
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.
This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.
Updates #235
PiperOrigin-RevId: 274827872
|
|
rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer
dereference). Before this, nil-pointer dereferences cause a syscall violation
instead of a panic.
PiperOrigin-RevId: 274028767
|
|
Updates #235
PiperOrigin-RevId: 271475319
|
|
PiperOrigin-RevId: 271235134
|
|
This removes the F_DUPFD_CLOEXEC support for the gofer, previously
required when depending on the STL net package.
|
|
|
|
|
|
Filter installation has been streamlined and functions renamed.
Documentation has been fixed to be standards compliant, and missing
documentation added. gofmt has also been applied to modified files.
|