Age | Commit message (Collapse) | Author |
|
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.
The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.
- New conditional operations were added to pkg/seccomp to support operations
available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
that syscalls matching the conditionals result in the provided action not
simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
jumps are still not permitted.
Fixes #510
PiperOrigin-RevId: 331938697
|
|
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.
Benchmark results:
BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2: 68109 ns
VFS1: 72507 ns
Updates #1198
PiperOrigin-RevId: 329628461
|
|
Updates #3494
PiperOrigin-RevId: 327548511
|
|
Replace mknod call with mknodat equivalent to protect
against symlink attacks. Also added Mknod tests.
Remove goferfs reliance on gofer to check for file
existence before creating a synthetic entry.
Updates #2923
PiperOrigin-RevId: 327544516
|
|
PiperOrigin-RevId: 327300635
|
|
9P2000.L is silent as to how readdir RPCs interact with directory mutation. The
most performant option is for Treaddir with offset=0 to restart iteration,
avoiding needing to walk+open+clunk a new directory fid between invocations of
getdents64(2), and the VFS2 gofer client assumes this is the case. Make this
actually true for the runsc fsgofer.
Fixes #3344, #3345, #3355
PiperOrigin-RevId: 324090384
|
|
... when it is possible.
The guitar gVisorKernel*Workflow-s runs test with the local execution_method.
In this case, blaze runs test cases locally without sandboxes. This means
that all tests run in the same network namespace. We have a few tests which
use hard-coded network ports and they can fail if one of these port will be
used by someone else or by another test cases.
PiperOrigin-RevId: 323137254
|
|
Implement WalkGetAttr() to reuse the stat that is already
needed for Walk(). In addition, cache file QID, so it
doesn't need to stat the file to compute it.
open(2) time improved by 10%:
Baseline: 6780 ns
Change: 6083 ns
Also fixed file type which was not being set in all places.
PiperOrigin-RevId: 323102560
|
|
Open tries to reuse the control file to save syscalls and
file descriptors when opening a file. However, when the
control file was opened using O_PATH (e.g. no file permission
to open readonly), Open() would not check for it.
PiperOrigin-RevId: 322821729
|
|
This change fixes a few things:
- creating sockets using mknod(2) is supported via vfs2
- fsgofer can create regular files via mknod(2)
- mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well
Updates #2923
PiperOrigin-RevId: 320074267
|
|
The previous format skipped many important structs that
are pointers, especially for cgroups. Change to print
as json, removing parts of the spec that are not relevant.
Also removed debug message from gofer that can be very
noisy when directories are large.
PiperOrigin-RevId: 316713267
|
|
PiperOrigin-RevId: 313663382
|
|
PiperOrigin-RevId: 305344989
|
|
Note that these are only implemented for tmpfs, and other impls will still
return EOPNOTSUPP.
PiperOrigin-RevId: 293899385
|
|
Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g
stack to prevent register corruption. We need to allow this syscall until it is
removed from Go.
PiperOrigin-RevId: 293212935
|
|
PiperOrigin-RevId: 291745021
|
|
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.
Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.
Extended attributes will be implemented in future commits.
PiperOrigin-RevId: 290121300
|
|
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387
|
|
PiperOrigin-RevId: 285012278
|
|
Refer to golang mallocgc(), each time of allocating an object > 32 KB,
a gc will be triggered.
When we do readdir, sentry always passes 65535, which leads to a malloc
of 65535 * sizeof(p9.Direnta) > 32 KB.
Considering we already use slice append, let's avoid defining the
capability for this slide.
Command for test:
Before this change:
(container)$ time tree linux-5.3.1 > /dev/null
real 0m54.272s
user 0m2.010s
sys 0m1.740s
(CPU usage of Gofer: ~30 cores)
(host)$ perf top -p <pid-of-gofer>
42.57% runsc [.] runtime.gcDrain
23.41% runsc [.] runtime.(*lfstack).pop
9.74% runsc [.] runtime.greyobject
8.06% runsc [.] runtime.(*lfstack).push
4.33% runsc [.] runtime.scanobject
1.69% runsc [.] runtime.findObject
1.12% runsc [.] runtime.findrunnable
0.69% runsc [.] runtime.runqgrab
...
(host)$ mkdir test && cd test
(host)$ for i in `seq 1 65536`; do mkdir $i; done
(container)$ time ls test/ > /dev/null
real 2m10.934s
user 0m0.280s
sys 0m4.260s
(CPU usage of Gofer: ~1 core)
After this change:
(container)$ time tree linux-5.3.1 > /dev/null
real 0m22.465s
user 0m1.270s
sys 0m1.310s
(CPU usage of Gofer: ~1 core)
$ perf top -p <pid-of-gofer>
20.57% runsc [.] runtime.gcDrain
7.15% runsc [.] runtime.(*lfstack).pop
4.11% runsc [.] runtime.scanobject
3.78% runsc [.] runtime.greyobject
2.78% runsc [.] runtime.(*lfstack).push
...
(host)$ mkdir test && cd test
(host)$ for i in `seq 1 65536`; do mkdir $i; done
(container)$ time ls test/ > /dev/null
real 0m13.338s
user 0m0.190s
sys 0m3.980s
(CPU usage of Gofer: ~0.8 core)
Fixes #898
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
This is required to implement O_TRUNC correctly on filesystems backed by
gofers.
9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags
bits, e.g. O_RDONLY, O_RDWR, O_WRONLY."
open(2): "The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation
flags and file status flags can be bitwise-or'd in flags."
The reference 9P2000.L implementation also appears to expect arbitrary flags,
not just access modes, in Tlopen.flags:
https://github.com/chaos/diod/blob/master/diod/ops.c#L703
PiperOrigin-RevId: 278972683
|
|
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
|
|
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
|
|
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.
A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.
Updates #235
Updates #1003
[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.
PiperOrigin-RevId: 275558502
|
|
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.
This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.
Updates #235
PiperOrigin-RevId: 274827872
|
|
rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer
dereference). Before this, nil-pointer dereferences cause a syscall violation
instead of a panic.
PiperOrigin-RevId: 274028767
|
|
Updates #235
PiperOrigin-RevId: 271475319
|
|
PiperOrigin-RevId: 271235134
|
|
This removes the F_DUPFD_CLOEXEC support for the gofer, previously
required when depending on the STL net package.
|
|
|
|
|
|
Filter installation has been streamlined and functions renamed.
Documentation has been fixed to be standards compliant, and missing
documentation added. gofmt has also been applied to modified files.
|
|
This commit allows the use of the `--fsgofer-host-uds-allowed` flag to
enable mounting sockets and add the appropriate seccomp filters.
|
|
PiperOrigin-RevId: 268845090
|
|
|
|
|
|
This commit further restricts the seccomp filters required for Gofer
access ot Unix Domain Sockets (UDS).
|
|
This commit adds support for detecting the socket file type, connecting
to a Unix Domain Socket, and providing bidirectional communication
(without file descriptor transfer support).
|
|
PiperOrigin-RevId: 265478578
|
|
PiperOrigin-RevId: 263189654
|
|
syscall.POLL is not supported on arm64, using syscall.PPOLL
to support both the x86 and arm64. refs #63
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I2c81a063d3ec4e7e6b38fe62f17a0924977f505e
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/543 from xiaobo55x:master ba598263fd3748d1addd48e4194080aa12085164
PiperOrigin-RevId: 260752049
|
|
Addresses obvious typos, in the documentation only.
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
|
|
When we reopen file by path, we can't be sure that
we will open exactly the same file. The file can be
deleted and another one with the same name can be
created.
PiperOrigin-RevId: 254898594
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
This more directly matches what Linux does with unsupported
nodes.
PiperOrigin-RevId: 248780425
Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec
|
|
Closes #225
PiperOrigin-RevId: 247508791
Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
|
|
Fixes #219
PiperOrigin-RevId: 246568639
Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
The caller must call Readdir() at least twice to detect
EOF. The old code was always restarting the directory
search and then skipping elements already seen, effectively
doubling the cost to read a directory. The code now
remembers the last offset and doesn't reposition the cursor
if next request comes at the same offset.
PiperOrigin-RevId: 244957816
Change-Id: If21a8dc68b76614adbcf4301439adfda40f2643f
|
|
os.NewFile() accounts for 38% of CPU time in localFile.Walk().
This change switchs to use fd.FD which is much cheaper to create.
Now, fd.New() in localFile.Walk() accounts for only 4%.
PiperOrigin-RevId: 244944983
Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
|