Age | Commit message (Collapse) | Author |
|
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.
Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.
Extended attributes will be implemented in future commits.
PiperOrigin-RevId: 290121300
|
|
Updates #231
PiperOrigin-RevId: 289897881
|
|
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387
|
|
PiperOrigin-RevId: 288779416
|
|
...enabling us to remove the "CreateNamedLoopbackNIC" variant of
CreateNIC and all the plumbing to connect it through to where the value
is read in FindRoute.
PiperOrigin-RevId: 288713093
|
|
It can take more than 10 seconds when running under --race.
PiperOrigin-RevId: 286296060
|
|
Remove introduced CPUNumMin config and hard-code it as 2.
|
|
* Add `--cpu-num-min` flag to control minimum CPUs
* Only lower CPU count
* Fix comments
|
|
When application is not cgroups-aware, it can spawn excessive threads
which often defaults to CPU number.
Introduce a opt-in flag that will set CPU number accordingly to CPU
quota (if available).
Fixes #1391
|
|
It would be preferrable to test iptables via syscall tests, but there are some
problems with that approach:
* We're limited to loopback-only, as syscall tests involve only a single
container. Other link interfaces (e.g. fdbased) should be tested.
* We'd have to shell out to call iptables anyways, as the iptables syscall
interface itself is too large and complex to work with alone.
* Running the Linux/native version of the syscall test will require root, which
is a pain to configure, is inherently unsafe, and could leave host iptables
misconfigured.
Using the go_test target allows there to be no new test runner.
PiperOrigin-RevId: 285274275
|
|
Fixes #1341
PiperOrigin-RevId: 285108973
|
|
runsc debug --ps list all processes with all threads. This option is added to
the debug command but not to the ps command, because it is going to be used for
debug purposes and we want to add any useful information without thinking about
backward compatibility.
This will help to investigate syzkaller issues.
PiperOrigin-RevId: 285013668
|
|
PiperOrigin-RevId: 285012278
|
|
PiperOrigin-RevId: 284320186
|
|
PiperOrigin-RevId: 284305935
|
|
This adds meaningful annotations to the trace generated by the runtime/trace
package.
PiperOrigin-RevId: 284290115
|
|
Threadgroups already know their TTY (if they have one), which now contains the
TTY Index, and is returned in the Processes() call.
PiperOrigin-RevId: 284263850
|
|
Changed annotation to follow the standard defined here:
https://github.com/opencontainers/image-spec/blob/master/annotations.md
PiperOrigin-RevId: 284254847
|
|
When the sandbox is destroyed, making URPC calls to destroy the
container will fail. The code was checking if the sandbox was
running before attempting to make the URPC call, but that is racy.
PiperOrigin-RevId: 284093764
|
|
There are two potential ways of sending a TOS byte with outgoing packets:
including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS
socket options (for IPV4 and IPV6 respectively). This change lets hostinet
support the latter.
Fixes #1188
PiperOrigin-RevId: 283550925
|
|
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I3fd5e552f5f03b5144ed52647f75af3b8253b1d6
|
|
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.
PiperOrigin-RevId: 282851425
|
|
PiperOrigin-RevId: 282669859
|
|
PiperOrigin-RevId: 282401165
|
|
PiperOrigin-RevId: 282382564
|
|
Refer to golang mallocgc(), each time of allocating an object > 32 KB,
a gc will be triggered.
When we do readdir, sentry always passes 65535, which leads to a malloc
of 65535 * sizeof(p9.Direnta) > 32 KB.
Considering we already use slice append, let's avoid defining the
capability for this slide.
Command for test:
Before this change:
(container)$ time tree linux-5.3.1 > /dev/null
real 0m54.272s
user 0m2.010s
sys 0m1.740s
(CPU usage of Gofer: ~30 cores)
(host)$ perf top -p <pid-of-gofer>
42.57% runsc [.] runtime.gcDrain
23.41% runsc [.] runtime.(*lfstack).pop
9.74% runsc [.] runtime.greyobject
8.06% runsc [.] runtime.(*lfstack).push
4.33% runsc [.] runtime.scanobject
1.69% runsc [.] runtime.findObject
1.12% runsc [.] runtime.findrunnable
0.69% runsc [.] runtime.runqgrab
...
(host)$ mkdir test && cd test
(host)$ for i in `seq 1 65536`; do mkdir $i; done
(container)$ time ls test/ > /dev/null
real 2m10.934s
user 0m0.280s
sys 0m4.260s
(CPU usage of Gofer: ~1 core)
After this change:
(container)$ time tree linux-5.3.1 > /dev/null
real 0m22.465s
user 0m1.270s
sys 0m1.310s
(CPU usage of Gofer: ~1 core)
$ perf top -p <pid-of-gofer>
20.57% runsc [.] runtime.gcDrain
7.15% runsc [.] runtime.(*lfstack).pop
4.11% runsc [.] runtime.scanobject
3.78% runsc [.] runtime.greyobject
2.78% runsc [.] runtime.(*lfstack).push
...
(host)$ mkdir test && cd test
(host)$ for i in `seq 1 65536`; do mkdir $i; done
(container)$ time ls test/ > /dev/null
real 0m13.338s
user 0m0.190s
sys 0m3.980s
(CPU usage of Gofer: ~0.8 core)
Fixes #898
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
The first use of time.Local (usually via time.Time.Date, et. al) performs
initialization of the local timezone, which involves open several tzdata files
from the host.
Since filter installation disallows open, we should explicitly force this
initialization rather than implicitly depending on the first logging (or other
time) call occurring before filter installation.
PiperOrigin-RevId: 282053121
|
|
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
|
|
This is required to implement O_TRUNC correctly on filesystems backed by
gofers.
9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags
bits, e.g. O_RDONLY, O_RDWR, O_WRONLY."
open(2): "The argument flags must include one of the following access modes:
O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation
flags and file status flags can be bitwise-or'd in flags."
The reference 9P2000.L implementation also appears to expect arbitrary flags,
not just access modes, in Tlopen.flags:
https://github.com/chaos/diod/blob/master/diod/ops.c#L703
PiperOrigin-RevId: 278972683
|
|
This fixes a number of issues with the repository build process:
* Fix the overall structure of the repository.
* Fix the debian package description.
* Fix the broken version number for packages.
* Update the digest algorithm used for signing the release.
I've validated that installation works from a separate staging bucket.
Updates #852
PiperOrigin-RevId: 278716914
|
|
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.
systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.
Fixes #1117
Updates #1119
PiperOrigin-RevId: 278405893
|
|
PiperOrigin-RevId: 278032567
|
|
The watchdog currently can find stuck tasks, but has no way to tell if the
sandbox is stuck before the application starts executing.
This CL adds a startup timeout and action to the watchdog. If Start() is not
called before the given timeout (if non-zero), then the watchdog will take the
action.
PiperOrigin-RevId: 277970577
|
|
Adds a systemd-cgroup flag option that prints an error letting the user know
that systemd cgroups are not supported and points them to the relevant issue.
Issue #193
PiperOrigin-RevId: 277837162
|
|
PiperOrigin-RevId: 277623766
|
|
Sandbox root dir was not being saved with the Container state,
so it would point to the wrong directory location when attempting
to lock the sandbox. This led to race conditions saving and
loading container state. Fixing it, led to multiple deadlocks.
I've moved the saving and locking logic to a separate struct and
moved the lock file inside the RootDir (instead of container
root dir), which allows the lock to be taken inside Destroy,
and removes the need to lock the sandbox.
PiperOrigin-RevId: 277599612
|
|
It is required to guarantee the same order of endpoints after save/restore.
PiperOrigin-RevId: 277598665
|
|
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
|
|
Since the syscall.Stat_t.Nlink is defined as different types on
amd64 and arm64(uint64 and uint32 respectively), we need to cast
them to a unified uint64 type in gVisor code.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
|
|
container.startContainers() cannot be called twice in a test
(e.g. TestMultiContainerLoadSandbox) because the cleanup
function deletes the rootDir, together with information from
all other containers that may exist.
PiperOrigin-RevId: 276591806
|
|
fix a typo
|
|
remove a duplicated period
|
|
PiperOrigin-RevId: 276172466
|
|
Right now, we send each tcp packet separately, we call one system
call per-packet. This patch allows to generate multiple tcp packets
and send them by sendmmsg.
The arguable part of this CL is a way how to handle multiple headers.
This CL adds the next field to the Prepandable buffer.
Nginx test results:
Server Software: nginx/1.15.9
Server Hostname: 10.138.0.2
Server Port: 8080
Document Path: /10m.txt
Document Length: 10485760 bytes
w/o gso:
Concurrency Level: 5
Time taken for tests: 5.491 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 18.21 [#/sec] (mean)
Time per request: 274.525 [ms] (mean)
Time per request: 54.905 [ms] (mean, across all concurrent requests)
Transfer rate: 186508.03 [Kbytes/sec] received
sw-gso:
Concurrency Level: 5
Time taken for tests: 3.852 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 1048600200 bytes
HTML transferred: 1048576000 bytes
Requests per second: 25.96 [#/sec] (mean)
Time per request: 192.576 [ms] (mean)
Time per request: 38.515 [ms] (mean, across all concurrent requests)
Transfer rate: 265874.92 [Kbytes/sec] received
w/o gso:
$ ./tcp_benchmark --client --duration 15 --ideal
[SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec
software gso:
$ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso
[SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec
PiperOrigin-RevId: 276112677
|
|
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.
Binding isn't supported yet.
PiperOrigin-RevId: 275909366
|
|
Obligatory https://xkcd.com/927
Fixes #626
|
|
This change fixes several issues with the fsgofer host UDS support. Notably, it
adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes
unsafe use of unet.Socket, which could cause a panic if Socket.FD is called
when err != nil, and calls to Socket.FD with nothing to prevent the garbage
collector from destroying and closing the socket.
A set of tests is added to exercise host UDS access. This required extracting
most of the syscall test runner into a library that can be used by custom
tests.
Updates #235
Updates #1003
[1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can
only reply to a client that binds first. We don't allow bind, so these are
unlikely to be used.
PiperOrigin-RevId: 275558502
|
|
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.
Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.
Closes #1006
It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.
PiperOrigin-RevId: 275115289
|
|
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW
file or connect a socket rather than creating a standard control file like
localFile.Walk.
This is unecessary and error-prone, as the attach point still has to go through
Open or Connect which will properly convert the control file to something
usable. As such, switch the logic to be equivalent to a simple Walk.
Updates #235
PiperOrigin-RevId: 274827872
|
|
PiperOrigin-RevId: 274675428
|