Age | Commit message (Collapse) | Author |
|
Set /proc/self/oom_score_adj based on oomScoreAdj specified in the OCI bundle.
When new containers are added to the sandbox oom_score_adj for the sandbox and
all other gofers are adjusted so that oom_score_adj is equal to the lowest
oom_score_adj of all containers in the sandbox.
Fixes #512
PiperOrigin-RevId: 261242725
|
|
PiperOrigin-RevId: 260824989
|
|
PiperOrigin-RevId: 256494243
|
|
Go was going to change the behavior of SysProcAttr.Ctty such that it must be an
FD in the *parent* FD table:
https://go-review.googlesource.com/c/go/+/178919/
However, after some debate, it was decided that this change was too
backwards-incompatible, and so it was reverted.
https://github.com/golang/go/issues/29458
The behavior going forward is unchanged: the Ctty FD must be an FD in the
*child* FD table.
PiperOrigin-RevId: 255228476
|
|
An upcoming change in Go 1.13 [1] changes the semantics of the SysProcAttr.Ctty
field. Prior to the change, the FD must be an FD in the child process's FD
table (aka "post-shuffle"). After the change, the FD must be an FD in the
current process's FD table (aka "pre-shuffle").
To be compatible with both versions this CL introduces a new boolean
"CttyFdIsPostShuffle" which indicates whether a pre- or post-shuffle FD should
be provided. We use build tags to chose the correct one.
1: https://go-review.googlesource.com/c/go/+/178919/
PiperOrigin-RevId: 255015303
|
|
New options are:
runsc debug --strace=off|all|function1,function2
runsc debug --log-level=warning|info|debug
runsc debug --log-packets=true|false
Updates #407
PiperOrigin-RevId: 254843128
|
|
This was from an old comment, which was superseded by the
existing comment which is correct.
PiperOrigin-RevId: 254434535
|
|
PiperOrigin-RevId: 254237530
|
|
PiperOrigin-RevId: 253882115
|
|
There were 3 string arguments that could be easily misplaced
and it makes it easier to add new arguments, especially for
Container that has dozens of callers.
PiperOrigin-RevId: 253872074
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
'--rootless' flag lets a non-root user execute 'runsc do'.
The drawback is that the sandbox and gofer processes will
run as root inside a user namespace that is mapped to the
caller's user, intead of nobody. And network is defaulted
to '--network=host' inside the root network namespace. On
the bright side, it's very convenient for testing:
runsc --rootless do ls
runsc --rootless do curl www.google.com
PiperOrigin-RevId: 252840970
|
|
It prints formatted to the log.
PiperOrigin-RevId: 252699551
|
|
This allows an fdbased endpoint to have multiple underlying fd's from which
packets can be read and dispatched/written to.
This should allow for higher throughput as well as better scalability of the
network stack as number of connections increases.
Updates #231
PiperOrigin-RevId: 251852825
|
|
clearStatus was added to allow detached execution to wait
on the exec'd process and retrieve its exit status. However,
it's not currently used. Both docker and gvisor-containerd-shim
wait on the "shim" process and retrieve the exit status from
there. We could change gvisor-containerd-shim to use waits, but
it will end up also consuming a process for the wait, which is
similar to having the shim process.
Closes #234
PiperOrigin-RevId: 251349490
|
|
Updates #220
PiperOrigin-RevId: 250532302
|
|
PiperOrigin-RevId: 248367340
Change-Id: Id792afcfff9c9d2cfd62cae21048316267b4a924
|
|
With this change, we will be able to run runsc do in a host network namespace.
PiperOrigin-RevId: 246436660
Change-Id: I8ea18b1053c88fe2feed74239b915fe7a151ce34
|
|
Sandbox always runsc with IP 192.168.10.2 and the peer
network adds 1 to the address (192.168.10.3). Sandbox
IP can be changed using --ip flag.
Here a few examples:
sudo runsc do curl www.google.com
sudo runsc do --ip=10.10.10.2 bash -c "echo 123 | netcat -l -p 8080"
PiperOrigin-RevId: 246421277
Change-Id: I7b3dce4af46a57300350dab41cb27e04e4b6e9da
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
|
|
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.
Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.
Updates #211
iperf output w/ 4MB buffer.
iperf3 -c 172.17.0.2 -t 100
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.02 MBytes
[ 4] 1.00-2.00 sec 1.18 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 2.00-3.00 sec 965 MBytes 8.09 Gbits/sec 0 1.02 MBytes
[ 4] 3.00-4.00 sec 942 MBytes 7.90 Gbits/sec 0 1.02 MBytes
[ 4] 4.00-5.00 sec 952 MBytes 7.99 Gbits/sec 0 1.02 MBytes
[ 4] 5.00-6.00 sec 1.14 GBytes 9.81 Gbits/sec 0 1.02 MBytes
[ 4] 6.00-7.00 sec 1.13 GBytes 9.68 Gbits/sec 0 1.02 MBytes
[ 4] 7.00-8.00 sec 930 MBytes 7.80 Gbits/sec 0 1.02 MBytes
[ 4] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.02 MBytes
[ 4] 9.00-10.00 sec 938 MBytes 7.87 Gbits/sec 0 1.02 MBytes
[ 4] 10.00-11.00 sec 737 MBytes 6.18 Gbits/sec 0 1.02 MBytes
[ 4] 11.00-12.00 sec 1.16 GBytes 9.93 Gbits/sec 0 1.02 MBytes
[ 4] 12.00-13.00 sec 917 MBytes 7.69 Gbits/sec 0 1.02 MBytes
[ 4] 13.00-14.00 sec 1.19 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 14.00-15.00 sec 1.01 GBytes 8.70 Gbits/sec 0 1.02 MBytes
[ 4] 15.00-16.00 sec 1.20 GBytes 10.3 Gbits/sec 0 1.02 MBytes
[ 4] 16.00-17.00 sec 1.14 GBytes 9.80 Gbits/sec 0 1.02 MBytes
^C[ 4] 17.00-17.60 sec 718 MBytes 10.1 Gbits/sec 0 1.02 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-17.60 sec 18.4 GBytes 8.98 Gbits/sec 0 sender
[ 4] 0.00-17.60 sec 0.00 Bytes 0.00 bits/sec receiver
PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
|
|
os.NewFile() accounts for 38% of CPU time in localFile.Walk().
This change switchs to use fd.FD which is much cheaper to create.
Now, fd.New() in localFile.Walk() accounts for only 4%.
PiperOrigin-RevId: 244944983
Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
|
|
It provides an easy way to run commands to quickly test gVisor.
By default it maps the host root as the container root with a
writable overlay on top (so the host root is not modified).
Example:
sudo runsc do ls -lh --color
sudo runsc do ~/src/test/my-test.sh
PiperOrigin-RevId: 243178711
Change-Id: I05f3d6ce253fe4b5f1362f4a07b5387f6ddb5dd9
|
|
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.
Here are numbers for the nginx-1m test:
runsc: 579330.01 [Kbytes/sec] received
runsc-gso: 1794121.66 [Kbytes/sec] received
runc: 2122139.06 [Kbytes/sec] received
and for tcp_benchmark:
$ tcp_benchmark --duration 15 --ideal
[ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal
[ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal --gso 65536
[ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec
PiperOrigin-RevId: 241072403
Change-Id: I20b03063a1a6649362b43609cbbc9b59be06e6d5
|
|
Properly handle propagation options for root and mounts. Now usage of
mount options shared, rshared, and noexec cause error to start. shared/
rshared breaks sandbox=>host isolation. slave however can be supported
because changes propagate from host to sandbox.
Root FS setup moved inside the gofer. Apart from simplifying the code,
it keeps all mounts inside the namespace. And they are torn down when
the namespace is destroyed (DestroyFS is no longer needed).
PiperOrigin-RevId: 239037661
Change-Id: I8b5ee4d50da33c042ea34fa68e56514ebe20e6e0
|
|
Example:
runsc debug --root=<dir> \
--profile-heap=/tmp/heap.prof \
--profile-cpu=/tmp/cpu.prod --profile-delay=30 \
<container ID>
PiperOrigin-RevId: 237848456
Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773
|
|
Nothing reads them and they can simply get stale.
Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD
PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
|
|
PiperOrigin-RevId: 231504064
Change-Id: I585b769aef04a3ad7e7936027958910a6eed9c8d
|
|
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
Change-Id: I032f834edae5c716fb2d3538285eec07aa11a902
PiperOrigin-RevId: 231318438
|
|
PiperOrigin-RevId: 230437407
Change-Id: Id9d8ceeb018aad2fe317407c78c6ee0f4b47aa2b
|
|
Removed "error" and "failed to" prefix that don't add value
from messages. Adjusted a few other messages. In particular,
when the container fail to start, the message returned is easier
for humans to read:
$ docker run --rm --runtime=runsc alpine foobar
docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory
Closes #77
PiperOrigin-RevId: 230022798
Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
|
|
In addition, it fixes a race condition in TestMultiContainerGoferStop.
There are two scripts copy the same set of files into the same directory
and sometime one of this command fails with EXIST.
PiperOrigin-RevId: 230011247
Change-Id: I9289f72e65dc407cdcd0e6cd632a509e01f43e9c
|
|
PiperOrigin-RevId: 229971902
Change-Id: Ief4fac731e839ef092175908de9375d725eaa3aa
|
|
In this case, new mounts are not created in the host mount namspaces, so
tearDownChroot isn't needed, because chroot will be destroyed with a
sandbox mount namespace.
In additional, pivot_root can't be called instead of chroot.
PiperOrigin-RevId: 229250871
Change-Id: I765bdb587d0b8287a6a8efda8747639d37c7e7b6
|
|
And we need to wait a gofer process before cgroup.Uninstall,
because it is running in the sandbox cgroups.
PiperOrigin-RevId: 228904020
Change-Id: Iaf8826d5b9626db32d4057a1c505a8d7daaeb8f9
|
|
The original code assumed that it was safe to join and not restore cgroup,
but Container.Run will not exit after calling start, making cgroup cleanup
fail because there were still processes inside the cgroup.
PiperOrigin-RevId: 228529199
Change-Id: I12a48d9adab4bbb02f20d71ec99598c336cbfe51
|
|
PiperOrigin-RevId: 227747566
Change-Id: Ide9df4ac1391adcd1c56e08d6570e0d149d85bc4
|
|
Make 'runsc create' join cgroup before creating sandbox process.
This removes the need to synchronize platform creation and ensure
that sandbox process is charged to the right cgroup from the start.
PiperOrigin-RevId: 227166451
Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2
|
|
PiperOrigin-RevId: 224418900
Change-Id: I53cf4d7c1c70117875b6920f8fd3d58a3b1497e9
|
|
sandbox.Wait is racey, as the sandbox may have exited before it is called, or
even during.
We already had code to handle the case that the sandbox exits during the Wait
call, but we were not properly handling the case where the sandbox has exited
before the call.
The best we can do in such cases is return the sandbox exit code as the
application exit code.
PiperOrigin-RevId: 221702517
Change-Id: I290d0333cc094c7c1c3b4ce0f17f61a3e908d787
|
|
Before this change, a container starting up could race with
destroy (aka delete) and leave processes behind.
Now, whenever a container is created, Loader.processes gets
a new entry. Start now expects the entry to be there, and if
it's not it means that the container was deleted.
I've also fixed Loader.waitPID to search for the process using
the init process's PID namespace.
We could use a few more tests for signal and wait. I'll send
them in another cl.
PiperOrigin-RevId: 220224290
Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
|
|
Updated error messages so that it doesn't print full Go struct representations
when running a new container in a sandbox. For example, this occurs frequently
when commands are not found when doing a 'kubectl exec'.
PiperOrigin-RevId: 219729141
Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
|
|
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.
PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
|
|
Errors are shown as being ignored by assigning to the blank identifier.
PiperOrigin-RevId: 218103819
Change-Id: I7cc7b9d8ac503a03de5504ebdeb99ed30a531cf2
|
|
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
|
|
It's hard to resolve symlinks inside the sandbox because rootfs and mounts
may be read-only, forcing us to create mount points inside lower layer of an
overlay, **before** the volumes are mounted.
Since the destination must already be resolved outside the sandbox when creating
mounts, take this opportunity to rewrite the spec with paths resolved.
"runsc boot" will use the "resolved" spec to load mounts. In addition, symlink
traversals were disabled while mounting containers inside the sandbox.
It haven't been able to write a good test for it. So I'm relying on manual tests
for now.
PiperOrigin-RevId: 217749904
Change-Id: I7ac434d5befd230db1488446cda03300cc0751a9
|
|
Now containers run with "docker run -it" support control characters like ^C and
^Z.
This required refactoring our signal handling a bit. Signals delivered to the
"runsc boot" process are turned into loader.Signal calls with the appropriate
delivery mode. Previously they were always sent directly to PID 1.
PiperOrigin-RevId: 217566770
Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
|
|
PiperOrigin-RevId: 217433699
Change-Id: Icef08285728c23ee7dd650706aaf18da51c25dff
|
|
We treat handle the boot process stdio separately from the application stdio
(which gets passed via flags), but we were still sending both to same place. As
a result, some logs that are written directly to os.Stderr by the boot process
were ending up in the application logs.
This CL starts sendind boot process stdio to the null device (since we don't
have any better options). The boot process is already configured to send all
logs (and panics) to the log file, so we won't miss anything important.
PiperOrigin-RevId: 217173020
Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794
|