Age | Commit message (Collapse) | Author |
|
|
|
This change adds a layer of abstraction around the internal Docker APIs,
and eliminates all direct dependencies on Dockerfiles in the infrastructure.
A subsequent change will automated the generation of local images (with
efficient caching). Note that this change drops the use of bazel container
rules, as that experiment does not seem to be viable.
PiperOrigin-RevId: 308095430
|
|
|
|
PiperOrigin-RevId: 307977689
|
|
|
|
PiperOrigin-RevId: 307941984
|
|
This change renames the tools/images directory to tools/vm for clarity, and
adds a functional vm_test. Sharding is also added to the same test, and some
documentation added around key flags & variables to describe how they work.
Subsequent changes will add vm_tests for other cases, such as the runtime tests.
PiperOrigin-RevId: 307492245
|
|
|
|
This is to make easier to find corresponding logs in
case test fails.
PiperOrigin-RevId: 307104283
|
|
|
|
Included:
- loader_test.go RunTest and TestStartSignal VFS2
- container_test.go TestAppExitStatus on VFS2
- experimental flag added to runsc to turn on VFS2
Note: shared mounts are not yet supported.
PiperOrigin-RevId: 307070753
|
|
|
|
PiperOrigin-RevId: 306908296
|
|
|
|
PiperOrigin-RevId: 306477639
|
|
|
|
noNewPrivileges is ignored if set to false since gVisor assumes that
PR_SET_NO_NEW_PRIVS is always enabled.
PiperOrigin-RevId: 305991947
|
|
|
|
The sentry doesn't allow execve, but it's a good defense
in-depth measure.
PiperOrigin-RevId: 305958737
|
|
|
|
PiperOrigin-RevId: 305807868
|
|
|
|
Closes #2393
PiperOrigin-RevId: 305793027
|
|
|
|
PiperOrigin-RevId: 305592245
|
|
Updates #2243
|
|
This required minor restructuring of how system call tables were saved
and restored, but it makes way more sense this way.
Updates #2243
|
|
|
|
Issue #2270
Issue #1765
PiperOrigin-RevId: 305385436
|
|
Move TODO to #238 so that proper synchronization of operations is handled
when we create the urpc client.
Issue #238
Fixes #512
PiperOrigin-RevId: 305383924
|
|
|
|
Starting with go1.13, we can specify ambient capabilities when we execute a new
process with os/exe.Cmd.
PiperOrigin-RevId: 305366706
|
|
|
|
PiperOrigin-RevId: 305344989
|
|
|
|
PiperOrigin-RevId: 304221302
|
|
Suppose I start a runsc container using kvm platform like this:
$ sudo runsc --debug=true --debug-log=1.txt --platform=kvm run rootbash
The donating FD and the corresponding cmdline for runsc-sandbox is:
D0313 17:50:12.608203 44389 x:0] Donating FD 3: "1.txt"
D0313 17:50:12.608214 44389 x:0] Donating FD 4: "control_server_socket"
D0313 17:50:12.608224 44389 x:0] Donating FD 5: "|0"
D0313 17:50:12.608229 44389 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json"
D0313 17:50:12.608234 44389 x:0] Donating FD 7: "|1"
D0313 17:50:12.608238 44389 x:0] Donating FD 8: "sandbox IO FD"
D0313 17:50:12.608242 44389 x:0] Donating FD 9: "/dev/kvm"
D0313 17:50:12.608246 44389 x:0] Donating FD 10: "/dev/stdin"
D0313 17:50:12.608249 44389 x:0] Donating FD 11: "/dev/stdout"
D0313 17:50:12.608253 44389 x:0] Donating FD 12: "/dev/stderr"
D0313 17:50:12.608257 44389 x:0] Starting sandbox: /proc/self/exe
[runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log=
--max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt
--debug-log-format=text --file-access=exclusive --overlay=false
--fsgofer-host-uds=false --network=sandbox --log-packets=false
--platform=kvm --strace=false --strace-syscalls=--strace-log-size=1024
--watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true
--num-network-channels=1 --rootless=false --alsologtostderr=false
--ref-leak-mode=disabled --gso=true --software-gso=true
--overlayfs-stale-read=false --shared-volume= --debug-log-fd=3
--panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc
--controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8
--device-fd=9 --stdio-fds=10 --stdio-fds=11 --stdio-fds=12 --pidns=true
--setup-root --cpu-num 32 --total-memory 4294967296 rootbash]
Note stdioFDs starts from 10 with kvm platform and stderr's FD is 12.
If I restore a container from the checkpoint image which is derived
by checkpointing the above rootbash container, but either omit the
platform switch or specify to use ptrace platform explicitely:
$ sudo runsc --debug=true --debug-log=1.txt restore --image-path=some_path restored_rootbash
the donating FD and corresponding cmdline for runsc-sandbox is:
D0313 17:50:15.258632 44452 x:0] Donating FD 3: "1.txt"
D0313 17:50:15.258640 44452 x:0] Donating FD 4: "control_server_socket"
D0313 17:50:15.258645 44452 x:0] Donating FD 5: "|0"
D0313 17:50:15.258648 44452 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json"
D0313 17:50:15.258653 44452 x:0] Donating FD 7: "|1"
D0313 17:50:15.258657 44452 x:0] Donating FD 8: "sandbox IO FD"
D0313 17:50:15.258661 44452 x:0] Donating FD 9: "/dev/stdin"
D0313 17:50:15.258675 44452 x:0] Donating FD 10: "/dev/stdout"
D0313 17:50:15.258680 44452 x:0] Donating FD 11: "/dev/stderr"
D0313 17:50:15.258684 44452 x:0] Starting sandbox: /proc/self/exe
[runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log=
--max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt
--debug-log-format=text --file-access=exclusive --overlay=false
--fsgofer-host-uds=false --network=sandbox --log-packets=false
--platform=ptrace --strace=false --strace-syscalls=
--strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1
--profile=false --net-raw=true --num-network-channels=1 --rootless=false
--alsologtostderr=false --ref-leak-mode=disabled --gso=true
--software-gso=true --overlayfs-stale-read=false --shared-volume=
--debug-log-fd=3 --panic-signal=15 boot
--bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4
--mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --stdio-fds=9
--stdio-fds=10 --stdio-fds=11 --setup-root --cpu-num 32 --total-memory
4294967296 restored_rootbash]
Note this time, stdioFDs starts from 9 and stderr's FD is 11(so the
saved host.descritor.origFD which is 12 for stderr is no longer valid).
For the three host FD based files, The s.Dev and s.Ino derived from
fstat(fd) shall all be the same and since the two fields are used
as device.MultiDeviceKey, the host.inodeFileState.sattr.InodeId which is
the value of MultiDevice.Map(MultiDeviceKey), shall also all be the same.
Note that for MultiDevice m, m.cache records the mapping of key to value
and m.rcache records the mapping of value to key. If same value doesn't
map to the same key, it will panic on restore.
Now that stderr's origFD 12 is no longer valid(it happens to be
/memfd:runsc-memory in my test on restore), the s.Dev and s.Ino derived
from fstat(fd=12) in host.inodeFileState.afterLoad() will neither be
correct. But its InodeID is still the same as saved, MultiDevice.Load()
will complain about the same value(InodeID) being mapped to different
keys (different from stdin and stdout's) and panic with: "MultiDevice's
caches are inconsistent".
Solve this problem by making sure stdioFDs for root container's init
task are always the same on initial start and on restore time, no matter
what cmdline user has used: debug log specified or not, platform changed
or not etc. shall not affect the ability to restore.
Fixes #1844.
|
|
|
|
In the case of other signals (preemption), inject a normal bounce and
defer the signal until the vCPU has been returned from guest mode.
PiperOrigin-RevId: 303799678
|
|
|
|
Using the host-defined file owner matches VFS1. It is more correct to use the
host-defined mode, since the cached value may become out of date. However,
kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are
in-memory so retrieving mode should not fail. Therefore, if the host syscall
fails, we rely on a cached value instead.
Updates #1672.
PiperOrigin-RevId: 303220864
|
|
|
|
utimensat is used by hostfs for setting timestamps on imported fds. Previously,
this would crash the sandbox since utimensat was not allowed.
Correct the VFS2 version of hostfs to match the call in VFS1.
PiperOrigin-RevId: 301970121
|
|
|
|
PiperOrigin-RevId: 301949722
|
|
|
|
- When setting up the virtual filesystem, mount a host.filesystem to contain
all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
enabled.
PiperOrigin-RevId: 300922353
|
|
|
|
When the sandbox runs in attached more, e.g. runsc do, runsc run, the
sandbox lifetime is controlled by the parent process. This wasn't working
in all cases because PR_GET_PDEATHSIG doesn't propagate through execve
when the process changes uid/gid. So it was getting dropped when the
sandbox execve's to change to user nobody.
PiperOrigin-RevId: 300601247
|
|
|