Age | Commit message (Collapse) | Author |
|
Suppose I start a runsc container using kvm platform like this:
$ sudo runsc --debug=true --debug-log=1.txt --platform=kvm run rootbash
The donating FD and the corresponding cmdline for runsc-sandbox is:
D0313 17:50:12.608203 44389 x:0] Donating FD 3: "1.txt"
D0313 17:50:12.608214 44389 x:0] Donating FD 4: "control_server_socket"
D0313 17:50:12.608224 44389 x:0] Donating FD 5: "|0"
D0313 17:50:12.608229 44389 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json"
D0313 17:50:12.608234 44389 x:0] Donating FD 7: "|1"
D0313 17:50:12.608238 44389 x:0] Donating FD 8: "sandbox IO FD"
D0313 17:50:12.608242 44389 x:0] Donating FD 9: "/dev/kvm"
D0313 17:50:12.608246 44389 x:0] Donating FD 10: "/dev/stdin"
D0313 17:50:12.608249 44389 x:0] Donating FD 11: "/dev/stdout"
D0313 17:50:12.608253 44389 x:0] Donating FD 12: "/dev/stderr"
D0313 17:50:12.608257 44389 x:0] Starting sandbox: /proc/self/exe
[runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log=
--max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt
--debug-log-format=text --file-access=exclusive --overlay=false
--fsgofer-host-uds=false --network=sandbox --log-packets=false
--platform=kvm --strace=false --strace-syscalls=--strace-log-size=1024
--watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true
--num-network-channels=1 --rootless=false --alsologtostderr=false
--ref-leak-mode=disabled --gso=true --software-gso=true
--overlayfs-stale-read=false --shared-volume= --debug-log-fd=3
--panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc
--controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8
--device-fd=9 --stdio-fds=10 --stdio-fds=11 --stdio-fds=12 --pidns=true
--setup-root --cpu-num 32 --total-memory 4294967296 rootbash]
Note stdioFDs starts from 10 with kvm platform and stderr's FD is 12.
If I restore a container from the checkpoint image which is derived
by checkpointing the above rootbash container, but either omit the
platform switch or specify to use ptrace platform explicitely:
$ sudo runsc --debug=true --debug-log=1.txt restore --image-path=some_path restored_rootbash
the donating FD and corresponding cmdline for runsc-sandbox is:
D0313 17:50:15.258632 44452 x:0] Donating FD 3: "1.txt"
D0313 17:50:15.258640 44452 x:0] Donating FD 4: "control_server_socket"
D0313 17:50:15.258645 44452 x:0] Donating FD 5: "|0"
D0313 17:50:15.258648 44452 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json"
D0313 17:50:15.258653 44452 x:0] Donating FD 7: "|1"
D0313 17:50:15.258657 44452 x:0] Donating FD 8: "sandbox IO FD"
D0313 17:50:15.258661 44452 x:0] Donating FD 9: "/dev/stdin"
D0313 17:50:15.258675 44452 x:0] Donating FD 10: "/dev/stdout"
D0313 17:50:15.258680 44452 x:0] Donating FD 11: "/dev/stderr"
D0313 17:50:15.258684 44452 x:0] Starting sandbox: /proc/self/exe
[runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log=
--max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt
--debug-log-format=text --file-access=exclusive --overlay=false
--fsgofer-host-uds=false --network=sandbox --log-packets=false
--platform=ptrace --strace=false --strace-syscalls=
--strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1
--profile=false --net-raw=true --num-network-channels=1 --rootless=false
--alsologtostderr=false --ref-leak-mode=disabled --gso=true
--software-gso=true --overlayfs-stale-read=false --shared-volume=
--debug-log-fd=3 --panic-signal=15 boot
--bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4
--mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --stdio-fds=9
--stdio-fds=10 --stdio-fds=11 --setup-root --cpu-num 32 --total-memory
4294967296 restored_rootbash]
Note this time, stdioFDs starts from 9 and stderr's FD is 11(so the
saved host.descritor.origFD which is 12 for stderr is no longer valid).
For the three host FD based files, The s.Dev and s.Ino derived from
fstat(fd) shall all be the same and since the two fields are used
as device.MultiDeviceKey, the host.inodeFileState.sattr.InodeId which is
the value of MultiDevice.Map(MultiDeviceKey), shall also all be the same.
Note that for MultiDevice m, m.cache records the mapping of key to value
and m.rcache records the mapping of value to key. If same value doesn't
map to the same key, it will panic on restore.
Now that stderr's origFD 12 is no longer valid(it happens to be
/memfd:runsc-memory in my test on restore), the s.Dev and s.Ino derived
from fstat(fd=12) in host.inodeFileState.afterLoad() will neither be
correct. But its InodeID is still the same as saved, MultiDevice.Load()
will complain about the same value(InodeID) being mapped to different
keys (different from stdin and stdout's) and panic with: "MultiDevice's
caches are inconsistent".
Solve this problem by making sure stdioFDs for root container's init
task are always the same on initial start and on restore time, no matter
what cmdline user has used: debug log specified or not, platform changed
or not etc. shall not affect the ability to restore.
Fixes #1844.
|
|
This flag is set on Rome CPUs, but it is not documented.
PiperOrigin-RevId: 303825532
|
|
PiperOrigin-RevId: 303805784
|
|
In the case of other signals (preemption), inject a normal bounce and
defer the signal until the vCPU has been returned from guest mode.
PiperOrigin-RevId: 303799678
|
|
PiperOrigin-RevId: 303773475
|
|
PiperOrigin-RevId: 303753027
|
|
/proc/[pid]/mount* omit mounts whose mount point is outside the chroot, which
is checked (indirectly) via __d_path().
PiperOrigin-RevId: 303434226
|
|
Both have analogues in Linux:
* struct file_system_type has a char *name field.
* struct super_block keeps a pointer to the file_system_type.
These fields are necessary to support the `filesystem type` field in
/proc/[pid]/mountinfo.
PiperOrigin-RevId: 303434063
|
|
Enables handling the Hop by Hop and Destination Options extension
headers, but options are not yet supported. All options will be
treated as unknown and their respective action will be followed.
Note, the stack does not yet support sending ICMPv6 error messages in
response to options that cannot be handled/parsed. That will come
in a later change (Issue #2211).
Tests:
- header_test.TestIPv6UnknownExtHdrOption
- header_test.TestIPv6OptionsExtHdrIterErr
- header_test.TestIPv6OptionsExtHdrIter
- ipv6_test.TestReceiveIPv6ExtHdrs
PiperOrigin-RevId: 303433085
|
|
BoundEndpointAt() is needed to support Unix sockets bound at a
file path, corresponding to BoundEndpoint() in VFS1.
Updates #1476.
PiperOrigin-RevId: 303258251
|
|
Using the host-defined file owner matches VFS1. It is more correct to use the
host-defined mode, since the cached value may become out of date. However,
kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are
in-memory so retrieving mode should not fail. Therefore, if the host syscall
fails, we rely on a cached value instead.
Updates #1672.
PiperOrigin-RevId: 303220864
|
|
PiperOrigin-RevId: 303212189
|
|
PiperOrigin-RevId: 303208407
|
|
Enables the reassembly of fragmented IPv6 packets and handling of the
Routing extension header with a Segments Left value of 0. Atomic
fragments are handled as described in RFC 6946 to not interfere with
"normal" fragment traffic. No specific routing header type is supported.
Note, the stack does not yet support sending ICMPv6 error messages in
response to IPv6 packets that cannot be handled/parsed. That will come
in a later change (Issue #2211).
Test:
- header_test.TestIPv6RoutingExtHdr
- header_test.TestIPv6FragmentExtHdr
- header_test.TestIPv6ExtHdrIterErr
- header_test.TestIPv6ExtHdrIter
- ipv6_test.TestReceiveIPv6ExtHdrs
- ipv6_test.TestReceiveIPv6Fragments
RELNOTES: n/a
PiperOrigin-RevId: 303189584
|
|
Analagous to Linux's mount.mnt_id. This ID is displayed in
/proc/[pid]/mountinfo.
PiperOrigin-RevId: 303185564
|
|
|
|
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
|
|
PiperOrigin-RevId: 303159175
|
|
PiperOrigin-RevId: 303158421
|
|
PiperOrigin-RevId: 303156734
|
|
PiperOrigin-RevId: 303147253
|
|
- Change receiver of endpoint lookup functions
- Remove unused struct fields and functions in test
- s/%v/%s/ for errors
- Capitalize NIC
https://github.com/golang/go/wiki/CodeReviewComments#initialisms
PiperOrigin-RevId: 303119580
|
|
PiperOrigin-RevId: 303105826
|
|
Updates #1035
PiperOrigin-RevId: 303021328
|
|
Tests were run assuming a runtime of "runsc" was present, and did not
have --net-raw enabled.
|
|
PiperOrigin-RevId: 303010530
|
|
This enables all relevant santizers (though most analyzers will not find
much, it will prevent instances from creeping in), and codifies existing
exceptions in tools/nogo.js to be fixed.
|
|
There is a canonical naming convention for Examples, which are checked
by analyzers. This must be fixed since adding exceptions for generated
code will be more challenging.
|
|
PiperOrigin-RevId: 302987344
|
|
It's possible to execute the command that checks user's
$HOME dir before the user is created. Move the code that
creates the user inside exec so it can be serialized.
PiperOrigin-RevId: 302986184
|
|
Pushing it down requires all implementation to check for
exec individualy which is not maintanable. Making it part
of GenericCheckPermissions add extra cost to everyone that
calls it. So it's better to keep is in
VirtualFilesystem.OpenAt.
Updates #1193
PiperOrigin-RevId: 302982993
|
|
The only test failing now requires socket which is not
available in VFS2 yet.
Updates #1198
PiperOrigin-RevId: 302976572
|
|
Makes less error prone to find file type.
Updates #1197
PiperOrigin-RevId: 302974244
|
|
When copybara migrates changes, it creates a new branch and then creates a
pull-requests which is based on this branch. In this case, travis-ci
triggers build twice for the branch and for the pull-request.
PiperOrigin-RevId: 302930634
|
|
- Fix definitions of Futex* wrappers.
- Correctly handle glibc syscall() (which returns -1 and sets errno instead of
returning the raw syscall return value).
- De-parameterize FutexWaitBitset, which was apparently intended to test with
deadlines of between 0 and 100000 nanoseconds after the Unix epoch, but was
broken due to the preceding two issues.
- Use wall time to measure the durations of tests that are expected to block
(and thus stop accumulating CPU time).
- Require 5s for all tests to improve robustness in the presence of sentry GC.
- Remove FutexContend and FutexContendDeadline; it's unclear what these are
supposed to measure, given that (1) FutexLock is unrealistically inefficient
and (2) the benchmark rewards slow scheduling (since this reduces
contention).
PiperOrigin-RevId: 302925246
|
|
PiperOrigin-RevId: 302924789
|
|
PiperOrigin-RevId: 302891559
|
|
This allows the link layer endpoints to consistenly hash a TCP
segment to a single underlying queue in case a link layer endpoint
does support multiple underlying queues.
Updates #231
PiperOrigin-RevId: 302760664
|
|
In cl/302130790, we started using a temp directory which is provided by bazel.
By default, a test process has enough permissions to open it, but there is not
any guarantee that it still will be able to do this after changing credentials.
PiperOrigin-RevId: 302702337
|
|
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.
Updates #2214
PiperOrigin-RevId: 302677662
|
|
Fixes #506
PiperOrigin-RevId: 302540404
|
|
PiperOrigin-RevId: 302539171
|
|
PiperOrigin-RevId: 302518924
|
|
PiperOrigin-RevId: 302506064
|
|
The posix_server works fine when run in locally or in docker but fails in the
kokoro GCP build environment. Linking libpthread statically fixes it.
PiperOrigin-RevId: 302139082
|
|
The root mount is not shared by default, but all other mounts are shared.
So if we create the /tmp mount, this means that we run tests on a shared mount
even if tests run without the --shared option.
PiperOrigin-RevId: 302130790
|
|
Updates #231
PiperOrigin-RevId: 302127697
|
|
PiperOrigin-RevId: 302110328
|
|
utimensat is used by hostfs for setting timestamps on imported fds. Previously,
this would crash the sandbox since utimensat was not allowed.
Correct the VFS2 version of hostfs to match the call in VFS1.
PiperOrigin-RevId: 301970121
|
|
PiperOrigin-RevId: 301949722
|