summaryrefslogtreecommitdiffhomepage
path: root/runsc
AgeCommit message (Collapse)Author
2020-03-31checkpoint/restore: make sure the donated stdioFDs have the same valueAaron Lu
Suppose I start a runsc container using kvm platform like this: $ sudo runsc --debug=true --debug-log=1.txt --platform=kvm run rootbash The donating FD and the corresponding cmdline for runsc-sandbox is: D0313 17:50:12.608203 44389 x:0] Donating FD 3: "1.txt" D0313 17:50:12.608214 44389 x:0] Donating FD 4: "control_server_socket" D0313 17:50:12.608224 44389 x:0] Donating FD 5: "|0" D0313 17:50:12.608229 44389 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:12.608234 44389 x:0] Donating FD 7: "|1" D0313 17:50:12.608238 44389 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:12.608242 44389 x:0] Donating FD 9: "/dev/kvm" D0313 17:50:12.608246 44389 x:0] Donating FD 10: "/dev/stdin" D0313 17:50:12.608249 44389 x:0] Donating FD 11: "/dev/stdout" D0313 17:50:12.608253 44389 x:0] Donating FD 12: "/dev/stderr" D0313 17:50:12.608257 44389 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=kvm --strace=false --strace-syscalls=--strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --device-fd=9 --stdio-fds=10 --stdio-fds=11 --stdio-fds=12 --pidns=true --setup-root --cpu-num 32 --total-memory 4294967296 rootbash] Note stdioFDs starts from 10 with kvm platform and stderr's FD is 12. If I restore a container from the checkpoint image which is derived by checkpointing the above rootbash container, but either omit the platform switch or specify to use ptrace platform explicitely: $ sudo runsc --debug=true --debug-log=1.txt restore --image-path=some_path restored_rootbash the donating FD and corresponding cmdline for runsc-sandbox is: D0313 17:50:15.258632 44452 x:0] Donating FD 3: "1.txt" D0313 17:50:15.258640 44452 x:0] Donating FD 4: "control_server_socket" D0313 17:50:15.258645 44452 x:0] Donating FD 5: "|0" D0313 17:50:15.258648 44452 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:15.258653 44452 x:0] Donating FD 7: "|1" D0313 17:50:15.258657 44452 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:15.258661 44452 x:0] Donating FD 9: "/dev/stdin" D0313 17:50:15.258675 44452 x:0] Donating FD 10: "/dev/stdout" D0313 17:50:15.258680 44452 x:0] Donating FD 11: "/dev/stderr" D0313 17:50:15.258684 44452 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=ptrace --strace=false --strace-syscalls= --strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --stdio-fds=9 --stdio-fds=10 --stdio-fds=11 --setup-root --cpu-num 32 --total-memory 4294967296 restored_rootbash] Note this time, stdioFDs starts from 9 and stderr's FD is 11(so the saved host.descritor.origFD which is 12 for stderr is no longer valid). For the three host FD based files, The s.Dev and s.Ino derived from fstat(fd) shall all be the same and since the two fields are used as device.MultiDeviceKey, the host.inodeFileState.sattr.InodeId which is the value of MultiDevice.Map(MultiDeviceKey), shall also all be the same. Note that for MultiDevice m, m.cache records the mapping of key to value and m.rcache records the mapping of value to key. If same value doesn't map to the same key, it will panic on restore. Now that stderr's origFD 12 is no longer valid(it happens to be /memfd:runsc-memory in my test on restore), the s.Dev and s.Ino derived from fstat(fd=12) in host.inodeFileState.afterLoad() will neither be correct. But its InodeID is still the same as saved, MultiDevice.Load() will complain about the same value(InodeID) being mapped to different keys (different from stdin and stdout's) and panic with: "MultiDevice's caches are inconsistent". Solve this problem by making sure stdioFDs for root container's init task are always the same on initial start and on restore time, no matter what cmdline user has used: debug log specified or not, platform changed or not etc. shall not affect the ability to restore. Fixes #1844.
2020-03-30kvm: handle exit reasons even under EINTR.Adin Scannell
In the case of other signals (preemption), inject a normal bounce and defer the signal until the vCPU has been returned from guest mode. PiperOrigin-RevId: 303799678
2020-03-26Use host-defined file owner and mode, when possible, for imported fds.Dean Deng
Using the host-defined file owner matches VFS1. It is more correct to use the host-defined mode, since the cached value may become out of date. However, kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are in-memory so retrieving mode should not fail. Therefore, if the host syscall fails, we rely on a cached value instead. Updates #1672. PiperOrigin-RevId: 303220864
2020-03-19Whitelist utimensat(2).Dean Deng
utimensat is used by hostfs for setting timestamps on imported fds. Previously, this would crash the sandbox since utimensat was not allowed. Correct the VFS2 version of hostfs to match the call in VFS1. PiperOrigin-RevId: 301970121
2020-03-19Improve error message when pivot_root failsFabricio Voznika
PiperOrigin-RevId: 301949722
2020-03-14Plumb VFS2 imported fds into virtual filesystem.Dean Deng
- When setting up the virtual filesystem, mount a host.filesystem to contain all files that need to be imported. - Make read/preadv syscalls to the host in cases where preadv2 may not be supported yet (likewise for writing). - Make save/restore functions in kernel/kernel.go return early if vfs2 is enabled. PiperOrigin-RevId: 300922353
2020-03-12Kill sandbox process when parent process terminatesFabricio Voznika
When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247
2020-03-11runsc: Set asyncpreemptoff for the kvm platformAndrei Vagin
The asynchronous goroutine preemption is a new feature of Go 1.14. When we switched to go 1.14 (cl/297915917) in the bazel config, the kokoro syscall-kvm job started permanently failing. Lets temporary set asyncpreemptoff for the kvm platform to unblock tests. PiperOrigin-RevId: 300372387
2020-03-05Merge pull request #1951 from moricho:moricho/add-profiler-optiongVisor bot
PiperOrigin-RevId: 299233818
2020-03-05tests: Don't print log messages on stdoutAndrei Vagin
A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 299174138
2020-03-04tests: Don't print log messages on stdoutAndrei Vagin
A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 298966577
2020-02-28Allow to specify a separate log for GO's runtime messagesAndrei Vagin
GO's runtime calls the write system call twice to print "panic:" and "the reason of this panic", so here is a race window when other threads can print something to the log and we will see something like this: panic: log messages from another thread The reason of the panic. This confuses the syzkaller blacklist and dedup detection. It also makes the logs generally difficult to read. e.g., data races often have one side of the race, followed by a large "diagnosis" dump, finally followed by the other side of the race. PiperOrigin-RevId: 297887895
2020-02-27Log oom_score_adj value on errorFabricio Voznika
Updates #1873 PiperOrigin-RevId: 297695241
2020-02-26add profile optionmoricho
2020-02-25Port most syscalls to VFS2.Jamie Liu
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2. mount and umount2 aren't ported out of temporary laziness. access and faccessat need additional FSImpl methods to implement properly, but are stubbed to prevent googletest from CHECK-failing. Other syscalls require additional plumbing. Updates #1623 PiperOrigin-RevId: 297188448
2020-02-25Add log during process wait in testsFabricio Voznika
TestMultiContainerKillAll timed out under --race. Without logging, we cannot tell if the process list is still increasing, but slowly, or is stuck. PiperOrigin-RevId: 297158834
2020-02-20Initial network namespace support.gVisor bot
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-19Add statefile command to runsc.Adin Scannell
PiperOrigin-RevId: 296105337
2020-02-14Synchronize signalling with S/RgVisor bot
This is to fix a data race between sending an external signal to a ThreadGroup and kernel saving state for S/R. PiperOrigin-RevId: 295244281
2020-02-14Plumb VFS2 inside the SentrygVisor bot
- Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950
2020-02-11Disallow duplicate NIC names.gVisor bot
PiperOrigin-RevId: 294500858
2020-02-10Clean-up comments in runsc/BUILD and CONTRIBUTING.md.Adin Scannell
PiperOrigin-RevId: 294300437
2020-02-10Add flag package to limit visibility.Adin Scannell
PiperOrigin-RevId: 294297004
2020-02-07Support listxattr and removexattr syscalls.Dean Deng
Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-06Fix TestPauseResume in container test failed with connection refused.Ting-Yu Wang
Sometimes we get this error under TSAN: """ error getting process data from container: connecting to control server at PID XXXX: connection refused """ The theory is that the top "sleep 20" was too short for TSAN, and the container already exited, so we get connected refused. This commit changes the test to let container signaling it's running by touching a file repeatedly forever during the test. PiperOrigin-RevId: 293710957
2020-02-06runsc/container_test: hide host /etc in test containersAndrei Vagin
The host /etc can contain config files which affect tests. For example, bash reads /etc/passwd and if it is too big a test can fail by timeout. PiperOrigin-RevId: 293670637
2020-02-05Add notes to relevant tests.Adin Scannell
These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915
2020-02-04Merge pull request #1683 from kevinGC:ipt-udp-matchersgVisor bot
PiperOrigin-RevId: 293243342
2020-02-04Increase container_test size.Kevin Krakauer
container_test was flaking because a small percentage of runs timed out. Tested this fix with --runs_per_test=100. PiperOrigin-RevId: 293240102
2020-02-04Allow mlock in fsgofer system call filtersFabricio Voznika
Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 293212935
2020-02-03Reduce run time for //test/syscalls:socket_inet_loopback_test_runsc_ptrace.Ting-Yu Wang
* Tests are picked for a shard differently. It now picks one test from each block, instead of picking the whole block. This makes the same kind of tests spreads across different shards. * Reduce the number of connect() calls in TCPListenClose. PiperOrigin-RevId: 293019281
2020-02-03Tag version_test as noguitar.Brad Burlage
PiperOrigin-RevId: 292974323
2020-02-03Allow mlock in system call filtersMichael Pratt
Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 292967478
2020-01-28Add vfs.FileDescription to FD tableFabricio Voznika
FD table now holds both VFS1 and VFS2 types and uses the correct one based on what's set. Parts of this CL are just initial changes (e.g. sys_read.go, runsc/main.go) to serve as a template for the remaining changes. Updates #1487 Updates #1623 PiperOrigin-RevId: 292023223
2020-01-27Cleanup glog and add real caller information.Adin Scannell
In general, we've learned that logging must be avoided at all costs in the hot path. It's unlikely that the optimizations here were significant in any case, since buffer would certainly escape. This also adds a test to ensure that the caller identification works as expected, and so that logging can be benchmarked. Original: BenchmarkGoogleLogging-6 1222255 949 ns/op With this change: BenchmarkGoogleLogging-6 517323 2346 ns/op Fixes #184 PiperOrigin-RevId: 291815420
2020-01-27Update package locations.Adin Scannell
Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27Fix licenses.Adin Scannell
The preferred Copyright holder is "The gVisor Authors". PiperOrigin-RevId: 291786657
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021
2020-01-16Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.Dean Deng
There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300
2020-01-15Bump SO_SNDBUF for fdbased endpoint used by runsc.Bhasker Hariharan
Updates #231 PiperOrigin-RevId: 289897881
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2020-01-08Combine various Create*NIC methods into CreateNICWithOptions.Bert Muthalaly
PiperOrigin-RevId: 288779416
2020-01-08Add NIC.isLoopback()Bert Muthalaly
...enabling us to remove the "CreateNamedLoopbackNIC" variant of CreateNIC and all the plumbing to connect it through to where the value is read in FindRoute. PiperOrigin-RevId: 288713093
2019-12-18Increase waitForProcessList timeoutFabricio Voznika
It can take more than 10 seconds when running under --race. PiperOrigin-RevId: 286296060
2019-12-17Leave minimum CPU number as a constantAleksandr Razumov
Remove introduced CPUNumMin config and hard-code it as 2.
2019-12-17Add minimum CPU number and only lower CPUs on --cpu-num-from-quotaAleksandr Razumov
* Add `--cpu-num-min` flag to control minimum CPUs * Only lower CPU count * Fix comments
2019-12-15Set CPU number to CPU quotaAleksandr Razumov
When application is not cgroups-aware, it can spawn excessive threads which often defaults to CPU number. Introduce a opt-in flag that will set CPU number accordingly to CPU quota (if available). Fixes #1391
2019-12-12Add iptables testing framework.Kevin Krakauer
It would be preferrable to test iptables via syscall tests, but there are some problems with that approach: * We're limited to loopback-only, as syscall tests involve only a single container. Other link interfaces (e.g. fdbased) should be tested. * We'd have to shell out to call iptables anyways, as the iptables syscall interface itself is too large and complex to work with alone. * Running the Linux/native version of the syscall test will require root, which is a pain to configure, is inherently unsafe, and could leave host iptables misconfigured. Using the go_test target allows there to be no new test runner. PiperOrigin-RevId: 285274275
2019-12-11Enable IPv6 in runscBhasker Hariharan
Fixes #1341 PiperOrigin-RevId: 285108973
2019-12-11runsc/debug: add an option to list all processesAndrei Vagin
runsc debug --ps list all processes with all threads. This option is added to the debug command but not to the ps command, because it is going to be used for debug purposes and we want to add any useful information without thinking about backward compatibility. This will help to investigate syzkaller issues. PiperOrigin-RevId: 285013668