gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-04-16	Preserve log FD after execve	Fabricio Voznika
	PiperOrigin-RevId: 306908296
2020-04-14	Merge pull request #2212 from aaronlu:dup_stdioFDs	gVisor bot
	PiperOrigin-RevId: 306477639
2020-04-10	Add logging message for noNewPrivileges OCI option.	Ian Lewis
	noNewPrivileges is ignored if set to false since gVisor assumes that PR_SET_NO_NEW_PRIVS is always enabled. PiperOrigin-RevId: 305991947
2020-04-10	Use O_CLOEXEC when dup'ing FDs	Fabricio Voznika
	The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737
2020-04-09	Merge pull request #2253 from amscanne:nogo	gVisor bot
	PiperOrigin-RevId: 305807868
2020-04-09	Don't unconditionally set --panic-signal	Fabricio Voznika
	Closes #2393 PiperOrigin-RevId: 305793027
2020-04-08	Clean up TODOs	Fabricio Voznika
	PiperOrigin-RevId: 305592245
2020-04-08	Fix all printf formatting errors.	Adin Scannell
	Updates #2243
2020-04-08	Fix all copy locks violations.	Adin Scannell
	This required minor restructuring of how system call tables were saved and restored, but it makes way more sense this way. Updates #2243
2020-04-07	Add friendlier messages for frequently encountered errors.	Ian Lewis
	Issue #2270 Issue #1765 PiperOrigin-RevId: 305385436
2020-04-07	Update TODO to #238	Ian Lewis
	Move TODO to #238 so that proper synchronization of operations is handled when we create the urpc client. Issue #238 Fixes #512 PiperOrigin-RevId: 305383924
2020-04-07	Don't map the 0 uid into a sandbox user namespace	Andrei Vagin
	Starting with go1.13, we can specify ambient capabilities when we execute a new process with os/exe.Cmd. PiperOrigin-RevId: 305366706
2020-04-07	Remove TODOs for local gofer extended attributes.	Dean Deng
	PiperOrigin-RevId: 305344989
2020-04-01	Automated rollback of changelist 303799678	Adin Scannell
	PiperOrigin-RevId: 304221302
2020-03-31	checkpoint/restore: make sure the donated stdioFDs have the same value	Aaron Lu
	Suppose I start a runsc container using kvm platform like this: $ sudo runsc --debug=true --debug-log=1.txt --platform=kvm run rootbash The donating FD and the corresponding cmdline for runsc-sandbox is: D0313 17:50:12.608203 44389 x:0] Donating FD 3: "1.txt" D0313 17:50:12.608214 44389 x:0] Donating FD 4: "control_server_socket" D0313 17:50:12.608224 44389 x:0] Donating FD 5: "\|0" D0313 17:50:12.608229 44389 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:12.608234 44389 x:0] Donating FD 7: "\|1" D0313 17:50:12.608238 44389 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:12.608242 44389 x:0] Donating FD 9: "/dev/kvm" D0313 17:50:12.608246 44389 x:0] Donating FD 10: "/dev/stdin" D0313 17:50:12.608249 44389 x:0] Donating FD 11: "/dev/stdout" D0313 17:50:12.608253 44389 x:0] Donating FD 12: "/dev/stderr" D0313 17:50:12.608257 44389 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=kvm --strace=false --strace-syscalls=--strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --device-fd=9 --stdio-fds=10 --stdio-fds=11 --stdio-fds=12 --pidns=true --setup-root --cpu-num 32 --total-memory 4294967296 rootbash] Note stdioFDs starts from 10 with kvm platform and stderr's FD is 12. If I restore a container from the checkpoint image which is derived by checkpointing the above rootbash container, but either omit the platform switch or specify to use ptrace platform explicitely: $ sudo runsc --debug=true --debug-log=1.txt restore --image-path=some_path restored_rootbash the donating FD and corresponding cmdline for runsc-sandbox is: D0313 17:50:15.258632 44452 x:0] Donating FD 3: "1.txt" D0313 17:50:15.258640 44452 x:0] Donating FD 4: "control_server_socket" D0313 17:50:15.258645 44452 x:0] Donating FD 5: "\|0" D0313 17:50:15.258648 44452 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:15.258653 44452 x:0] Donating FD 7: "\|1" D0313 17:50:15.258657 44452 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:15.258661 44452 x:0] Donating FD 9: "/dev/stdin" D0313 17:50:15.258675 44452 x:0] Donating FD 10: "/dev/stdout" D0313 17:50:15.258680 44452 x:0] Donating FD 11: "/dev/stderr" D0313 17:50:15.258684 44452 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=ptrace --strace=false --strace-syscalls= --strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --stdio-fds=9 --stdio-fds=10 --stdio-fds=11 --setup-root --cpu-num 32 --total-memory 4294967296 restored_rootbash] Note this time, stdioFDs starts from 9 and stderr's FD is 11(so the saved host.descritor.origFD which is 12 for stderr is no longer valid). For the three host FD based files, The s.Dev and s.Ino derived from fstat(fd) shall all be the same and since the two fields are used as device.MultiDeviceKey, the host.inodeFileState.sattr.InodeId which is the value of MultiDevice.Map(MultiDeviceKey), shall also all be the same. Note that for MultiDevice m, m.cache records the mapping of key to value and m.rcache records the mapping of value to key. If same value doesn't map to the same key, it will panic on restore. Now that stderr's origFD 12 is no longer valid(it happens to be /memfd:runsc-memory in my test on restore), the s.Dev and s.Ino derived from fstat(fd=12) in host.inodeFileState.afterLoad() will neither be correct. But its InodeID is still the same as saved, MultiDevice.Load() will complain about the same value(InodeID) being mapped to different keys (different from stdin and stdout's) and panic with: "MultiDevice's caches are inconsistent". Solve this problem by making sure stdioFDs for root container's init task are always the same on initial start and on restore time, no matter what cmdline user has used: debug log specified or not, platform changed or not etc. shall not affect the ability to restore. Fixes #1844.
2020-03-30	kvm: handle exit reasons even under EINTR.	Adin Scannell
	In the case of other signals (preemption), inject a normal bounce and defer the signal until the vCPU has been returned from guest mode. PiperOrigin-RevId: 303799678
2020-03-26	Use host-defined file owner and mode, when possible, for imported fds.	Dean Deng
	Using the host-defined file owner matches VFS1. It is more correct to use the host-defined mode, since the cached value may become out of date. However, kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are in-memory so retrieving mode should not fail. Therefore, if the host syscall fails, we rely on a cached value instead. Updates #1672. PiperOrigin-RevId: 303220864
2020-03-19	Whitelist utimensat(2).	Dean Deng
	utimensat is used by hostfs for setting timestamps on imported fds. Previously, this would crash the sandbox since utimensat was not allowed. Correct the VFS2 version of hostfs to match the call in VFS1. PiperOrigin-RevId: 301970121
2020-03-19	Improve error message when pivot_root fails	Fabricio Voznika
	PiperOrigin-RevId: 301949722
2020-03-14	Plumb VFS2 imported fds into virtual filesystem.	Dean Deng
	- When setting up the virtual filesystem, mount a host.filesystem to contain all files that need to be imported. - Make read/preadv syscalls to the host in cases where preadv2 may not be supported yet (likewise for writing). - Make save/restore functions in kernel/kernel.go return early if vfs2 is enabled. PiperOrigin-RevId: 300922353
2020-03-12	Kill sandbox process when parent process terminates	Fabricio Voznika
	When the sandbox runs in attached more, e.g. runsc do, runsc run, the sandbox lifetime is controlled by the parent process. This wasn't working in all cases because PR_GET_PDEATHSIG doesn't propagate through execve when the process changes uid/gid. So it was getting dropped when the sandbox execve's to change to user nobody. PiperOrigin-RevId: 300601247
2020-03-11	runsc: Set asyncpreemptoff for the kvm platform	Andrei Vagin
	The asynchronous goroutine preemption is a new feature of Go 1.14. When we switched to go 1.14 (cl/297915917) in the bazel config, the kokoro syscall-kvm job started permanently failing. Lets temporary set asyncpreemptoff for the kvm platform to unblock tests. PiperOrigin-RevId: 300372387
2020-03-05	Merge pull request #1951 from moricho:moricho/add-profiler-option	gVisor bot
	PiperOrigin-RevId: 299233818
2020-03-05	tests: Don't print log messages on stdout	Andrei Vagin
	A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 299174138
2020-03-04	tests: Don't print log messages on stdout	Andrei Vagin
	A parser of test results doesn't expect to see any extra messages. PiperOrigin-RevId: 298966577
2020-02-28	Allow to specify a separate log for GO's runtime messages	Andrei Vagin
	GO's runtime calls the write system call twice to print "panic:" and "the reason of this panic", so here is a race window when other threads can print something to the log and we will see something like this: panic: log messages from another thread The reason of the panic. This confuses the syzkaller blacklist and dedup detection. It also makes the logs generally difficult to read. e.g., data races often have one side of the race, followed by a large "diagnosis" dump, finally followed by the other side of the race. PiperOrigin-RevId: 297887895
2020-02-27	Log oom_score_adj value on error	Fabricio Voznika
	Updates #1873 PiperOrigin-RevId: 297695241
2020-02-26	add profile option	moricho

2020-02-25	Port most syscalls to VFS2.	Jamie Liu
	pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2. mount and umount2 aren't ported out of temporary laziness. access and faccessat need additional FSImpl methods to implement properly, but are stubbed to prevent googletest from CHECK-failing. Other syscalls require additional plumbing. Updates #1623 PiperOrigin-RevId: 297188448
2020-02-25	Add log during process wait in tests	Fabricio Voznika
	TestMultiContainerKillAll timed out under --race. Without logging, we cannot tell if the process list is still increasing, but slowly, or is stuck. PiperOrigin-RevId: 297158834
2020-02-20	Initial network namespace support.	gVisor bot
	TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-19	Add statefile command to runsc.	Adin Scannell
	PiperOrigin-RevId: 296105337
2020-02-14	Synchronize signalling with S/R	gVisor bot
	This is to fix a data race between sending an external signal to a ThreadGroup and kernel saving state for S/R. PiperOrigin-RevId: 295244281
2020-02-14	Plumb VFS2 inside the Sentry	gVisor bot
	- Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950
2020-02-11	Disallow duplicate NIC names.	gVisor bot
	PiperOrigin-RevId: 294500858
2020-02-10	Clean-up comments in runsc/BUILD and CONTRIBUTING.md.	Adin Scannell
	PiperOrigin-RevId: 294300437
2020-02-10	Add flag package to limit visibility.	Adin Scannell
	PiperOrigin-RevId: 294297004
2020-02-07	Support listxattr and removexattr syscalls.	Dean Deng
	Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-06	Fix TestPauseResume in container test failed with connection refused.	Ting-Yu Wang
	Sometimes we get this error under TSAN: """ error getting process data from container: connecting to control server at PID XXXX: connection refused """ The theory is that the top "sleep 20" was too short for TSAN, and the container already exited, so we get connected refused. This commit changes the test to let container signaling it's running by touching a file repeatedly forever during the test. PiperOrigin-RevId: 293710957
2020-02-06	runsc/container_test: hide host /etc in test containers	Andrei Vagin
	The host /etc can contain config files which affect tests. For example, bash reads /etc/passwd and if it is too big a test can fail by timeout. PiperOrigin-RevId: 293670637
2020-02-05	Add notes to relevant tests.	Adin Scannell
	These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915
2020-02-04	Merge pull request #1683 from kevinGC:ipt-udp-matchers	gVisor bot
	PiperOrigin-RevId: 293243342
2020-02-04	Increase container_test size.	Kevin Krakauer
	container_test was flaking because a small percentage of runs timed out. Tested this fix with --runs_per_test=100. PiperOrigin-RevId: 293240102
2020-02-04	Allow mlock in fsgofer system call filters	Fabricio Voznika
	Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 293212935
2020-02-03	Reduce run time for //test/syscalls:socket_inet_loopback_test_runsc_ptrace.	Ting-Yu Wang
	* Tests are picked for a shard differently. It now picks one test from each block, instead of picking the whole block. This makes the same kind of tests spreads across different shards. * Reduce the number of connect() calls in TCPListenClose. PiperOrigin-RevId: 293019281
2020-02-03	Tag version_test as noguitar.	Brad Burlage
	PiperOrigin-RevId: 292974323
2020-02-03	Allow mlock in system call filters	Michael Pratt
	Go 1.14 has a workaround for a Linux 5.2-5.4 bug which requires mlock'ing the g stack to prevent register corruption. We need to allow this syscall until it is removed from Go. PiperOrigin-RevId: 292967478
2020-01-28	Add vfs.FileDescription to FD table	Fabricio Voznika
	FD table now holds both VFS1 and VFS2 types and uses the correct one based on what's set. Parts of this CL are just initial changes (e.g. sys_read.go, runsc/main.go) to serve as a template for the remaining changes. Updates #1487 Updates #1623 PiperOrigin-RevId: 292023223
2020-01-27	Cleanup glog and add real caller information.	Adin Scannell
	In general, we've learned that logging must be avoided at all costs in the hot path. It's unlikely that the optimizations here were significant in any case, since buffer would certainly escape. This also adds a test to ensure that the caller identification works as expected, and so that logging can be benchmarked. Original: BenchmarkGoogleLogging-6 1222255 949 ns/op With this change: BenchmarkGoogleLogging-6 517323 2346 ns/op Fixes #184 PiperOrigin-RevId: 291815420
2020-01-27	Update package locations.	Adin Scannell
	Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289