gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-05-13	Enable overlayfs_stale_read by default for runsc.	Jamie Liu
	Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char *argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-11	Automated rollback of changelist 310417191	Bhasker Hariharan
	PiperOrigin-RevId: 310963404
2020-05-07	Automated rollback of changelist 309339316	Bhasker Hariharan
	PiperOrigin-RevId: 310417191
2020-04-30	Enable FIFO QDisc by default in runsc.	Bhasker Hariharan
	Updates #231 PiperOrigin-RevId: 309339316
2020-04-30	FIFO QDisc implementation	Bhasker Hariharan
	Updates #231 PiperOrigin-RevId: 309323808
2020-04-27	runsc: extend do network cleanup	Michael Pratt
	Previously we unconditionally failed to cleanup the networking files (hostname, resolve.conf, hosts), and failed to cleanup the netns, etc on partial setup failure. We can drop the iptables commands from cleanup, as the routes automatically go away when the device is deleted. Those commands were failing previously. Forward signals to the container, allowing it to exit normally when a signal is received, and then for runsc to run the cleanup. This doesn't cover cleanup when runsc is signalled before the container start, it covers the most common case. Fixes #2539 Fixes #2540
2020-04-17	Add test name to boot and gofer log files	Fabricio Voznika
	This is to make easier to find corresponding logs in case test fails. PiperOrigin-RevId: 307104283
2020-04-17	Get /bin/true to run on VFS2	Zach Koopmans
	Included: - loader_test.go RunTest and TestStartSignal VFS2 - container_test.go TestAppExitStatus on VFS2 - experimental flag added to runsc to turn on VFS2 Note: shared mounts are not yet supported. PiperOrigin-RevId: 307070753
2020-04-16	Preserve log FD after execve	Fabricio Voznika
	PiperOrigin-RevId: 306908296
2020-04-10	Use O_CLOEXEC when dup'ing FDs	Fabricio Voznika
	The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737
2020-04-08	Fix all copy locks violations.	Adin Scannell
	This required minor restructuring of how system call tables were saved and restored, but it makes way more sense this way. Updates #2243
2020-02-28	Allow to specify a separate log for GO's runtime messages	Andrei Vagin
	GO's runtime calls the write system call twice to print "panic:" and "the reason of this panic", so here is a race window when other threads can print something to the log and we will see something like this: panic: log messages from another thread The reason of the panic. This confuses the syzkaller blacklist and dedup detection. It also makes the logs generally difficult to read. e.g., data races often have one side of the race, followed by a large "diagnosis" dump, finally followed by the other side of the race. PiperOrigin-RevId: 297887895
2020-02-19	Add statefile command to runsc.	Adin Scannell
	PiperOrigin-RevId: 296105337
2020-02-10	Add flag package to limit visibility.	Adin Scannell
	PiperOrigin-RevId: 294297004
2020-01-27	Cleanup glog and add real caller information.	Adin Scannell
	In general, we've learned that logging must be avoided at all costs in the hot path. It's unlikely that the optimizations here were significant in any case, since buffer would certainly escape. This also adds a test to ensure that the caller identification works as expected, and so that logging can be benchmarked. Original: BenchmarkGoogleLogging-6 1222255 949 ns/op With this change: BenchmarkGoogleLogging-6 517323 2346 ns/op Fixes #184 PiperOrigin-RevId: 291815420
2019-12-17	Leave minimum CPU number as a constant	Aleksandr Razumov
	Remove introduced CPUNumMin config and hard-code it as 2.
2019-12-17	Add minimum CPU number and only lower CPUs on --cpu-num-from-quota	Aleksandr Razumov
	* Add `--cpu-num-min` flag to control minimum CPUs * Only lower CPU count * Fix comments
2019-12-15	Set CPU number to CPU quota	Aleksandr Razumov
	When application is not cgroups-aware, it can spawn excessive threads which often defaults to CPU number. Introduce a opt-in flag that will set CPU number accordingly to CPU quota (if available). Fixes #1391
2019-11-22	Force timezone initialization before filter installation	Michael Pratt
	The first use of time.Local (usually via time.Time.Date, et. al) performs initialization of the local timezone, which involves open several tzdata files from the host. Since filter installation disallows open, we should explicitly force this initialization rather than implicitly depending on the first logging (or other time) call occurring before filter installation. PiperOrigin-RevId: 282053121
2019-10-31	Add systemd-cgroup flag option.	Ian Lewis
	Adds a systemd-cgroup flag option that prints an error letting the user know that systemd cgroups are not supported and points them to the relevant issue. Issue #193 PiperOrigin-RevId: 277837162
2019-10-22	netstack/tcp: software segmentation offload	Andrei Vagin
	Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-16	Fix problem with open FD when copy up is triggered in overlayfs	Fabricio Voznika
	Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-09-25	Merge pull request #765 from trailofbits:uds_support	gVisor bot
	PiperOrigin-RevId: 271235134
2019-09-24	Refactor command line options and remove the allowed terminology for uds	Robert Tonic

2019-09-24	Merge pull request #812 from lubinszARM:pr_dup3_arm	gVisor bot
	PiperOrigin-RevId: 270957224
2019-09-19	Place the host UDS mounting behind --fsgofer-host-uds-allowed.	Robert Tonic
	This commit allows the use of the `--fsgofer-host-uds-allowed` flag to enable mounting sockets and add the appropriate seccomp filters.
2019-09-16	Bring back to life features lost in recent refactor	Fabricio Voznika
	- Sandbox logs are generated when running tests - Kokoro uploads the sandbox logs - Supports multiple parallel runs - Revive script to install locally built runsc with docker PiperOrigin-RevId: 269337274
2019-09-05	Change syscall.Dup2 to syscall.Dup3	Bin Lu
	Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-09-03	Impose order on test scripts.	Adin Scannell
	The simple test script has gotten out of control. Shard this script into different pieces and attempt to impose order on overall test structure. This change helps lay some of the foundations for future improvements. * The runsc/test directories are moved into just test/. * The runsc/test/testutil package is split into logical pieces. * The scripts/ directory contains new top-level targets. * Each test is now responsible for building targets it requires. * The install functionality is moved into `runsc` itself for simplicity. * The existing kokoro run_tests.sh file now just calls all (can be split). After this change is merged, I will create multiple distinct workflows for Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today, which should dramatically reduce the time-to-run for the Kokoro tests, and provides a better foundation for further improvements to the infrastructure. PiperOrigin-RevId: 267081397
2019-08-29	Merge pull request #655 from praveensastry:feature/runsc-ref-chk-leak	gVisor bot
	PiperOrigin-RevId: 266226714
2019-08-22	Log message sent before logging is setup	Fabricio Voznika
	Moved log message to after the log options have been read and log setup. PiperOrigin-RevId: 264964171
2019-08-22	Add log prefix for better clarity	praveensastry

2019-08-13	tests: print stack traces if test failed by timeout	Andrei Vagin
	PiperOrigin-RevId: 263184083
2019-08-09	Fix the Stringer for leak mode	praveensastry

2019-08-06	Remove traces option for ref leak mode	praveensastry

2019-08-06	Add option to configure reference leak checking	praveensastry

2019-07-26	runsc: propagate the alsologtostderr to sub-commands	Andrei Vagin
	PiperOrigin-RevId: 260239119
2019-07-03	Avoid importing platforms from many source files	Andrei Vagin
	PiperOrigin-RevId: 256494243
2019-06-13	Update canonical repository.	Adin Scannell
	This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-12	Allow 'runsc do' to run without root	Fabricio Voznika
	'--rootless' flag lets a non-root user execute 'runsc do'. The drawback is that the sandbox and gofer processes will run as root inside a user namespace that is mapped to the caller's user, intead of nobody. And network is defaulted to '--network=host' inside the root network namespace. On the bright side, it's very convenient for testing: runsc --rootless do ls runsc --rootless do curl www.google.com PiperOrigin-RevId: 252840970
2019-06-10	Add introspection for Linux/AMD64 syscalls	Ian Lewis
	Adds simple introspection for syscall compatibility information to Linux/AMD64. Syscalls registered in the syscall table now have associated metadata like name, support level, notes, and URLs to relevant issues. Syscall information can be exported as a table, JSON, or CSV using the new 'runsc help syscalls' command. Users can use this info to debug and get info on the compatibility of the version of runsc they are running or to generate documentation. PiperOrigin-RevId: 252558304
2019-06-06	Add alsologtostderr option	Fabricio Voznika
	When set sends log messages to the error log: sudo ./runsc --logtostderr do ls I0531 17:59:58.105064 144564 x:0] *************************** I0531 17:59:58.105087 144564 x:0] Args: [runsc --logtostderr do ls] I0531 17:59:58.105112 144564 x:0] PID: 144564 I0531 17:59:58.105125 144564 x:0] UID: 0, GID: 0 [...] PiperOrigin-RevId: 251964377
2019-06-06	Send error message to docker/kubectl exec on failure	Fabricio Voznika
	Containerd uses the last error message sent to the log to print as failure cause for create/exec. This required a few changes in the logging logic for runsc: - cmd.Errorf/Fatalf: now writes a message with 'error' level to containerd log, in addition to stderr and debug logs, like before. - log.Infof/Warningf/Fatalf: are not sent to containerd log anymore. They are mostly used for debugging and not useful to containerd. In most cases, --debug-log is enabled and this avoids the logs messages from being duplicated. - stderr is not used as default log destination anymore. Some commands assume stdio is for the container/process running inside the sandbox and it's better to never use it for logging. By default, logs are supressed now. PiperOrigin-RevId: 251881815
2019-06-06	Add multi-fd support to fdbased endpoint.	Bhasker Hariharan
	This allows an fdbased endpoint to have multiple underlying fd's from which packets can be read and dispatched/written to. This should allow for higher throughput as well as better scalability of the network stack as number of connections increases. Updates #231 PiperOrigin-RevId: 251852825
2019-04-29	Change copyright notice to "The gVisor Authors"	Michael Pratt
	Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-26	Make raw sockets a toggleable feature disabled by default.	Kevin Krakauer
	PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-11	Add 'runsc do' command	Fabricio Voznika
	It provides an easy way to run commands to quickly test gVisor. By default it maps the host root as the container root with a writable overlay on top (so the host root is not modified). Example: sudo runsc do ls -lh --color sudo runsc do ~/src/test/my-test.sh PiperOrigin-RevId: 243178711 Change-Id: I05f3d6ce253fe4b5f1362f4a07b5387f6ddb5dd9
2019-04-01	Add release hook and version flag	Adin Scannell
	PiperOrigin-RevId: 241421671 Change-Id: Ic0cebfe3efd458dc42c49f7f812c13318705199a
2019-03-29	gvisor/runsc: enable generic segmentation offload (GSO)	Andrei Vagin
	The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 241072403 Change-Id: I20b03063a1a6649362b43609cbbc9b59be06e6d5
2019-03-11	Add profiling commands to runsc	Fabricio Voznika
	Example: runsc debug --root=<dir> \ --profile-heap=/tmp/heap.prof \ --profile-cpu=/tmp/cpu.prod --profile-delay=30 \ <container ID> PiperOrigin-RevId: 237848456 Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773