summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot
AgeCommit message (Collapse)Author
2020-05-13Merge release-20200422.0-297-gd846077 (automated)gVisor bot
2020-05-13Enable overlayfs_stale_read by default for runsc.Jamie Liu
Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char **argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-13Merge release-20200422.0-296-g18cb3d2 (automated)gVisor bot
2020-05-13Use VFS2 mount namesFabricio Voznika
Updates #1487 PiperOrigin-RevId: 311356385
2020-05-13Merge release-20200422.0-294-g305f786 (automated)gVisor bot
2020-05-12Adjust a few log messagesFabricio Voznika
PiperOrigin-RevId: 311234146
2020-05-11Merge release-20200422.0-62-gc52195d (automated)gVisor bot
2020-05-10Stop avoiding preadv2 and pwritev2, and add them to the filters.Nicolas Lacasse
Some code paths needed these syscalls anyways, so they should be included in the filters. Given that we depend on these syscalls in some cases, there's no real reason to avoid them any more. PiperOrigin-RevId: 310829126
2020-05-07Merge release-20200422.0-52-g9115f26 (automated)gVisor bot
2020-05-07Allocate device numbers for VFS2 filesystems.Jamie Liu
Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006
2020-05-07Merge release-20200422.0-51-g1f4087e (automated)gVisor bot
2020-05-07Merge release-20200422.0-45-g16da7e7 (automated)gVisor bot
2020-05-07Update privateunixsocket TODOs.Dean Deng
Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474
2020-05-06Merge release-20200422.0-37-g279f1eb (automated)gVisor bot
2020-05-06Fix runsc syscall documentation generation.Adin Scannell
We can register any number of tables with any number of architectures, and need not limit the definitions to the architecture in question. This allows runsc to generate documentation for all architectures simultaneously. Similarly, this simplifies the VFSv2 patching process. PiperOrigin-RevId: 310224827
2020-05-04Merge release-20200422.0-19-g0a307d0 (automated)gVisor bot
2020-05-04Mount VSFS2 filesystem using root credentialsFabricio Voznika
PiperOrigin-RevId: 309787938
2020-05-04Merge release-20200422.0-16-gcbc5bef (automated)gVisor bot
2020-05-04Add TTY support on VFS2 to runscFabricio Voznika
Updates #1623, #1487 PiperOrigin-RevId: 309777922
2020-04-30Merge release-20200422.0-7-gae15d90 (automated)gVisor bot
2020-04-30FIFO QDisc implementationBhasker Hariharan
Updates #231 PiperOrigin-RevId: 309323808
2020-04-29Merge release-20200413.0-22-gd5c34ba (automated)gVisor bot
2020-04-26refactor and add test for bindmountmoricho
Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-25add bind/rbind options for mountmoricho
Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-25fix behavior of `getMountNameAndOptions` when options include either bind or ↵moricho
rbind Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-25Merge release-20200323.0-248-g15a822a (automated)gVisor bot
2020-04-24VFS2: Get HelloWorld image tests to pass with VFS2Zach Koopmans
This change includes: - Modifications to loader_test.go to get TestCreateMountNamespace to pass with VFS2. - Changes necessary to get TestHelloWorld in image tests to pass with VFS2. This means runsc can run the hello-world container with docker on VSF2. Note: Containers that use sockets will not run with these changes. See "//test/image/...". Any tests here with sockets currently fail (which is all of them but HelloWorld). PiperOrigin-RevId: 308363072
2020-04-24Merge release-20200323.0-236-g632b104 (automated)gVisor bot
2020-04-24Plumb context.Context into kernfs.Inode.Open().Dean Deng
PiperOrigin-RevId: 308304793
2020-04-24Merge release-20200323.0-234-g1b88c63 (automated)gVisor bot
2020-04-24Move hostfs mount to Kernel struct.Dean Deng
This is needed to set up host fds passed through a Unix socket. Note that the host package depends on kernel, so we cannot set up the hostfs mount directly in Kernel.Init as we do for sockfs and pipefs. Also, adjust sockfs to make its setup look more like hostfs's and pipefs's. PiperOrigin-RevId: 308274053
2020-04-23Merge release-20200323.0-225-g5042ea7 (automated)gVisor bot
2020-04-23Add vfs.MkdirOptions.ForSyntheticMountpoint.Jamie Liu
PiperOrigin-RevId: 308143529
2020-04-23Simplify Docker test infrastructure.Adin Scannell
This change adds a layer of abstraction around the internal Docker APIs, and eliminates all direct dependencies on Dockerfiles in the infrastructure. A subsequent change will automated the generation of local images (with efficient caching). Note that this change drops the use of bazel container rules, as that experiment does not seem to be viable. PiperOrigin-RevId: 308095430
2020-04-23Merge release-20200323.0-216-ge69a871 (automated)gVisor bot
2020-04-22Move user home detection to its own library.Nicolas Lacasse
PiperOrigin-RevId: 307977689
2020-04-17Merge release-20200323.0-177-g12bde95 (automated)gVisor bot
2020-04-17Get /bin/true to run on VFS2Zach Koopmans
Included: - loader_test.go RunTest and TestStartSignal VFS2 - container_test.go TestAppExitStatus on VFS2 - experimental flag added to runsc to turn on VFS2 Note: shared mounts are not yet supported. PiperOrigin-RevId: 307070753
2020-04-14Merge release-20200323.0-152-gac9b32c (automated)gVisor bot
2020-04-14Merge pull request #2212 from aaronlu:dup_stdioFDsgVisor bot
PiperOrigin-RevId: 306477639
2020-04-10Merge release-20200323.0-128-g96f9142 (automated)gVisor bot
2020-04-10Use O_CLOEXEC when dup'ing FDsFabricio Voznika
The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737
2020-04-10Merge release-20200323.0-119-g7812661 (automated)gVisor bot
2020-04-08Fix all copy locks violations.Adin Scannell
This required minor restructuring of how system call tables were saved and restored, but it makes way more sense this way. Updates #2243
2020-04-08Merge release-20200323.0-89-g56054fc (automated)gVisor bot
2020-04-07Add friendlier messages for frequently encountered errors.Ian Lewis
Issue #2270 Issue #1765 PiperOrigin-RevId: 305385436
2020-03-31checkpoint/restore: make sure the donated stdioFDs have the same valueAaron Lu
Suppose I start a runsc container using kvm platform like this: $ sudo runsc --debug=true --debug-log=1.txt --platform=kvm run rootbash The donating FD and the corresponding cmdline for runsc-sandbox is: D0313 17:50:12.608203 44389 x:0] Donating FD 3: "1.txt" D0313 17:50:12.608214 44389 x:0] Donating FD 4: "control_server_socket" D0313 17:50:12.608224 44389 x:0] Donating FD 5: "|0" D0313 17:50:12.608229 44389 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:12.608234 44389 x:0] Donating FD 7: "|1" D0313 17:50:12.608238 44389 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:12.608242 44389 x:0] Donating FD 9: "/dev/kvm" D0313 17:50:12.608246 44389 x:0] Donating FD 10: "/dev/stdin" D0313 17:50:12.608249 44389 x:0] Donating FD 11: "/dev/stdout" D0313 17:50:12.608253 44389 x:0] Donating FD 12: "/dev/stderr" D0313 17:50:12.608257 44389 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=kvm --strace=false --strace-syscalls=--strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --device-fd=9 --stdio-fds=10 --stdio-fds=11 --stdio-fds=12 --pidns=true --setup-root --cpu-num 32 --total-memory 4294967296 rootbash] Note stdioFDs starts from 10 with kvm platform and stderr's FD is 12. If I restore a container from the checkpoint image which is derived by checkpointing the above rootbash container, but either omit the platform switch or specify to use ptrace platform explicitely: $ sudo runsc --debug=true --debug-log=1.txt restore --image-path=some_path restored_rootbash the donating FD and corresponding cmdline for runsc-sandbox is: D0313 17:50:15.258632 44452 x:0] Donating FD 3: "1.txt" D0313 17:50:15.258640 44452 x:0] Donating FD 4: "control_server_socket" D0313 17:50:15.258645 44452 x:0] Donating FD 5: "|0" D0313 17:50:15.258648 44452 x:0] Donating FD 6: "/home/ziqian.lzq/bundle/bash/runsc/config.json" D0313 17:50:15.258653 44452 x:0] Donating FD 7: "|1" D0313 17:50:15.258657 44452 x:0] Donating FD 8: "sandbox IO FD" D0313 17:50:15.258661 44452 x:0] Donating FD 9: "/dev/stdin" D0313 17:50:15.258675 44452 x:0] Donating FD 10: "/dev/stdout" D0313 17:50:15.258680 44452 x:0] Donating FD 11: "/dev/stderr" D0313 17:50:15.258684 44452 x:0] Starting sandbox: /proc/self/exe [runsc-sandbox --root=/run/containerd/runsc/default --debug=true --log= --max-threads=256 --reclaim-period=5 --log-format=text --debug-log=1.txt --debug-log-format=text --file-access=exclusive --overlay=false --fsgofer-host-uds=false --network=sandbox --log-packets=false --platform=ptrace --strace=false --strace-syscalls= --strace-log-size=1024 --watchdog-action=Panic --panic-signal=-1 --profile=false --net-raw=true --num-network-channels=1 --rootless=false --alsologtostderr=false --ref-leak-mode=disabled --gso=true --software-gso=true --overlayfs-stale-read=false --shared-volume= --debug-log-fd=3 --panic-signal=15 boot --bundle=/home/ziqian.lzq/bundle/bash/runsc --controller-fd=4 --mounts-fd=5 --spec-fd=6 --start-sync-fd=7 --io-fds=8 --stdio-fds=9 --stdio-fds=10 --stdio-fds=11 --setup-root --cpu-num 32 --total-memory 4294967296 restored_rootbash] Note this time, stdioFDs starts from 9 and stderr's FD is 11(so the saved host.descritor.origFD which is 12 for stderr is no longer valid). For the three host FD based files, The s.Dev and s.Ino derived from fstat(fd) shall all be the same and since the two fields are used as device.MultiDeviceKey, the host.inodeFileState.sattr.InodeId which is the value of MultiDevice.Map(MultiDeviceKey), shall also all be the same. Note that for MultiDevice m, m.cache records the mapping of key to value and m.rcache records the mapping of value to key. If same value doesn't map to the same key, it will panic on restore. Now that stderr's origFD 12 is no longer valid(it happens to be /memfd:runsc-memory in my test on restore), the s.Dev and s.Ino derived from fstat(fd=12) in host.inodeFileState.afterLoad() will neither be correct. But its InodeID is still the same as saved, MultiDevice.Load() will complain about the same value(InodeID) being mapped to different keys (different from stdin and stdout's) and panic with: "MultiDevice's caches are inconsistent". Solve this problem by making sure stdioFDs for root container's init task are always the same on initial start and on restore time, no matter what cmdline user has used: debug log specified or not, platform changed or not etc. shall not affect the ability to restore. Fixes #1844.
2020-03-26Merge release-20200219.0-251-g137f361 (automated)gVisor bot
2020-03-26Use host-defined file owner and mode, when possible, for imported fds.Dean Deng
Using the host-defined file owner matches VFS1. It is more correct to use the host-defined mode, since the cached value may become out of date. However, kernfs.Inode.Mode() does not return an error--other filesystems on kernfs are in-memory so retrieving mode should not fail. Therefore, if the host syscall fails, we rely on a cached value instead. Updates #1672. PiperOrigin-RevId: 303220864
2020-03-20Merge release-20200219.0-211-g248e46f (automated)gVisor bot