summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot/loader.go
AgeCommit message (Collapse)Author
2019-08-02Stops container if gofer is killedFabricio Voznika
Each gofer now has a goroutine that polls on the FDs used to communicate with the sandbox. The respective gofer is destroyed if any of the FDs is closed. Closes #601 PiperOrigin-RevId: 261383725
2019-08-02Remove kernel.mounts.Nicolas Lacasse
We can get the mount namespace from the CreateProcessArgs in all cases where we need it. This also gets rid of kernel.Destroy method, since the only thing it was doing was DecRefing the mounts. Removing the need to call kernel.SetRootMountNamespace also allowed for some more simplifications in the container fs setup code. PiperOrigin-RevId: 261357060
2019-07-26Merge pull request #452 from zhangningdlut:chris_test_pidnsgVisor bot
PiperOrigin-RevId: 260220279
2019-07-24Use different pidns among different containerschris.zn
The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-07-23Give each container a distinct MountNamespace.Nicolas Lacasse
This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346
2019-07-08Merge pull request #375 from jmgao:mastergVisor bot
PiperOrigin-RevId: 257041876
2019-07-03Avoid importing platforms from many source filesAndrei Vagin
PiperOrigin-RevId: 256494243
2019-07-02Remove map from fd_map, change to fd_table.Adin Scannell
This renames FDMap to FDTable and drops the kernel.FD type, which had an entire package to itself and didn't serve much use (it was freely cast between types, and served as more of an annoyance than providing any protection.) Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup operation, and 10-15 ns per concurrent lookup operation of savings. This also fixes two tangential usage issues with the FDMap. Namely, non-atomic use of NewFDFrom and associated calls to Remove (that are both racy and fail to drop the reference on the underlying file.) PiperOrigin-RevId: 256285890
2019-06-28Add finalizer on AtomicRefCount to check for leaks.Ian Gudger
PiperOrigin-RevId: 255711454
2019-06-14Enable Receive Buffer Auto-Tuning for runsc.Bhasker Hariharan
Updates #230 PiperOrigin-RevId: 253225078
2019-06-13Plumb context through more layers of filesytem.Ian Gudger
All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-13Set the HOME environment variable (fixes #293)Ian Lewis
runsc will now set the HOME environment variable as required by POSIX. The user's home directory is retrieved from the /etc/passwd file located on the container's file system during boot. PiperOrigin-RevId: 253120627
2019-06-13Fix use of "2 ^ 30".Josh Gao
2 ^ 30 is 28, not 1073741824.
2019-06-12gvisor/runsc: apply seccomp filters before parsing a state fileAndrei Vagin
PiperOrigin-RevId: 252869983
2019-06-11Add support to mount pod shared tmpfs mountsFabricio Voznika
Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037
2019-06-07Move //pkg/sentry/memutil to //pkg/memutil.Jamie Liu
PiperOrigin-RevId: 252124156
2019-06-06Implement reclaim-driven MemoryFile eviction.Jamie Liu
PiperOrigin-RevId: 251950660
2019-06-03Refactor container FS setupFabricio Voznika
No change in functionaly. Added containerMounter object to keep state while the mounts are processed. This will help upcoming changes to share mounts per-pod. PiperOrigin-RevId: 251350096
2019-06-03Remove 'clearStatus' option from container.Wait*PID()Fabricio Voznika
clearStatus was added to allow detached execution to wait on the exec'd process and retrieve its exit status. However, it's not currently used. Both docker and gvisor-containerd-shim wait on the "shim" process and retrieve the exit status from there. We could change gvisor-containerd-shim to use waits, but it will end up also consuming a process for the wait, which is similar to having the shim process. Closes #234 PiperOrigin-RevId: 251349490
2019-06-03Remove spurious periodMichael Pratt
PiperOrigin-RevId: 251288885
2019-05-15Cleanup around urpc file payload handlingFabricio Voznika
urpc always closes all files once the RPC function returns. PiperOrigin-RevId: 248406857 Change-Id: I400a8562452ec75c8e4bddc2154948567d572950
2019-04-30Implement async MemoryFile eviction, and use it in CachingInodeOperations.Jamie Liu
This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29Allow and document bug ids in gVisor codebase.Nicolas Lacasse
PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-26Make raw sockets a toggleable feature disabled by default.Kevin Krakauer
PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-17Use FD limit and file size limit from hostFabricio Voznika
FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-03-14Decouple filemem from platform and move it to pgalloc.MemoryFile.Jamie Liu
This is in preparation for improved page cache reclaim, which requires greater integration between the page cache and page allocator. PiperOrigin-RevId: 238444706 Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4
2019-03-12Make HandleLocal apply to all non-loopback interfaces.Ian Gudger
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify the implementations. This has the benefit of making HandleLocal apply even when the fdbased link endpoint isn't in use. In addition, move looping logic to route creation so that it doesn't need to be run for each packet. This should improve performance. PiperOrigin-RevId: 238099480 Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
2019-03-11Add profiling commands to runscFabricio Voznika
Example: runsc debug --root=<dir> \ --profile-heap=/tmp/heap.prof \ --profile-cpu=/tmp/cpu.prod --profile-delay=30 \ <container ID> PiperOrigin-RevId: 237848456 Change-Id: Icff3f20c1b157a84d0922599eaea327320dad773
2019-02-22Rename ping endpoints to icmp endpoints.Kevin Krakauer
PiperOrigin-RevId: 235248572 Change-Id: I5b0538b6feb365a98712c2a2d56d856fe80a8a09
2019-02-14Don't allow writing or reading to TTY unless process group is in foreground.Nicolas Lacasse
If a background process tries to read from a TTY, linux sends it a SIGTTIN unless the signal is blocked or ignored, or the process group is an orphan, in which case the syscall returns EIO. See drivers/tty/n_tty.c:n_tty_read()=>job_control(). If a background process tries to write a TTY, set the termios, or set the foreground process group, linux then sends a SIGTTOU. If the signal is ignored or blocked, linux allows the write. If the process group is an orphan, the syscall returns EIO. See drivers/tty/tty_io.c:tty_check_change(). PiperOrigin-RevId: 234044367 Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9
2019-01-31runsc: check whether a container is deleted or not before setupContainerFSAndrei Vagin
PiperOrigin-RevId: 231811387 Change-Id: Ib143fb9a4d0fa1f105d1a3a3bd533dfc44e792af
2019-01-18Scrub runsc error messagesFabricio Voznika
Removed "error" and "failed to" prefix that don't add value from messages. Adjusted a few other messages. In particular, when the container fail to start, the message returned is easier for humans to read: $ docker run --rm --runtime=runsc alpine foobar docker: Error response from daemon: OCI runtime start failed: <path> did not terminate sucessfully: starting container: starting root container [foobar]: starting sandbox: searching for executable "foobar", cwd: "/", $PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin": no such file or directory Closes #77 PiperOrigin-RevId: 230022798 Change-Id: I83339017c70dae09e4f9f8e0ea2e554c4d5d5cd1
2019-01-02Automated rollback of changelist 225089593Michael Pratt
PiperOrigin-RevId: 227595007 Change-Id: If14cc5aab869c5fd7a4ebd95929c887ab690e94c
2018-12-28Simplify synchronization between runsc and sandbox processFabricio Voznika
Make 'runsc create' join cgroup before creating sandbox process. This removes the need to synchronize platform creation and ensure that sandbox process is charged to the right cgroup from the start. PiperOrigin-RevId: 227166451 Change-Id: Ieb4b18e6ca0daf7b331dc897699ca419bc5ee3a2
2018-12-11Add "trace signal" optionMichael Pratt
This option is effectively equivalent to -panic-signal, except that the sandbox does not die after logging the traceback. PiperOrigin-RevId: 225089593 Change-Id: Ifb1c411210110b6104613f404334bd02175e484e
2018-11-13Internal change.Nicolas Lacasse
PiperOrigin-RevId: 221299066 Change-Id: I8ae352458f9976c329c6946b1efa843a3de0eaa4
2018-11-09Close donated files if containerManager.Start() failsFabricio Voznika
PiperOrigin-RevId: 220869535 Change-Id: I9917e5daf02499f7aab6e2aa4051c54ff4461b9a
2018-11-05Fix race between start and destroyFabricio Voznika
Before this change, a container starting up could race with destroy (aka delete) and leave processes behind. Now, whenever a container is created, Loader.processes gets a new entry. Start now expects the entry to be there, and if it's not it means that the container was deleted. I've also fixed Loader.waitPID to search for the process using the init process's PID namespace. We could use a few more tests for signal and wait. I'll send them in another cl. PiperOrigin-RevId: 220224290 Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
2018-11-05Log when external signal is receivedFabricio Voznika
PiperOrigin-RevId: 220204591 Change-Id: I21a9c6f5c12a376d18da5d10c1871837c4f49ad2
2018-11-01Make error messages a bit more user friendly.Ian Lewis
Updated error messages so that it doesn't print full Go struct representations when running a new container in a sandbox. For example, this occurs frequently when commands are not found when doing a 'kubectl exec'. PiperOrigin-RevId: 219729141 Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
2018-10-29Removes outdated TODO.Kevin Krakauer
PiperOrigin-RevId: 219151173 Change-Id: I73014ea648ae485692ea0d44860c87f4365055cb
2018-10-19Use correct company name in copyright headerIan Gudger
PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-17runsc: Support job control signals for the root container.Nicolas Lacasse
Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17runsc: Add --pid flag to runsc kill.Kevin Krakauer
--pid allows specific processes to be signalled rather than the container root process or all processes in the container. containerd needs to SIGKILL exec'd processes that timeout and check whether processes are still alive. PiperOrigin-RevId: 217547636 Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-11Add bare bones unsupported syscall loggingFabricio Voznika
This change introduces a new flags to create/run called --user-log. Logs to this files are visible to users and are meant to help debugging problems with their images and containers. For now only unsupported syscalls are sent to this log, and only minimum support was added. We can build more infrastructure around it as needed. PiperOrigin-RevId: 216735977 Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-10Removes irrelevant TODO.Kevin Krakauer
PiperOrigin-RevId: 216616873 Change-Id: I4d974ab968058eadd01542081e18a987ef08f50a
2018-10-10Support for older Linux kernels without getrandomJonathan Giannuzzi
Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578 PiperOrigin-RevId: 216591484
2018-10-10Add sandbox to cgroupFabricio Voznika
Sandbox creation uses the limits and reservations configured in the OCI spec and set cgroup options accordinly. Then it puts both the sandbox and gofer processes inside the cgroup. It also allows the cgroup to be pre-configured by the caller. If the cgroup already exists, sandbox and gofer processes will join the cgroup but it will not modify the cgroup with spec limits. PiperOrigin-RevId: 216538209 Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019