summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot/loader.go
AgeCommit message (Collapse)Author
2018-11-13Internal change.Nicolas Lacasse
PiperOrigin-RevId: 221299066 Change-Id: I8ae352458f9976c329c6946b1efa843a3de0eaa4
2018-11-09Close donated files if containerManager.Start() failsFabricio Voznika
PiperOrigin-RevId: 220869535 Change-Id: I9917e5daf02499f7aab6e2aa4051c54ff4461b9a
2018-11-05Fix race between start and destroyFabricio Voznika
Before this change, a container starting up could race with destroy (aka delete) and leave processes behind. Now, whenever a container is created, Loader.processes gets a new entry. Start now expects the entry to be there, and if it's not it means that the container was deleted. I've also fixed Loader.waitPID to search for the process using the init process's PID namespace. We could use a few more tests for signal and wait. I'll send them in another cl. PiperOrigin-RevId: 220224290 Change-Id: I15146079f69904dc07d43c3b66cc343a2dab4cc4
2018-11-05Log when external signal is receivedFabricio Voznika
PiperOrigin-RevId: 220204591 Change-Id: I21a9c6f5c12a376d18da5d10c1871837c4f49ad2
2018-11-01Make error messages a bit more user friendly.Ian Lewis
Updated error messages so that it doesn't print full Go struct representations when running a new container in a sandbox. For example, this occurs frequently when commands are not found when doing a 'kubectl exec'. PiperOrigin-RevId: 219729141 Change-Id: Ic3a7bc84cd7b2167f495d48a1da241d621d3ca09
2018-10-29Removes outdated TODO.Kevin Krakauer
PiperOrigin-RevId: 219151173 Change-Id: I73014ea648ae485692ea0d44860c87f4365055cb
2018-10-19Use correct company name in copyright headerIan Gudger
PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-17runsc: Support job control signals for the root container.Nicolas Lacasse
Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17runsc: Add --pid flag to runsc kill.Kevin Krakauer
--pid allows specific processes to be signalled rather than the container root process or all processes in the container. containerd needs to SIGKILL exec'd processes that timeout and check whether processes are still alive. PiperOrigin-RevId: 217547636 Change-Id: I2058ebb548b51c8eb748f5884fb88bad0b532e45
2018-10-11Add bare bones unsupported syscall loggingFabricio Voznika
This change introduces a new flags to create/run called --user-log. Logs to this files are visible to users and are meant to help debugging problems with their images and containers. For now only unsupported syscalls are sent to this log, and only minimum support was added. We can build more infrastructure around it as needed. PiperOrigin-RevId: 216735977 Change-Id: I54427ca194604991c407d49943ab3680470de2d0
2018-10-10Removes irrelevant TODO.Kevin Krakauer
PiperOrigin-RevId: 216616873 Change-Id: I4d974ab968058eadd01542081e18a987ef08f50a
2018-10-10Support for older Linux kernels without getrandomJonathan Giannuzzi
Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578 PiperOrigin-RevId: 216591484
2018-10-10Add sandbox to cgroupFabricio Voznika
Sandbox creation uses the limits and reservations configured in the OCI spec and set cgroup options accordinly. Then it puts both the sandbox and gofer processes inside the cgroup. It also allows the cgroup to be pre-configured by the caller. If the cgroup already exists, sandbox and gofer processes will join the cgroup but it will not modify the cgroup with spec limits. PiperOrigin-RevId: 216538209 Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-09Add new netstack metrics to the sentryIan Gudger
PiperOrigin-RevId: 216431260 Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820
2018-10-08Job control signals must be sent to all processes in the FG process group.Nicolas Lacasse
We were previously only sending to the originator of the process group. Integration test was changed to test this behavior. It fails without the corresponding code change. PiperOrigin-RevId: 216297263 Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
2018-10-08Uncapitalize errorMichael Pratt
PiperOrigin-RevId: 216281263 Change-Id: Ie0c189e7f5934b77c6302336723bc1181fd2866c
2018-10-03runsc: Pass root container's stdio via FD.Nicolas Lacasse
We were previously using the sandbox process's stdio as the root container's stdio. This makes it difficult/impossible to distinguish output application output from sandbox output, such as panics, which are always written to stderr. Also close the console socket when we are done with it. PiperOrigin-RevId: 215585180 Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-10-01runsc: Support job control signals in "exec -it".Nicolas Lacasse
Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01Make multi-container the default mode for runscFabricio Voznika
And remove multicontainer option. PiperOrigin-RevId: 215236981 Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-09-28Make runsc kill and delete more conformant to the "spec"Fabricio Voznika
PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-27Merge Loader.containerRootTGs and execProcess into a single mapFabricio Voznika
It's easier to manage a single map with processes that we're interested to track. This will make the next change to clean up the map on destroy easier. PiperOrigin-RevId: 214894210 Change-Id: I099247323a0487cd0767120df47ba786fac0926d
2018-09-27Implement 'runsc kill --all'Fabricio Voznika
In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27Refactor 'runsc boot' to take container ID as argumentFabricio Voznika
This makes the flow slightly simpler (no need to call Loader.SetRootContainer). And this is required change to tag tasks with container ID inside the Sentry. PiperOrigin-RevId: 214795210 Change-Id: I6ff4af12e73bb07157f7058bb15fd5bb88760884
2018-09-20runsc: allow `runsc wait` on a container for multiple times.Lantao Liu
PiperOrigin-RevId: 213908919 Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7
2018-09-19runsc: Fix stdin/stdout/stderr in multi-container mode.Kevin Krakauer
The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
2018-09-19Add docker command line args support for --cpuset-cpus and --cpusLingfu
`docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json (runtime spec file). When nginx worker_connections is configured as auto, the worker is generated according to the number of CPUs. If the cgroup is already set on the host, but it is not displayed correctly in the sandbox, performance may be degraded. This patch can get cpus info from spec file and apply to sentry on bootup, so the /proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on `/sys/devices/system/cpu/online` are also affected by this patch. e.g. --cpuset-cpus=2,3 -> cpu number:2 --cpuset-cpus=4-7 -> cpu number:4 --cpus=2.8 -> cpu number:3 --cpus=0.5 -> cpu number:1 Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316 PiperOrigin-RevId: 213685199
2018-09-18Automated rollback of changelist 213307171Kevin Krakauer
PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-17Remove memory usage static initFabricio Voznika
panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17runsc: Enable waiting on exited processes.Kevin Krakauer
This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17runsc: Fix stdin/out/err in multi-container mode.Kevin Krakauer
Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-13runsc: Support container signal/wait.Lantao Liu
This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-11platform: Pass device fd into platform constructor.Nicolas Lacasse
We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-07Only start signal forwarding after init process is createdFabricio Voznika
PiperOrigin-RevId: 212028121 Change-Id: If9c2c62f3be103e2bb556b8d154c169888e34369
2018-09-07Remove '--file-access=direct' optionFabricio Voznika
It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07Use root abstract socket namespace for execFabricio Voznika
PiperOrigin-RevId: 211999211 Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d
2018-09-05runsc: Support runsc kill multi-container.Kevin Krakauer
Now, we can kill individual containers rather than the entire sandbox. PiperOrigin-RevId: 211748106 Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05runsc: Promote getExecutablePathInternal to getExecutablePath.Nicolas Lacasse
Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-08-28Add argument checks to seccompFabricio Voznika
This is required to increase protection when running in GKE. PiperOrigin-RevId: 210635123 Change-Id: Iaaa8be49e73f7a3a90805313885e75894416f0b5
2018-08-27Add command-line parameter to trigger panic on signalFabricio Voznika
This is to troubleshoot problems with a hung process that is not responding to 'runsc debug --stack' command. PiperOrigin-RevId: 210483513 Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
2018-08-27Put fsgofer inside chrootFabricio Voznika
Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-24runsc: Terminal support for "docker exec -ti".Nicolas Lacasse
This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24runsc: Allow runsc to properly search the PATH for executable name.Kevin Krakauer
Previously, runsc improperly attempted to find an executable in the container's PATH. We now search the PATH via the container's fsgofer rather than the host FS, eliminating the confusing differences between paths on the host and within a container. PiperOrigin-RevId: 210159488 Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2
2018-08-15runsc fsgofer: Support dynamic serving of filesystems.Kevin Krakauer
When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-08Enable SACK in runscFabricio Voznika
SACK is disabled by default and needs to be manually enabled. It not only improves performance, but also fixes hangs downloading files from certain websites. PiperOrigin-RevId: 207906742 Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b
2018-08-01Move stack clock to options structIan Gudger
PiperOrigin-RevId: 207039273 Change-Id: Ib8f55a6dc302052ab4a10ccd70b07f0d73b373df
2018-07-18Moved restore code out of create and made to be called after create.Justine Olshan
Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-13runsc: Fix map access race in boot.Loader.waitContainer.Nicolas Lacasse
PiperOrigin-RevId: 204522004 Change-Id: I4819dc025f0a1df03ceaaba7951b1902d44562b3
2018-07-12runsc: Don't close the control server in a defer.Nicolas Lacasse
Closing the control server will block until all open requests have completed. If a control server method panics, we end up stuck because the defer'd Destroy function will never return. PiperOrigin-RevId: 204354676 Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
2018-07-03Fix runsc VDSO mappingMichael Pratt
80bdf8a4068de3ac4a73b6b61a0cdcfe3e3571af accidentally moved vdso into an inner scope, never assigning the vdso variable passed to the Kernel and thus skipping VDSO mappings. Fix this and remove the ability for loadVDSO to skip VDSO mappings, since tests that do so are gone. PiperOrigin-RevId: 203169135 Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba
2018-06-29Sets the restore environment for restoring a container.Justine Olshan
Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c