summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot
AgeCommit message (Collapse)Author
2018-09-17runsc: Fix stdin/out/err in multi-container mode.Kevin Krakauer
Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-13runsc: Support container signal/wait.Lantao Liu
This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-12runsc: Add exec flag that specifies where to save the sandbox-internal pid.Kevin Krakauer
This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12Remove getdents from filtersMichael Pratt
It was only used by whitelistfs, which was removed in bc81f3fe4a042a15343d2eab44da32d818ac1ade. PiperOrigin-RevId: 212666374 Change-Id: Ia35e6dc9d68c1a3b015d5b5f71ea3e68e46c5bed
2018-09-11Rollback of changelist 212483372Michael Pratt
PiperOrigin-RevId: 212557844 Change-Id: I414de848e75d57ecee2c05e851d05b607db4aa57
2018-09-11platform: Pass device fd into platform constructor.Nicolas Lacasse
We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-11Allow fstatat back in syscall filtersFabricio Voznika
PiperOrigin-RevId: 212483372 Change-Id: If95f32a8e41126cf3dc8bd6c8b2fb0fcfefedc6d
2018-09-07runsc: Support multi-container exec.Nicolas Lacasse
We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07Only start signal forwarding after init process is createdFabricio Voznika
PiperOrigin-RevId: 212028121 Change-Id: If9c2c62f3be103e2bb556b8d154c169888e34369
2018-09-07Remove '--file-access=direct' optionFabricio Voznika
It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07Use root abstract socket namespace for execFabricio Voznika
PiperOrigin-RevId: 211999211 Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d
2018-09-07runsc: Run sandbox process inside minimal chroot.Nicolas Lacasse
We construct a dir with the executable bind-mounted at /exe, and proc mounted at /proc. Runsc now executes the sandbox process inside this chroot, thus limiting access to the host filesystem. The mounts and chroot dir are removed when the sandbox is destroyed. Because this requires bind-mounts, we can only do the chroot if we have CAP_SYS_ADMIN. PiperOrigin-RevId: 211994001 Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-05runsc: Support runsc kill multi-container.Kevin Krakauer
Now, we can kill individual containers rather than the entire sandbox. PiperOrigin-RevId: 211748106 Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05Enabled bind mounts in sub-containersFabricio Voznika
With multi-gofers, bind mounts in sub-containers should just work. Removed restrictions and added test. There are also a few cleanups along the way, e.g. retry unmounting in case cleanup races with gofer teardown. PiperOrigin-RevId: 211699569 Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
2018-09-05runsc: Promote getExecutablePathInternal to getExecutablePath.Nicolas Lacasse
Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-04runsc: Run sandbox as user nobody.Nicolas Lacasse
When starting a sandbox without direct file or network access, we create an empty user namespace and run the sandbox in there. However, the root user in that namespace is still mapped to the root user in the parent namespace. This CL maps the "nobody" user from the parent namespace into the child namespace, and runs the sandbox process as user "nobody" inside the new namespace. PiperOrigin-RevId: 211572223 Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6
2018-09-04runsc: Pass log and config files to sandbox process by FD.Nicolas Lacasse
This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. The specutils.ReadSpecFromFile method was fixed to always seek to the beginning of the file before reading. This allows Files from the same FD to be read multiple times, as we do in the boot command when the apply-caps flag is set. Tested with --network=host. PiperOrigin-RevId: 211570647 Change-Id: I685be0a290aa7f70731ebdce82ebc0ebcc9d475c
2018-09-04Remove epoll_wait from filtersMichael Pratt
Go 1.11 replaced it with epoll_pwait. PiperOrigin-RevId: 211510006 Change-Id: I48a6cae95ed3d57a4633895358ad05ad8bf2f633
2018-08-31Automated rollback of changelist 210995199Fabricio Voznika
PiperOrigin-RevId: 211116429 Change-Id: I446d149c822177dc9fc3c64ce5e455f7f029aa82
2018-08-30runsc: Pass log and config files to sandbox process by FD.Nicolas Lacasse
This is a prereq for running the sandbox process as user "nobody", when it may not have permissions to open these files. Instead, we must open then before starting the sandbox process, and pass them by FD. PiperOrigin-RevId: 210995199 Change-Id: I715875a9553290b4a49394a8fcd93be78b1933dd
2018-08-28Add argument checks to seccompFabricio Voznika
This is required to increase protection when running in GKE. PiperOrigin-RevId: 210635123 Change-Id: Iaaa8be49e73f7a3a90805313885e75894416f0b5
2018-08-28Drop support for Go 1.10Michael Pratt
PiperOrigin-RevId: 210589588 Change-Id: Iba898bc3eb8f13e17c668ceea6dc820fc8180a70
2018-08-27Add command-line parameter to trigger panic on signalFabricio Voznika
This is to troubleshoot problems with a hung process that is not responding to 'runsc debug --stack' command. PiperOrigin-RevId: 210483513 Change-Id: I4377b210b4e51bc8a281ad34fd94f3df13d9187d
2018-08-27Put fsgofer inside chrootFabricio Voznika
Now each container gets its own dedicated gofer that is chroot'd to the rootfs path. This is done to add an extra layer of security in case the gofer gets compromised. PiperOrigin-RevId: 210396476 Change-Id: Iba21360a59dfe90875d61000db103f8609157ca0
2018-08-24runsc: Terminal support for "docker exec -ti".Nicolas Lacasse
This CL adds terminal support for "docker exec". We previously only supported consoles for the container process, but not exec processes. The SYS_IOCTL syscall was added to the default seccomp filter list, but only for ioctls that get/set winsize and termios structs. We need to allow these ioctl for all containers because it's possible to run "exec -ti" on a container that was started without an attached console, after the filters have been installed. Note that control-character signals are still not properly supported. Tested with: $ docker run --runtime=runsc -it alpine In another terminial: $ docker exec -it <containerid> /bin/sh PiperOrigin-RevId: 210185456 Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
2018-08-24runsc: Allow runsc to properly search the PATH for executable name.Kevin Krakauer
Previously, runsc improperly attempted to find an executable in the container's PATH. We now search the PATH via the container's fsgofer rather than the host FS, eliminating the confusing differences between paths on the host and within a container. PiperOrigin-RevId: 210159488 Change-Id: I228174dbebc4c5356599036d6efaa59f28ff28d2
2018-08-23Clean up syscall filtersFabricio Voznika
Removed syscalls that are only used by whitelistfs which has its own set of filters. PiperOrigin-RevId: 209967259 Change-Id: Idb2e1b9d0201043d7cd25d96894f354729dbd089
2018-08-15runsc fsgofer: Support dynamic serving of filesystems.Kevin Krakauer
When multiple containers run inside a sentry, each container has its own root filesystem and set of mounts. Containers are also added after sentry boot rather than all configured and known at boot time. The fsgofer needs to be able to serve the root filesystem of each container. Thus, it must be possible to add filesystems after the fsgofer has already started. This change: * Creates a URPC endpoint within the gofer process that listens for requests to serve new content. * Enables the sentry, when starting a new container, to add the new container's filesystem. * Mounts those new filesystems at separate roots within the sentry. PiperOrigin-RevId: 208903248 Change-Id: Ifa91ec9c8caf5f2f0a9eead83c4a57090ce92068
2018-08-15runsc: Fix instances of file access "proxy".Nicolas Lacasse
This file access type is actually called "proxy-shared", but I forgot to update all locations. PiperOrigin-RevId: 208832491 Change-Id: I7848bc4ec2478f86cf2de1dcd1bfb5264c6276de
2018-08-14runsc: Change cache policy for root fs and volume mounts.Nicolas Lacasse
Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-08Basic support for ip link/addr and ifconfigFabricio Voznika
Closes #94 PiperOrigin-RevId: 207997580 Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924
2018-08-08Resend packets back to netstack if destined to itselfFabricio Voznika
Add option to redirect packet back to netstack if it's destined to itself. This fixes the problem where connecting to the local NIC address would not work, e.g.: echo bar | nc -l -p 8080 & echo foo | nc 192.168.0.2 8080 PiperOrigin-RevId: 207995083 Change-Id: I17adc2a04df48bfea711011a5df206326a1fb8ef
2018-08-08Enable SACK in runscFabricio Voznika
SACK is disabled by default and needs to be manually enabled. It not only improves performance, but also fixes hangs downloading files from certain websites. PiperOrigin-RevId: 207906742 Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b
2018-08-01Move stack clock to options structIan Gudger
PiperOrigin-RevId: 207039273 Change-Id: Ib8f55a6dc302052ab4a10ccd70b07f0d73b373df
2018-07-31Drop dup2 filterMichael Pratt
It is unused. PiperOrigin-RevId: 206798328 Change-Id: I2d7d27c0e4a0ef51264b900f14f1b3fdad17f2c4
2018-07-18Moved restore code out of create and made to be called after create.Justine Olshan
Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-13runsc: Fix map access race in boot.Loader.waitContainer.Nicolas Lacasse
PiperOrigin-RevId: 204522004 Change-Id: I4819dc025f0a1df03ceaaba7951b1902d44562b3
2018-07-12runsc: Don't close the control server in a defer.Nicolas Lacasse
Closing the control server will block until all open requests have completed. If a control server method panics, we end up stuck because the defer'd Destroy function will never return. PiperOrigin-RevId: 204354676 Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
2018-07-11Automated rollback of changelist 203157739Bhasker Hariharan
PiperOrigin-RevId: 204196916 Change-Id: If632750fc6368acb835e22cfcee0ae55c8a04d16
2018-07-03Fix runsc VDSO mappingMichael Pratt
80bdf8a4068de3ac4a73b6b61a0cdcfe3e3571af accidentally moved vdso into an inner scope, never assigning the vdso variable passed to the Kernel and thus skipping VDSO mappings. Fix this and remove the ability for loadVDSO to skip VDSO mappings, since tests that do so are gone. PiperOrigin-RevId: 203169135 Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba
2018-07-03Skip overlay on root when its readonlyFabricio Voznika
PiperOrigin-RevId: 203161098 Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
2018-07-03Resend packets back to netstack if destined to itselfFabricio Voznika
Add option to redirect packet back to netstack if it's destined to itself. This fixes the problem where connecting to the local NIC address would not work, e.g.: echo bar | nc -l -p 8080 & echo foo | nc 192.168.0.2 8080 PiperOrigin-RevId: 203157739 Change-Id: I31c9f7c501e3f55007f25e1852c27893a16ac6c4
2018-07-03runsc: Mount "mandatory" mounts right after mounting the root.Nicolas Lacasse
The /proc and /sys mounts are "mandatory" in the sense that they should be mounted in the sandbox even when they are not included in the spec. Runsc treats /tmp similarly, because it is faster to use the internal tmpfs implementation instead of proxying to the host. However, the spec may contain submounts of these mandatory mounts (particularly for /tmp). In those cases, we must mount our mandatory mounts before the submount, otherwise the submount will be masked. Since the mandatory mounts are all top-level directories, we can mount them right after the root. PiperOrigin-RevId: 203145635 Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf
2018-07-02runsc/boot/filter: permit SYS_TIME for raceDmitry Vyukov
glibc's malloc also uses SYS_TIME. Permit it. #0 0x0000000000de6267 in time () #1 0x0000000000db19d8 in get_nprocs () #2 0x0000000000d8a31a in arena_get2.part () #3 0x0000000000d8ab4a in malloc () #4 0x0000000000d3c6b5 in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 140737488355328ull, 0ul, __sanitizer::SizeClassMap<3ul, 4ul, 8ul, 17ul, 64ul, 14ul>, 20ul, __sanitizer::TwoLevelByteMap<32768ull, 4096ull, __sanitizer::NoOpMapUnmapCallback>, __sanitizer::NoOpMapUnmapCallback> >*, unsigned long) () #5 0x0000000000d4cd70 in __tsan_go_start () #6 0x00000000004617a3 in racecall () #7 0x00000000010f4ea0 in runtime.findfunctab () #8 0x000000000043f193 in runtime.racegostart () Signed-off-by: Dmitry Vyukov <dvyukov@google.com> [mpratt@google.com: updated comments and commit message] Signed-off-by: Michael Pratt <mpratt@google.com> Change-Id: Ibe2d0dc3035bf5052d5fb802cfaa37c5e0e7a09a PiperOrigin-RevId: 203042627
2018-07-02Make default limits the same as with runcFabricio Voznika
Closes #2 PiperOrigin-RevId: 202997196 Change-Id: I0c9f6f5a8a1abe1ae427bca5f590bdf9f82a6675
2018-06-29Sets the restore environment for restoring a container.Justine Olshan
Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-28runsc: Add the "wait" subcommand.Kevin Krakauer
Users can now call "runsc wait <container id>" to wait on a particular process inside the container. -pid can also be used to wait on a specific PID. Manually tested the wait subcommand for a single waiter and multiple waiters (simultaneously 2 processes waiting on the container and 2 processes waiting on a PID within the container). PiperOrigin-RevId: 202548978 Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
2018-06-28Error out if spec is invalidFabricio Voznika
Closes #66 PiperOrigin-RevId: 202496258 Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
2018-06-28Add option to configure watchdog actionFabricio Voznika
PiperOrigin-RevId: 202494747 Change-Id: I4d4a18e71468690b785060e580a5f83c616bd90f
2018-06-26runsc: set gofer umask to 0.Lantao Liu
PiperOrigin-RevId: 202185642 Change-Id: I2eefcc0b2ffadc6ef21d177a8a4ab0cda91f3399