summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot
AgeCommit message (Collapse)Author
2018-08-14runsc: Change cache policy for root fs and volume mounts.Nicolas Lacasse
Previously, gofer filesystems were configured with the default "fscache" policy, which caches filesystem metadata and contents aggressively. While this setting is best for performance, it means that changes from inside the sandbox may not be immediately propagated outside the sandbox, and vice-versa. This CL changes volumes and the root fs configuration to use a new "remote-revalidate" cache policy which tries to retain as much caching as possible while still making fs changes visible across the sandbox boundary. This cache policy is enabled by default for the root filesystem. The default value for the "--file-access" flag is still "proxy", but the behavior is changed to use the new cache policy. A new value for the "--file-access" flag is added, called "proxy-exclusive", which turns on the previous aggressive caching behavior. As the name implies, this flag should be used when the sandbox has "exclusive" access to the filesystem. All volume mounts are configured to use the new cache policy, since it is safest and most likely to be correct. There is not currently a way to change this behavior, but it's possible to add such a mechanism in the future. The configurability is a smaller issue for volumes, since most of the expensive application fs operations (walking + stating files) will likely served by the root fs. PiperOrigin-RevId: 208735037 Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
2018-08-08Basic support for ip link/addr and ifconfigFabricio Voznika
Closes #94 PiperOrigin-RevId: 207997580 Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924
2018-08-08Resend packets back to netstack if destined to itselfFabricio Voznika
Add option to redirect packet back to netstack if it's destined to itself. This fixes the problem where connecting to the local NIC address would not work, e.g.: echo bar | nc -l -p 8080 & echo foo | nc 192.168.0.2 8080 PiperOrigin-RevId: 207995083 Change-Id: I17adc2a04df48bfea711011a5df206326a1fb8ef
2018-08-08Enable SACK in runscFabricio Voznika
SACK is disabled by default and needs to be manually enabled. It not only improves performance, but also fixes hangs downloading files from certain websites. PiperOrigin-RevId: 207906742 Change-Id: I4fb7277b67bfdf83ac8195f1b9c38265a0d51e8b
2018-08-01Move stack clock to options structIan Gudger
PiperOrigin-RevId: 207039273 Change-Id: Ib8f55a6dc302052ab4a10ccd70b07f0d73b373df
2018-07-31Drop dup2 filterMichael Pratt
It is unused. PiperOrigin-RevId: 206798328 Change-Id: I2d7d27c0e4a0ef51264b900f14f1b3fdad17f2c4
2018-07-18Moved restore code out of create and made to be called after create.Justine Olshan
Docker expects containers to be created before they are restored. However, gVisor restoring requires specificactions regarding the kernel and the file system. These actions were originally in booting the sandbox. Now setting up the file system is deferred until a call to a call to runsc start. In the restore case, the kernel is destroyed and a new kernel is created in the same process, as we need the same process for Docker. These changes required careful execution of concurrent processes which required the use of a channel. Full docker integration still needs the ability to restore into the same container. PiperOrigin-RevId: 205161441 Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
2018-07-13runsc: Fix map access race in boot.Loader.waitContainer.Nicolas Lacasse
PiperOrigin-RevId: 204522004 Change-Id: I4819dc025f0a1df03ceaaba7951b1902d44562b3
2018-07-12runsc: Don't close the control server in a defer.Nicolas Lacasse
Closing the control server will block until all open requests have completed. If a control server method panics, we end up stuck because the defer'd Destroy function will never return. PiperOrigin-RevId: 204354676 Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
2018-07-11Automated rollback of changelist 203157739Bhasker Hariharan
PiperOrigin-RevId: 204196916 Change-Id: If632750fc6368acb835e22cfcee0ae55c8a04d16
2018-07-03Fix runsc VDSO mappingMichael Pratt
80bdf8a4068de3ac4a73b6b61a0cdcfe3e3571af accidentally moved vdso into an inner scope, never assigning the vdso variable passed to the Kernel and thus skipping VDSO mappings. Fix this and remove the ability for loadVDSO to skip VDSO mappings, since tests that do so are gone. PiperOrigin-RevId: 203169135 Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba
2018-07-03Skip overlay on root when its readonlyFabricio Voznika
PiperOrigin-RevId: 203161098 Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
2018-07-03Resend packets back to netstack if destined to itselfFabricio Voznika
Add option to redirect packet back to netstack if it's destined to itself. This fixes the problem where connecting to the local NIC address would not work, e.g.: echo bar | nc -l -p 8080 & echo foo | nc 192.168.0.2 8080 PiperOrigin-RevId: 203157739 Change-Id: I31c9f7c501e3f55007f25e1852c27893a16ac6c4
2018-07-03runsc: Mount "mandatory" mounts right after mounting the root.Nicolas Lacasse
The /proc and /sys mounts are "mandatory" in the sense that they should be mounted in the sandbox even when they are not included in the spec. Runsc treats /tmp similarly, because it is faster to use the internal tmpfs implementation instead of proxying to the host. However, the spec may contain submounts of these mandatory mounts (particularly for /tmp). In those cases, we must mount our mandatory mounts before the submount, otherwise the submount will be masked. Since the mandatory mounts are all top-level directories, we can mount them right after the root. PiperOrigin-RevId: 203145635 Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf
2018-07-02runsc/boot/filter: permit SYS_TIME for raceDmitry Vyukov
glibc's malloc also uses SYS_TIME. Permit it. #0 0x0000000000de6267 in time () #1 0x0000000000db19d8 in get_nprocs () #2 0x0000000000d8a31a in arena_get2.part () #3 0x0000000000d8ab4a in malloc () #4 0x0000000000d3c6b5 in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 140737488355328ull, 0ul, __sanitizer::SizeClassMap<3ul, 4ul, 8ul, 17ul, 64ul, 14ul>, 20ul, __sanitizer::TwoLevelByteMap<32768ull, 4096ull, __sanitizer::NoOpMapUnmapCallback>, __sanitizer::NoOpMapUnmapCallback> >*, unsigned long) () #5 0x0000000000d4cd70 in __tsan_go_start () #6 0x00000000004617a3 in racecall () #7 0x00000000010f4ea0 in runtime.findfunctab () #8 0x000000000043f193 in runtime.racegostart () Signed-off-by: Dmitry Vyukov <dvyukov@google.com> [mpratt@google.com: updated comments and commit message] Signed-off-by: Michael Pratt <mpratt@google.com> Change-Id: Ibe2d0dc3035bf5052d5fb802cfaa37c5e0e7a09a PiperOrigin-RevId: 203042627
2018-07-02Make default limits the same as with runcFabricio Voznika
Closes #2 PiperOrigin-RevId: 202997196 Change-Id: I0c9f6f5a8a1abe1ae427bca5f590bdf9f82a6675
2018-06-29Sets the restore environment for restoring a container.Justine Olshan
Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-28runsc: Add the "wait" subcommand.Kevin Krakauer
Users can now call "runsc wait <container id>" to wait on a particular process inside the container. -pid can also be used to wait on a specific PID. Manually tested the wait subcommand for a single waiter and multiple waiters (simultaneously 2 processes waiting on the container and 2 processes waiting on a PID within the container). PiperOrigin-RevId: 202548978 Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
2018-06-28Error out if spec is invalidFabricio Voznika
Closes #66 PiperOrigin-RevId: 202496258 Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
2018-06-28Add option to configure watchdog actionFabricio Voznika
PiperOrigin-RevId: 202494747 Change-Id: I4d4a18e71468690b785060e580a5f83c616bd90f
2018-06-26runsc: set gofer umask to 0.Lantao Liu
PiperOrigin-RevId: 202185642 Change-Id: I2eefcc0b2ffadc6ef21d177a8a4ab0cda91f3399
2018-06-25runsc: add a `multi-container` flag to enable multi-container support.Lantao Liu
PiperOrigin-RevId: 201995800 Change-Id: I770190d135e14ec7da4b3155009fe10121b2a502
2018-06-25Fix lint errorsFabricio Voznika
PiperOrigin-RevId: 201978212 Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067
2018-06-22runsc: Enable waiting on individual containers within a sandbox.Kevin Krakauer
PiperOrigin-RevId: 201742160 Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31
2018-06-21Forward SIGUSR2 to the sandbox tooFabricio Voznika
SIGUSR2 was being masked out to be used as a way to dump sentry stacks. This could cause compatibility problems in cases anyone uses SIGUSR2 to communicate with the container init process. PiperOrigin-RevId: 201575374 Change-Id: I312246e828f38ad059139bb45b8addc2ed055d74
2018-06-21Added functionality to create a RestoreEnvironment.Justine Olshan
Before a container can be restored, the mounts must be configured. The root and submounts and their key information is compiled into a RestoreEnvironment. Future code will be added to set this created environment before restoring a container. Tests to ensure the correct environment were added. PiperOrigin-RevId: 201544637 Change-Id: Ia894a8b0f80f31104d1c732e113b1d65a4697087
2018-06-21runsc: Default umask should be 0.Nicolas Lacasse
PiperOrigin-RevId: 201539050 Change-Id: I36cbf270fa5ad25de507ecb919e4005eda6aa16d
2018-06-20Add 'runsc debug' commandFabricio Voznika
It prints sandbox stacks to the log to help debug stuckness. I expect that many more options will be added in the future. PiperOrigin-RevId: 201405931 Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e
2018-06-19runsc: Enable container creation within existing sandboxes.Kevin Krakauer
Containers are created as processes in the sandbox. Of the many things that don't work yet, the biggest issue is that the fsgofer is launched with its root as the sandbox's root directory. Thus, when a container is started and wants to read anything (including the init binary of the container), the gofer tries to serve from sandbox's root (which basically just has pause), not the container's. PiperOrigin-RevId: 201294560 Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1
2018-06-19runsc: Whitelist lstat, as it is now used in specutils.Kevin Krakauer
When running multi-container, child containers are added after the filters have been installed. Thus, lstat must be in the set of allowed syscalls. PiperOrigin-RevId: 201269550 Change-Id: I03f2e6675a53d462ed12a0f651c10049b76d4c52
2018-06-19Added a resume command to unpause a paused container.Justine Olshan
Resume checks the status of the container and unpauses the kernel if its status is paused. Otherwise nothing happens. Tests were added to ensure that the process is in the correct state after various commands. PiperOrigin-RevId: 201251234 Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19
2018-06-18Modified boot.go to allow for restores.Justine Olshan
A file descriptor was added as a flag to boot so a state file can restore a container that was checkpointed. PiperOrigin-RevId: 201068699 Change-Id: I18e96069488ffa3add468861397f3877725544aa
2018-06-18runsc: support "rw" mount option.Lantao Liu
PiperOrigin-RevId: 201018483 Change-Id: I52fe3d01c83c8a2f0e9275d9d88c37e46fa224a2
2018-06-15Added code for a pause command for a container process.Justine Olshan
Like runc, the pause command will pause the processes of the given container. It will set that container's status to "paused." A resume command will be be added to unpause and continue running the process. PiperOrigin-RevId: 200789624 Change-Id: I72a5d7813d90ecfc4d01cc252d6018855016b1ea
2018-06-15runsc: support /dev bind mount which does not conflict with default /dev mount.Lantao Liu
PiperOrigin-RevId: 200768923 Change-Id: I4b8da10bcac296e8171fe6754abec5aabfec5e65
2018-06-15Set kernel.applicationCores to the number of processor on the hostFabricio Voznika
The right number to use is the number of processors assigned to the cgroup. But until we make the sandbox join the respective cgroup, just use the number of processors on the host. Closes #65, closes #66 PiperOrigin-RevId: 200725483 Change-Id: I34a566b1a872e26c66f56fa6e3100f42aaf802b1
2018-06-14Add nanosleep filter for Go 1.11 supportMichael Pratt
golang.org/cl/108538 replaces pselect6 with nanosleep in runtime.usleep. Update the filters accordingly. PiperOrigin-RevId: 200574612 Change-Id: Ifb2296fcb3781518fc047aabbbffedb9ae488cd7
2018-06-13Fix failure to mount volume that sandbox process has no accessFabricio Voznika
Boot loader tries to stat mount to determine whether it's a file or not. This may file if the sandbox process doesn't have access to the file. Instead, add overlay on top of file, which is better anyway since we don't want to propagate changes to the host. PiperOrigin-RevId: 200411261 Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
2018-06-12runsc: do not include sub target if it is not started with '/'.Lantao Liu
PiperOrigin-RevId: 200274828 Change-Id: I956703217df08d8650a881479b7ade8f9f119912
2018-06-12Runsc checkpoint works.Brielle Broder
This is the first iteration of checkpoint that actually saves to a file. Tests for checkpoint are included. Ran into an issue when private unix sockets are enabled. An error message was added for this case and the mutex state was set. PiperOrigin-RevId: 200269470 Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
2018-06-12runsc: enable terminals in the sandbox.Kevin Krakauer
runsc now mounts the devpts filesystem, so you get a real terminal using ssh+sshd. PiperOrigin-RevId: 200244830 Change-Id: If577c805ad0138fda13103210fa47178d8ac6605
2018-06-12Enable debug logging in testsFabricio Voznika
Unit tests call runsc directly now, so all command line arguments are valid. On the other hand, enabling debug in the test binary doesn't affect runsc. It needs to be set in the config. PiperOrigin-RevId: 200237706 Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7
2018-06-08Drop capabilities not needed by GoferFabricio Voznika
PiperOrigin-RevId: 199808391 Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
2018-06-06Added a function to the controller to checkpoint a container.Googler
Functionality for checkpoint is not complete, more to come. PiperOrigin-RevId: 199500803 Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7
2018-06-04Create destination mount dir if it doesn't existFabricio Voznika
PiperOrigin-RevId: 199175296 Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
2018-06-01Add SyscallRules that supports argument filteringZhengyu He
PiperOrigin-RevId: 198919043 Change-Id: I7f1f0a3b3430cd0936a4ee4fc6859aab71820bdf
2018-05-24Configure sandbox as superuserFabricio Voznika
Container user might not have enough priviledge to walk directories and mount filesystems. Instead, create superuser to perform these steps of the configuration. PiperOrigin-RevId: 197953667 Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb
2018-05-17Implement sysv shm.Rahat Mahmood
PiperOrigin-RevId: 197058289 Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
2018-05-17Push signal-delivery and wait into the sandbox.Nicolas Lacasse
This is another step towards multi-container support. Previously, we delivered signals directly to the sandbox process (which then forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a container by waiting on the sandbox process itself. This approach will not work when there are multiple containers inside the sandbox, and we need to signal/wait on individual containers. This CL adds two new messages, ContainerSignal and ContainerWait. These messages include the id of the container to signal/wait. The controller inside the sandbox receives these messages and signals/waits on the appropriate process inside the sandbox. The container id is plumbed into the sandbox, but it currently is not used. We still end up signaling/waiting on PID 1 in all cases. Once we actually have multiple containers inside the sandbox, we will need to keep some sort of map of container id -> pid (or possibly pid namespace), and signal/kill the appropriate process for the container. PiperOrigin-RevId: 197028366 Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
2018-05-15Refactor the Sandbox package into Sandbox + Container.Nicolas Lacasse
This is a necessary prerequisite for supporting multiple containers in a single sandbox. All the commands (in cmd package) now call operations on Containers (container package). When a Container first starts, it will create a Sandbox with the same ID. The Sandbox class is now simpler, as it only knows how to create boot/gofer processes, and how to forward commands into the running boot process. There are TODOs sprinkled around for additional support for multiple containers. Most notably, we need to detect when a container is intended to run in an existing sandbox (by reading the metadata), and then have some way to signal to the sandbox to start a new container. Other urpc calls into the sandbox need to pass the container ID, so the sandbox can run the operation on the given container. These are only half-plummed through right now. PiperOrigin-RevId: 196688269 Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0