summaryrefslogtreecommitdiffhomepage
path: root/runsc
AgeCommit message (Collapse)Author
2018-09-26runsc: fix pid file race condition in exec detach mode.Lantao Liu
PiperOrigin-RevId: 214700295 Change-Id: I73d8490572eebe5da584af91914650d1953aeb91
2018-09-24runsc: All non-root bind mounts should be shared.Nicolas Lacasse
This CL changes the semantics of the "--file-access" flag so that it only affects the root filesystem. The default remains "exclusive" which is the common use case, as neither Docker nor K8s supports sharing the root. Keeping the root fs as "exclusive" means that the fs-intensive work done during application startup will mostly be cacheable, and thus faster. Non-root bind mounts will always be shared. This CL also removes some redundant FSAccessType validations. We validate this flag in main(), so we can assume it is valid afterwards. PiperOrigin-RevId: 214359936 Change-Id: I7e75d7bf52dbd7fa834d0aacd4034868314f3b51
2018-09-21Run gofmt -s on everythingIan Gudger
PiperOrigin-RevId: 214040901 Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21The "action" in container.Signal should be "signal".Nicolas Lacasse
PiperOrigin-RevId: 214038776 Change-Id: I4ad212540ec4ef4fb5ab5fdcb7f0865c4f746895
2018-09-21runsc: Synchronize container metadata changes with a file lock.Nicolas Lacasse
Each container has associated metadata (particularly the container status) that is manipulated by various runsc commands. This metadata is stored in a file identified by the container id. Different runsc processes may manipulate the same container metadata, and each will read/write to the metadata file. This CL adds a file lock per container which must be held when reading the container metadata file, and when modifying and writing the container metadata. PiperOrigin-RevId: 214019179 Change-Id: Ice4390ad233bc7f216c9a9a6cf05fb456c9ec0ad
2018-09-20Set Sandbox.Chroot so it gets cleaned up upon destructionFabricio Voznika
I've made several attempts to create a test, but the lack of permission from the test user makes it nearly impossible to test anything useful. PiperOrigin-RevId: 213922174 Change-Id: I5b502ca70cb7a6645f8836f028fb203354b4c625
2018-09-20runsc: allow `runsc wait` on a container for multiple times.Lantao Liu
PiperOrigin-RevId: 213908919 Change-Id: I74eff99a5360bb03511b946f4cb5658bb5fc40c7
2018-09-20Wait for all async fs operations to complete before returning from Destroy.Nicolas Lacasse
Destroy flushes dirent references, which triggers many async close operations. We must wait for those to finish before returning from Destroy, otherwise we may kill the gofer, causing a cascade of failing RPCs and leading to an inconsistent FS state. PiperOrigin-RevId: 213884637 Change-Id: Id054b47fc0f97adc5e596d747c08d3b97a1d1f71
2018-09-20runsc: Fix a bug that `runsc wait` doesn't work after container exits.Lantao Liu
PiperOrigin-RevId: 213849165 Change-Id: I5120b2f568850c0c42a08e8706e7f8653ef1bd94
2018-09-19runsc: Fix stdin/stdout/stderr in multi-container mode.Kevin Krakauer
The issue with the previous change was that the stdin/stdout/stderr passed to the sentry were dup'd by host.ImportFile. This left a dangling FD that by never closing caused containerd to timeout waiting on container stop. PiperOrigin-RevId: 213753032 Change-Id: Ia5e4c0565c42c8610d3b59f65599a5643b0901e4
2018-09-19Add container.Destroy urpc method.Nicolas Lacasse
This method will: 1. Stop the container process if it is still running. 2. Unmount all sanadbox-internal mounts for the container. 3. Delete the contaner root directory inside the sandbox. Destroy is idempotent, and safe to call concurrantly. This fixes a bug where after stopping a container, we cannot unmount the container root directory on the host. This bug occured because the sandbox dirent cache was holding a dirent with a host fd corresponding to a file inside the container root on the host. The dirent cache did not know that the container had exited, and kept the FD open, preventing us from unmounting on the host. Now that we unmount (and flush) all container mounts inside the sandbox, any host FDs donated by the gofer will be closed, and we can unmount the container root on the host. PiperOrigin-RevId: 213737693 Change-Id: I28c0ff4cd19a08014cdd72fec5154497e92aacc9
2018-09-19runsc: Mark container_test flaky.Kevin Krakauer
PiperOrigin-RevId: 213732520 Change-Id: Ife292987ec8b1de4c2e7e3b7d4452b00c1582e91
2018-09-19Fix sandbox and gofer capabilitiesFabricio Voznika
Capabilities.Set() adds capabilities, but doesn't remove existing ones that might have been loaded. Fixed the code and added tests. PiperOrigin-RevId: 213726369 Change-Id: Id7fa6fce53abf26c29b13b9157bb4c6616986fba
2018-09-19runsc: Don't create __runsc_containers__ unless we are in multi-container mode.Nicolas Lacasse
PiperOrigin-RevId: 213715511 Change-Id: I3e41b583c6138edbdeba036dfb9df4864134fc12
2018-09-19Add docker command line args support for --cpuset-cpus and --cpusLingfu
`docker run --cpuset-cpus=/--cpus=` will generate cpu resource info in config.json (runtime spec file). When nginx worker_connections is configured as auto, the worker is generated according to the number of CPUs. If the cgroup is already set on the host, but it is not displayed correctly in the sandbox, performance may be degraded. This patch can get cpus info from spec file and apply to sentry on bootup, so the /proc/cpuinfo can show the correct cpu numbers. `lscpu` and other commands rely on `/sys/devices/system/cpu/online` are also affected by this patch. e.g. --cpuset-cpus=2,3 -> cpu number:2 --cpuset-cpus=4-7 -> cpu number:4 --cpus=2.8 -> cpu number:3 --cpus=0.5 -> cpu number:1 Change-Id: Ideb22e125758d4322a12be7c51795f8018e3d316 PiperOrigin-RevId: 213685199
2018-09-18Added state machine checks for Container.StatusFabricio Voznika
For my own sanitity when thinking about possible transitions and state. PiperOrigin-RevId: 213559482 Change-Id: I25588c86cf6098be4eda01f4e7321c102ceef33c
2018-09-18Handle children processes better in testsFabricio Voznika
Reap children more systematically in container tests. Previously, container_test was taking ~5 mins to run because constainer.Destroy() would timeout waiting for the sandbox process to exit. Now the test running in less than a minute. Also made the contract around Container and Sandbox destroy clearer. PiperOrigin-RevId: 213527471 Change-Id: Icca84ee1212bbdcb62bdfc9cc7b71b12c6d1688d
2018-09-18Automated rollback of changelist 213307171Kevin Krakauer
PiperOrigin-RevId: 213504354 Change-Id: Iadd42f0ca4b7e7a9eae780bee9900c7233fb4f3f
2018-09-17Remove memory usage static initFabricio Voznika
panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17Rename container in testFabricio Voznika
's' used to stand for sandbox, before container exited. PiperOrigin-RevId: 213390641 Change-Id: I7bda94a50398c46721baa92227e32a7a1d817412
2018-09-17runsc: Enable waiting on exited processes.Kevin Krakauer
This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17runsc: Fix stdin/out/err in multi-container mode.Kevin Krakauer
Stdin/out/err weren't being sent to the sentry. PiperOrigin-RevId: 213307171 Change-Id: Ie4b634a58b1b69aa934ce8597e5cc7a47a2bcda2
2018-09-13runsc: Support container signal/wait.Lantao Liu
This CL: 1) Fix `runsc wait`, it now also works after the container exits; 2) Generate correct container state in Load; 2) Make sure `Destory` cleanup everything before successfully return. PiperOrigin-RevId: 212900107 Change-Id: Ie129cbb9d74f8151a18364f1fc0b2603eac4109a
2018-09-12runsc: Add exec flag that specifies where to save the sandbox-internal pid.Kevin Krakauer
This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12Remove getdents from filtersMichael Pratt
It was only used by whitelistfs, which was removed in bc81f3fe4a042a15343d2eab44da32d818ac1ade. PiperOrigin-RevId: 212666374 Change-Id: Ia35e6dc9d68c1a3b015d5b5f71ea3e68e46c5bed
2018-09-11Rollback of changelist 212483372Michael Pratt
PiperOrigin-RevId: 212557844 Change-Id: I414de848e75d57ecee2c05e851d05b607db4aa57
2018-09-11platform: Pass device fd into platform constructor.Nicolas Lacasse
We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-11Allow fstatat back in syscall filtersFabricio Voznika
PiperOrigin-RevId: 212483372 Change-Id: If95f32a8e41126cf3dc8bd6c8b2fb0fcfefedc6d
2018-09-10runsc: Chmod all mounted files to 777 inside chroot.Nicolas Lacasse
Inside the chroot, we run as user nobody, so all mounted files and directories must be accessible to all users. PiperOrigin-RevId: 212284805 Change-Id: I705e0dbbf15e01e04e0c7f378a99daffe6866807
2018-09-07Automated rollback of changelist 212059579Nicolas Lacasse
PiperOrigin-RevId: 212069131 Change-Id: I01476f957bbf29d4ee5a3c11d59d4f863ba9f2df
2018-09-07Automated rollback of changelist 211992321Nicolas Lacasse
PiperOrigin-RevId: 212066419 Change-Id: Icded56e7e117bfd9b644e6541bddcd110460a9b8
2018-09-07runsc: Support multi-container exec.Nicolas Lacasse
We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07Disable test until we figure out what's brokenFabricio Voznika
PiperOrigin-RevId: 212059579 Change-Id: I052c2192d3483d7bd0fd2232ef2023a12da66446
2018-09-07Add additional sanity checks for walk.Adin Scannell
PiperOrigin-RevId: 212058684 Change-Id: I319709b9ffcfccb3231bac98df345d2a20eca24b
2018-09-07Only start signal forwarding after init process is createdFabricio Voznika
PiperOrigin-RevId: 212028121 Change-Id: If9c2c62f3be103e2bb556b8d154c169888e34369
2018-09-07Remove '--file-access=direct' optionFabricio Voznika
It was used before gofer was implemented and it's not supported anymore. BREAKING CHANGE: proxy-shared and proxy-exclusive options are now: shared and exclusive. PiperOrigin-RevId: 212017643 Change-Id: If029d4073fe60583e5ca25f98abb2953de0d78fd
2018-09-07Use root abstract socket namespace for execFabricio Voznika
PiperOrigin-RevId: 211999211 Change-Id: I5968dd1a8313d3e49bb6e6614e130107495de41d
2018-09-07runsc: Run sandbox process inside minimal chroot.Nicolas Lacasse
We construct a dir with the executable bind-mounted at /exe, and proc mounted at /proc. Runsc now executes the sandbox process inside this chroot, thus limiting access to the host filesystem. The mounts and chroot dir are removed when the sandbox is destroyed. Because this requires bind-mounts, we can only do the chroot if we have CAP_SYS_ADMIN. PiperOrigin-RevId: 211994001 Change-Id: Ia71c515e26085e0b69b833e71691830148bc70d1
2018-09-07runsc: Dup debug log file to stderr, so sentry panics don't get lost.Nicolas Lacasse
Docker and containerd do not expose runsc's stderr, so tracking down sentry panics can be painful. If we have a debug log file, we should send panics (and all stderr data) to the log file. PiperOrigin-RevId: 211992321 Change-Id: I5f0d2f45f35c110a38dab86bafc695aaba42f7a3
2018-09-06runsc: do not delete in paused state.Lantao Liu
PiperOrigin-RevId: 211835570 Change-Id: Ied7933732cad5bc60b762e9c964986cb49a8d9b9
2018-09-06Enable network for multi-containerFabricio Voznika
PiperOrigin-RevId: 211834411 Change-Id: I52311a6c5407f984e5069359d9444027084e4d2a
2018-09-06runsc testing: Move TestMultiContainerSignal to multi_container_test.Kevin Krakauer
PiperOrigin-RevId: 211831396 Change-Id: Id67f182cb43dccb696180ec967f5b96176f252e0
2018-09-05runsc: Support runsc kill multi-container.Kevin Krakauer
Now, we can kill individual containers rather than the entire sandbox. PiperOrigin-RevId: 211748106 Change-Id: Ic97e91db33d53782f838338c4a6d0aab7a313ead
2018-09-05Use container's capabilities in execFabricio Voznika
When no capabilities are specified in exec, use the container's capabilities to match runc's behavior. PiperOrigin-RevId: 211735186 Change-Id: Icd372ed64410c81144eae94f432dffc9fe3a86ce
2018-09-05Enabled bind mounts in sub-containersFabricio Voznika
With multi-gofers, bind mounts in sub-containers should just work. Removed restrictions and added test. There are also a few cleanups along the way, e.g. retry unmounting in case cleanup races with gofer teardown. PiperOrigin-RevId: 211699569 Change-Id: Ic0a69c29d7c31cd7e038909cc686c6ac98703374
2018-09-05Running container should have a valid sandboxFabricio Voznika
PiperOrigin-RevId: 211693868 Change-Id: Iea340dd78bf26ae6409c310b63c17cc611c2055f
2018-09-05Add MADVISE to fsgofer seccomp profileFabricio Voznika
PiperOrigin-RevId: 211686037 Change-Id: I0e776ca760b65ba100e495f471b6e811dbd6590a
2018-09-05Move multi-container test to a single fileFabricio Voznika
PiperOrigin-RevId: 211685288 Change-Id: I7872f2a83fcaaa54f385e6e567af6e72320c5aa0
2018-09-05runsc: Promote getExecutablePathInternal to getExecutablePath.Nicolas Lacasse
Remove GetExecutablePath (the non-internal version). This makes path handling more consistent between exec, root, and child containers. The new getExecutablePath now uses MountNamespace.FindInode, which is more robust than Walking the Dirent tree ourselves. This also removes the last use of lstat(2) in the sentry, so that can be removed from the filters. PiperOrigin-RevId: 211683110 Change-Id: Ic8ec960fc1c267aa7d310b8efe6e900c88a9207a
2018-09-04runsc: Run sandbox as user nobody.Nicolas Lacasse
When starting a sandbox without direct file or network access, we create an empty user namespace and run the sandbox in there. However, the root user in that namespace is still mapped to the root user in the parent namespace. This CL maps the "nobody" user from the parent namespace into the child namespace, and runs the sandbox process as user "nobody" inside the new namespace. PiperOrigin-RevId: 211572223 Change-Id: I1b1f9b1a86c0b4e7e5ca7bc93be7d4887678bab6