summaryrefslogtreecommitdiffhomepage
path: root/runsc/boot/fs.go
AgeCommit message (Collapse)Author
2021-06-09Remove --overlayfs-stale-read flagFabricio Voznika
It defaults to true and setting it to false can cause filesytem corruption. PiperOrigin-RevId: 378518663
2021-05-04Make Mount.Type optional for bind mountsFabricio Voznika
According to the OCI spec Mount.Type is an optional field and it defaults to "bind" when any of "bind" or "rbind" is included in Mount.Options. Also fix the shim to remove bind/rbind from options when mount is converted from bind to tmpfs inside the Sentry. Fixes #2330 Fixes #3274 PiperOrigin-RevId: 371996891
2021-04-02Implement cgroupfs.Rahat Mahmood
A skeleton implementation of cgroupfs. It supports trivial cpu and memory controllers with no support for hierarchies. PiperOrigin-RevId: 366561126
2021-03-30Fix panic when overriding /dev files with VFS2Fabricio Voznika
VFS1 skips over mounts that overrides files in /dev because the list of files is hardcoded. This is not needed for VFS2 and a recent change lifted this restriction. However, parts of the code were still skipping /dev mounts even in VFS2, causing the loader to panic when it ran short of FDs to connect to the gofer. PiperOrigin-RevId: 365858436
2021-03-23Add --file-access-mounts flagFabricio Voznika
--file-access-mounts flag is similar to --file-access, but controls non-root mounts that were previously mounted in shared mode only. This gives more flexibility to control how mounts are shared within a container. PiperOrigin-RevId: 364669882
2021-03-18Skip /dev submount hack on VFS2.Jamie Liu
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.: ``` "mounts": [ ... { "destination": "/dev", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, ... { "destination": "/dev/shm", "type": "tmpfs", "source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=67108864" ] }, ... ``` (This is mostly consistent with how Linux is usually configured, except that /dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer implements OCI-runtime-spec-undocumented behavior to create /dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently switches /dev to devtmpfs. In VFS1, this is necessary to get device files like /dev/null at all, since VFS1 doesn't support real device special files, only what is hardcoded in devfs. VFS2 does support device special files, but using devtmpfs is the easiest way to get pre-created files in /dev.) runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1, this appears to be to avoid introducing a submount overlay for /dev, and is mostly fine since the typical mode for the /dev/shm mount is ~consistent with the mode of the /dev/shm directory provided by devfs (modulo the sticky bit). In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs' /dev/shm mode is correct for the mount point but not the mount. So turn off this behavior for VFS2. After this change: ``` $ docker run --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 360 Mar 18 00:16 .. $ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwx 1 root root 0 Mar 18 00:16 . dr-xr-xr-x 1 root root 0 Mar 18 00:16 .. $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm total 0 drwxrwxrwt 2 root root 40 Mar 18 00:16 . drwxr-xr-x 5 root root 320 Mar 18 00:16 .. ``` Fixes #5687 PiperOrigin-RevId: 363699385
2021-03-06[op] Replace syscall package usage with golang.org/x/sys/unix in runsc/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in some places because the following don't seem to have an equivalent in unix package: - syscall.SysProcIDMap - syscall.Credential Updates #214 PiperOrigin-RevId: 361381490
2020-11-17Add support for TTY in multi-containerFabricio Voznika
Fixes #2714 PiperOrigin-RevId: 342950412
2020-10-27Merge pull request #4420 from workato:dev-optionsgVisor bot
PiperOrigin-RevId: 339363816
2020-10-26Allow overriding mount options for /dev and /dev/ptsKonstantin Baranov
This is useful to optionally set /dev ro,noexec. Treat /dev and /dev/pts the same as /proc and /sys. Make sure the Type is right though. Many config.json snippets on the Internet suggest /dev is tmpfs, not devtmpfs.
2020-09-08Honor readonly flag for root mountFabricio Voznika
Updates #1487 PiperOrigin-RevId: 330580699
2020-09-04Simplify FD handling for container start/execFabricio Voznika
VFS1 and VFS2 host FDs have different dupping behavior, making error prone to code for both. Change the contract so that FDs are released as they are used, so the caller can simple defer a block that closes all remaining files. This also addresses handling of partial failures. With this fix, more VFS2 tests can be enabled. Updates #1487 PiperOrigin-RevId: 330112266
2020-09-01Refactor tty codebase to use master-replica terminology.Ayush Ranjan
Updates #2972 PiperOrigin-RevId: 329584905
2020-08-19Move boot.Config to its own packageFabricio Voznika
Updates #3494 PiperOrigin-RevId: 327548511
2020-08-03Plumbing context.Context to DecRef() and Release().Nayana Bidari
context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-08Add shared mount hints to VFS2Fabricio Voznika
Container restart test is disabled for VFS2 for now. Updates #1487 PiperOrigin-RevId: 320296401
2020-06-08Combine executable lookup codeFabricio Voznika
Run vs. exec, VFS1 vs. VFS2 were executable lookup were slightly different from each other. Combine them all into the same logic. PiperOrigin-RevId: 315426443
2020-06-01More runsc changes for VFS2Fabricio Voznika
- Add /tmp handling - Apply mount options - Enable more container_test tests - Forward signals to child process when test respaws process to run as root inside namespace. Updates #1487 PiperOrigin-RevId: 314263281
2020-05-29Refactor the ResolveExecutablePath logic.Nicolas Lacasse
PiperOrigin-RevId: 313871804
2020-05-28Automated rollback of changelist 309082540Fabricio Voznika
PiperOrigin-RevId: 313636920
2020-05-13Fix runsc association of gofers and FDs on VFS2.Jamie Liu
Updates #1487 PiperOrigin-RevId: 311443628
2020-05-13Use VFS2 mount namesFabricio Voznika
Updates #1487 PiperOrigin-RevId: 311356385
2020-04-26refactor and add test for bindmountmoricho
Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-25add bind/rbind options for mountmoricho
Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-25fix behavior of `getMountNameAndOptions` when options include either bind or ↵moricho
rbind Signed-off-by: moricho <ikeda.morito@gmail.com>
2020-04-17Get /bin/true to run on VFS2Zach Koopmans
Included: - loader_test.go RunTest and TestStartSignal VFS2 - container_test.go TestAppExitStatus on VFS2 - experimental flag added to runsc to turn on VFS2 Note: shared mounts are not yet supported. PiperOrigin-RevId: 307070753
2020-04-07Add friendlier messages for frequently encountered errors.Ian Lewis
Issue #2270 Issue #1765 PiperOrigin-RevId: 305385436
2020-01-27Update package locations.Adin Scannell
Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2019-12-06Make annotations OCI compliantFabricio Voznika
Changed annotation to follow the standard defined here: https://github.com/opencontainers/image-spec/blob/master/annotations.md PiperOrigin-RevId: 284254847
2019-11-25Use mount hints to determine FileAccessTypeFabricio Voznika
PiperOrigin-RevId: 282401165
2019-10-16Fix problem with open FD when copy up is triggered in overlayfsFabricio Voznika
Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-08Ignore mount options that are not supported in shared mountsFabricio Voznika
Options that do not change mount behavior inside the Sentry are irrelevant and should not be used when looking for possible incompatibilities between master and slave mounts. PiperOrigin-RevId: 273593486
2019-08-27Mount volumes as super userFabricio Voznika
This used to be the case, but regressed after a recent change. Also made a few fixes around it and clean up the code a bit. Closes #720 PiperOrigin-RevId: 265717496
2019-08-02Stops container if gofer is killedFabricio Voznika
Each gofer now has a goroutine that polls on the FDs used to communicate with the sandbox. The respective gofer is destroyed if any of the FDs is closed. Closes #601 PiperOrigin-RevId: 261383725
2019-08-02Remove kernel.mounts.Nicolas Lacasse
We can get the mount namespace from the CreateProcessArgs in all cases where we need it. This also gets rid of kernel.Destroy method, since the only thing it was doing was DecRefing the mounts. Removing the need to call kernel.SetRootMountNamespace also allowed for some more simplifications in the container fs setup code. PiperOrigin-RevId: 261357060
2019-07-26Merge pull request #452 from zhangningdlut:chris_test_pidnsgVisor bot
PiperOrigin-RevId: 260220279
2019-07-25Automated rollback of changelist 255679453Fabricio Voznika
PiperOrigin-RevId: 260047477
2019-07-24Use different pidns among different containerschris.zn
The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-07-23Give each container a distinct MountNamespace.Nicolas Lacasse
This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346
2019-07-12Take a reference on the already-mounted inode before re-mounting it.Nicolas Lacasse
PiperOrigin-RevId: 257855777
2019-07-02Remove map from fd_map, change to fd_table.Adin Scannell
This renames FDMap to FDTable and drops the kernel.FD type, which had an entire package to itself and didn't serve much use (it was freely cast between types, and served as more of an annoyance than providing any protection.) Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup operation, and 10-15 ns per concurrent lookup operation of savings. This also fixes two tangential usage issues with the FDMap. Namely, non-atomic use of NewFDFrom and associated calls to Remove (that are both racy and fail to drop the reference on the underlying file.) PiperOrigin-RevId: 256285890
2019-06-28Automated rollback of changelist 255263686Nicolas Lacasse
PiperOrigin-RevId: 255679453
2019-06-27Cache directory entries in the overlayMichael Pratt
Currently, the overlay dirCache is only used for a single logical use of getdents. i.e., it is discard when the FD is closed or seeked back to the beginning. But the initial work of getting the directory contents can be quite expensive (particularly sorting large directories), so we should keep it as long as possible. This is very similar to the readdirCache in fs/gofer. Since the upper filesystem does not have to allow caching readdir entries, the new CacheReaddir MountSourceOperations method controls this behavior. This caching should be trivially movable to all Inodes if desired, though that adds an additional copy step for non-overlay Inodes. (Overlay Inodes already do the extra copy). PiperOrigin-RevId: 255477592
2019-06-26Preserve permissions when checking lowerFabricio Voznika
The code was wrongly assuming that only read access was required from the lower overlay when checking for permissions. This allowed non-writable files to be writable in the overlay. Fixes #316 PiperOrigin-RevId: 255263686
2019-06-13Plumb context through more layers of filesytem.Ian Gudger
All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-11Add support to mount pod shared tmpfs mountsFabricio Voznika
Parse annotations containing 'gvisor.dev/spec/mount' that gives hints about how mounts are shared between containers inside a pod. This information can be used to better inform how to mount these volumes inside gVisor. For example, a volume that is shared between containers inside a pod can be bind mounted inside the sandbox, instead of being two independent mounts. For now, this information is used to allow the same tmpfs mounts to be shared between containers which wasn't possible before. PiperOrigin-RevId: 252704037
2019-06-03Refactor container FS setupFabricio Voznika
No change in functionaly. Added containerMounter object to keep state while the mounts are processed. This will help upcoming changes to share mounts per-pod. PiperOrigin-RevId: 251350096
2019-05-23Set sticky bit to /tmpFabricio Voznika
This is generally done for '/tmp' to prevent accidental deletion of files. More details here: http://man7.org/linux/man-pages/man1/chmod.1.html#RESTRICTED_DELETION_FLAG_OR_STICKY_BIT PiperOrigin-RevId: 249633207 Change-Id: I444a5b406fdef664f5677b2f20f374972613a02b
2019-05-23Initial support for bind mountsFabricio Voznika
Separate MountSource from Mount. This is needed to allow mounts to be shared by multiple containers within the same pod. PiperOrigin-RevId: 249617810 Change-Id: Id2944feb7e4194951f355cbe6d4944ae3c02e468