gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-09-23	Create the cgroupfs mount point in sysfs.	Rahat Mahmood
	Create the /sys/fs/cgroup directory when cgroups are available. This creates the empty directory to serve as the mountpoint, actually mounting cgroups is left to the launcher/userspace. This is consistent with Linux behaviour. Without this mountpoint, getdents(2) on /sys/fs indicates an empty directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The launcher can't create the mountpoint directory since sysfs doesn't support mkdir. PiperOrigin-RevId: 398596698
2021-09-21	[lisa] Implement lisafs protocol methods in VFS2 gofer client and fsgofer.	Ayush Ranjan
	Introduces RPC methods in lisafs. Makes that gofer client use lisafs RPCs instead of p9 when lisafs is enabled. Implements the handlers for those methods in fsgofer. Fixes #5465 PiperOrigin-RevId: 398080310
2021-09-20	[lisa] Plumb lisafs through runsc.	Ayush Ranjan
	lisafs is only supported in VFS2. Added a runsc flag which enables lisafs. When the flag is enabled, the gofer process and the client communicate using lisafs protocol instead of 9P. Added a filesystem option in fsimpl/gofer which indicates if lisafs is being used. That will be used to gate lisafs on the gofer client. Note that this change does not make the gofer client use lisafs just yet. Updates #5465 PiperOrigin-RevId: 397917844
2021-09-07	Stub some memory control files.	Rahat Mahmood
	PiperOrigin-RevId: 395338926
2021-09-01	Propagate vfs.MkdirOptions.ForSyntheticMountpoint to overlay copy-up.	Jamie Liu
	PiperOrigin-RevId: 394296687
2021-08-30	Use specialFileFD handles in specialFileFD.Stat().	Jamie Liu
	PiperOrigin-RevId: 393831108
2021-08-27	Fix lock order violations: mm.mappingMu > Task.mu.	Nicolas Lacasse
	Document this ordering in mm/mm.go. PiperOrigin-RevId: 393413203
2021-08-20	Allow gofer.specialFileFDs to be mmapped with a host FD.	Jamie Liu
	PiperOrigin-RevId: 392102898
2021-08-19	Cache verity dentries	Chong Cai
	Add an LRU cache to cache verity dentries when ref count drop to 0. This way we don't need to hash and verify the previous opened files or directories each time. PiperOrigin-RevId: 391880157
2021-08-19	Use MM-mapped I/O instead of buffered copies in gofer.specialFileFD.	Jamie Liu
	The rationale given for using buffered copies is still valid, but it's unclear whether holding MM locks or allocating buffers is better in practice, and the former is at least consistent with gofer.regularFileFD (and VFS1), making performance easier to reason about. PiperOrigin-RevId: 391877913
2021-08-13	[syserror] Remove pkg syserror.	Zach Koopmans
	Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-12	[syserror] Convert remaining syserror definitions to linuxerr.	Zach Koopmans
	Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-08-11	Do not clear merkle files when creating dentry	Chong Cai
	The dentry for each file/directory can be created/destroyed multiple times during sandbox lifetime. We should not clear the Merkle file each time a dentry is created. PiperOrigin-RevId: 390277107
2021-08-11	Popluate verity directory children names	Chong Cai
	We were relying on children adding its name to parent's dentry to populate parent's children list. However, this may not work since the parent dentry could be destroyed if its reference count drops to zero. In that case, a new dentry will be created when enabling the parent and it does not contain the children names info. Therefore we need to populate the child names list again to avoid missing children in the directory. PiperOrigin-RevId: 390270227
2021-08-11	Initial cgroupfs support for subcontainers	Rahat Mahmood
	Allow creation and management of subcontainers through cgroupfs directory syscalls. Also add a mechanism to specify a default root container to start new jobs in. This implements the filesystem support for subcontainers, but doesn't implement hierarchical resource accounting or task migration. PiperOrigin-RevId: 390254870
2021-08-01	Merge pull request #6350 from sudo-sturbia:cgroupfs	gVisor bot
	PiperOrigin-RevId: 388129112
2021-07-28	Add Uid/Gid/Groups fields to VFS2 /proc/[pid]/status.	Jamie Liu
	For comparison: ``` $ docker run --rm -it ubuntu:focal bash -c 'cat /proc/self/status' Name: cat Umask: 0022 State: R (running) Tgid: 1 Ngid: 0 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 1 NSpid: 1 NSpgid: 1 NSsid: 1 VmPeak: 2660 kB VmSize: 2660 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 528 kB VmRSS: 528 kB ... $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal bash -c 'cat /proc/self/status' Name: cat State: R (running) Tgid: 1 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 4 Groups: VmSize: 10708 kB VmRSS: 3124 kB VmData: 316 kB ... ``` Fixes #6374 PiperOrigin-RevId: 387465655
2021-07-28	Lock gofer.dentry.dataMu before SetAttr RPC modifying file size.	Jamie Liu
	PiperOrigin-RevId: 387427887
2021-07-23	Clean up logic for when a VFS2 gofer regular file close causes a flushf.	Jamie Liu
	PiperOrigin-RevId: 386577891
2021-07-22	VFS2: remove ext code	Kevin Krakauer
	We opted to move forward with FUSE instead. PiperOrigin-RevId: 386344258
2021-07-22	Delete unnecessary function.	Zyad A. Ali
	Since cgroupfs.dir embedes cgroupfs.implStatFS, and dir.StatFS and implStatFS.StatFS are identical, dir.StatFS is not needed.
2021-07-21	Use atomics when checking for parent setgid in VFS2 tmpfs file creation.	Jamie Liu
	Reported-by: syzbot+59550b48e06cc0d3b638@syzkaller.appspotmail.com PiperOrigin-RevId: 386075453
2021-07-15	Fix refcount increments in gofer.filesystem.Sync.	Fabricio Voznika
	fs.renameMu is released and reacquired in `dentry.destroyLocked()` allowing a dentry to be in `fs.syncableDentries` with a negative reference count. Fixes #5263 PiperOrigin-RevId: 385054337
2021-07-13	Do not require O_PATH flag to enable verity	Chong Cai
	Remove the hack in gVisor vfs that allows verity to bypass the O_PATH check, since ioctl is not allowed on fds opened with O_PATH in linux. Verity still opens the lowerFD with O_PATH to open it as a symlink, but the API no longer expects O_PATH to open a fd to be verity enabled. Now only O_FOLLOW should be specified when opening and enabling verity features. PiperOrigin-RevId: 384567833
2021-07-12	Fix deadlock in procfs	Fabricio Voznika
	Kernfs provides an internal mechanism to defer calls to `DecRef()` because on the last reference `Filesystem.mu` must be held and most places that need to call `DecRef()` are inside the lock. The same can be true for filesystems that extend kernfs. procfs needs to look up files and `DecRef()` them inside the `kernfs.Filesystem.mu`. If the files happen to be procfs files, it can deadlock trying to decrement if it's the last reference. This change extends the mechanism to external callers to defer DecRefs to `vfs.FileDescription` and `vfs.VirtualDentries`. PiperOrigin-RevId: 384361647
2021-07-12	Fix stdios ownership	Fabricio Voznika
	Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12	[syserror] Update syserror to linuxerr for more errors.	Zach Koopmans
	Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-08	devpts: Notify of echo'd input queue bytes only after locks have been released.	Etienne Perot
	PiperOrigin-RevId: 383684320
2021-07-01	Mix checklocks and atomic analyzers.	Adin Scannell
	This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01	[syserror] Update several syserror errors to linuxerr equivalents.	Zach Koopmans
	Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-30	[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.	Zach Koopmans
	Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29	Sort children map before hash	Chong Cai
	The unordered map may generate different hash due to its order. The children map needs to be sorted each time before hashing to avoid false verification failure due to the map. Store the sorted children map in verity dentry to avoid sorting it each time verification happens. Also serialize the whole VerityDescriptor struct to hash now that the map is removed from it. PiperOrigin-RevId: 382201560
2021-06-29	[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVAL	Zach Koopmans
	Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-28	Allow VFS2 gofer client to mmap from sentry page cache when forced.	Jamie Liu
	PiperOrigin-RevId: 381982257
2021-06-24	Internal change.	Jamie Liu
	PiperOrigin-RevId: 381375705
2021-06-22	[syserror] Add conversions to linuxerr with temporary Equals method.	Zach Koopmans
	Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-17	Move tcpip.Clock impl to Timekeeper	Tamir Duberstein
	...and pass it explicitly. This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed. PiperOrigin-RevId: 380039167
2021-06-10	Minor VFS2 xattr changes.	Jamie Liu
	- Allow the gofer client to use most xattr namespaces. As documented by the updated comment, this is consistent with e.g. Linux's FUSE client, and allows gofers to provide extended attributes from FUSE filesystems. - Make tmpfs' listxattr omit xattrs in the "trusted" namespace for non-privileged users. PiperOrigin-RevId: 378778854
2021-06-10	Fix lock ordering issue when enumerating cgroup tasks.	Rahat Mahmood
	The control files enumerating tasks and threads residing in cgroupfs incorrectly locks cgroupfs.filesystem.tasksMu before kernel.TaskSet.mu. The contents of these control files are inherently racy anyways, so use a snapshot of the tasks in the cgroup and drop tasksMu before resolving pids/tids (which acquires TaskSet.mu). PiperOrigin-RevId: 378767060
2021-06-10	Add /proc/sys/vm/max_map_count	Fabricio Voznika
	Set it to int32 max because gVisor doesn't have a limit. Fixes #2337 PiperOrigin-RevId: 378722230
2021-06-07	Implement RENAME_NOREPLACE for all VFS2 filesystem implementations.	Jamie Liu
	PiperOrigin-RevId: 377966969
2021-06-03	Initialize metrics at init	Tamir Duberstein
	Avoids a race condition at kernel initialization. Updates #6057. PiperOrigin-RevId: 377357723
2021-05-26	Add verity getdents tests	Chong Cai
	PiperOrigin-RevId: 376001603
2021-05-25	Initialize Kernel.Timekeeper before network NS	Tamir Duberstein
	PiperOrigin-RevId: 375843579
2021-05-25	Use specific fmt verbs (avoid %v)	Tamir Duberstein
	Remove useless conversions. Avoid unhandled errors. PiperOrigin-RevId: 375834275
2021-05-25	setgid directories for VFS1 tmpfs, overlayfs, and goferfs	Kevin Krakauer
	PiperOrigin-RevId: 375780659
2021-05-18	Delete /cloud/gvisor/sandbox/sentry/gofer/opened_write_execute_file metric	Nayana Bidari
	This metric is replaced by /cloud/gvisor/sandbox/sentry/suspicious_operations metric with field value opened_write_execute_file. PiperOrigin-RevId: 374509823
2021-05-14	Add new metric for suspicious operations.	Nayana Bidari
	The new metric contains fields and will replace the below existing metric: - opened_write_execute_file PiperOrigin-RevId: 373884604
2021-05-14	Resolve remaining O_PATH TODOs.	Dean Deng
	O_PATH is now implemented in vfs2. Fixes #2782. PiperOrigin-RevId: 373861410
2021-05-14	Don't read forwarding from netstack in sentry	Ghanan Gowripalan
	https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt: /proc/sys/net/ipv4/* Variables: ip_forward - BOOLEAN 0 - disabled (default) not 0 - enabled Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers) /proc/sys/net/ipv4/ip_forward only does work when its value is changed and always returns the last written value. The last written value may not reflect the current state of the netstack (e.g. when `ip_forward` was written a value of "1" then disable forwarding on an interface) so there is no need for sentry to probe netstack to get the current forwarding state of interfaces. ``` ~$ cat /proc/sys/net/ipv4/ip_forward 0 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo sysctl -a \| grep ipv4 \| grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo sysctl -w net.ipv4.conf.wlp1s0.forwarding=0 net.ipv4.conf.wlp1s0.forwarding = 0 ~$ sudo sysctl -a \| grep ipv4 \| grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a \| grep ipv4 \| grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo bash -c "echo 0 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a \| grep ipv4 \| grep forward sysctl: unable to open directory "/proc/sys/fs/binfmt_misc/" net.ipv4.conf.all.forwarding = 0 net.ipv4.conf.default.forwarding = 0 net.ipv4.conf.eno1.forwarding = 0 net.ipv4.conf.lo.forwarding = 0 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 0 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 0 ``` In the above example we can see that writing "1" to /proc/sys/net/ipv4/ip_forward configures the stack to be a router (all interfaces are configured to enable forwarding). However, if we manually update an interace (`wlp1s0`) to not forward packets, /proc/sys/net/ipv4/ip_forward continues to return the last written value of "1", even though not all interfaces will forward packets. Also note that writing the same value twice has no effect; work is performed iff the value changes. This change also removes the 'unset' state from sentry's ip forwarding data structures as an 'unset' ip forwarding value is the same as leaving forwarding disabled as the stack is always brought up with forwarding initially disabled; disabling forwarding on a newly created stack is a no-op. PiperOrigin-RevId: 373853106