summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/fsimpl
AgeCommit message (Collapse)Author
2021-10-26Simplify vfs.NewDisconnectedMount signature and callpoints.Ayush Ranjan
vfs.NewDisconnectedMount has no error paths. Its much prettier without the error return value. Also simplify MountDisconnected which would immediately drop the refs taken by NewDisconnectedMount. Instead make it directly call newMount. PiperOrigin-RevId: 405767966
2021-10-26Obtain ref on root dentry in mqfs.GetFilesystem.Ayush Ranjan
As documented in FilesystemType.GetFilesystem, a reference should be taken on the returned dentry and filesystem by GetFilesystem implementation. mqfs did not do that. Additionally cleanup and clarify ref counting of dentry, filesystem and mount in mqfs. Reported-by: syzbot+a2c54bfb6e1525228e5f@syzkaller.appspotmail.com Reported-by: syzbot+ccd305cdab11cfebbfff@syzkaller.appspotmail.com PiperOrigin-RevId: 405700565
2021-10-26Ensure statfs::f_namelen is set by VFS2 gofer statfs/fstatfs.Jamie Liu
VFS1 discards the value of f_namelen returned by the filesystem and returns NAME_MAX unconditionally instead, so it doesn't run into this. Also set f_frsize for completeness. PiperOrigin-RevId: 405579707
2021-10-21Merge pull request #6345 from sudo-sturbia:mq/syscallsgVisor bot
PiperOrigin-RevId: 404901660
2021-10-20Report correct error when restore failsFabricio Voznika
When file corruption is detected, report vfs.ErrCorruption to distinguish corruption error from other restore errors. Updates #1035 PiperOrigin-RevId: 404588445
2021-10-19Do not return non-nil *lisafs.Inode to doCreateAt on error.Ayush Ranjan
lisafs.ClientFile.MkdirAt is allowed to return a non-nil Inode and a non-nil error on an RPC error. The caller must not use the returned (invalid) Inode on error. But a code path in the gofer client does end up using it. More specifically, when the Mkdir RPC fails and we end up creating a synthetic dentry for a mountpoint, we end up returning the (invalid) non-nil Inode to filesystem.doCreateAt implementation which thinks that a remote file was created. But that non-nil Inode is actually invalid because the RPC failed. Things go downhill from there. Update client to not use childDirInode if RPC failed. PiperOrigin-RevId: 404396573
2021-10-19Stub cpuset cgroup control files.Rahat Mahmood
PiperOrigin-RevId: 404382475
2021-10-18Report ramdiskfs usage correctlyFabricio Voznika
Updates #1035 PiperOrigin-RevId: 404072231
2021-10-18Support distinction for RWMutex and read-only locks.Adin Scannell
Fixes #6590 PiperOrigin-RevId: 404007524
2021-10-14Report total memory based on limit or hostFabricio Voznika
gVisor was previously reporting the lower of cgroup limit or 2GB as total memory. This may cause applications to make bad decisions based on amount of memory available to them when more than 2GB is required. This change makes the lower of cgroup limit or the host total memory to be reported inside the sandbox. This also is more inline with docker which always reports host total memory. Note that reporting cgroup limit is strictly better than host total memory when there is a limit set. Fixes #5608 PiperOrigin-RevId: 403241608
2021-10-08Remove redundant slice copy in lisafs gofer client.Ayush Ranjan
listXattr() was doing redundant work. Remove it. PiperOrigin-RevId: 401871315
2021-10-08Disallow "trusted" namespace xattr in VFS2 gofer client.Ayush Ranjan
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer process when the gofer filesystem is the lower layer of an overlay. The overlay filesystem aggressively queries for "trusted.overlay.opaque" which in practice is never found in the lower layer gofer. But leads to a lot of wasted work. A consequence is that mutable gofer upper layer is not supported anymore but that is still consistent with VFS1. We can revisit when need arises. PiperOrigin-RevId: 401860585
2021-09-28Use one mutex for both Registry and RegistryImpl.Zyad A. Ali
Updates #136
2021-09-28Implement Registry.FindOrCreate.Zyad A. Ali
FindOrCreate implements the behaviour of mq_open(2). Updates #136
2021-09-28Return FDs in RegistryImpl functions and use Views.Zyad A. Ali
Update RegistryImpl functions to return file descriptions, instead of queues, and use Views in queue inodes. Updates #136
2021-09-28Define mq.View and use it for mqfs.queueFD.Zyad A. Ali
View makes it easier to handle O_RDONLY, O_WRONLY, and ORDWR options in mq_open(2). Updates #136
2021-09-28Move filesystem creation from GetFilesystem to RegistryImpl.Zyad A. Ali
Move root dentry and filesystem creation from GetFilesystem to NewRegistryImpl, create IPCNamespace.InitPosixQueues to create a new mqueue filesystem for each ipc namespace, and update GetFilesystem to retreive fs and root dentry from IPCNamespace and return them. Updates #136
2021-09-27Add procfs files for SysV message queues.Zyad A. Ali
2021-09-23Create the cgroupfs mount point in sysfs.Rahat Mahmood
Create the /sys/fs/cgroup directory when cgroups are available. This creates the empty directory to serve as the mountpoint, actually mounting cgroups is left to the launcher/userspace. This is consistent with Linux behaviour. Without this mountpoint, getdents(2) on /sys/fs indicates an empty directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The launcher can't create the mountpoint directory since sysfs doesn't support mkdir. PiperOrigin-RevId: 398596698
2021-09-21[lisa] Implement lisafs protocol methods in VFS2 gofer client and fsgofer.Ayush Ranjan
Introduces RPC methods in lisafs. Makes that gofer client use lisafs RPCs instead of p9 when lisafs is enabled. Implements the handlers for those methods in fsgofer. Fixes #5465 PiperOrigin-RevId: 398080310
2021-09-20[lisa] Plumb lisafs through runsc.Ayush Ranjan
lisafs is only supported in VFS2. Added a runsc flag which enables lisafs. When the flag is enabled, the gofer process and the client communicate using lisafs protocol instead of 9P. Added a filesystem option in fsimpl/gofer which indicates if lisafs is being used. That will be used to gate lisafs on the gofer client. Note that this change does not make the gofer client use lisafs just yet. Updates #5465 PiperOrigin-RevId: 397917844
2021-09-17Create mq.Registry and mqfs.RegistryImpl.Zyad A. Ali
Define a POSIX message queue Registry and RegistryImpl in mq package, implement RegistryImpl in mqfs, and add a Registry object to IPCNamespace initialized at filesystem creation. Updates #136
2021-09-15Implement queueInode and queueFD in mqfs.Zyad A. Ali
Implement inode and file description representing a POSIX message queue, and other utilities needed to implement file operations. Updates #136
2021-09-15Implement vfs.FilesystemType and kernfs.Filesystem for mqfs.Zyad A. Ali
Updates #136
2021-09-15Implement mqfs.rootInode.Zyad A. Ali
rootInode represents the root inode for mqueue filesystem (/dev/mqueue). Updates #136
2021-09-07Stub some memory control files.Rahat Mahmood
PiperOrigin-RevId: 395338926
2021-09-01Propagate vfs.MkdirOptions.ForSyntheticMountpoint to overlay copy-up.Jamie Liu
PiperOrigin-RevId: 394296687
2021-08-30Use specialFileFD handles in specialFileFD.Stat().Jamie Liu
PiperOrigin-RevId: 393831108
2021-08-27Fix lock order violations: mm.mappingMu > Task.mu.Nicolas Lacasse
Document this ordering in mm/mm.go. PiperOrigin-RevId: 393413203
2021-08-20Allow gofer.specialFileFDs to be mmapped with a host FD.Jamie Liu
PiperOrigin-RevId: 392102898
2021-08-19Cache verity dentriesChong Cai
Add an LRU cache to cache verity dentries when ref count drop to 0. This way we don't need to hash and verify the previous opened files or directories each time. PiperOrigin-RevId: 391880157
2021-08-19Use MM-mapped I/O instead of buffered copies in gofer.specialFileFD.Jamie Liu
The rationale given for using buffered copies is still valid, but it's unclear whether holding MM locks or allocating buffers is better in practice, and the former is at least consistent with gofer.regularFileFD (and VFS1), making performance easier to reason about. PiperOrigin-RevId: 391877913
2021-08-13[syserror] Remove pkg syserror.Zach Koopmans
Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-12[syserror] Convert remaining syserror definitions to linuxerr.Zach Koopmans
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-08-11Do not clear merkle files when creating dentryChong Cai
The dentry for each file/directory can be created/destroyed multiple times during sandbox lifetime. We should not clear the Merkle file each time a dentry is created. PiperOrigin-RevId: 390277107
2021-08-11Popluate verity directory children namesChong Cai
We were relying on children adding its name to parent's dentry to populate parent's children list. However, this may not work since the parent dentry could be destroyed if its reference count drops to zero. In that case, a new dentry will be created when enabling the parent and it does not contain the children names info. Therefore we need to populate the child names list again to avoid missing children in the directory. PiperOrigin-RevId: 390270227
2021-08-11Initial cgroupfs support for subcontainersRahat Mahmood
Allow creation and management of subcontainers through cgroupfs directory syscalls. Also add a mechanism to specify a default root container to start new jobs in. This implements the filesystem support for subcontainers, but doesn't implement hierarchical resource accounting or task migration. PiperOrigin-RevId: 390254870
2021-08-01Merge pull request #6350 from sudo-sturbia:cgroupfsgVisor bot
PiperOrigin-RevId: 388129112
2021-07-28Add Uid/Gid/Groups fields to VFS2 /proc/[pid]/status.Jamie Liu
For comparison: ``` $ docker run --rm -it ubuntu:focal bash -c 'cat /proc/self/status' Name: cat Umask: 0022 State: R (running) Tgid: 1 Ngid: 0 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 1 NSpid: 1 NSpgid: 1 NSsid: 1 VmPeak: 2660 kB VmSize: 2660 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 528 kB VmRSS: 528 kB ... $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal bash -c 'cat /proc/self/status' Name: cat State: R (running) Tgid: 1 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 4 Groups: VmSize: 10708 kB VmRSS: 3124 kB VmData: 316 kB ... ``` Fixes #6374 PiperOrigin-RevId: 387465655
2021-07-28Lock gofer.dentry.dataMu before SetAttr RPC modifying file size.Jamie Liu
PiperOrigin-RevId: 387427887
2021-07-23Clean up logic for when a VFS2 gofer regular file close causes a flushf.Jamie Liu
PiperOrigin-RevId: 386577891
2021-07-22VFS2: remove ext codeKevin Krakauer
We opted to move forward with FUSE instead. PiperOrigin-RevId: 386344258
2021-07-22Delete unnecessary function.Zyad A. Ali
Since cgroupfs.dir embedes cgroupfs.implStatFS, and dir.StatFS and implStatFS.StatFS are identical, dir.StatFS is not needed.
2021-07-21Use atomics when checking for parent setgid in VFS2 tmpfs file creation.Jamie Liu
Reported-by: syzbot+59550b48e06cc0d3b638@syzkaller.appspotmail.com PiperOrigin-RevId: 386075453
2021-07-15Fix refcount increments in gofer.filesystem.Sync.Fabricio Voznika
fs.renameMu is released and reacquired in `dentry.destroyLocked()` allowing a dentry to be in `fs.syncableDentries` with a negative reference count. Fixes #5263 PiperOrigin-RevId: 385054337
2021-07-13Do not require O_PATH flag to enable verityChong Cai
Remove the hack in gVisor vfs that allows verity to bypass the O_PATH check, since ioctl is not allowed on fds opened with O_PATH in linux. Verity still opens the lowerFD with O_PATH to open it as a symlink, but the API no longer expects O_PATH to open a fd to be verity enabled. Now only O_FOLLOW should be specified when opening and enabling verity features. PiperOrigin-RevId: 384567833
2021-07-12Fix deadlock in procfsFabricio Voznika
Kernfs provides an internal mechanism to defer calls to `DecRef()` because on the last reference `Filesystem.mu` must be held and most places that need to call `DecRef()` are inside the lock. The same can be true for filesystems that extend kernfs. procfs needs to look up files and `DecRef()` them inside the `kernfs.Filesystem.mu`. If the files happen to be procfs files, it can deadlock trying to decrement if it's the last reference. This change extends the mechanism to external callers to defer DecRefs to `vfs.FileDescription` and `vfs.VirtualDentries`. PiperOrigin-RevId: 384361647
2021-07-12Fix stdios ownershipFabricio Voznika
Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12[syserror] Update syserror to linuxerr for more errors.Zach Koopmans
Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-08devpts: Notify of echo'd input queue bytes only after locks have been released.Etienne Perot
PiperOrigin-RevId: 383684320