Age | Commit message (Collapse) | Author |
|
vfs.NewDisconnectedMount has no error paths. Its much prettier without the
error return value.
Also simplify MountDisconnected which would immediately drop the refs taken by
NewDisconnectedMount. Instead make it directly call newMount.
PiperOrigin-RevId: 405767966
|
|
As documented in FilesystemType.GetFilesystem, a reference should be taken on
the returned dentry and filesystem by GetFilesystem implementation. mqfs did
not do that.
Additionally cleanup and clarify ref counting of dentry, filesystem and mount
in mqfs.
Reported-by: syzbot+a2c54bfb6e1525228e5f@syzkaller.appspotmail.com
Reported-by: syzbot+ccd305cdab11cfebbfff@syzkaller.appspotmail.com
PiperOrigin-RevId: 405700565
|
|
VFS1 discards the value of f_namelen returned by the filesystem and returns
NAME_MAX unconditionally instead, so it doesn't run into this. Also set
f_frsize for completeness.
PiperOrigin-RevId: 405579707
|
|
PiperOrigin-RevId: 404901660
|
|
When file corruption is detected, report vfs.ErrCorruption to
distinguish corruption error from other restore errors.
Updates #1035
PiperOrigin-RevId: 404588445
|
|
lisafs.ClientFile.MkdirAt is allowed to return a non-nil Inode and a non-nil
error on an RPC error. The caller must not use the returned (invalid) Inode on
error. But a code path in the gofer client does end up using it.
More specifically, when the Mkdir RPC fails and we end up creating a synthetic
dentry for a mountpoint, we end up returning the (invalid) non-nil Inode to
filesystem.doCreateAt implementation which thinks that a remote file was
created. But that non-nil Inode is actually invalid because the RPC failed.
Things go downhill from there.
Update client to not use childDirInode if RPC failed.
PiperOrigin-RevId: 404396573
|
|
PiperOrigin-RevId: 404382475
|
|
Updates #1035
PiperOrigin-RevId: 404072231
|
|
Fixes #6590
PiperOrigin-RevId: 404007524
|
|
gVisor was previously reporting the lower of cgroup limit or 2GB as total
memory. This may cause applications to make bad decisions based on amount
of memory available to them when more than 2GB is required.
This change makes the lower of cgroup limit or the host total memory to be
reported inside the sandbox. This also is more inline with docker which always
reports host total memory. Note that reporting cgroup limit is strictly better
than host total memory when there is a limit set.
Fixes #5608
PiperOrigin-RevId: 403241608
|
|
listXattr() was doing redundant work. Remove it.
PiperOrigin-RevId: 401871315
|
|
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer
process when the gofer filesystem is the lower layer of an overlay.
The overlay filesystem aggressively queries for "trusted.overlay.opaque" which
in practice is never found in the lower layer gofer. But leads to a lot of
wasted work.
A consequence is that mutable gofer upper layer is not supported anymore but
that is still consistent with VFS1. We can revisit when need arises.
PiperOrigin-RevId: 401860585
|
|
Updates #136
|
|
FindOrCreate implements the behaviour of mq_open(2).
Updates #136
|
|
Update RegistryImpl functions to return file descriptions, instead of
queues, and use Views in queue inodes.
Updates #136
|
|
View makes it easier to handle O_RDONLY, O_WRONLY, and ORDWR options in
mq_open(2).
Updates #136
|
|
Move root dentry and filesystem creation from GetFilesystem to
NewRegistryImpl, create IPCNamespace.InitPosixQueues to create a new
mqueue filesystem for each ipc namespace, and update GetFilesystem to
retreive fs and root dentry from IPCNamespace and return them.
Updates #136
|
|
|
|
Create the /sys/fs/cgroup directory when cgroups are available. This
creates the empty directory to serve as the mountpoint, actually
mounting cgroups is left to the launcher/userspace. This is consistent
with Linux behaviour.
Without this mountpoint, getdents(2) on /sys/fs indicates an empty
directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The
launcher can't create the mountpoint directory since sysfs doesn't
support mkdir.
PiperOrigin-RevId: 398596698
|
|
Introduces RPC methods in lisafs. Makes that gofer client use lisafs RPCs
instead of p9 when lisafs is enabled.
Implements the handlers for those methods in fsgofer.
Fixes #5465
PiperOrigin-RevId: 398080310
|
|
lisafs is only supported in VFS2. Added a runsc flag which enables lisafs.
When the flag is enabled, the gofer process and the client communicate using
lisafs protocol instead of 9P.
Added a filesystem option in fsimpl/gofer which indicates if lisafs is being
used. That will be used to gate lisafs on the gofer client.
Note that this change does not make the gofer client use lisafs just yet.
Updates #5465
PiperOrigin-RevId: 397917844
|
|
Define a POSIX message queue Registry and RegistryImpl in mq package,
implement RegistryImpl in mqfs, and add a Registry object to
IPCNamespace initialized at filesystem creation.
Updates #136
|
|
Implement inode and file description representing a POSIX message queue,
and other utilities needed to implement file operations.
Updates #136
|
|
Updates #136
|
|
rootInode represents the root inode for mqueue filesystem (/dev/mqueue).
Updates #136
|
|
PiperOrigin-RevId: 395338926
|
|
PiperOrigin-RevId: 394296687
|
|
PiperOrigin-RevId: 393831108
|
|
Document this ordering in mm/mm.go.
PiperOrigin-RevId: 393413203
|
|
PiperOrigin-RevId: 392102898
|
|
Add an LRU cache to cache verity dentries when ref count drop to 0. This
way we don't need to hash and verify the previous opened files or
directories each time.
PiperOrigin-RevId: 391880157
|
|
The rationale given for using buffered copies is still valid, but it's unclear
whether holding MM locks or allocating buffers is better in practice, and the
former is at least consistent with gofer.regularFileFD (and VFS1), making
performance easier to reason about.
PiperOrigin-RevId: 391877913
|
|
Removes package syserror and moves still relevant code to either linuxerr
or to syserr (to be later removed).
Internal errors are converted from random types to *errors.Error types used
in linuxerr. Internal errors are in linuxerr/internal.go.
PiperOrigin-RevId: 390724202
|
|
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr.
PiperOrigin-RevId: 390471763
|
|
The dentry for each file/directory can be created/destroyed multiple
times during sandbox lifetime. We should not clear the Merkle file each
time a dentry is created.
PiperOrigin-RevId: 390277107
|
|
We were relying on children adding its name to parent's dentry to
populate parent's children list. However, this may not work since the
parent dentry could be destroyed if its reference count drops to zero.
In that case, a new dentry will be created when enabling the parent and
it does not contain the children names info. Therefore we need to
populate the child names list again to avoid missing children in the
directory.
PiperOrigin-RevId: 390270227
|
|
Allow creation and management of subcontainers through cgroupfs
directory syscalls. Also add a mechanism to specify a default root
container to start new jobs in.
This implements the filesystem support for subcontainers, but doesn't
implement hierarchical resource accounting or task migration.
PiperOrigin-RevId: 390254870
|
|
PiperOrigin-RevId: 388129112
|
|
For comparison:
```
$ docker run --rm -it ubuntu:focal bash -c 'cat /proc/self/status'
Name: cat
Umask: 0022
State: R (running)
Tgid: 1
Ngid: 0
Pid: 1
PPid: 0
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
NStgid: 1
NSpid: 1
NSpgid: 1
NSsid: 1
VmPeak: 2660 kB
VmSize: 2660 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 528 kB
VmRSS: 528 kB
...
$ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal bash -c 'cat /proc/self/status'
Name: cat
State: R (running)
Tgid: 1
Pid: 1
PPid: 0
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 4
Groups:
VmSize: 10708 kB
VmRSS: 3124 kB
VmData: 316 kB
...
```
Fixes #6374
PiperOrigin-RevId: 387465655
|
|
PiperOrigin-RevId: 387427887
|
|
PiperOrigin-RevId: 386577891
|
|
We opted to move forward with FUSE instead.
PiperOrigin-RevId: 386344258
|
|
Since cgroupfs.dir embedes cgroupfs.implStatFS, and dir.StatFS and
implStatFS.StatFS are identical, dir.StatFS is not needed.
|
|
Reported-by: syzbot+59550b48e06cc0d3b638@syzkaller.appspotmail.com
PiperOrigin-RevId: 386075453
|
|
fs.renameMu is released and reacquired in `dentry.destroyLocked()` allowing
a dentry to be in `fs.syncableDentries` with a negative reference count.
Fixes #5263
PiperOrigin-RevId: 385054337
|
|
Remove the hack in gVisor vfs that allows verity to bypass the O_PATH
check, since ioctl is not allowed on fds opened with O_PATH in linux.
Verity still opens the lowerFD with O_PATH to open it as a symlink, but
the API no longer expects O_PATH to open a fd to be verity enabled.
Now only O_FOLLOW should be specified when opening and enabling verity
features.
PiperOrigin-RevId: 384567833
|
|
Kernfs provides an internal mechanism to defer calls to `DecRef()` because
on the last reference `Filesystem.mu` must be held and most places that
need to call `DecRef()` are inside the lock. The same can be true for
filesystems that extend kernfs. procfs needs to look up files and `DecRef()`
them inside the `kernfs.Filesystem.mu`. If the files happen to be procfs
files, it can deadlock trying to decrement if it's the last reference.
This change extends the mechanism to external callers to defer DecRefs
to `vfs.FileDescription` and `vfs.VirtualDentries`.
PiperOrigin-RevId: 384361647
|
|
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.
1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.
Fixes #6180
PiperOrigin-RevId: 384347009
|
|
Update the following from syserror to the linuxerr equivalent:
EEXIST
EFAULT
ENOTDIR
ENOTTY
EOPNOTSUPP
ERANGE
ESRCH
PiperOrigin-RevId: 384329869
|
|
PiperOrigin-RevId: 383684320
|