Age | Commit message (Collapse) | Author |
|
Remove three syserror entries duplicated in linuxerr. Because of the
linuxerr.Equals method, this is a mere change of return values from
syserror to linuxerr definitions.
Done with only these three errnos as CLs removing all grow to a significantly
large size.
PiperOrigin-RevId: 382173835
|
|
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors.
This will facilitate removal of syserror definitions in a followup, and
finding needed conversions from unix.Errno to linuxerr.
PiperOrigin-RevId: 380909667
|
|
PiperOrigin-RevId: 377966969
|
|
PiperOrigin-RevId: 375780659
|
|
Split usermem package to help remove syserror dependency in go_marshal.
New hostarch package contains code not dependent on syserror.
PiperOrigin-RevId: 365651233
|
|
PiperOrigin-RevId: 363276495
|
|
PiperOrigin-RevId: 362406813
|
|
This validates that struct fields if annotated with "// checklocks:mu" where
"mu" is a mutex field in the same struct then access to the field is only
done with "mu" locked.
All types that are guarded by a mutex must be annotated with
// +checklocks:<mutex field name>
For more details please refer to README.md.
PiperOrigin-RevId: 360729328
|
|
Before this CL, VFS2's overlayfs uses a single private device number and an
autoincrementing generated inode number for directories; this is consistent
with Linux's overlayfs in the non-samefs non-xino case. However, this breaks
some applications more consistently than on Linux due to more aggressive
caching of Linux overlayfs dentries.
Switch from using mapped device numbers + the topmost layer's inode number for
just non-copied-up non-directory files, to doing so for all files. This still
allows directory dev/ino numbers to change across copy-up, but otherwise keeps
them consistent.
Fixes #5545:
```
$ docker run --runtime=runsc-vfs2-overlay --rm ubuntu:focal bash -c "mkdir -p 1/2/3/4/5/6/7/8 && rm -rf 1 && echo done"
done
```
PiperOrigin-RevId: 359350716
|
|
PiperOrigin-RevId: 357106080
|
|
Our implementation of vfs.CheckDeleteSticky was not consistent with Linux,
specifically not consistent with fs/linux.h:check_sticky().
One of the biggest differences was that the vfs implementation did not
allow the owner of the sticky directory to delete files inside it that belonged
to other users.
This change makes our implementation consistent with Linux.
Also adds an integration test to check for this. This bug is also present in
VFS1.
Updates #3027
PiperOrigin-RevId: 355557425
|
|
PiperOrigin-RevId: 352904728
|
|
Return EEXIST when overwritting a file as long as the caller has exec
permission on the parent directory, even if the caller doesn't have
write permission.
Also reordered the mount write check, which happens before permission
is checked.
Closes #5164
PiperOrigin-RevId: 351868123
|
|
Fixes #4991
PiperOrigin-RevId: 345800333
|
|
These options allow overriding the signal that gets sent to the process when
I/O operations are available on the file descriptor, rather than the default
`SIGIO` signal. Doing so also populates `siginfo` to contain extra information
about which file descriptor caused the event (`si_fd`) and what events happened
on it (`si_band`). The logic around which FD is populated within `si_fd`
matches Linux's, which means it has some weird edge cases where that value may
not actually refer to a file descriptor that is still valid.
This CL also ports extra S/R logic regarding async handler in VFS2.
Without this, async I/O handlers aren't properly re-registered after S/R.
PiperOrigin-RevId: 345436598
|
|
PiperOrigin-RevId: 342161204
|
|
This lets us avoid treating a value of 0 as one reference. All references
using the refsvfs2 template must call InitRefs() before the reference is
incremented/decremented, or else a panic will occur. Therefore, it should be
pretty easy to identify missing InitRef calls during testing.
Updates #1486.
PiperOrigin-RevId: 341411151
|
|
This is consistent with what Linux does. This was causing a PHP runtime test
failure. Fixed it for VFS2.
PiperOrigin-RevId: 341155209
|
|
We can reuse information about whether a whiteout exists on a given file path
from stepLocked when creating a file at that path. This helps save an Unlink
call to the upper filesystem if the whiteout does NOT exist (common case).
Plumbs this information from lookupLocked() -> getChildLocked() -> stepLocked().
This also helped save a Lookup in RenameAt().
Fixes #1199
PiperOrigin-RevId: 341105351
|
|
Also refactor the template and CheckedObject interface to make this cleaner.
Updates #1486.
PiperOrigin-RevId: 339577120
|
|
Updates #1199
PiperOrigin-RevId: 339528827
|
|
In VFS1's overlayfs, files use the device and inode number of the lower layer
inode if one exists, and the upper layer inode otherwise. The former behavior
is inefficient (requiring lower layer lookups even if the file exists and is
otherwise wholly determined by the upper layer), and somewhat dangerous if the
lower layer is also observable (since both the overlay and lower layer file
will have the same device and inode numbers and thus appear to be the same
file, despite being behaviorally different). VFS2 overlayfs imitates Linux
overlayfs (in its default configuration) instead; it always uses the inode
number from the originating layer, but synthesizes a unique device number for
directories and another device number for non-directory files that have not
been copied-up.
As it turns out, the latter is insufficient (in VFS2, and possibly Linux as
well), because a given layer may include files with different device numbers.
If two distinct files on such a layer have device number X and Y respectively,
but share inode number Z, then the overlay will map both files to some private
device number X' and inode number Z, potentially confusing applications. Fix
this by assigning synthetic device numbers based on the lower layer's device
number, rather than the lower layer's vfs.Filesystem.
PiperOrigin-RevId: 339300341
|
|
Updates #1486.
PiperOrigin-RevId: 338832085
|
|
Inode number consistency checks are now skipped in save/restore tests for
reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename.
They pass in VFS1 due to the bug described in new test case
SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs.
Fixes #1663
PiperOrigin-RevId: 338776148
|
|
- Check the sticky bit in overlay.filesystem.UnlinkAt(). Fixes
StickyTest.StickyBitPermDenied.
- When configuring a VFS2 overlay in runsc, copy the lower layer's root
owner/group/mode to the upper layer's root (as in the VFS1 equivalent,
boot.addOverlay()). This makes the overlay root owned by UID/GID 65534 with
mode 0755 rather than owned by UID/GID 0 with mode 01777. Fixes
CreateTest.CreateFailsOnUnpermittedDir, which assumes that the test cannot
create files in /.
- MknodTest.UnimplementedTypesReturnError assumes that the creation of device
special files is not supported. However, while the VFS2 gofer client still
doesn't support device special files, VFS2 tmpfs does, and in the overlay
test dimension mknod() targets a tmpfs upper layer. The test initially has
all capabilities, including CAP_MKNOD, so its creation of these files
succeeds. Constrain these tests to VFS1.
- Rename overlay.nonDirectoryFD to overlay.regularFileFD and only use it for
regular files, using the original FD for pipes and device special files. This
is more consistent with Linux (which gets the original inode_operations, and
therefore file_operations, for these file types from ovl_fill_inode() =>
init_special_inode()) and fixes remaining mknod and pipe tests.
- Read/write 1KB at a time in PipeTest.Streaming, rather than 4 bytes. This
isn't strictly necessary, but it makes the test less obnoxiously slow on
ptrace.
Fixes #4407
PiperOrigin-RevId: 337971042
|
|
Singleton filesystem like devpts and devtmpfs have a single filesystem shared
among all mounts, so they acquire a "self-reference" when initialized that
must be released when the entire virtual filesystem is released at sandbox
exit.
PiperOrigin-RevId: 336828852
|
|
Fixes #1479, #317.
PiperOrigin-RevId: 334258052
|
|
Updates #1663
PiperOrigin-RevId: 333539293
|
|
Updates #1199
PiperOrigin-RevId: 332539197
|
|
This change includes overlay, special regular gofer files, and hostfs.
Fixes #3589.
PiperOrigin-RevId: 332330860
|
|
This is very similar to copy-up-coherent mmap in the VFS1 overlay, with the
minor wrinkle that there is no fs.InodeOperations.Mappable().
Updates #1199
PiperOrigin-RevId: 331206314
|
|
overlay/filesystem.go:lookupLocked() did not DecRef the VD on some error paths
when it would not end up saving or using the VD.
PiperOrigin-RevId: 330589742
|
|
PiperOrigin-RevId: 330554450
|
|
PiperOrigin-RevId: 329825497
|
|
Updates #1199
PiperOrigin-RevId: 329802274
|
|
PiperOrigin-RevId: 328583461
|
|
This is needed to support the overlay opaque attribute.
PiperOrigin-RevId: 328552985
|
|
PiperOrigin-RevId: 328410065
|
|
Some VFS operations (those which operate on FDs) get their credentials via the
context instead of via an explicit creds param. For these cases, we must pass
the overlay credentials on the context.
PiperOrigin-RevId: 327881259
|
|
Our "Preconditions:" blocks are very useful to determine the input invariants,
but they are bit inconsistent throughout the codebase, which makes them harder
to read (particularly cases with 5+ conditions in a single paragraph).
I've reformatted all of the cases to fit in simple rules:
1. Cases with a single condition are placed on a single line.
2. Cases with multiple conditions are placed in a bulleted list.
This format has been added to the style guide.
I've also mentioned "Postconditions:", though those are much less frequently
used, and all uses already match this style.
PiperOrigin-RevId: 327687465
|
|
Fixes #3243, #3521
PiperOrigin-RevId: 327308890
|
|
Earlier we were using NLink to decide if /tmp is empty or not. However, NLink
at best tells us about the number of subdirectories (via the ".." entries).
NLink = n + 2 for n subdirectories. But it does not tell us if the directory is
empty. There still might be non-directory files. We could also not rely on
NLink because host overlayfs always returned 1.
VFS1 uses Readdir to decide if the directory is empty. Used a similar approach.
We now use IterDirents to decide if the "/tmp" directory is empty.
Fixes #3369
PiperOrigin-RevId: 325554234
|
|
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.
PiperOrigin-RevId: 324672584
|
|
- Check write permission on truncate(2). Unlike ftruncate(2),
truncate(2) fails if the user does not have write permissions
on the file.
- For gofers under InteropModeShared, check file type before
making a truncate request. We should fail early and avoid
making an rpc when possible. Furthermore, depending on the
remote host's failure may give us unexpected behavior--if the
host converts the truncate request to an ftruncate syscall on
an open fd, we will get EINVAL instead of EISDIR.
Updates #2923.
PiperOrigin-RevId: 322913569
|
|
Fix typos.
PiperOrigin-RevId: 322913282
|
|
Because there is no inode structure stored in the sandbox, inotify watches
must be held on the dentry. This would be an issue in the presence of hard
links, where multiple dentries would need to share the same set of watches,
but in VFS2, we do not support the internal creation of hard links on gofer
fs. As a result, we make the assumption that every dentry corresponds to a
unique inode.
Furthermore, dentries can be cached and then evicted, even if the underlying
file has not be deleted. We must prevent this from occurring if there are any
watches that would be lost. Note that if the dentry was deleted or invalidated
(d.vfsd.IsDead()), we should still destroy it along with its watches.
Additionally, when a dentry’s last watch is removed, we cache it if it also
has zero references. This way, the dentry can eventually be evicted from
memory if it is no longer needed. This is accomplished with a new dentry
method, OnZeroWatches(), which is called by Inotify.RmWatch and
Inotify.Release. Note that it must be called after all inotify locks are
released to avoid violating lock order. Stress tests are added to make sure
that inotify operations don't deadlock with gofer.OnZeroWatches.
Updates #1479.
PiperOrigin-RevId: 317958034
|
|
Updates #1035, #1199
PiperOrigin-RevId: 317028108
|
|
- Change FileDescriptionImpl Lock/UnlockPOSIX signature to
take {start,length,whence}, so the correct offset can be
calculated in the implementations.
- Create PosixLocker interface to make it possible to share
the same locking code from different implementations.
Closes #1480
PiperOrigin-RevId: 316910286
|
|
Major differences from existing overlay filesystems:
- Linux allows lower layers in an overlay to require revalidation, but not the
upper layer. VFS1 allows the upper layer in an overlay to require
revalidation, but not the lower layer. VFS2 does not allow any layers to
require revalidation. (Now that vfs.MkdirOptions.ForSyntheticMountpoint
exists, no uses of overlay in VFS1 are believed to require upper layer
revalidation; in particular, the requirement that the upper layer support the
creation of "trusted." extended attributes for whiteouts effectively required
the upper filesystem to be tmpfs in most cases.)
- Like VFS1, but unlike Linux, VFS2 overlay does not attempt to make mutations
of the upper layer atomic using a working directory and features like
RENAME_WHITEOUT. (This may change in the future, since not having a working
directory makes error recovery for some operations, e.g. rmdir, particularly
painful.)
- Like Linux, but unlike VFS1, VFS2 represents whiteouts using character
devices with rdev == 0; the equivalent of the whiteout attribute on
directories is xattr trusted.overlay.opaque = "y"; and there is no equivalent
to the whiteout attribute on non-directories since non-directories are never
merged with lower layers.
- Device and inode numbers work as follows:
- In Linux, modulo the xino feature and a special case for when all layers
are the same filesystem:
- Directories use the overlay filesystem's device number and an
ephemeral inode number assigned by the overlay.
- Non-directories that have been copied up use the device and inode
number assigned by the upper filesystem.
- Non-directories that have not been copied up use a per-(overlay,
layer)-pair device number and the inode number assigned by the lower
filesystem.
- In VFS1, device and inode numbers always come from the lower layer unless
"whited out"; this has the adverse effect of requiring interaction with
the lower filesystem even for non-directory files that exist on the upper
layer.
- In VFS2, device and inode numbers are assigned as in Linux, except that
xino and the samefs special case are not supported.
- Like Linux, but unlike VFS1, VFS2 does not attempt to maintain memory mapping
coherence across copy-up. (This may have to change in the future, as users
may be dependent on this property.)
- Like Linux, but unlike VFS1, VFS2 uses the overlayfs mounter's credentials
when interacting with the overlay's layers, rather than the caller's.
- Like Linux, but unlike VFS1, VFS2 permits multiple lower layers in an
overlay.
- Like Linux, but unlike VFS1, VFS2's overlay filesystem is
application-mountable.
Updates #1199
PiperOrigin-RevId: 316019067
|