Age | Commit message (Collapse) | Author |
|
|
|
Linux 4.18 and later make reads and writes coherent between pre-copy-up and
post-copy-up FDs representing the same file on an overlay filesystem. However,
memory mappings remain incoherent:
- Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file
residing on a lower layer is opened for read-only and then memory mapped with
MAP_SHARED, then subsequent changes to the file are not reflected in the
memory mapping."
- fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any
management of coherence in the overlay.
- Experimentally on Linux 5.2:
```
$ cat mmap_cat_page.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2) {
errx(1, "syntax: %s [FILE]", argv[0]);
}
const int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
err(1, "open(%s)", argv[1]);
}
const size_t page_size = sysconf(_SC_PAGE_SIZE);
void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
if (page == MAP_FAILED) {
err(1, "mmap");
}
for (;;) {
write(1, page, strnlen(page, page_size));
if (getc(stdin) == EOF) {
break;
}
}
return 0;
}
$ gcc -O2 -o mmap_cat_page mmap_cat_page.c
$ mkdir lowerdir upperdir workdir overlaydir
$ echo old > lowerdir/file
$ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir
$ ./mmap_cat_page overlaydir/file
old
^Z
[1]+ Stopped ./mmap_cat_page overlaydir/file
$ echo new > overlaydir/file
$ cat overlaydir/file
new
$ fg
./mmap_cat_page overlaydir/file
old
```
Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only
necessary pre-4.18, replacing existing memory mappings (in both sentry and
application address spaces) with mappings of the new FD is required regardless
of kernel version, and this latter behavior is common to both VFS1 and VFS2.
Re-document accordingly, and change the runsc flag to enabled by default.
New test:
- Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b
- After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab
PiperOrigin-RevId: 311361267
|
|
|
|
Fixes #2651.
PiperOrigin-RevId: 311193661
|
|
|
|
Synthetic sockets do not have the race condition issue in VFS2, and we will
get rid of privateunixsocket as well.
Fixes #1200.
PiperOrigin-RevId: 310386474
|
|
|
|
p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in
particular, using it for file ownership causes capabilities to be ineffective
since file capabilities require that the file's KUID and KGID are mapped into
the capability holder's user namespace [1], and auth.NoID is not mapped into
any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the
unlikely case that an application actually using the overflow KUID/KGID
attempts an operation that is consequently permitted by client permission
checks, the remote operation will still fail with EPERM.
Since this changes the VFS2 gofer client to no longer ignore the invalid IDs
entirely, this CL both permits and requires that we change synthetic mount point
creation to use root credentials.
[1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions.
PiperOrigin-RevId: 309856455
|
|
|
|
Named pipes and sockets can be represented in two ways in gofer fs:
1. As a file on the remote filesystem. In this case, all file operations are
passed through 9p.
2. As a synthetic file that is internal to the sandbox. In this case, the
dentry stores an endpoint or VFSPipe for sockets and pipes respectively,
which replaces interactions with the remote fs through the gofer.
In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p,
and if it fails, fall back to the synthetic version.
Updates #1200.
PiperOrigin-RevId: 308828161
|
|
|
|
Sentry metrics with nanoseconds units are labeled as such, and non-cumulative
sentry metrics are supported.
PiperOrigin-RevId: 307621080
|
|
|
|
The comments in the ticket indicate that this behavior
is fine and that the ticket should be closed, so we shouldn't
need pointers to the ticket.
PiperOrigin-RevId: 306266071
|
|
|
|
The sentry doesn't allow execve, but it's a good defense
in-depth measure.
PiperOrigin-RevId: 305958737
|
|
|
|
PiperOrigin-RevId: 305588941
|
|
|
|
Determine system time from within the sentry rather than relying on the remote
filesystem to prevent inconsistencies.
Resolve related TODOs; the time discrepancies in question don't exist anymore.
PiperOrigin-RevId: 305557099
|
|
|
|
|
|
PiperOrigin-RevId: 294295852
|
|
|
|
Note that these are only implemented for tmpfs, and other impls will still
return EOPNOTSUPP.
PiperOrigin-RevId: 293899385
|
|
|
|
Internal pipes are supported similarly to how internal UDS is done.
It is also controlled by the same flag.
Fixes #1102
PiperOrigin-RevId: 293150045
|
|
|
|
Because the abi will depend on the core types for marshalling (usermem,
context, safemem, safecopy), these need to be flattened from the sentry
directory. These packages contain no sentry-specific details.
PiperOrigin-RevId: 291811289
|
|
PiperOrigin-RevId: 291745021
|
|
|
|
There was a very bare get/setxattr in the InodeOperations interface. Add
context.Context to both, size to getxattr, and flags to setxattr.
Note that extended attributes are passed around as strings in this
implementation, so size is automatically encoded into the value. Size is
added in getxattr so that implementations can return ERANGE if a value is larger
than can fit in the user-allocated buffer. This prevents us from unnecessarily
passing around an arbitrarily large xattr when the user buffer is actually too
small.
Don't use the existing xattrwalk and xattrcreate messages and define our
own, mainly for the sake of simplicity.
Extended attributes will be implemented in future commits.
PiperOrigin-RevId: 290121300
|
|
|
|
* Rename syncutil to sync.
* Add aliases to sync types.
* Replace existing usage of standard library sync package.
This will make it easier to swap out synchronization primitives. For example,
this will allow us to use primitives from github.com/sasha-s/go-deadlock to
check for lock ordering violations.
Updates #1472
PiperOrigin-RevId: 289033387
|
|
|
|
After the finalizer optimize in 76039f895995c3fe0deef5958f843868685ecc38
commit, clientFile needs to closed before finalizer release it.
The clientFile is not closed if it is created via
gofer.(*inodeOperations).Bind, this will cause fd leak which is hold
by gofer process.
Fixes #1396
Signed-off-by: Yong He <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
|
|
PiperOrigin-RevId: 284606233
|
|
|
|
PiperOrigin-RevId: 282382564
|
|
|
|
Note that the Sentry still calls Truncate() on the file before calling Open.
A new p9 version check was added to ensure that the p9 server can handle the
the OpenTruncate flag. If not, then the flag is stripped before sending.
PiperOrigin-RevId: 281609112
|
|
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
|
|
PiperOrigin-RevId: 275139066
|
|
|
|
|
|
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.
Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.
Closes #1006
It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.
PiperOrigin-RevId: 275115289
|
|
When any of these flags are set, all writes will trigger a subsequent fsync
call. This behavior already existed for "write-through" mounts.
O_DIRECT is treated as an alias for O_SYNC. Better support coming soon.
PiperOrigin-RevId: 275114392
|
|
|
|
The gofer's CachingInodeOperations implementation contains an optimization for
the common open-read-close pattern when we have a host FD. In this case, the
host kernel will update the timestamp for us to a reasonably close time, so we
don't need an extra RPC to the gofer.
However, when the app explicitly sets the timestamps (via futimes or similar)
then we actually DO need to update the timestamps, because the host kernel
won't do it for us.
To fix this, a new boolean `forceSetTimestamps` was added to
CachineInodeOperations.SetMaskedAttributes. It is only set by
gofer.InodeOperations.SetTimestamps.
PiperOrigin-RevId: 272048146
|