summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2020-05-13Enable overlayfs_stale_read by default for runsc.Jamie Liu
Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char **argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-12Merge pull request #2678 from nybidari:iptablesgVisor bot
PiperOrigin-RevId: 311203776
2020-05-12Don't allow rename across different gofer or tmpfs mounts.Nicolas Lacasse
Fixes #2651. PiperOrigin-RevId: 311193661
2020-05-12Merge pull request #2671 from kevinGC:skip-outputgVisor bot
PiperOrigin-RevId: 311181084
2020-05-12Don't call kernel.Task.Block() from netstack.SocketOperations.Write().Jamie Liu
kernel.Task.Block() requires that the caller is running on the task goroutine. netstack.SocketOperations.Write() uses kernel.TaskFromContext() to call kernel.Task.Block() even if it's not running on the task goroutine. Stop doing that. PiperOrigin-RevId: 311178335
2020-05-12iptables: support gid match for owner matching.Nayana Bidari
- Added support for matching gid owner and invert flag for uid and gid. $ iptables -A OUTPUT -p tcp -m owner --gid-owner root -j ACCEPT $ iptables -A OUTPUT -p tcp -m owner ! --uid-owner root -j ACCEPT $ iptables -A OUTPUT -p tcp -m owner ! --gid-owner root -j DROP - Added tests for uid, gid and invert flags.
2020-05-12Merge pull request #2664 from lubinszARM:pr_sigfpgVisor bot
PiperOrigin-RevId: 311153824
2020-05-11Internal change.Jamie Liu
PiperOrigin-RevId: 311046755
2020-05-11iptables: check for truly unconditional rulesKevin Krakauer
We weren't properly checking whether the inserted default rule was unconditional.
2020-05-11Add fpsimd support in sigreturn on Arm64Bin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-11Add fsimpl/gofer.InternalFilesystemOptions.OpenSocketsByConnecting.Jamie Liu
PiperOrigin-RevId: 311014995
2020-05-10Stop avoiding preadv2 and pwritev2, and add them to the filters.Nicolas Lacasse
Some code paths needed these syscalls anyways, so they should be included in the filters. Given that we depend on these syscalls in some cases, there's no real reason to avoid them any more. PiperOrigin-RevId: 310829126
2020-05-08iptables - filter packets using outgoing interface.gVisor bot
Enables commands with -o (--out-interface) for iptables rules. $ iptables -A OUTPUT -o eth0 -j ACCEPT PiperOrigin-RevId: 310642286
2020-05-08Pass flags to fsimpl/host.inode.open().Jamie Liu
This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]") effective, and it prevents imported pipes/sockets/character devices from being opened with O_NONBLOCK unconditionally (because the underlying host FD was set to non-blocking in ImportFD()). PiperOrigin-RevId: 310596062
2020-05-07Fix ARM64 build.Adin Scannell
The common syscall definitions mean that ARM64-exclusive files need stubs in the ARM64 build. PiperOrigin-RevId: 310446698
2020-05-07Allocate device numbers for VFS2 filesystems.Jamie Liu
Updates #1197, #1198, #1672 PiperOrigin-RevId: 310432006
2020-05-07Move pkg/sentry/vfs/{eventfd,timerfd} to new packages in pkg/sentry/fsimpl.Nicolas Lacasse
They don't depend on anything in VFS2, so they should be their own packages. PiperOrigin-RevId: 310416807
2020-05-07Port signalfd to vfs2.Nicolas Lacasse
PiperOrigin-RevId: 310404113
2020-05-07Update privateunixsocket TODOs.Dean Deng
Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474
2020-05-07Remove outdated TODO for VFS2 AccessAt.Dean Deng
Fixes #1965. PiperOrigin-RevId: 310380433
2020-05-06Merge pull request #2570 from lubinszARM:pr_cleangVisor bot
PiperOrigin-RevId: 310259686
2020-05-06Remove vfs.FileDescriptionOptions.InvalidWrite.Jamie Liu
Compare: https://elixir.bootlin.com/linux/v5.6/source/fs/timerfd.c#L431 PiperOrigin-RevId: 310246908
2020-05-06Fix runsc syscall documentation generation.Adin Scannell
We can register any number of tables with any number of architectures, and need not limit the definitions to the architecture in question. This allows runsc to generate documentation for all architectures simultaneously. Similarly, this simplifies the VFSv2 patching process. PiperOrigin-RevId: 310224827
2020-05-06Add maximum memory limit.Nicolas Lacasse
PiperOrigin-RevId: 310179277
2020-05-05Internal change.gVisor bot
PiperOrigin-RevId: 310057834
2020-05-05Update vfs2 socket TODOs.Dean Deng
Three updates: - Mark all vfs2 socket syscalls as supported. - Use the same dev number and ino number generator for all types of sockets, unlike in VFS1. - Do not use host fd for hostinet metadata. Fixes #1476, #1478, #1484, 1485, #2017. PiperOrigin-RevId: 309994579
2020-05-05Update comments for synthetic gofer files in vfs2.Dean Deng
PiperOrigin-RevId: 309966538
2020-05-05Return correct name for imported host filesFabricio Voznika
Implement PrependPath() in host.filesystem to correctly format name for host files. Updates #1672 PiperOrigin-RevId: 309959135
2020-05-05Translate p9.NoUID/GID to OverflowUID/GID.Jamie Liu
p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in particular, using it for file ownership causes capabilities to be ineffective since file capabilities require that the file's KUID and KGID are mapped into the capability holder's user namespace [1], and auth.NoID is not mapped into any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the unlikely case that an application actually using the overflow KUID/KGID attempts an operation that is consequently permitted by client permission checks, the remote operation will still fail with EPERM. Since this changes the VFS2 gofer client to no longer ignore the invalid IDs entirely, this CL both permits and requires that we change synthetic mount point creation to use root credentials. [1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions. PiperOrigin-RevId: 309856455
2020-05-04Port eventfd to VFS2.Nicolas Lacasse
And move sys_timerfd.go to just timerfd.go for consistency. Updates #1475. PiperOrigin-RevId: 309835029
2020-05-04Remove kernfs.Filesystem cast from GenericDirectoryFDFabricio Voznika
This allows for kerfs.Filesystem to be overridden by different implementations. Updates #1672 PiperOrigin-RevId: 309809321
2020-05-04Merge pull request #2275 from nybidari:iptablesgVisor bot
PiperOrigin-RevId: 309783486
2020-05-04Add TTY support on VFS2 to runscFabricio Voznika
Updates #1623, #1487 PiperOrigin-RevId: 309777922
2020-05-04Fix flaky monotonic time.Adin Scannell
This change ensures that even platforms with some TSC issues (e.g. KVM), can get reliable monotonic time by applied a lower bound on each read. PiperOrigin-RevId: 309773801
2020-05-01Support for connection tracking of TCP packets.Nayana Bidari
Connection tracking is used to track packets in prerouting and output hooks of iptables. The NAT rules modify the tuples in connections. The connection tracking code modifies the packets by looking at the modified tuples.
2020-05-01Automated rollback of changelist 308674219Kevin Krakauer
PiperOrigin-RevId: 309491861
2020-05-01Port netstack, hostinet, and netlink sockets to VFS2.Dean Deng
All three follow the same pattern: 1. Refactor VFS1 sockets into socketOpsCommon, so that most of the methods can be shared with VFS2. 2. Create a FileDescriptionImpl with the corresponding socket operations, rewriting the few that cannot be shared with VFS1. 3. Set up a VFS2 socket provider that creates a socket by setting up a dentry in the global Kernel.socketMount and connecting it with a new FileDescription. This mostly completes the work for porting sockets to VFS2, and many syscall tests can be enabled as a result. There are several networking-related syscall tests that are still not passing: 1. net gofer tests 2. socketpair gofer tests 2. sendfile tests (splice is not implemented in VFS2 yet) Updates #1478, #1484, #1485 PiperOrigin-RevId: 309457331
2020-04-30Add gofer.InternalFilesystemOptions.LeakConnection.Jamie Liu
PiperOrigin-RevId: 309317605
2020-04-30Implement waiter.Waitable methods on VFS2 host inodes.Nicolas Lacasse
This fixes bash in Ubuntu. Updates #1672. PiperOrigin-RevId: 309298252
2020-04-30Fix proc net bugs in VFS2.Dean Deng
The /proc/net/udp header was missing, and /proc/sys/net was set up as /proc/sys/net/net. Discovered while trying to run networking tests for VFS2. PiperOrigin-RevId: 309243758
2020-04-29Add read/write timeouts for VFS2 socket files.Dean Deng
Updates #1476 PiperOrigin-RevId: 309098590
2020-04-29iptables: don't pollute logsKevin Krakauer
The netfilter package uses logs to make debugging the (de)serialization of structs easier. This generates a lot of (usually irrelevant) logs. Logging is now hidden behind a debug flag. PiperOrigin-RevId: 309087115
2020-04-28Fix Unix socket permissions.Dean Deng
Enforce write permission checks in BoundEndpointAt, which corresponds to the permission checks in Linux (net/unix/af_unix.c:unix_find_other). Also, create bound socket files with the correct permissions in VFS2. Fixes #2324. PiperOrigin-RevId: 308949084
2020-04-28Deduplicate unix socket Release() method.Dean Deng
PiperOrigin-RevId: 308932254
2020-04-28Support pipes and sockets in VFS2 gofer fs.Dean Deng
Named pipes and sockets can be represented in two ways in gofer fs: 1. As a file on the remote filesystem. In this case, all file operations are passed through 9p. 2. As a synthetic file that is internal to the sandbox. In this case, the dentry stores an endpoint or VFSPipe for sockets and pipes respectively, which replaces interactions with the remote fs through the gofer. In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p, and if it fails, fall back to the synthetic version. Updates #1200. PiperOrigin-RevId: 308828161
2020-04-28code clean in arch moduleBin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-04-27Import host sockets.Dean Deng
The FileDescription implementation for hostfs sockets uses the standard Unix socket implementation (unix.SocketVFS2), but is also tied to a hostfs dentry. Updates #1672, #1476 PiperOrigin-RevId: 308716426
2020-04-27Dump stack for stuck start and stuck watchdogFabricio Voznika
The meaning for skipDump was reversed, but not all callers were updated. Change the meaning once again to forceDump, so that the period between stack dump is respected from all callers. PiperOrigin-RevId: 308674373
2020-04-27Automated rollback of changelist 308163542gVisor bot
PiperOrigin-RevId: 308674219
2020-04-27Don't leak vfs.MountNamespace reference if kernel.TaskSet.NewTask fails.Jamie Liu
PiperOrigin-RevId: 308617610