Age | Commit message (Collapse) | Author |
|
On amd64, it uses 'HLT' to leave the guest.
Unlike amd64, arm64 can only uses mmio_exit/psci to leave the guest.
So, I designed the HYPERCALL_VMEXIT to be compatible with amd64/arm64.
To keep it simple, I used the address of exception table as the
MMIO base address, so that I can trigger a MMIO-EXIT by forcibly writing this space.
Then, in host user space, I can calculate this address to find out
which hypercall.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 311657502
|
|
Closes #2612.
PiperOrigin-RevId: 311548074
|
|
This change adds support for TCP_SYNCNT and TCP_WINDOW_CLAMP options
in GetSockOpt/SetSockOpt. This change does not really change any
behaviour in Netstack and only stores/returns the stored value.
Actual honoring of these options will be added as required.
Fixes #2626, #2625
PiperOrigin-RevId: 311453777
|
|
Closes #1197
PiperOrigin-RevId: 311438223
|
|
Linux 4.18 and later make reads and writes coherent between pre-copy-up and
post-copy-up FDs representing the same file on an overlay filesystem. However,
memory mappings remain incoherent:
- Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file
residing on a lower layer is opened for read-only and then memory mapped with
MAP_SHARED, then subsequent changes to the file are not reflected in the
memory mapping."
- fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any
management of coherence in the overlay.
- Experimentally on Linux 5.2:
```
$ cat mmap_cat_page.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2) {
errx(1, "syntax: %s [FILE]", argv[0]);
}
const int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
err(1, "open(%s)", argv[1]);
}
const size_t page_size = sysconf(_SC_PAGE_SIZE);
void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
if (page == MAP_FAILED) {
err(1, "mmap");
}
for (;;) {
write(1, page, strnlen(page, page_size));
if (getc(stdin) == EOF) {
break;
}
}
return 0;
}
$ gcc -O2 -o mmap_cat_page mmap_cat_page.c
$ mkdir lowerdir upperdir workdir overlaydir
$ echo old > lowerdir/file
$ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir
$ ./mmap_cat_page overlaydir/file
old
^Z
[1]+ Stopped ./mmap_cat_page overlaydir/file
$ echo new > overlaydir/file
$ cat overlaydir/file
new
$ fg
./mmap_cat_page overlaydir/file
old
```
Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only
necessary pre-4.18, replacing existing memory mappings (in both sentry and
application address spaces) with mappings of the new FD is required regardless
of kernel version, and this latter behavior is common to both VFS1 and VFS2.
Re-document accordingly, and change the runsc flag to enabled by default.
New test:
- Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b
- After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab
PiperOrigin-RevId: 311361267
|
|
PiperOrigin-RevId: 311203776
|
|
Fixes #2651.
PiperOrigin-RevId: 311193661
|
|
PiperOrigin-RevId: 311181084
|
|
kernel.Task.Block() requires that the caller is running on the task goroutine.
netstack.SocketOperations.Write() uses kernel.TaskFromContext() to call
kernel.Task.Block() even if it's not running on the task goroutine. Stop doing
that.
PiperOrigin-RevId: 311178335
|
|
- Added support for matching gid owner and invert flag for uid
and gid.
$ iptables -A OUTPUT -p tcp -m owner --gid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --uid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --gid-owner root -j DROP
- Added tests for uid, gid and invert flags.
|
|
PiperOrigin-RevId: 311153824
|
|
PiperOrigin-RevId: 311046755
|
|
We weren't properly checking whether the inserted default rule was
unconditional.
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 311014995
|
|
Some code paths needed these syscalls anyways, so they should be included in
the filters. Given that we depend on these syscalls in some cases, there's no
real reason to avoid them any more.
PiperOrigin-RevId: 310829126
|
|
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT
PiperOrigin-RevId: 310642286
|
|
This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]")
effective, and it prevents imported pipes/sockets/character devices from being
opened with O_NONBLOCK unconditionally (because the underlying host FD was set
to non-blocking in ImportFD()).
PiperOrigin-RevId: 310596062
|
|
The common syscall definitions mean that ARM64-exclusive files need stubs in
the ARM64 build.
PiperOrigin-RevId: 310446698
|
|
Updates #1197, #1198, #1672
PiperOrigin-RevId: 310432006
|
|
They don't depend on anything in VFS2, so they should be their own packages.
PiperOrigin-RevId: 310416807
|
|
PiperOrigin-RevId: 310404113
|
|
Synthetic sockets do not have the race condition issue in VFS2, and we will
get rid of privateunixsocket as well.
Fixes #1200.
PiperOrigin-RevId: 310386474
|
|
Fixes #1965.
PiperOrigin-RevId: 310380433
|
|
PiperOrigin-RevId: 310259686
|
|
Compare:
https://elixir.bootlin.com/linux/v5.6/source/fs/timerfd.c#L431
PiperOrigin-RevId: 310246908
|
|
We can register any number of tables with any number of architectures, and
need not limit the definitions to the architecture in question. This allows
runsc to generate documentation for all architectures simultaneously.
Similarly, this simplifies the VFSv2 patching process.
PiperOrigin-RevId: 310224827
|
|
PiperOrigin-RevId: 310179277
|
|
PiperOrigin-RevId: 310057834
|
|
Three updates:
- Mark all vfs2 socket syscalls as supported.
- Use the same dev number and ino number generator for all types of sockets,
unlike in VFS1.
- Do not use host fd for hostinet metadata.
Fixes #1476, #1478, #1484, 1485, #2017.
PiperOrigin-RevId: 309994579
|
|
PiperOrigin-RevId: 309966538
|
|
Implement PrependPath() in host.filesystem to correctly format
name for host files.
Updates #1672
PiperOrigin-RevId: 309959135
|
|
p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in
particular, using it for file ownership causes capabilities to be ineffective
since file capabilities require that the file's KUID and KGID are mapped into
the capability holder's user namespace [1], and auth.NoID is not mapped into
any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the
unlikely case that an application actually using the overflow KUID/KGID
attempts an operation that is consequently permitted by client permission
checks, the remote operation will still fail with EPERM.
Since this changes the VFS2 gofer client to no longer ignore the invalid IDs
entirely, this CL both permits and requires that we change synthetic mount point
creation to use root credentials.
[1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions.
PiperOrigin-RevId: 309856455
|
|
And move sys_timerfd.go to just timerfd.go for consistency.
Updates #1475.
PiperOrigin-RevId: 309835029
|
|
This allows for kerfs.Filesystem to be overridden by
different implementations.
Updates #1672
PiperOrigin-RevId: 309809321
|
|
PiperOrigin-RevId: 309783486
|
|
Updates #1623, #1487
PiperOrigin-RevId: 309777922
|
|
This change ensures that even platforms with some TSC issues (e.g. KVM),
can get reliable monotonic time by applied a lower bound on each read.
PiperOrigin-RevId: 309773801
|
|
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
|
|
PiperOrigin-RevId: 309491861
|
|
All three follow the same pattern:
1. Refactor VFS1 sockets into socketOpsCommon, so that most of the methods can
be shared with VFS2.
2. Create a FileDescriptionImpl with the corresponding socket operations,
rewriting the few that cannot be shared with VFS1.
3. Set up a VFS2 socket provider that creates a socket by setting up a dentry
in the global Kernel.socketMount and connecting it with a new
FileDescription.
This mostly completes the work for porting sockets to VFS2, and many syscall
tests can be enabled as a result.
There are several networking-related syscall tests that are still not passing:
1. net gofer tests
2. socketpair gofer tests
2. sendfile tests (splice is not implemented in VFS2 yet)
Updates #1478, #1484, #1485
PiperOrigin-RevId: 309457331
|
|
PiperOrigin-RevId: 309317605
|
|
This fixes bash in Ubuntu.
Updates #1672.
PiperOrigin-RevId: 309298252
|
|
The /proc/net/udp header was missing, and /proc/sys/net was set up as
/proc/sys/net/net. Discovered while trying to run networking tests for VFS2.
PiperOrigin-RevId: 309243758
|
|
Updates #1476
PiperOrigin-RevId: 309098590
|
|
The netfilter package uses logs to make debugging the (de)serialization of
structs easier. This generates a lot of (usually irrelevant) logs. Logging is
now hidden behind a debug flag.
PiperOrigin-RevId: 309087115
|
|
Enforce write permission checks in BoundEndpointAt, which corresponds to the
permission checks in Linux (net/unix/af_unix.c:unix_find_other).
Also, create bound socket files with the correct permissions in VFS2.
Fixes #2324.
PiperOrigin-RevId: 308949084
|
|
PiperOrigin-RevId: 308932254
|
|
Named pipes and sockets can be represented in two ways in gofer fs:
1. As a file on the remote filesystem. In this case, all file operations are
passed through 9p.
2. As a synthetic file that is internal to the sandbox. In this case, the
dentry stores an endpoint or VFSPipe for sockets and pipes respectively,
which replaces interactions with the remote fs through the gofer.
In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p,
and if it fails, fall back to the synthetic version.
Updates #1200.
PiperOrigin-RevId: 308828161
|