Age | Commit message (Collapse) | Author |
|
Linux 4.18 and later make reads and writes coherent between pre-copy-up and
post-copy-up FDs representing the same file on an overlay filesystem. However,
memory mappings remain incoherent:
- Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file
residing on a lower layer is opened for read-only and then memory mapped with
MAP_SHARED, then subsequent changes to the file are not reflected in the
memory mapping."
- fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any
management of coherence in the overlay.
- Experimentally on Linux 5.2:
```
$ cat mmap_cat_page.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2) {
errx(1, "syntax: %s [FILE]", argv[0]);
}
const int fd = open(argv[1], O_RDONLY);
if (fd < 0) {
err(1, "open(%s)", argv[1]);
}
const size_t page_size = sysconf(_SC_PAGE_SIZE);
void* page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
if (page == MAP_FAILED) {
err(1, "mmap");
}
for (;;) {
write(1, page, strnlen(page, page_size));
if (getc(stdin) == EOF) {
break;
}
}
return 0;
}
$ gcc -O2 -o mmap_cat_page mmap_cat_page.c
$ mkdir lowerdir upperdir workdir overlaydir
$ echo old > lowerdir/file
$ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir
$ ./mmap_cat_page overlaydir/file
old
^Z
[1]+ Stopped ./mmap_cat_page overlaydir/file
$ echo new > overlaydir/file
$ cat overlaydir/file
new
$ fg
./mmap_cat_page overlaydir/file
old
```
Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only
necessary pre-4.18, replacing existing memory mappings (in both sentry and
application address spaces) with mappings of the new FD is required regardless
of kernel version, and this latter behavior is common to both VFS1 and VFS2.
Re-document accordingly, and change the runsc flag to enabled by default.
New test:
- Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b
- After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab
PiperOrigin-RevId: 311361267
|
|
PiperOrigin-RevId: 311285868
|
|
PiperOrigin-RevId: 311203776
|
|
Fixes #2651.
PiperOrigin-RevId: 311193661
|
|
PiperOrigin-RevId: 311181084
|
|
kernel.Task.Block() requires that the caller is running on the task goroutine.
netstack.SocketOperations.Write() uses kernel.TaskFromContext() to call
kernel.Task.Block() even if it's not running on the task goroutine. Stop doing
that.
PiperOrigin-RevId: 311178335
|
|
- Added support for matching gid owner and invert flag for uid
and gid.
$ iptables -A OUTPUT -p tcp -m owner --gid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --uid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --gid-owner root -j DROP
- Added tests for uid, gid and invert flags.
|
|
PiperOrigin-RevId: 311153824
|
|
PiperOrigin-RevId: 311046755
|
|
We weren't properly checking whether the inserted default rule was
unconditional.
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 311014995
|
|
PiperOrigin-RevId: 310963404
|
|
view.ToVectorisedView() now just returns an empty vectorised
view if the view is of zero length. Earlier it would return
a VectorisedView of zero length but with 1 empty view. This
has been a source of bugs as lower layers don't expect
zero length views in VectorisedViews.
VectorisedView.AppendView() now is a no-op if the view being
appended is of zero length.
Fixes #2658
PiperOrigin-RevId: 310942269
|
|
Some code paths needed these syscalls anyways, so they should be included in
the filters. Given that we depend on these syscalls in some cases, there's no
real reason to avoid them any more.
PiperOrigin-RevId: 310829126
|
|
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT
PiperOrigin-RevId: 310642286
|
|
This has two effects: It makes flags passed to open("/proc/[pid]/fd/[hostfd]")
effective, and it prevents imported pipes/sockets/character devices from being
opened with O_NONBLOCK unconditionally (because the underlying host FD was set
to non-blocking in ImportFD()).
PiperOrigin-RevId: 310596062
|
|
This fixed the corresponding packetimpact test.
PiperOrigin-RevId: 310593470
|
|
The common syscall definitions mean that ARM64-exclusive files need stubs in
the ARM64 build.
PiperOrigin-RevId: 310446698
|
|
Only the last test was running before since the goroutines won't be executed
until after this loop. I added t.Log(test.name) and this is was the result:
TestListenNoAcceptNonUnicastV4/SourceUnspecified: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestUnspecified: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestOtherMulticast: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/SourceBroadcast: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestOurMulticast: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/DestBroadcast: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/SourceOtherMulticast: DestOtherMulticast
TestListenNoAcceptNonUnicastV4/SourceOurMulticast: DestOtherMulticast
https://github.com/golang/go/wiki/TableDrivenTests#parallel-testing
PiperOrigin-RevId: 310440629
|
|
Updates #1197, #1198, #1672
PiperOrigin-RevId: 310432006
|
|
PiperOrigin-RevId: 310417191
|
|
They don't depend on anything in VFS2, so they should be their own packages.
PiperOrigin-RevId: 310416807
|
|
PiperOrigin-RevId: 310404113
|
|
Every call to sender.NextSeg does not need to iterate from the
front of the writeList as in a given recovery episode we can cache
the last nextSeg returned. There cannot be a lower sequenced segment
that matches the next call to NextSeg as otherwise we would have
returned that instead in the previous call.
This fixes the issue of excessive CPU usage w/ large send buffers
where we spend a lot of time iterating from the front of the list on
every NextSeg invocation.
Further the following other bugs were also fixed:
* Iteration of segments never sent in NextSeg() when looking for segments for
retransmission that match step1/3/4 of the NextSeg algorithm
* Correctly setting rescueRxt only if the rescue segment was actually sent.
* Correctly initializing rescueRxt/highRxt when entering SACK recovery.
* Correctly re-arming the timer only on retransmissions when SACK is in use
and not for every segment being sent as it was being done before.
* Copy over xmitTime and xmitCount on segment clone.
* Move writeNext along when skipping over SACKED segments. This is required
to prevent spurious retransmissions where we end up retransmitting data
that was never lost.
PiperOrigin-RevId: 310387671
|
|
Synthetic sockets do not have the race condition issue in VFS2, and we will
get rid of privateunixsocket as well.
Fixes #1200.
PiperOrigin-RevId: 310386474
|
|
PiperOrigin-RevId: 310380911
|
|
Fixes #1965.
PiperOrigin-RevId: 310380433
|
|
Based on ipv6's TestReceiveIPv6Fragments.
|
|
PiperOrigin-RevId: 310259686
|
|
Compare:
https://elixir.bootlin.com/linux/v5.6/source/fs/timerfd.c#L431
PiperOrigin-RevId: 310246908
|
|
Do not assume that networks need any DHCPv6 configurations. Instead,
notify the NDP dispatcher in response to the first NDP RA's DHCPv6
flags, even if the flags indicate no DHCPv6 configurations are
available.
PiperOrigin-RevId: 310245068
|
|
We can register any number of tables with any number of architectures, and
need not limit the definitions to the architecture in question. This allows
runsc to generate documentation for all architectures simultaneously.
Similarly, this simplifies the VFSv2 patching process.
PiperOrigin-RevId: 310224827
|
|
We need to check vv.Size() instead of len(tcp), as tcp will always be 20 bytes
long.
PiperOrigin-RevId: 310218351
|
|
PiperOrigin-RevId: 310179277
|
|
PiperOrigin-RevId: 310057834
|
|
As per RFC 1122 4.2.2.17, when the remote advertizes zero receive window,
the sender needs to probe for the window-size to become non-zero starting
from the next retransmission interval. The TCP connection needs to be kept
open as long as the remote is acknowledging the zero window probes.
We reuse the retransmission timers to support this.
Fixes #1644
PiperOrigin-RevId: 310021575
|
|
Three updates:
- Mark all vfs2 socket syscalls as supported.
- Use the same dev number and ino number generator for all types of sockets,
unlike in VFS1.
- Do not use host fd for hostinet metadata.
Fixes #1476, #1478, #1484, 1485, #2017.
PiperOrigin-RevId: 309994579
|
|
PiperOrigin-RevId: 309966538
|
|
Implement PrependPath() in host.filesystem to correctly format
name for host files.
Updates #1672
PiperOrigin-RevId: 309959135
|
|
p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in
particular, using it for file ownership causes capabilities to be ineffective
since file capabilities require that the file's KUID and KGID are mapped into
the capability holder's user namespace [1], and auth.NoID is not mapped into
any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the
unlikely case that an application actually using the overflow KUID/KGID
attempts an operation that is consequently permitted by client permission
checks, the remote operation will still fail with EPERM.
Since this changes the VFS2 gofer client to no longer ignore the invalid IDs
entirely, this CL both permits and requires that we change synthetic mount point
creation to use root credentials.
[1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions.
PiperOrigin-RevId: 309856455
|
|
And move sys_timerfd.go to just timerfd.go for consistency.
Updates #1475.
PiperOrigin-RevId: 309835029
|
|
This allows for kerfs.Filesystem to be overridden by
different implementations.
Updates #1672
PiperOrigin-RevId: 309809321
|
|
PiperOrigin-RevId: 309783486
|
|
Updates #1623, #1487
PiperOrigin-RevId: 309777922
|
|
This change ensures that even platforms with some TSC issues (e.g. KVM),
can get reliable monotonic time by applied a lower bound on each read.
PiperOrigin-RevId: 309773801
|
|
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
|
|
If the NIC already has a generated SLAAC address, regenerate a new SLAAC
address until one is generated that does not conflict with the NIC's
existing addresses, up to a maximum of 10 attempts.
This applies to both stable and temporary SLAAC addresses.
Test: stack_test.TestMixedSLAACAddrConflictRegen
PiperOrigin-RevId: 309495628
|
|
PiperOrigin-RevId: 309491861
|
|
All three follow the same pattern:
1. Refactor VFS1 sockets into socketOpsCommon, so that most of the methods can
be shared with VFS2.
2. Create a FileDescriptionImpl with the corresponding socket operations,
rewriting the few that cannot be shared with VFS1.
3. Set up a VFS2 socket provider that creates a socket by setting up a dentry
in the global Kernel.socketMount and connecting it with a new
FileDescription.
This mostly completes the work for porting sockets to VFS2, and many syscall
tests can be enabled as a result.
There are several networking-related syscall tests that are still not passing:
1. net gofer tests
2. socketpair gofer tests
2. sendfile tests (splice is not implemented in VFS2 yet)
Updates #1478, #1484, #1485
PiperOrigin-RevId: 309457331
|