Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 402468096
|
|
Tools (e.g. cAdvisor) watches for changes inside /sys/fs/cgroup to detect
when containers are created and deleted. With gVisor, container cgroups were
not created because the containers are not visible to the host.
This change enables the creation of [empty] subcontainer cgroups that can
be used by tools to detect creation/deletion of subcontainers. This change
required a new annotation to be added so that the shim can communicate the
pod cgroup path to runsc, so pod and container cgroups can be identified,
Fixes #6500
PiperOrigin-RevId: 402392291
|
|
We already have integration tests `make iptables-tests` that tests
the REDIRECT target, but unit tests are a lot faster and easier
to run than the integration test.
PiperOrigin-RevId: 402365412
|
|
Updates #1584, #3556.
PiperOrigin-RevId: 402354066
|
|
PiperOrigin-RevId: 402323053
|
|
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go
will not clobber floating point registers before/after calling them
respectively. Due to regabig in Go 1.17, this is no longer the case; regabig
(among other things) maintains a zeroed XMM15 during ABIInternal execution,
including by zeroing it after ABI0-to-ABIInternal transitions. In
ring0.sysenter/exception, this happens in
ring0.kernelSyscall/kernelException.abi0 respectively; in
ring0.CPU.SwitchToUser, this happens after returning from
ring0.sysret/iret.abi0. Delete these functions and do floating point save/load
in assembly.
While arm64 doesn't appear to be immediately affected (so this CL permits us to
resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be
incorrect for the same fundamental reason (Go code can't sanely assume what
registers the Go compiler will or won't use) and should be fixed eventually.
PiperOrigin-RevId: 401895658
|
|
listXattr() was doing redundant work. Remove it.
PiperOrigin-RevId: 401871315
|
|
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer
process when the gofer filesystem is the lower layer of an overlay.
The overlay filesystem aggressively queries for "trusted.overlay.opaque" which
in practice is never found in the lower layer gofer. But leads to a lot of
wasted work.
A consequence is that mutable gofer upper layer is not supported anymore but
that is still consistent with VFS1. We can revisit when need arises.
PiperOrigin-RevId: 401860585
|
|
The same create/write/read pattern is copied around several places. It's easier
to understand in a package with names and comments, and we can reuse the smart
blocking code in package rawfile.
PiperOrigin-RevId: 401647108
|
|
- Implements RFC 3522 (Eifel detection algorithm) to detect if the connection
entered loss recovery unnecessarily.
- Added a new metric to count the total number of spurious loss recoveries.
- Added tests to verify the new metric.
PiperOrigin-RevId: 401637359
|
|
PiperOrigin-RevId: 401624134
|
|
PiperOrigin-RevId: 401620449
|
|
TestRACKWithWindowFull was sending ACK for the last packet to avoid TLP. But,
sometimes the ACK is delayed and the sender sends the re-transmitted packet
before receiving ACK.
The test is now modified to expect the re-transmitted packet always and then
send a DSACK to avoid entering recovery.
Before: http://sponge2/6473db18-137a-4afb-9d60-c3eafd236ea9
After: http://sponge2/6a0f744c-7ea3-40fa-8f76-68503bf142ca
PiperOrigin-RevId: 401606848
|
|
Rather than boiling down to an integer eagerly, do it as late as possible.
PiperOrigin-RevId: 401599308
|
|
...all connections should be tracked by ConnTrack, so create a no-op
connection entry on the first hook into IPTables (Prerouting or
Output) and let NAT targets modify the connection entry if they
need to instead of letting the NAT target create their own connection
entry.
This also prepares for "twice-NAT" where a packet may have both DNAT and
SNAT performed on it (which requires the ability to update ConnTrack
entries).
Updates #5696.
PiperOrigin-RevId: 401360377
|
|
PiperOrigin-RevId: 401296116
|
|
PiperOrigin-RevId: 401152818
|
|
PiperOrigin-RevId: 401088040
|
|
Before checking if there is space in the accept queue, the listener
should verify that the cookie is valid. If it is not, instead of
silently dropping the packet, reply with an RST.
Fixes #6683
PiperOrigin-RevId: 400807346
|
|
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
|
|
We should avoid taking the write lock to avoid contention when looking
for a packet's tracked connection.
No need to reap timed out connections when looking for connections
as the reaper (which runs periodically) will handle that.
PiperOrigin-RevId: 400322514
|
|
Move the hook specific logic to the IPTables hook functions.
This lets us avoid having to perform checks on the hook to determine
what action to take.
Later changes will drop the need for handlePacket's return value,
reducing the value of this function that all hooks call into.
PiperOrigin-RevId: 400298023
|
|
...as the packet's direction gives us the information that tcbHook is
used to derive.
PiperOrigin-RevId: 400280102
|
|
...to catch lock-related bugs in nogo tests.
Updates #6566.
PiperOrigin-RevId: 400265818
|
|
PiperOrigin-RevId: 400258924
|
|
...and have `CheckOutputPackets`, `CheckPostroutingPackets` call their
equivalent methods that operate on a single packet buffer directly.
This is so that the `Check{Output, Postrouting}Packets` methods may
leverage any hook-specific work that `Check{Output, Postrouting}`
may perform.
Note: Later changes will add hook-specific logic to the
`Check{Output, Postrouting}` methods.
PiperOrigin-RevId: 400255651
|
|
...to save a call to `ConnTrack.connFor` when callers already have a
reference to the ConnTrack entry.
PiperOrigin-RevId: 400244955
|
|
This obsoletes the need for the pendingMu and pending, since they are redundant
with acceptMu and pendingAccepted.
Fixes #6671.
PiperOrigin-RevId: 400162391
|
|
For multithreads processes, it is hard to read logs without knowing task pids.
And let's print a decimal return codeo for syscalls. A hex return code are
usefull for system calls that return addresses. For other syscalls, the decimal
form is more readable.
PiperOrigin-RevId: 400035449
|
|
PiperOrigin-RevId: 399765414
|
|
Rename cap -> capacity to avoid collision with the builtin.
PiperOrigin-RevId: 399753630
|
|
This is redundant with listenContext.pendingEndpoints
PiperOrigin-RevId: 399722472
|
|
PiperOrigin-RevId: 399560357
|
|
This function has only one caller.
Remove segment reference count manipulation since it is only used
synchronously.
PiperOrigin-RevId: 399525343
|
|
* Does not accept a port range (Issue #5772).
* Does not support checking for tuple conflits (Issue #5773).
PiperOrigin-RevId: 399524088
|
|
PiperOrigin-RevId: 399295737
|
|
PiperOrigin-RevId: 399276940
|
|
PacketData should not be modified and should be treated readonly because it
represents packet payload. The old DeleteFront method allowed callers to modify
the underlying buffer which should not be allowed.
Added a way to consume from the PacketData instead of deleting from it.
Updated call points to use that instead.
Reported-by: syzbot+faee5cb350f769a52d1b@syzkaller.appspotmail.com
PiperOrigin-RevId: 399268473
|
|
There's no need for synthetic keys here.
PiperOrigin-RevId: 399263134
|
|
|
|
Task.netns can be accessed atomically, so Task.mu isn't needed to access it.
PiperOrigin-RevId: 398773947
|
|
PiperOrigin-RevId: 398763161
|
|
The p9 client does the same. This allows applications to read/write >= 2MB of
data. This enables the read write benchmarks to work with lisafs.
Updates #5466
PiperOrigin-RevId: 398659947
|
|
This allows to avoind unnecessary lock-ordering dependencies on task.mu.
|
|
Create the /sys/fs/cgroup directory when cgroups are available. This
creates the empty directory to serve as the mountpoint, actually
mounting cgroups is left to the launcher/userspace. This is consistent
with Linux behaviour.
Without this mountpoint, getdents(2) on /sys/fs indicates an empty
directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The
launcher can't create the mountpoint directory since sysfs doesn't
support mkdir.
PiperOrigin-RevId: 398596698
|
|
PiperOrigin-RevId: 398572735
|
|
...instead of an address.
This allows a later change to more precisely select an address
based on the NAT type (source vs. destination NAT).
PiperOrigin-RevId: 398559901
|
|
PiperOrigin-RevId: 398559780
|
|
An ICMP endpoint's write path can use the datagram-based endpoint.
Updates #6565.
Test: Datagram-based generic socket + ICMP/ping syscall tests.
PiperOrigin-RevId: 398539844
|
|
...to make it clear what arguments are needed per hook.
PiperOrigin-RevId: 398538776
|