Age | Commit message (Collapse) | Author |
|
Now that we use x/sys/unix beyond https://golang.org/cl/313690 we always use
accept4 in place of accept.
PiperOrigin-RevId: 404265340
|
|
This prevents reaping connections unnecessarily early. This change both moves
the state update to the beginning of handlePacket and fixes a bug where
un-finalized connections could become un-reapable.
Fixes #6748
PiperOrigin-RevId: 404141012
|
|
- We should be using a monotonic clock
- This will make future testing easier
Updates #6748.
PiperOrigin-RevId: 404072318
|
|
Updates #1035
PiperOrigin-RevId: 404072231
|
|
Updates #1035
PiperOrigin-RevId: 404043283
|
|
The in-progress Go 1.18's testing.corpusEntry changed definition slightly in
https://golang.org/cl/354632. Update our definition to the new version.
PiperOrigin-RevId: 404040853
|
|
PiperOrigin-RevId: 404025736
|
|
Updates #1035
PiperOrigin-RevId: 404017795
|
|
Fixes #6590
PiperOrigin-RevId: 404007524
|
|
https://github.com/dominikh/go-tools/issues/924 has been fixed.
PiperOrigin-RevId: 403485831
|
|
PiperOrigin-RevId: 403479257
|
|
Implement WriteRawPacket for pipe by calling `DeliverNetworkPacket`
on the other end with empty values for the route and protocol number,
and relies on the `NetworkDispatcher` to decapsulate the link layer
header from the raw packet itself.
PiperOrigin-RevId: 403461448
|
|
tcpip.Error does not implement error and thus cannot be used with %w.
This was flagged by nogo.
PiperOrigin-RevId: 403458480
|
|
gVisor was previously reporting the lower of cgroup limit or 2GB as total
memory. This may cause applications to make bad decisions based on amount
of memory available to them when more than 2GB is required.
This change makes the lower of cgroup limit or the host total memory to be
reported inside the sandbox. This also is more inline with docker which always
reports host total memory. Note that reporting cgroup limit is strictly better
than host total memory when there is a limit set.
Fixes #5608
PiperOrigin-RevId: 403241608
|
|
PiperOrigin-RevId: 403241314
|
|
PiperOrigin-RevId: 403214414
|
|
PiperOrigin-RevId: 402995191
|
|
Use route/protocol from packetbuffer.
Sharedmem implementation should use the EgressRoute/NetworkProtocolNumber
embedded in the packetbuffer rather than what is passed as parameters to
Write(Raw)Packet(s).
PiperOrigin-RevId: 402934171
|
|
These can be used by applications to manipulate iptables rules without enabling
arbitrary reads from and writes to the underlying packet socket.
PiperOrigin-RevId: 402924733
|
|
Before cl/402392291 and cl/402614820, it worked without any problem.
In this case, we just ignore a cgroup configuration. We do the same thing,
when we don't have permissions to create new cgroups on cgroupV1.
PiperOrigin-RevId: 402913129
|
|
...since direction can only hold one of two possible values.
PiperOrigin-RevId: 402855698
|
|
This CL allows both SNAT and DNAT targets to be performed on the same
packet.
Fixes #5696.
PiperOrigin-RevId: 402714738
|
|
PiperOrigin-RevId: 402705397
|
|
Fixes #6725
PiperOrigin-RevId: 402683244
|
|
This change also refactors the conntrack packet handling code
to not perform the actual rewriting of the packet while holding
the lock.
This change prepares for a followup CL that adds support for twice-NAT.
Updates #5696.
PiperOrigin-RevId: 402671685
|
|
We don't want the read to block and want to test that epoll_wait returns only
when there is data available in rfd to be read.
PiperOrigin-RevId: 402631091
|
|
- Don't attempt to create directory is controller is not
present in the system
- Ensure that all files being written exist in cgroupfs
- Attempt to delete directories during Uninstall even if
other deletions have failed
Fixes #6446
PiperOrigin-RevId: 402614820
|
|
Prior to cl/318010298, //pkg/state couldn't handle pointers to struct fields,
which meant that it couldn't handle intrusive linked lists, which meant that it
couldn't handle waiter.Queue, which meant that it couldn't handle epoll. As a
result, VFS1 unregisters all epoll waiters before saving and re-registers them
after loading, and waitable VFS1 file implementations tag their waiter.Queues
state:"nosave" (causing them to be skipped by the save/restore machinery) or
state:"zerovalue" (causing them to only be checked for zero-value-equality on
save).
VFS2 required cl/318010298 to support save/restore (due to the Impl inheritance
pattern used by vfs.FileDescription, vfs.Dentry, etc.); correspondingly, VFS2
epoll assumes that waiter.Queues *will be* saved and loaded correctly, and VFS2
file implementations do not tag waiter.Queues.
Some waiter.Queues, e.g. pipe.Pipe.Queue and kernel.Task.signalQueue, are used
by both VFS1 and VFS2 (the latter via signalfd); as a result of the above,
tagging these Queues state:"nosave" or state:"zerovalue" breaks VFS2 epoll.
Remove VFS1 epoll unregistration before saving (bringing it in line with VFS2),
and remove these tags from all waiter.Queues.
Also clean up after the epoll test added by cl/402323053, which implied this
issue (by instantiating DisableSave in the new test) without reporting it.
PiperOrigin-RevId: 402596216
|
|
PiperOrigin-RevId: 402468096
|
|
Tools (e.g. cAdvisor) watches for changes inside /sys/fs/cgroup to detect
when containers are created and deleted. With gVisor, container cgroups were
not created because the containers are not visible to the host.
This change enables the creation of [empty] subcontainer cgroups that can
be used by tools to detect creation/deletion of subcontainers. This change
required a new annotation to be added so that the shim can communicate the
pod cgroup path to runsc, so pod and container cgroups can be identified,
Fixes #6500
PiperOrigin-RevId: 402392291
|
|
We already have integration tests `make iptables-tests` that tests
the REDIRECT target, but unit tests are a lot faster and easier
to run than the integration test.
PiperOrigin-RevId: 402365412
|
|
Updates #1584, #3556.
PiperOrigin-RevId: 402354066
|
|
PiperOrigin-RevId: 402323053
|
|
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go
will not clobber floating point registers before/after calling them
respectively. Due to regabig in Go 1.17, this is no longer the case; regabig
(among other things) maintains a zeroed XMM15 during ABIInternal execution,
including by zeroing it after ABI0-to-ABIInternal transitions. In
ring0.sysenter/exception, this happens in
ring0.kernelSyscall/kernelException.abi0 respectively; in
ring0.CPU.SwitchToUser, this happens after returning from
ring0.sysret/iret.abi0. Delete these functions and do floating point save/load
in assembly.
While arm64 doesn't appear to be immediately affected (so this CL permits us to
resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be
incorrect for the same fundamental reason (Go code can't sanely assume what
registers the Go compiler will or won't use) and should be fixed eventually.
PiperOrigin-RevId: 401895658
|
|
listXattr() was doing redundant work. Remove it.
PiperOrigin-RevId: 401871315
|
|
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer
process when the gofer filesystem is the lower layer of an overlay.
The overlay filesystem aggressively queries for "trusted.overlay.opaque" which
in practice is never found in the lower layer gofer. But leads to a lot of
wasted work.
A consequence is that mutable gofer upper layer is not supported anymore but
that is still consistent with VFS1. We can revisit when need arises.
PiperOrigin-RevId: 401860585
|
|
The same create/write/read pattern is copied around several places. It's easier
to understand in a package with names and comments, and we can reuse the smart
blocking code in package rawfile.
PiperOrigin-RevId: 401647108
|
|
- Implements RFC 3522 (Eifel detection algorithm) to detect if the connection
entered loss recovery unnecessarily.
- Added a new metric to count the total number of spurious loss recoveries.
- Added tests to verify the new metric.
PiperOrigin-RevId: 401637359
|
|
A libpcap change broke tcpdump support in gvisor. As a result tcpdump
w/ libpcap> 1.10 fails with gvisor.
Updates #6664, #6699
PiperOrigin-RevId: 401633061
|
|
PiperOrigin-RevId: 401624134
|
|
PiperOrigin-RevId: 401620449
|
|
TestRACKWithWindowFull was sending ACK for the last packet to avoid TLP. But,
sometimes the ACK is delayed and the sender sends the re-transmitted packet
before receiving ACK.
The test is now modified to expect the re-transmitted packet always and then
send a DSACK to avoid entering recovery.
Before: http://sponge2/6473db18-137a-4afb-9d60-c3eafd236ea9
After: http://sponge2/6a0f744c-7ea3-40fa-8f76-68503bf142ca
PiperOrigin-RevId: 401606848
|
|
Rather than boiling down to an integer eagerly, do it as late as possible.
PiperOrigin-RevId: 401599308
|
|
Go1.18 is changing the signature of testing.MainStart. To ensure compatibility,
we wrap MainStart and different implementations for versions before/after
Go1.18.
PiperOrigin-RevId: 401362668
|
|
...all connections should be tracked by ConnTrack, so create a no-op
connection entry on the first hook into IPTables (Prerouting or
Output) and let NAT targets modify the connection entry if they
need to instead of letting the NAT target create their own connection
entry.
This also prepares for "twice-NAT" where a packet may have both DNAT and
SNAT performed on it (which requires the ability to update ConnTrack
entries).
Updates #5696.
PiperOrigin-RevId: 401360377
|
|
PiperOrigin-RevId: 401296116
|
|
PiperOrigin-RevId: 401278851
|
|
Linux prevents setting `IP_MULTICAST_IF` to a device when
`SO_BINDTODEVICE` is set to another device. gVsior allows this.
PiperOrigin-RevId: 401267471
|
|
gVisor is a strong host, preventing packets from being sent from a
device using the another device's address as the source. Linux is a weak
host which allows this.
Updates #6686.
PiperOrigin-RevId: 401260128
|
|
PiperOrigin-RevId: 401251635
|