summaryrefslogtreecommitdiffhomepage
path: root/pkg
AgeCommit message (Collapse)Author
2021-10-11Support DNAT targetGhanan Gowripalan
PiperOrigin-RevId: 402468096
2021-10-11Create subcontainer cgroups for compatibilityFabricio Voznika
Tools (e.g. cAdvisor) watches for changes inside /sys/fs/cgroup to detect when containers are created and deleted. With gVisor, container cgroups were not created because the containers are not visible to the host. This change enables the creation of [empty] subcontainer cgroups that can be used by tools to detect creation/deletion of subcontainers. This change required a new annotation to be added so that the shim can communicate the pod cgroup path to runsc, so pod and container cgroups can be identified, Fixes #6500 PiperOrigin-RevId: 402392291
2021-10-11Add unit test for Redirect targetGhanan Gowripalan
We already have integration tests `make iptables-tests` that tests the REDIRECT target, but unit tests are a lot faster and easier to run than the integration test. PiperOrigin-RevId: 402365412
2021-10-11Support IP_PKTINFO and IPV6_RECVPKTINFO on raw socketsGhanan Gowripalan
Updates #1584, #3556. PiperOrigin-RevId: 402354066
2021-10-11Merge pull request #6428 from dillanzhou:fix_epoll_vfs2gVisor bot
PiperOrigin-RevId: 402323053
2021-10-08Remove ring0 floating point save/load functions on amd64.Jamie Liu
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go will not clobber floating point registers before/after calling them respectively. Due to regabig in Go 1.17, this is no longer the case; regabig (among other things) maintains a zeroed XMM15 during ABIInternal execution, including by zeroing it after ABI0-to-ABIInternal transitions. In ring0.sysenter/exception, this happens in ring0.kernelSyscall/kernelException.abi0 respectively; in ring0.CPU.SwitchToUser, this happens after returning from ring0.sysret/iret.abi0. Delete these functions and do floating point save/load in assembly. While arm64 doesn't appear to be immediately affected (so this CL permits us to resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be incorrect for the same fundamental reason (Go code can't sanely assume what registers the Go compiler will or won't use) and should be fixed eventually. PiperOrigin-RevId: 401895658
2021-10-08Remove redundant slice copy in lisafs gofer client.Ayush Ranjan
listXattr() was doing redundant work. Remove it. PiperOrigin-RevId: 401871315
2021-10-08Disallow "trusted" namespace xattr in VFS2 gofer client.Ayush Ranjan
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer process when the gofer filesystem is the lower layer of an overlay. The overlay filesystem aggressively queries for "trusted.overlay.opaque" which in practice is never found in the lower layer gofer. But leads to a lot of wasted work. A consequence is that mutable gofer upper layer is not supported anymore but that is still consistent with VFS1. We can revisit when need arises. PiperOrigin-RevId: 401860585
2021-10-07add convenient wrapper for eventfdKevin Krakauer
The same create/write/read pattern is copied around several places. It's easier to understand in a package with names and comments, and we can reuse the smart blocking code in package rawfile. PiperOrigin-RevId: 401647108
2021-10-07Add a new metric to detect the number of spurious loss recoveries.Nayana Bidari
- Implements RFC 3522 (Eifel detection algorithm) to detect if the connection entered loss recovery unnecessarily. - Added a new metric to count the total number of spurious loss recoveries. - Added tests to verify the new metric. PiperOrigin-RevId: 401637359
2021-10-07tests: use a proper path to the kvm deviceAndrei Vagin
PiperOrigin-RevId: 401624134
2021-10-07Track UDP packets performing REDIRECT NATGhanan Gowripalan
PiperOrigin-RevId: 401620449
2021-10-07Modify the TCP test to receive re-transmitted packet before sending ACK.Nayana Bidari
TestRACKWithWindowFull was sending ACK for the last packet to avoid TLP. But, sometimes the ACK is delayed and the sender sends the re-transmitted packet before receiving ACK. The test is now modified to expect the re-transmitted packet always and then send a DSACK to avoid entering recovery. Before: http://sponge2/6473db18-137a-4afb-9d60-c3eafd236ea9 After: http://sponge2/6a0f744c-7ea3-40fa-8f76-68503bf142ca PiperOrigin-RevId: 401606848
2021-10-07Store timestamps as time.TimeTamir Duberstein
Rather than boiling down to an integer eagerly, do it as late as possible. PiperOrigin-RevId: 401599308
2021-10-06Create null entry connection on first IPTables hookGhanan Gowripalan
...all connections should be tracked by ConnTrack, so create a no-op connection entry on the first hook into IPTables (Prerouting or Output) and let NAT targets modify the connection entry if they need to instead of letting the NAT target create their own connection entry. This also prepares for "twice-NAT" where a packet may have both DNAT and SNAT performed on it (which requires the ability to update ConnTrack entries). Updates #5696. PiperOrigin-RevId: 401360377
2021-10-06Add global lisafs kernel flag.Ayush Ranjan
PiperOrigin-RevId: 401296116
2021-10-05Merge pull request #6687 from zchee:atomicbitops-nosplitgVisor bot
PiperOrigin-RevId: 401152818
2021-10-05Add server implementation for sharedmem endpoints.Bhasker Hariharan
PiperOrigin-RevId: 401088040
2021-10-04Reply to invalid ACKs even when accept queue is fullArthur Sfez
Before checking if there is space in the accept queue, the listener should verify that the cookie is valid. If it is not, instead of silently dropping the packet, reply with an RST. Fixes #6683 PiperOrigin-RevId: 400807346
2021-10-04No split to assembly and noasm functions on atomicbitops packageKoichi Shiraishi
Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>
2021-10-01Read lock when getting connectionsGhanan Gowripalan
We should avoid taking the write lock to avoid contention when looking for a packet's tracked connection. No need to reap timed out connections when looking for connections as the reaper (which runs periodically) will handle that. PiperOrigin-RevId: 400322514
2021-10-01Drop ConnTrack.handlePacketGhanan Gowripalan
Move the hook specific logic to the IPTables hook functions. This lets us avoid having to perform checks on the hook to determine what action to take. Later changes will drop the need for handlePacket's return value, reducing the value of this function that all hooks call into. PiperOrigin-RevId: 400298023
2021-10-01Drop conn.tcbHookGhanan Gowripalan
...as the packet's direction gives us the information that tcbHook is used to derive. PiperOrigin-RevId: 400280102
2021-10-01Annotate checklocks on mutex protected fieldsGhanan Gowripalan
...to catch lock-related bugs in nogo tests. Updates #6566. PiperOrigin-RevId: 400265818
2021-10-01Merge pull request #6551 from sudo-sturbia:msgqueue/procfsgVisor bot
PiperOrigin-RevId: 400258924
2021-10-01Drop IPTables.checkPacketsGhanan Gowripalan
...and have `CheckOutputPackets`, `CheckPostroutingPackets` call their equivalent methods that operate on a single packet buffer directly. This is so that the `Check{Output, Postrouting}Packets` methods may leverage any hook-specific work that `Check{Output, Postrouting}` may perform. Note: Later changes will add hook-specific logic to the `Check{Output, Postrouting}` methods. PiperOrigin-RevId: 400255651
2021-10-01Let connection handle tracked packetsGhanan Gowripalan
...to save a call to `ConnTrack.connFor` when callers already have a reference to the ConnTrack entry. PiperOrigin-RevId: 400244955
2021-10-01Move pendingEndpoints to acceptQueueTamir Duberstein
This obsoletes the need for the pendingMu and pending, since they are redundant with acceptMu and pendingAccepted. Fixes #6671. PiperOrigin-RevId: 400162391
2021-09-30kernel: print PID in addition to TID in task log messagesAndrei Vagin
For multithreads processes, it is hard to read logs without knowing task pids. And let's print a decimal return codeo for syscalls. A hex return code are usefull for system calls that return addresses. For other syscalls, the decimal form is more readable. PiperOrigin-RevId: 400035449
2021-09-29Avoid comparisons to zero value of acceptQueueTamir Duberstein
PiperOrigin-RevId: 399765414
2021-09-29Rename accepted -> acceptQueueTamir Duberstein
Rename cap -> capacity to avoid collision with the builtin. PiperOrigin-RevId: 399753630
2021-09-29Remove syncRcvdCountTamir Duberstein
This is redundant with listenContext.pendingEndpoints PiperOrigin-RevId: 399722472
2021-09-28Move `safecopy.ReplaceSignalHandler` into `sighandling` package.Etienne Perot
PiperOrigin-RevId: 399560357
2021-09-28Inline handleSynSegmentTamir Duberstein
This function has only one caller. Remove segment reference count manipulation since it is only used synchronously. PiperOrigin-RevId: 399525343
2021-09-28Support naive Masquerade NAT targetGhanan Gowripalan
* Does not accept a port range (Issue #5772). * Does not support checking for tuple conflits (Issue #5773). PiperOrigin-RevId: 399524088
2021-09-27Move `sighandling` package out of `sentry`.Etienne Perot
PiperOrigin-RevId: 399295737
2021-09-27Implement S/R for StatsTamir Duberstein
PiperOrigin-RevId: 399276940
2021-09-27Prevent PacketData from being modified.Ayush Ranjan
PacketData should not be modified and should be treated readonly because it represents packet payload. The old DeleteFront method allowed callers to modify the underlying buffer which should not be allowed. Added a way to consume from the PacketData instead of deleting from it. Updated call points to use that instead. Reported-by: syzbot+faee5cb350f769a52d1b@syzkaller.appspotmail.com PiperOrigin-RevId: 399268473
2021-09-27Store pending endpoints in a setTamir Duberstein
There's no need for synthetic keys here. PiperOrigin-RevId: 399263134
2021-09-27Add procfs files for SysV message queues.Zyad A. Ali
2021-09-24Update the comment for Task.netnsAndrei Vagin
Task.netns can be accessed atomically, so Task.mu isn't needed to access it. PiperOrigin-RevId: 398773947
2021-09-24Merge pull request #6647 from avagin:task-netnsgVisor bot
PiperOrigin-RevId: 398763161
2021-09-23Allow lisafs client to send more data than MaxMessageSize using chunks.Ayush Ranjan
The p9 client does the same. This allows applications to read/write >= 2MB of data. This enables the read write benchmarks to work with lisafs. Updates #5466 PiperOrigin-RevId: 398659947
2021-09-23kernel: allow to access Task.netns without taking Task.muAndrei Vagin
This allows to avoind unnecessary lock-ordering dependencies on task.mu.
2021-09-23Create the cgroupfs mount point in sysfs.Rahat Mahmood
Create the /sys/fs/cgroup directory when cgroups are available. This creates the empty directory to serve as the mountpoint, actually mounting cgroups is left to the launcher/userspace. This is consistent with Linux behaviour. Without this mountpoint, getdents(2) on /sys/fs indicates an empty directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The launcher can't create the mountpoint directory since sysfs doesn't support mkdir. PiperOrigin-RevId: 398596698
2021-09-23Merge pull request #6573 from avagin:kvm-seccomp-mmapgVisor bot
PiperOrigin-RevId: 398572735
2021-09-23Pass AddressableEndpoint to IPTablesGhanan Gowripalan
...instead of an address. This allows a later change to more precisely select an address based on the NAT type (source vs. destination NAT). PiperOrigin-RevId: 398559901
2021-09-23Implement S/R for TransportEndpointStatsTamir Duberstein
PiperOrigin-RevId: 398559780
2021-09-23Compose ICMP endpoint with datagram-based endpointGhanan Gowripalan
An ICMP endpoint's write path can use the datagram-based endpoint. Updates #6565. Test: Datagram-based generic socket + ICMP/ping syscall tests. PiperOrigin-RevId: 398539844
2021-09-23Introduce method per iptables hookGhanan Gowripalan
...to make it clear what arguments are needed per hook. PiperOrigin-RevId: 398538776