Age | Commit message (Collapse) | Author |
|
...so a NAT-ed connection's socket can handle ICMP errors.
Updates #5916.
PiperOrigin-RevId: 406270868
|
|
kernel/time.Timer allocation is expensive and not sync.Poolable (since
time.Timer only supports notification through a channel, requiring a goroutine
to receive from the channel, and sync.Pool doesn't invoke any kind of cleanup
on discarded items in the pool so it would leak timer goroutines). Using the
existing Task.blockingTimer for nanosleep(), and applicable cases in
clock_nanosleep(), at least avoids Timer allocation in common cases.
PiperOrigin-RevId: 406248394
|
|
PiperOrigin-RevId: 406027220
|
|
Reported-by: syzbot+39d434b96cf7c29a66ad@syzkaller.appspotmail.com
Reported-by: syzbot+7c38bce6353d91facca3@syzkaller.appspotmail.com
PiperOrigin-RevId: 406024052
|
|
When transmitting packets we only need to notify if the peer is not
already processing packets. sharedData region is used to enable/disable
notifications and the peer will disable notifications when its actively
processing packets and enable notifications just before it goes to
sleep waiting on packets. This allows more efficient transmit as the
sharedmem endpoint does not need to notify on eventFD and incur an
expensive host systemcall when the peer is already awake.
PiperOrigin-RevId: 406018843
|
|
Connection tracking is agnostic to whether the packet is inbound or outbound. It
cares who initiated the connection. The naming can get confusing as conntrack
can track connections originating from any host.
Part of resolving #6736.
PiperOrigin-RevId: 405997540
|
|
...so a NAT-ed connection's socket can handle ICMP errors.
Updates #5916.
PiperOrigin-RevId: 405970089
|
|
Previously, we recorded a single aggregated count. These per-protocol counts
can help us debug field issues when frames are dropped for this reason.
PiperOrigin-RevId: 405913911
|
|
vfs.NewDisconnectedMount has no error paths. Its much prettier without the
error return value.
Also simplify MountDisconnected which would immediately drop the refs taken by
NewDisconnectedMount. Instead make it directly call newMount.
PiperOrigin-RevId: 405767966
|
|
A header can't be smaller than header.ICMPv4MinimumSize.
Reported-by: syzbot+57b68b14b4f6a58bf985@syzkaller.appspotmail.com
PiperOrigin-RevId: 405748438
|
|
Right now, each vdso call triggers vmexit. VDSO and VVAR pages are
mapped with VM_IO and get_user_pages fails for such vma-s. KVM was not
able to handle this case up to the v4.8 kernel. This problem was fixed by
add6a0cd1c5ba ("KVM: MMU: try to fix up page faults before giving up").
For some unknown reasons, it still doesn't work in case of nested
virtualization.
Before:
BenchmarkKernelVDSO-6 252519 4598 ns/op
After:
BenchmarkKernelVDSO-6 34431957 34.91 ns/op
PiperOrigin-RevId: 405715941
|
|
As documented in FilesystemType.GetFilesystem, a reference should be taken on
the returned dentry and filesystem by GetFilesystem implementation. mqfs did
not do that.
Additionally cleanup and clarify ref counting of dentry, filesystem and mount
in mqfs.
Reported-by: syzbot+a2c54bfb6e1525228e5f@syzkaller.appspotmail.com
Reported-by: syzbot+ccd305cdab11cfebbfff@syzkaller.appspotmail.com
PiperOrigin-RevId: 405700565
|
|
PiperOrigin-RevId: 405698863
|
|
eventfd.Notify() uses unix.Write which will eventually
call unix.Syscall which will yield the current go processor
resulting in the Go scheduler parking the current goroutine
till the syscall returns.
But in most cases where Notify() is called there is no reason
to yield as the caller probably wants to continue doing something
right afterwards. Like in the case of the sharedmem endpoint
which may still have more packets to write.
PiperOrigin-RevId: 405693801
|
|
VFS1 discards the value of f_namelen returned by the filesystem and returns
NAME_MAX unconditionally instead, so it doesn't run into this. Also set
f_frsize for completeness.
PiperOrigin-RevId: 405579707
|
|
As caught by syzkaller, we were leaking non-permission bits while passing the
user generated mode. DynamicBytesFile panics in this case.
Reported-by: syzbot+5abe52d47d56a5a98c89@syzkaller.appspotmail.com
PiperOrigin-RevId: 405481392
|
|
"cri.runtimeoptions.v1" moved to "runtimeoptions.v1" and containerd
configuration format version 2 is required.
Updates #6449
PiperOrigin-RevId: 405474653
|
|
|
|
|
|
PiperOrigin-RevId: 404901660
|
|
Updates #6441,#6317
PiperOrigin-RevId: 404872327
|
|
When file corruption is detected, report vfs.ErrCorruption to
distinguish corruption error from other restore errors.
Updates #1035
PiperOrigin-RevId: 404588445
|
|
..including ICMP headers before delivering them to the
TransportDispatcher.
Updates #3810.
PiperOrigin-RevId: 404404002
|
|
PiperOrigin-RevId: 404400399
|
|
lisafs.ClientFile.MkdirAt is allowed to return a non-nil Inode and a non-nil
error on an RPC error. The caller must not use the returned (invalid) Inode on
error. But a code path in the gofer client does end up using it.
More specifically, when the Mkdir RPC fails and we end up creating a synthetic
dentry for a mountpoint, we end up returning the (invalid) non-nil Inode to
filesystem.doCreateAt implementation which thinks that a remote file was
created. But that non-nil Inode is actually invalid because the RPC failed.
Things go downhill from there.
Update client to not use childDirInode if RPC failed.
PiperOrigin-RevId: 404396573
|
|
Reaping an expired tuple removes it from its bucket so we need to grab
the succeeding tuple in the bucket before reaping the expired tuple.
Before this change, only the first expired tuple in a bucket was reaped
per reaper run on the bucket. This change just allows more connections
to be reaped.
PiperOrigin-RevId: 404392925
|
|
PiperOrigin-RevId: 404382475
|
|
This prevents reaping connections unnecessarily early. This change both moves
the state update to the beginning of handlePacket and fixes a bug where
un-finalized connections could become un-reapable.
Fixes #6748
PiperOrigin-RevId: 404141012
|
|
- We should be using a monotonic clock
- This will make future testing easier
Updates #6748.
PiperOrigin-RevId: 404072318
|
|
Updates #1035
PiperOrigin-RevId: 404072231
|
|
Fixes #6590
PiperOrigin-RevId: 404007524
|
|
PiperOrigin-RevId: 403479257
|
|
Implement WriteRawPacket for pipe by calling `DeliverNetworkPacket`
on the other end with empty values for the route and protocol number,
and relies on the `NetworkDispatcher` to decapsulate the link layer
header from the raw packet itself.
PiperOrigin-RevId: 403461448
|
|
gVisor was previously reporting the lower of cgroup limit or 2GB as total
memory. This may cause applications to make bad decisions based on amount
of memory available to them when more than 2GB is required.
This change makes the lower of cgroup limit or the host total memory to be
reported inside the sandbox. This also is more inline with docker which always
reports host total memory. Note that reporting cgroup limit is strictly better
than host total memory when there is a limit set.
Fixes #5608
PiperOrigin-RevId: 403241608
|
|
PiperOrigin-RevId: 403214414
|
|
Use route/protocol from packetbuffer.
Sharedmem implementation should use the EgressRoute/NetworkProtocolNumber
embedded in the packetbuffer rather than what is passed as parameters to
Write(Raw)Packet(s).
PiperOrigin-RevId: 402934171
|
|
These can be used by applications to manipulate iptables rules without enabling
arbitrary reads from and writes to the underlying packet socket.
PiperOrigin-RevId: 402924733
|
|
...since direction can only hold one of two possible values.
PiperOrigin-RevId: 402855698
|
|
This CL allows both SNAT and DNAT targets to be performed on the same
packet.
Fixes #5696.
PiperOrigin-RevId: 402714738
|
|
Fixes #6725
PiperOrigin-RevId: 402683244
|
|
This change also refactors the conntrack packet handling code
to not perform the actual rewriting of the packet while holding
the lock.
This change prepares for a followup CL that adds support for twice-NAT.
Updates #5696.
PiperOrigin-RevId: 402671685
|
|
- Don't attempt to create directory is controller is not
present in the system
- Ensure that all files being written exist in cgroupfs
- Attempt to delete directories during Uninstall even if
other deletions have failed
Fixes #6446
PiperOrigin-RevId: 402614820
|
|
Prior to cl/318010298, //pkg/state couldn't handle pointers to struct fields,
which meant that it couldn't handle intrusive linked lists, which meant that it
couldn't handle waiter.Queue, which meant that it couldn't handle epoll. As a
result, VFS1 unregisters all epoll waiters before saving and re-registers them
after loading, and waitable VFS1 file implementations tag their waiter.Queues
state:"nosave" (causing them to be skipped by the save/restore machinery) or
state:"zerovalue" (causing them to only be checked for zero-value-equality on
save).
VFS2 required cl/318010298 to support save/restore (due to the Impl inheritance
pattern used by vfs.FileDescription, vfs.Dentry, etc.); correspondingly, VFS2
epoll assumes that waiter.Queues *will be* saved and loaded correctly, and VFS2
file implementations do not tag waiter.Queues.
Some waiter.Queues, e.g. pipe.Pipe.Queue and kernel.Task.signalQueue, are used
by both VFS1 and VFS2 (the latter via signalfd); as a result of the above,
tagging these Queues state:"nosave" or state:"zerovalue" breaks VFS2 epoll.
Remove VFS1 epoll unregistration before saving (bringing it in line with VFS2),
and remove these tags from all waiter.Queues.
Also clean up after the epoll test added by cl/402323053, which implied this
issue (by instantiating DisableSave in the new test) without reporting it.
PiperOrigin-RevId: 402596216
|
|
PiperOrigin-RevId: 402468096
|
|
Tools (e.g. cAdvisor) watches for changes inside /sys/fs/cgroup to detect
when containers are created and deleted. With gVisor, container cgroups were
not created because the containers are not visible to the host.
This change enables the creation of [empty] subcontainer cgroups that can
be used by tools to detect creation/deletion of subcontainers. This change
required a new annotation to be added so that the shim can communicate the
pod cgroup path to runsc, so pod and container cgroups can be identified,
Fixes #6500
PiperOrigin-RevId: 402392291
|
|
We already have integration tests `make iptables-tests` that tests
the REDIRECT target, but unit tests are a lot faster and easier
to run than the integration test.
PiperOrigin-RevId: 402365412
|
|
Updates #1584, #3556.
PiperOrigin-RevId: 402354066
|
|
PiperOrigin-RevId: 402323053
|
|
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go
will not clobber floating point registers before/after calling them
respectively. Due to regabig in Go 1.17, this is no longer the case; regabig
(among other things) maintains a zeroed XMM15 during ABIInternal execution,
including by zeroing it after ABI0-to-ABIInternal transitions. In
ring0.sysenter/exception, this happens in
ring0.kernelSyscall/kernelException.abi0 respectively; in
ring0.CPU.SwitchToUser, this happens after returning from
ring0.sysret/iret.abi0. Delete these functions and do floating point save/load
in assembly.
While arm64 doesn't appear to be immediately affected (so this CL permits us to
resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be
incorrect for the same fundamental reason (Go code can't sanely assume what
registers the Go compiler will or won't use) and should be fixed eventually.
PiperOrigin-RevId: 401895658
|
|
listXattr() was doing redundant work. Remove it.
PiperOrigin-RevId: 401871315
|