Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 356762859
|
|
The limits for snd/rcv buffers for unix domain socket is controlled by the
following sysctls on linux
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
Today in gVisor we do not expose these sysctls but we do support setting the
equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor
can be used without netstack, with hostinet or even without any networking stack
at all. Which means ideally these sysctls need to live as globals in gVisor.
But rather than make this a big change for now we hardcode the limits in the
AF_UNIX implementation itself (which in itself is better than where we were
before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial
limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use
today.
Updates #5132
PiperOrigin-RevId: 356665498
|
|
PiperOrigin-RevId: 356645022
|
|
Utilities written to be common across IPv4/IPv6 are not planned to be
available for public use.
https://golang.org/doc/go1.4#internalpackages
PiperOrigin-RevId: 356554862
|
|
... as per RFC 7048. The Failed state is an internal state that is not
specified by any RFC; replacing it with the Unreachable state enables us to
expose this state while keeping our terminology consistent with RFC 4861 and
RFC 7048.
Unreachable state replaces all internal references for Failed state. However
unlike the Failed state, change events are dispatched when moving into
Unreachable state. This gives developers insight into whether a neighbor entry
failed address resolution or whether it was explicitly removed.
The Failed state will be removed entirely once all references to it are
removed. This is done to avoid a Fuchsia roll failure.
Updates #4667
PiperOrigin-RevId: 356554104
|
|
PiperOrigin-RevId: 356536548
|
|
Reported-by: syzbot+9ffc71246fe72c73fc25@syzkaller.appspotmail.com
PiperOrigin-RevId: 356536113
|
|
PiperOrigin-RevId: 356450303
|
|
IPv4 forwarding and reassembly needs support for option processing
and regular processing also needs options to be processed before
being passed to the transport layer. This patch extends option processing
to those cases and provides additional testing. A small change to the ICMP
error generation API code was required to allow it to know when a packet was
being forwarded or not.
Updates #4586
PiperOrigin-RevId: 356446681
|
|
The thing the lock protects will never be accessed concurrently.
PiperOrigin-RevId: 356423331
|
|
We previously return EINVAL when connecting to port 0, however this is not the
observed behavior on Linux. One of the observable effects after connecting to
port 0 on Linux is that getpeername() will fail with ENOTCONN.
PiperOrigin-RevId: 356413451
|
|
Reported-by: syzbot+d54bc27a15aefe52c330@syzkaller.appspotmail.com
PiperOrigin-RevId: 356406975
|
|
...as long as the network protocol supports duplicate address detection.
This CL provides the facilities for a netstack integrator to perform
DAD.
DHCP recommends that clients effectively perform DAD before accepting an
offer. As per RFC 2131 section 4.4.1 pg 38,
The client SHOULD perform a check on the suggested address to ensure
that the address is not already in use. For example, if the client
is on a network that supports ARP, the client may issue an ARP request
for the suggested request.
The implementation of ARP-based IPv4 DAD effectively operates the same
as IPv6's NDP DAD - using ARP requests and responses in place of
NDP neighbour solicitations and advertisements, respectively.
DAD performed by calls to (*Stack).CheckDuplicateAddress don't interfere
with DAD performed when a new IPv6 address is added. This is so that
integrator requests to check for duplicate addresses aren't unexpectedly
aborted when addresses are removed.
A network package internal package provides protocol agnostic DAD state
management that specific protocols that provide DAD can use.
Fixes #4550.
Tests:
- internal/ip_test.*
- integration_test.TestDAD
- arp_test.TestDADARPRequestPacket
- ipv6.TestCheckDuplicateAddress
PiperOrigin-RevId: 356405593
|
|
This makes it easier to implement dynamically sized types in go-marshal. You
really only need to implement MarshalBytes, UnmarshalBytes and SizeBytes to
implement the entire interface.
By using the `dynamic` tag, the autogenerator will generate the rest of the
methods for us.
This change also simplifies how KernelIPTGetEntries implements Marshallable
using the newly added utility.
PiperOrigin-RevId: 356397114
|
|
Fixes a bug in our getsockopt(2) implementation which was incorrectly using
binary.Size() instead of Marshallable.SizeBytes().
PiperOrigin-RevId: 356396551
|
|
Detect packet loss using reorder window and re-transmit them after the reorder
timer expires.
PiperOrigin-RevId: 356321786
|
|
It was replaced by NUD/neighborCache.
Fixes #4658.
PiperOrigin-RevId: 356085221
|
|
Before this change, packets were delivered asynchronously to the remote
end of a pipe. This was to avoid a deadlock during link resolution where
the stack would attempt to double-lock a mutex (see removed comments in
the parent commit for details).
As of https://github.com/google/gvisor/commit/4943347137, we do not hold
locks while sending link resolution probes so the deadlock will no
longer occur.
PiperOrigin-RevId: 356066224
|
|
Previously when sending NDP DAD or RS messages, we would hold a shared
lock which lead to deadlocks (due to synchronous packet loooping
(e.g. pipe and loopback link endpoints)) and lock contention.
Writing packets may be an expensive operation which could prevent other
goroutines from doing meaningful work if a shared lock is held while
writing packets.
This change upates the NDP DAD/RS timers to not hold shared locks while
sending packets.
PiperOrigin-RevId: 356053146
|
|
The network endpoints only look for other network endpoints of the
same kind. Since the network protocols keeps track of all endpoints,
go through the protocol to find an endpoint with an address instead
of the stack.
PiperOrigin-RevId: 356051498
|
|
Previously when sending probe messages, we would hold a shared lock
which lead to deadlocks (due to synchronous packet loooping (e.g. pipe
and loopback link endpoints)) and lock contention.
Writing packets may be an expensive operation which could prevent other
goroutines from doing meaningful work if a shared lock is held while
writing packets.
This change upates the NUD timers to not hold shared locks while
sending packets.
PiperOrigin-RevId: 356048697
|
|
Also while I'm here, update neighbor cahce/entry tests to use the
stack's RNG instead of creating a neigbor cache/entry specific one.
PiperOrigin-RevId: 356040581
|
|
The NIC structure is not to be used outside of the stack package
directly.
PiperOrigin-RevId: 356036737
|
|
Network endpoints that wish to check addresses on another NIC-local
network endpoint may now do so through the NetworkInterface.
This fixes a lock ordering issue between NIC removal and link
resolution. Before this change:
NIC Removal takes the stack lock, neighbor cache lock then neighbor
entries' locks.
When performing IPv4 link resolution, we take the entry lock then ARP
would try check IPv4 local addresses through the stack which tries to
obtain the stack's lock.
Now that ARP can check IPv4 addreses through the NIC, we avoid the lock
ordering issue, while also removing the need for stack to lookup the
NIC.
PiperOrigin-RevId: 356034245
|
|
After IPTables checks a batch of packets, we can write packets that are
not dropped or locally destined as a batch instead of individually.
This previously caused a bug since WritePacket* functions expect to take
ownership of passed PacketBuffer{List}. WritePackets assumed the list of
PacketBuffers will not be invalidated when calling WritePacket for each
PacketBuffer in the list, but this is not true. WritePacket may add the
passed PacketBuffer into a different list which would modify the
PacketBuffer in such a way that it no longer points to the next
PacketBuffer to write.
Example: Given a PB list of
PB_a -> PB_b -> PB_c
WritePackets may be iterating over the list and calling WritePacket for
each PB. When WritePacket takes PB_a, it may add it to a new list which
would update pointers such that PB_a no longer points to PB_b.
Test: integration_test.TestIPTableWritePackets
PiperOrigin-RevId: 355969560
|
|
Panic seen at some code path like control.ExecAsync where
ctx does not have a Task.
Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com
PiperOrigin-RevId: 355960596
|
|
According to vfs.FilesystemImpl.RenameAt documentation:
- If the last path component in rp is "." or "..", and opts.Flags contains
RENAME_NOREPLACE, RenameAt returns EEXIST.
- If the last path component in rp is "." or "..", and opts.Flags does not
contain RENAME_NOREPLACE, RenameAt returns EBUSY.
Reported-by: syzbot+6189786e64fe13fe43f8@syzkaller.appspotmail.com
PiperOrigin-RevId: 355959266
|
|
Make it clear that failing to parse a looped back is not a packet
sending error but a malformed received packet error.
FindNetworkEndpoint returns nil when no network endpoint is found
instead of an error.
PiperOrigin-RevId: 355954946
|
|
PiperOrigin-RevId: 355751801
|
|
Some versions of the Go runtime call getcpu(), so add it for compatibility. The
hostcpu package already uses getcpu() on arm64.
PiperOrigin-RevId: 355717757
|
|
PiperOrigin-RevId: 355675900
|
|
Our implementation of vfs.CheckDeleteSticky was not consistent with Linux,
specifically not consistent with fs/linux.h:check_sticky().
One of the biggest differences was that the vfs implementation did not
allow the owner of the sticky directory to delete files inside it that belonged
to other users.
This change makes our implementation consistent with Linux.
Also adds an integration test to check for this. This bug is also present in
VFS1.
Updates #3027
PiperOrigin-RevId: 355557425
|
|
- Adds a function to enable RACK in tests.
- RACK update functions are guarded behind the flag tcpRecovery.
PiperOrigin-RevId: 355435973
|
|
Implement basic lazy save and restore for FPSIMD registers, which only
restore FPSIMD state on el0_fpsimd_acc and save FPSIMD state in switch().
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
In order to improve the performance and stability, I reorg 2 modules slightly.
arch: no red zone on Arm64.
ring0: use stp instead of movd, and set RSV_REG_APP=R19.
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
This was missed in cl/351911375; pipe.VFSPipeFD.SpliceFromNonPipe already calls
Notify.
PiperOrigin-RevId: 355246655
|
|
Rename HandleNDupAcks() to HandleLossDetected() as it will enter this when
is detected after:
- reorder window expires and TLP (in case of RACK)
- dupAckCount >= 3
PiperOrigin-RevId: 355237858
|
|
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.
This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.
Addresses #172.
PiperOrigin-RevId: 355229833
|
|
This allows the package to serve as a general purpose ring0 support package, as
opposed to being bound to specific sentry platforms.
Updates #5039
PiperOrigin-RevId: 355220044
|
|
Reported-by: syzbot+db8d83f93b84fcb84374@syzkaller.appspotmail.com
PiperOrigin-RevId: 355213994
|
|
Netstack today will send dupACK's with no rate limit for incoming out of
window segments. This can result in ACK loops for example if a TCP socket
connects to itself (actually permitted by TCP). Where the ACK sent in
response to packets being out of order itself gets considered as an out
of window segment resulting in another ACK being generated.
PiperOrigin-RevId: 355206877
|
|
* Make split safe.
* Enable looking up next valid address.
* Support mappings with !accessType.Any(), distinct from unmap.
These changes allow for the use of pagetables in low-level OS packages, such
as ring0, and allow for the use of pagetables for more generic address space
reservation (by writing entries with no access specified).
Updates #5039
PiperOrigin-RevId: 355109016
|
|
...to remove the need for the transport layer to deduce the type of
error it received.
Rename HandleControlPacket to HandleError as HandleControlPacket only
handles errors.
tcpip.SockError now holds a tcpip.SockErrorCause interface that
different errors can implement.
PiperOrigin-RevId: 354994306
|
|
This change flips gvisor to use Neighbor unreachability detection by
default to populate the neighbor table as defined by RFC 4861 section 7.
Although RFC 4861 is targeted at IPv6, the same algorithm is used for
link resolution on IPv4 networks using ARP.
Integrators may still use the legacy link address cache by setting
stack.Options.UseLinkAddrCache to true; stack.Options.UseNeighborCache
is now unused and will be removed.
A later change will remove linkAddrCache and associated code.
Updates #4658.
PiperOrigin-RevId: 354850531
|
|
PiperOrigin-RevId: 354827491
|
|
...in IPv6 ICMP tests.
A channel link endpoint's channel is closed when the link endpoint is
closed.
When the stack tries to send packets through a NIC with a closed channel
endpoint, a panic will occur when attempting to write to a closed
channel (https://golang.org/ref/spec#Close). To make sure the stack does
not try to send packets through a NIC, we remove it.
PiperOrigin-RevId: 354822085
|
|
This stores each protocol's neighbor state separately.
This change also removes the need for each neighbor entry to keep
track of their own link address resolver now that all the entries
in a cache will use the same resolver.
PiperOrigin-RevId: 354818155
|
|
The network endpoint should not need to have logic to handle different
kinds of neighbor tables. Network endpoints can let the NIC know about
differnt neighbor discovery messages and let the NIC decide which table
to update.
This allows us to remove the LinkAddressCache interface.
PiperOrigin-RevId: 354812584
|
|
PiperOrigin-RevId: 354746864
|
|
This removes the need to provide the link address request with the NIC
the request is being performed on since the NetworkEndpoints already
have a reference to the NIC.
PiperOrigin-RevId: 354721940
|