Age | Commit message (Collapse) | Author |
|
Control messages should be released on Read (which ignores the control message)
or zero-byte Send. Otherwise, open fds sent through the control messages will
be leaked.
PiperOrigin-RevId: 337110774
|
|
Adds support for the IPv6-compatible redirect target. Redirection is a limited
form of DNAT, where the destination is always the localhost.
Updates #3549.
PiperOrigin-RevId: 334698344
|
|
PiperOrigin-RevId: 334652998
|
|
PiperOrigin-RevId: 334531794
|
|
As per relevant IP RFCS (see code comments), broadcast (for IPv4) and
multicast addresses are not allowed. Currently checks for these are
done at the transport layer, but since it is explicitly forbidden at
the IP layers, check for them there.
This change also removes the UDP.InvalidSourceAddress stat since there
is no longer a need for it.
Test: ip_test.TestSourceAddressValidation
PiperOrigin-RevId: 334490971
|
|
Like matchers, targets should use a module-like register/lookup system. This
replaces the brittle switch statements we had before.
The only behavior change is supporing IPT_GET_REVISION_TARGET. This makes it
much easier to add IPv6 redirect in the next change.
Updates #3549.
PiperOrigin-RevId: 334469418
|
|
PiperOrigin-RevId: 334428344
|
|
PiperOrigin-RevId: 334263322
|
|
Updates #1663
PiperOrigin-RevId: 333539293
|
|
VFS2 socket record is not removed from the system-wide
socket table when the socket is released, which will lead
to a memory leak. This patch fixes this issue.
Fixes: #3874
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
|
PiperOrigin-RevId: 332760843
|
|
PiperOrigin-RevId: 332486383
|
|
`ip6tables -t filter` is now usable. NAT support will come in a future CL.
#3549
PiperOrigin-RevId: 332381801
|
|
SO_LINGER is a socket level option and should be stored on all endpoints even
though it is used to linger only for TCP endpoints.
PiperOrigin-RevId: 332369252
|
|
This change includes overlay, special regular gofer files, and hostfs.
Fixes #3589.
PiperOrigin-RevId: 332330860
|
|
This is required to make tcpdump work. tcpdump falls back to not using things
like PACKET_RX_RING if setsockopt returns ENOPROTOOPT. This used to be the case
before https://github.com/google/gvisor/commit/6f8fb7e0db2790ff1f5ba835780c03fe245e437f.
Fixes #3981
PiperOrigin-RevId: 332326517
|
|
PiperOrigin-RevId: 332097286
|
|
PiperOrigin-RevId: 331256608
|
|
The existing implementation for TransportProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.
This change introduces marker interfaces for transport protocol options
that may be set or queried which transport protocol option types
implement to ensure that invalid types are caught at compile time.
Different interfaces are used to allow the compiler to enforce read-only
or set-only socket options.
RELNOTES: n/a
PiperOrigin-RevId: 330559811
|
|
Accept on gVisor will return an error if a socket in the accept queue was closed
before Accept() was called. Linux will return the new fd even if the returned
socket is already closed by the peer say due to a RST being sent by the peer.
This seems to be intentional in linux more details on the github issue.
Fixes #3780
PiperOrigin-RevId: 329828404
|
|
PiperOrigin-RevId: 329526153
|
|
PiperOrigin-RevId: 329036994
|
|
The existing implementation for {G,S}etSockOpt take arguments of an
empty interface type which all types (implicitly) implement; any
type may be passed to the functions.
This change introduces marker interfaces for socket options that may be
set or queried which socket option types implement to ensure that invalid
types are caught at compile time. Different interfaces are used to allow
the compiler to enforce read-only or set-only socket options.
Fixes #3714.
RELNOTES: n/a
PiperOrigin-RevId: 328832161
|
|
In an upcoming CL, socket option types are made to implement a marker
interface with pointer receivers. Since this results in calling methods
of an interface with a pointer, we incur an allocation when attempting
to get an Endpoint's last error with the current implementation.
When calling the method of an interface, the compiler is unable to
determine what the interface implementation does with the pointer
(since calling a method on an interface uses virtual dispatch at runtime
so the compiler does not know what the interface method will do) so it
allocates on the heap to be safe incase an implementation continues to
hold the pointer after the functioon returns (the reference escapes the
scope of the object).
In the example below, the compiler does not know what b.foo does with
the reference to a it allocates a on the heap as the reference to a may
escape the scope of a.
```
var a int
var b someInterface
b.foo(&a)
```
This change removes the opportunity for that allocation.
RELNOTES: n/a
PiperOrigin-RevId: 328796559
|
|
More implementation+testing to follow.
#3549.
PiperOrigin-RevId: 328770160
|
|
This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and
vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted
soon anyway.
The following structs now use the new tool, with leak check enabled:
devpts:rootInode
fuse:inode
kernfs:Dentry
kernfs:dir
kernfs:readonlyDir
kernfs:StaticDirectory
proc:fdDirInode
proc:fdInfoDirInode
proc:subtasksInode
proc:taskInode
proc:tasksInode
vfs:FileDescription
vfs:MountNamespace
vfs:Filesystem
sys:dir
kernel:FSContext
kernel:ProcessGroup
kernel:Session
shm:Shm
mm:aioMappable
mm:SpecialMappable
transport:queue
And the following use the template, but because they currently are not leak
checked, a TODO is left instead of enabling leak check in this patch:
kernel:FDTable
tun:tunEndpoint
Updates #1486.
PiperOrigin-RevId: 328460377
|
|
iptables sockopts were kludged into an unnecessary check, this properly
relegates them to the {get,set}SockOptIP functions.
PiperOrigin-RevId: 328395135
|
|
When SO_LINGER option is enabled, the close will not return until all the
queued messages are sent and acknowledged for the socket or linger timeout is
reached. If the option is not set, close will return immediately. This option
is mainly supported for connection oriented protocols such as TCP.
PiperOrigin-RevId: 328350576
|
|
We still deviate a bit from linux in how long we will actually wait in
FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely
obvious as to why it's done that way. For now I think we can ignore that and
fix it if it really is an issue.
PiperOrigin-RevId: 328324922
|
|
PiperOrigin-RevId: 327686558
|
|
A later change will introduce the equivalent IPv6 logic.
#3549
PiperOrigin-RevId: 327499064
|
|
- Merges aleksej-paschenko's with HEAD
- Adds vfs2 support for ip_forward
|
|
The abstract socket namespace no longer holds any references on sockets.
Instead, TryIncRef() is used when a socket is being retrieved in
BoundEndpoint(). Abstract sockets are now responsible for removing themselves
from the namespace they are in, when they are destroyed.
Updates #1486.
PiperOrigin-RevId: 327064173
|
|
Formerly, when a packet is constructed or parsed, all headers are set by the
client code. This almost always involved prepending to pk.Header buffer or
trimming pk.Data portion. This is known to prone to bugs, due to the complexity
and number of the invariants assumed across netstack to maintain.
In the new PacketHeader API, client will call Push()/Consume() method to
construct/parse an outgoing/incoming packet. All invariants, such as slicing
and trimming, are maintained by the API itself.
NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no
longer valid.
PacketBuffer now assumes the packet is a concatenation of following portions:
* LinkHeader
* NetworkHeader
* TransportHeader
* Data
Any of them could be empty, or zero-length.
PiperOrigin-RevId: 326507688
|
|
This is purely moving code, no changes. netfilter.go is cluttered and targets.go
is a good place for this.
#3549
PiperOrigin-RevId: 325879965
|
|
/proc/sys/net/ipv4/tcp_recovery is used to enable RACK loss
recovery in TCP.
PiperOrigin-RevId: 325157807
|
|
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.
PiperOrigin-RevId: 324672584
|
|
Envoy (#170) uses this to get the original destination of redirected
packets.
|
|
PiperOrigin-RevId: 323715260
|
|
PiperOrigin-RevId: 322954792
|
|
PiperOrigin-RevId: 322937495
|
|
Socket option values are now required to implement marshal.Marshallable.
Co-authored-by: Rahat Mahmood <rahat@google.com>
PiperOrigin-RevId: 322831612
|
|
For iptables users, Check() is a hot path called for every packet one or more
times. Let's avoid a bunch of map lookups.
PiperOrigin-RevId: 322678699
|
|
Updates #173
PiperOrigin-RevId: 322665518
|
|
Updates #173
PiperOrigin-RevId: 321690756
|
|
gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks
tcpdump as it tries to interpret the packets incorrectly.
Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which
fails with an EINVAL since we don't implement it. For now change it to return
EOPNOTSUPP to indicate that we don't support the query rather than return
EINVAL.
NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities
and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type
field while NIC capabilities are more like the device features which can be
queried using SIOCETHTOOL but not modified and NIC Flags are fields that can
be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc.
Updates #2746
PiperOrigin-RevId: 321436525
|
|
When we failed to create the new socket after adding the fd to
fdnotifier, we should remove the fd from fdnotifier, because we
are going to close the fd directly.
Fixes: #3241
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
|
Updates #2746
PiperOrigin-RevId: 320757963
|
|
RFC-1122 (and others) specify that UDP should not receive
datagrams that have a source address that is a multicast address.
Packets should never be received FROM a multicast address.
See also, RFC 768: 'User Datagram Protocol'
J. Postel, ISI, 28 August 1980
A UDP datagram received with an invalid IP source address
(e.g., a broadcast or multicast address) must be discarded
by UDP or by the IP layer (see rfc 1122 Section 3.2.1.3).
This CL does not address TCP or broadcast which is more complicated.
Also adds a test for both ipv6 and ipv4 UDP.
Fixes #3154
PiperOrigin-RevId: 320547674
|
|
Updates #2746
Fixes #3158
PiperOrigin-RevId: 320497190
|