Age | Commit message (Collapse) | Author |
|
TCP_KEEPCNT is used to set the maximum keepalive probes to be
sent before dropping the connection.
WANT_LGTM=jchacon
PiperOrigin-RevId: 315758094
|
|
In case of SOCK_SEQPACKET, it has to be ignored.
In case of SOCK_STREAM, EISCONN or EOPNOTSUPP has to be returned.
PiperOrigin-RevId: 315755972
|
|
LockFD is the generic implementation that can be embedded in
FileDescriptionImpl implementations. Unique lock ID is
maintained in vfs.FileDescription and is created on demand.
Updates #1480
PiperOrigin-RevId: 315604825
|
|
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.
This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.
PiperOrigin-RevId: 315179302
|
|
For TCP sockets gVisor incorrectly returns EAGAIN when no ephemeral ports are
available to bind during a connect. Linux returns EADDRNOTAVAIL. This change
fixes gVisor to return the correct code and adds a test for the same.
This change also fixes a minor bug for ping sockets where connect() would fail
with EINVAL unless the socket was bound first.
Also added tests for testing UDP Port exhaustion and Ping socket port
exhaustion.
PiperOrigin-RevId: 314988525
|
|
IPTables.connections contains a sync.RWMutex. Copying it will trigger copylocks
analysis. Tested by manually enabling nogo tests.
sync.RWMutex is added to IPTables for the additional race condition discovered.
PiperOrigin-RevId: 314817019
|
|
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.
With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.
Updates #2404.
PiperOrigin-RevId: 314610879
|
|
PiperOrigin-RevId: 314450191
|
|
|
|
On native Linux, calling recv/read right after send/write sometimes returns
EWOULDBLOCK, if the data has not made it to the receiving socket (even though
the endpoints are on the same host). Poll before reading to avoid this.
Making this change also uncovered a hostinet bug (gvisor.dev/issue/2726),
which is noted in this CL.
PiperOrigin-RevId: 312320587
|
|
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense
in one place.
* Drop "user-space kernel" and use "application kernel". The term "user-space
kernel" is confusing when some platform implementation do not run in
user-space (instead running in guest ring zero).
* Clear up the relationship between the Platform page in the user guide and the
Platform page in the architecture guide, and ensure they are cross-linked.
* Restore the call-to-action quick start link in the main page, and drop the
GitHub link (which also appears in the top-right).
* Improve image formatting by centering all doc and blog images, and move the
image captions to the alt text.
PiperOrigin-RevId: 311845158
|
|
This change adds support for TCP_SYNCNT and TCP_WINDOW_CLAMP options
in GetSockOpt/SetSockOpt. This change does not really change any
behaviour in Netstack and only stores/returns the stored value.
Actual honoring of these options will be added as required.
Fixes #2626, #2625
PiperOrigin-RevId: 311453777
|
|
PiperOrigin-RevId: 311203776
|
|
PiperOrigin-RevId: 311181084
|
|
kernel.Task.Block() requires that the caller is running on the task goroutine.
netstack.SocketOperations.Write() uses kernel.TaskFromContext() to call
kernel.Task.Block() even if it's not running on the task goroutine. Stop doing
that.
PiperOrigin-RevId: 311178335
|
|
- Added support for matching gid owner and invert flag for uid
and gid.
$ iptables -A OUTPUT -p tcp -m owner --gid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --uid-owner root -j ACCEPT
$ iptables -A OUTPUT -p tcp -m owner ! --gid-owner root -j DROP
- Added tests for uid, gid and invert flags.
|
|
We weren't properly checking whether the inserted default rule was
unconditional.
|
|
Enables commands with -o (--out-interface) for iptables rules.
$ iptables -A OUTPUT -o eth0 -j ACCEPT
PiperOrigin-RevId: 310642286
|
|
Updates #1197, #1198, #1672
PiperOrigin-RevId: 310432006
|
|
Three updates:
- Mark all vfs2 socket syscalls as supported.
- Use the same dev number and ino number generator for all types of sockets,
unlike in VFS1.
- Do not use host fd for hostinet metadata.
Fixes #1476, #1478, #1484, 1485, #2017.
PiperOrigin-RevId: 309994579
|
|
Connection tracking is used to track packets in prerouting and
output hooks of iptables. The NAT rules modify the tuples in
connections. The connection tracking code modifies the packets by
looking at the modified tuples.
|
|
PiperOrigin-RevId: 309491861
|
|
All three follow the same pattern:
1. Refactor VFS1 sockets into socketOpsCommon, so that most of the methods can
be shared with VFS2.
2. Create a FileDescriptionImpl with the corresponding socket operations,
rewriting the few that cannot be shared with VFS1.
3. Set up a VFS2 socket provider that creates a socket by setting up a dentry
in the global Kernel.socketMount and connecting it with a new
FileDescription.
This mostly completes the work for porting sockets to VFS2, and many syscall
tests can be enabled as a result.
There are several networking-related syscall tests that are still not passing:
1. net gofer tests
2. socketpair gofer tests
2. sendfile tests (splice is not implemented in VFS2 yet)
Updates #1478, #1484, #1485
PiperOrigin-RevId: 309457331
|
|
The netfilter package uses logs to make debugging the (de)serialization of
structs easier. This generates a lot of (usually irrelevant) logs. Logging is
now hidden behind a debug flag.
PiperOrigin-RevId: 309087115
|
|
Enforce write permission checks in BoundEndpointAt, which corresponds to the
permission checks in Linux (net/unix/af_unix.c:unix_find_other).
Also, create bound socket files with the correct permissions in VFS2.
Fixes #2324.
PiperOrigin-RevId: 308949084
|
|
PiperOrigin-RevId: 308932254
|
|
Named pipes and sockets can be represented in two ways in gofer fs:
1. As a file on the remote filesystem. In this case, all file operations are
passed through 9p.
2. As a synthetic file that is internal to the sandbox. In this case, the
dentry stores an endpoint or VFSPipe for sockets and pipes respectively,
which replaces interactions with the remote fs through the gofer.
In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p,
and if it fails, fall back to the synthetic version.
Updates #1200.
PiperOrigin-RevId: 308828161
|
|
The FileDescription implementation for hostfs sockets uses the standard Unix
socket implementation (unix.SocketVFS2), but is also tied to a hostfs dentry.
Updates #1672, #1476
PiperOrigin-RevId: 308716426
|
|
PiperOrigin-RevId: 308674219
|
|
Fixes #1477.
PiperOrigin-RevId: 308317511
|
|
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.
The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
PiperOrigin-RevId: 308163542
|
|
Sentry metrics with nanoseconds units are labeled as such, and non-cumulative
sentry metrics are supported.
PiperOrigin-RevId: 307621080
|
|
PiperOrigin-RevId: 307598974
|
|
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.
The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
|
|
This previously changed in 305699233, but this behaviour turned out to
be load bearing.
PiperOrigin-RevId: 307033802
|
|
This should fix panic at aio callback.
PiperOrigin-RevId: 305798549
|
|
PiperOrigin-RevId: 305699233
|
|
We already have network namespace for netstack.
PiperOrigin-RevId: 305341954
|
|
Updates #1476, #1478, #1484, #1485.
PiperOrigin-RevId: 304845354
|
|
This change involves several steps:
- Refactor the VFS1 unix socket implementation to share methods between VFS1
and VFS2 where possible. Re-implement the rest.
- Override the default PRead, Read, PWrite, Write, Ioctl, Release methods in
FileDescriptionDefaultImpl.
- Add functions to create and initialize a new Dentry/Inode and FileDescription
for a Unix socket file.
Updates #1476
PiperOrigin-RevId: 304689796
|
|
PiperOrigin-RevId: 304508083
|
|
Refactor the existing socket interface to share methods between VFS1 and VFS2.
The method signatures do not contain anything filesystem-related, so they don't
need to be re-defined for VFS2.
Updates #1476, #1478, #1484, #1485.
PiperOrigin-RevId: 304184545
|
|
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
|
|
PiperOrigin-RevId: 302891559
|
|
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.
Updates #2214
PiperOrigin-RevId: 302677662
|
|
Fixes #506
PiperOrigin-RevId: 302540404
|
|
PiperOrigin-RevId: 302539171
|
|
Also get rid of the readViewHasData as it's not required anymore.
Updates #231, #357
PiperOrigin-RevId: 301837227
|
|
workMu is removed and e.mu is now a mutex that supports TryLock. The packet
processing path tries to lock the mutex and if its locked it will just queue the
packet and move on. The endpoint.UnlockUser() will process any backlog of
packets before unlocking the socket.
This simplifies the locking inside tcp endpoints a lot. Further the
endpoint.LockUser() implements spinning as long as the lock is not held by
another syscall goroutine. This ensures low latency as not spinning leads to the
task thread being put to sleep if the lock is held by the packet dispatch
path. This is suboptimal as the lower layer rarely holds the lock for long so
implementing spinning here helps.
If the lock is held by another task goroutine then we just proceed to call
LockUser() and the task could be put to sleep.
The protocol goroutines themselves just call e.mu.Lock() and block if the
lock is currently not available.
Updates #231, #357
PiperOrigin-RevId: 301808349
|
|
PiperOrigin-RevId: 301197007
|