Age | Commit message (Collapse) | Author |
|
* Remove stack.Route from incoming packet path.
There is no need to pass around a stack.Route during the incoming path
of a packet. Instead, pass around the packet's link/network layer
information in the packet buffer since all layers may need this
information.
* Support address bound and outgoing packet NIC in routes.
When forwarding is enabled, the source address of a packet may be bound
to a different interface than the outgoing interface. This change
updates stack.Route to hold both NICs so that one can be used to write
packets while the other is used to check if the route's bound address
is valid. Note, we need to hold the address's interface so we can check
if the address is a spoofed address.
* Introduce the concept of a local route.
Local routes are routes where the packet never needs to leave the stack;
the destination is stack-local. We can now route between interfaces
within a stack if the packet never needs to leave the stack, even when
forwarding is disabled.
* Always obtain a route from the stack before sending a packet.
If a packet needs to be sent in response to an incoming packet, a route
must be obtained from the stack to ensure the stack is configured to
send packets to the packet's source from the packet's destination.
* Enable spoofing if a stack may send packets from unowned addresses.
This change required changes to some netgophers since previously,
promiscuous mode was enough to let the netstack respond to all
incoming packets regardless of the packet's destination address. Now
that a stack.Route is not held for each incoming packet, finding a route
may fail with local addresses we don't own but accepted packets for
while in promiscuous mode. Since we also want to be able to send from
any address (in response the received promiscuous mode packets), we need
to enable spoofing.
* Skip transport layer checksum checks for locally generated packets.
If a packet is locally generated, the stack can safely assume that no
errors were introduced while being locally routed since the packet is
never sent out the wire.
Some bugs fixed:
- transport layer checksum was never calculated after NAT.
- handleLocal didn't handle routing across interfaces.
- stack didn't support forwarding across interfaces.
- always consult the routing table before creating an endpoint.
Updates #4688
Fixes #3906
PiperOrigin-RevId: 340943442
|
|
This was occasionally causing tests to get stuck due to races with the save
process, during which the same mutex is acquired.
PiperOrigin-RevId: 340789616
|
|
The file size can now also be verified. Also, since we are zero-padding
the last block of the data, we cannot differentiate the cases between
zero-padded block from the blocks that are ends with zeroes. With the
size included this can be addressed, as those cases would have different
file size.
PiperOrigin-RevId: 340695510
|
|
PiperOrigin-RevId: 340536306
|
|
The waits-for relationship between an epoll instance and an inotify fd should
be restored.
This fixes flaky inotify vfs2 tests.
PiperOrigin-RevId: 340531367
|
|
The default pipe size already matched linux, and is unchanged.
Furthermore `atomicIOBytes` is made a proper constant (as it is in Linux). We
were plumbing usermem.PageSize everywhere, so this is no functional change.
PiperOrigin-RevId: 340497006
|
|
PiperOrigin-RevId: 340484823
|
|
Without releasing the mutex, operations on the endpoint following a
nonblocking connect will not make progress until connect is complete.
PiperOrigin-RevId: 340467654
|
|
Use an sErr injection to trigger sigbus when we receive EFAULT from the
run ioctl.
After applying this patch, mmap_test_runsc_kvm will be passed on
Arm64.
Signed-off-by: Bin Lu <bin.lu@arm.com>
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/4542 from lubinszARM:pr_kvm_mmap_1 f81bd42466d1d60a581e5fb34de18b78878c68c1
PiperOrigin-RevId: 340461239
|
|
PiperOrigin-RevId: 340389884
|
|
Don't return the filename, since it can already be determined by the caller.
This was causing a panic in RenameAt, which relied on the name to be nonempty
even if the error was EEXIST.
Reported-by: syzbot+e9f117d000301e42361f@syzkaller.appspotmail.com
PiperOrigin-RevId: 340381946
|
|
Send NUD probes in another gorountine to free the thread of execution for
finishing the state transition. This is necessary to avoid deadlock where
sending and processing probes are done in the same call stack, such as loopback
and integration tests.
Fixes #4701
PiperOrigin-RevId: 340362481
|
|
PiperOrigin-RevId: 340361998
|
|
PiperOrigin-RevId: 340275942
|
|
PiperOrigin-RevId: 340274194
|
|
Fixes: #509
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
PiperOrigin-RevId: 340149214
|
|
In the docker container, the ipv6 loopback address is not set,
and connect("::1") has to return ENEADDRNOTAVAIL in this case.
Without this fix, it returns EHOSTUNREACH.
PiperOrigin-RevId: 340002915
|
|
Read-only directories (e.g. under /sys, /proc) should return EPERM for rename.
PiperOrigin-RevId: 339979022
|
|
The non-errno error was causing panics before.
PiperOrigin-RevId: 339969348
|
|
PiperOrigin-RevId: 339945377
|
|
kernel.copyContext{t} cannot be used outside of t's task goroutine, for three
reasons:
- t.CopyScratchBuffer() is task-goroutine-local.
- Calling t.MemoryManager() without running on t's task goroutine or locking
t.mu violates t.MemoryManager()'s preconditions.
- kernel.copyContext passes t as context.Context to MM IO methods, which is
illegal outside of t's task goroutine (cf. kernel.Task.Value()).
Fix this by splitting AsCopyContext() into CopyContext() (which takes an
explicit context.Context and is usable outside of the task goroutine) and
OwnCopyContext() (which uses t as context.Context, but is only usable by t's
task goroutine).
PiperOrigin-RevId: 339933809
|
|
PiperOrigin-RevId: 339921446
|
|
PiperOrigin-RevId: 339750876
|
|
Fixes #4613.
PiperOrigin-RevId: 339746784
|
|
TCP endpoint unconditionly binds to v4 even when the stack only supports v6.
PiperOrigin-RevId: 339739392
|
|
PiperOrigin-RevId: 339721152
|
|
Refactor TCP handshake code so that when connect is initiated, the initial SYN
is sent before creating a goroutine to handle the rest of the handshake (which
blocks). Similarly, the initial SYN-ACK is sent inline when SYN is received
during accept.
Some additional cleanup is done as well.
Eventually we would like to complete connections in the dispatcher without
requiring a wakeup to complete the handshake. This refactor makes that easier.
Updates #231
PiperOrigin-RevId: 339675182
|
|
Updates #1486.
PiperOrigin-RevId: 339581879
|
|
Also refactor the template and CheckedObject interface to make this cleaner.
Updates #1486.
PiperOrigin-RevId: 339577120
|
|
PiperOrigin-RevId: 339540747
|
|
Updates #1199
PiperOrigin-RevId: 339528827
|
|
Use the stack clock instead. Change NeighborEntry.UpdatedAt to
UpdatedAtNanos.
PiperOrigin-RevId: 339520566
|
|
PiperOrigin-RevId: 339505487
|
|
PiperOrigin-RevId: 339404936
|
|
Signed-off-by: Min Le <lemin.lm@antgroup.com>
|
|
IPv4 options extend the size of the IP header and have a basic known
format. The framework can process that format without needing to know
about every possible option. We can add more code to handle additional
option types as we need them. Bad options or mangled option entries
can result in ICMP Parameter Problem packets. The first types we
support are the Timestamp option and the Record Route option, included
in this change.
The options are processed at several points in the packet flow within
the Network stack, with slightly different requirements. The framework
includes a mechanism to control this at each point. Support has been
added for such points which are only present in upcoming CLs such as
during packet forwarding and fragmentation.
With this change, 'ping -R' and 'ping -T' work against gVisor and Fuchsia.
$ ping -R 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.990 ms
NOP
RR: 192.168.1.1
192.168.1.2
192.168.1.1
$ ping -T tsprespec 192.168.1.2 192.168.1.1 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(124) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=1.20 ms
TS: 192.168.1.2 71486821 absolute
192.168.1.1 746
Unit tests included for generic options, Timestamp options
and Record Route options.
PiperOrigin-RevId: 339379076
|
|
PiperOrigin-RevId: 339377254
|
|
This change wakes up any waiters when we receive an ICMP port unreachable
control packet on an UDP socket as well as sets waiter.EventErr in
the result returned by Readiness() when e.lastError is not nil.
The latter is required where an epoll()/poll() is done after the error
is already handled since we will never notify again in such cases.
PiperOrigin-RevId: 339370469
|
|
This PR implements /proc/[pid]/mem for `pkg/sentry/fs` (refer to #2716) and `pkg/sentry/fsimpl`.
@majek
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/4060 from lnsp:proc-pid-mem 2caf9021254646f441be618a9bb5528610e44d43
PiperOrigin-RevId: 339369629
|
|
...instead of passing its fields piecemeal.
PiperOrigin-RevId: 339345899
|
|
In VFS1's overlayfs, files use the device and inode number of the lower layer
inode if one exists, and the upper layer inode otherwise. The former behavior
is inefficient (requiring lower layer lookups even if the file exists and is
otherwise wholly determined by the upper layer), and somewhat dangerous if the
lower layer is also observable (since both the overlay and lower layer file
will have the same device and inode numbers and thus appear to be the same
file, despite being behaviorally different). VFS2 overlayfs imitates Linux
overlayfs (in its default configuration) instead; it always uses the inode
number from the originating layer, but synthesizes a unique device number for
directories and another device number for non-directory files that have not
been copied-up.
As it turns out, the latter is insufficient (in VFS2, and possibly Linux as
well), because a given layer may include files with different device numbers.
If two distinct files on such a layer have device number X and Y respectively,
but share inode number Z, then the overlay will map both files to some private
device number X' and inode number Z, potentially confusing applications. Fix
this by assigning synthetic device numbers based on the lower layer's device
number, rather than the lower layer's vfs.Filesystem.
PiperOrigin-RevId: 339300341
|
|
Updates #3921
PiperOrigin-RevId: 339195417
|
|
PiperOrigin-RevId: 339166854
|
|
Also change verity test to use a context with an active task. This is
required to delete/rename the file in the underlying file system.
PiperOrigin-RevId: 339146445
|
|
Much like the VFS2 gofer client, kernfs too now caches dentries. The size of the
LRU cache is configurable via mount options.
Have adopted the same reference semantics from gofer client dentry.
Only sysfs and procfs use this LRU cache. The rest of the kernfs users (devpts,
fusefs, host, pipefs, sockfs) still use the no cache approach.
PiperOrigin-RevId: 339139835
|
|
Control messages collected when peeking into a socket were being leaked.
PiperOrigin-RevId: 339114961
|
|
PiperOrigin-RevId: 338847417
|
|
Updates #1486.
PiperOrigin-RevId: 338832085
|
|
Fixes #4427, #4428
PiperOrigin-RevId: 338805047
|