Age | Commit message (Collapse) | Author |
|
The limits for snd/rcv buffers for unix domain socket is controlled by the
following sysctls on linux
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
Today in gVisor we do not expose these sysctls but we do support setting the
equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor
can be used without netstack, with hostinet or even without any networking stack
at all. Which means ideally these sysctls need to live as globals in gVisor.
But rather than make this a big change for now we hardcode the limits in the
AF_UNIX implementation itself (which in itself is better than where we were
before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial
limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use
today.
Updates #5132
PiperOrigin-RevId: 356665498
|
|
Reported-by: syzbot+9ffc71246fe72c73fc25@syzkaller.appspotmail.com
PiperOrigin-RevId: 356536113
|
|
Individual test cases must not rely on being executed in a clean environment.
PiperOrigin-RevId: 354730126
|
|
PiperOrigin-RevId: 354615220
|
|
Individual test cases must not rely on being executed in a clean environment.
PiperOrigin-RevId: 354604389
|
|
PiperOrigin-RevId: 354367665
|
|
The test was execve itself into `/bin/true`, so the test was
not actually executing.
PiperOrigin-RevId: 353676855
|
|
Fixes #5113.
PiperOrigin-RevId: 353313374
|
|
On ARM64, when ptrace stops on a system call, it uses the x7 register to
indicate whether the stop has been signalled from syscall entry or syscall
exit. This means that we can't get a value of this register and we can't change
it. More details are in the comment for tracehook_report_syscall in
arch/arm64/kernel/ptrace.c.
This happens only if we stop on a system call, so let's queue a signal, resume
a stub thread and catch it on a signal handling.
Fixes: #5238
PiperOrigin-RevId: 352668695
|
|
io.Writer.Write requires err to be non-nil if n < len(v).
We could allow this but it will be irreversible if users depend on this
behavior.
Ported the test that discovered this.
PiperOrigin-RevId: 352065946
|
|
IPv4 was always supported but UDP never supported joining/leaving IPv6
multicast groups via socket options.
Add: IPPROTO_IPV6, IPV6_JOIN_GROUP/IPV6_ADD_MEMBERSHIP
Remove: IPPROTO_IPV6, IPV6_LEAVE_GROUP/IPV6_DROP_MEMBERSHIP
Test: integration_test.TestUDPAddRemoveMembershipSocketOption
PiperOrigin-RevId: 350396072
|
|
PiperOrigin-RevId: 349458984
|
|
open() has to return ENXIO in this case.
O_PATH isn't supported by vfs1.
PiperOrigin-RevId: 348820478
|
|
PiperOrigin-RevId: 348720223
|
|
Fixes #5004
PiperOrigin-RevId: 346643745
|
|
PiperOrigin-RevId: 345976554
|
|
These options allow overriding the signal that gets sent to the process when
I/O operations are available on the file descriptor, rather than the default
`SIGIO` signal. Doing so also populates `siginfo` to contain extra information
about which file descriptor caused the event (`si_fd`) and what events happened
on it (`si_band`). The logic around which FD is populated within `si_fd`
matches Linux's, which means it has some weird edge cases where that value may
not actually refer to a file descriptor that is still valid.
This CL also ports extra S/R logic regarding async handler in VFS2.
Without this, async I/O handlers aren't properly re-registered after S/R.
PiperOrigin-RevId: 345436598
|
|
For now, I only added a halt test case for Arm64.
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
This test fails because it must include additional UIDs. Omit
the bazel sandbox to ensure that it can function correctly.
PiperOrigin-RevId: 343927190
|
|
Store all the socket level options in a struct and call {Get/Set}SockOpt on
this struct. This will avoid implementing socket level options on all
endpoints. This CL contains implementing one socket level option for tcp and
udp endpoints.
PiperOrigin-RevId: 342203981
|
|
Writes to pipes of size < PIPE_BUF are guaranteed to be atomic, so writes
larger than that will return EAGAIN if the pipe has capacity < PIPE_BUF.
Writes to eventfds will return EAGAIN if the write would cause the eventfd
value to go over the max.
In both such cases, calling Ready() on the FD will return true (because it is
possible to write), but specific kinds of writes will in fact return EAGAIN.
This CL fixes an infinite loop in splice and sendfile (VFS1 and VFS2) by
forcing skipping the readiness check for the outfile in send, splice, and tee.
PiperOrigin-RevId: 341102260
|
|
In the docker container, the ipv6 loopback address is not set,
and connect("::1") has to return ENEADDRNOTAVAIL in this case.
Without this fix, it returns EHOSTUNREACH.
PiperOrigin-RevId: 340002915
|
|
kernel.copyContext{t} cannot be used outside of t's task goroutine, for three
reasons:
- t.CopyScratchBuffer() is task-goroutine-local.
- Calling t.MemoryManager() without running on t's task goroutine or locking
t.mu violates t.MemoryManager()'s preconditions.
- kernel.copyContext passes t as context.Context to MM IO methods, which is
illegal outside of t's task goroutine (cf. kernel.Task.Value()).
Fix this by splitting AsCopyContext() into CopyContext() (which takes an
explicit context.Context and is usable outside of the task goroutine) and
OwnCopyContext() (which uses t as context.Context, but is only usable by t's
task goroutine).
PiperOrigin-RevId: 339933809
|
|
PiperOrigin-RevId: 338805321
|
|
Inode number consistency checks are now skipped in save/restore tests for
reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename.
They pass in VFS1 due to the bug described in new test case
SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs.
Fixes #1663
PiperOrigin-RevId: 338776148
|
|
ualarm(2) is obsolete. Move IntervalTimer into a test util, where it can be
used by flock tests.
These tests were flaky with TSAN, probably because it slowed the tests down
enough that the alarm was expiring before flock() was called. Use an interval
timer so that even if we miss the first alarm (or more), flock() is still
guaranteed to be interrupted.
PiperOrigin-RevId: 337578751
|
|
Updates #267
PiperOrigin-RevId: 335713923
|
|
- When the KCOV_ENABLE_TRACE ioctl is called with the trace kind KCOV_TRACE_PC,
the kcov mode should be set to KCOV_*MODE*_TRACE_PC.
- When the owning task of kcov exits, the memory mapping should not be cleared
so it can be used by other tasks.
- Add more tests (also tested on native Linux kcov).
PiperOrigin-RevId: 335202585
|
|
PiperOrigin-RevId: 334419854
|
|
Calls to recv sometimes fail with EAGAIN, so call select beforehand.
PiperOrigin-RevId: 332943156
|
|
This is more consistent with Linux (see comment on MM.NewSharedAnonMappable()).
We don't do the same thing on VFS1 for reasons documented by the updated
comment.
PiperOrigin-RevId: 332514849
|
|
When a broadcast packet is received by the stack, the packet should be
delivered to each endpoint that may be interested in the packet. This
includes all any address and specified broadcast address listeners.
Test: integration_test.TestReuseAddrAndBroadcast
PiperOrigin-RevId: 332060652
|
|
These mostly guard linux-only headers; check for linux instead.
PiperOrigin-RevId: 329362762
|
|
An earlier change considered the loopback bound to all addresses in an
assigned subnet. This should have only be done for IPv4 to maintain
compatability with Linux:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3062ms
$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
^C
--- 2001:db8::2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms
$ sudo ip addr add 2001:db8::1/64 dev lo
$ ping 2001:db8::1
PING 2001:db8::1(2001:db8::1) 56 data bytes
64 bytes from 2001:db8::1: icmp_seq=1 ttl=64 time=0.055 ms
64 bytes from 2001:db8::1: icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from 2001:db8::1: icmp_seq=3 ttl=64 time=0.073 ms
64 bytes from 2001:db8::1: icmp_seq=4 ttl=64 time=0.071 ms
^C
--- 2001:db8::1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3075ms
rtt min/avg/max/mdev = 0.055/0.068/0.074/0.007 ms
$ ping 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
From 2001:db8::1 icmp_seq=1 Destination unreachable: No route
From 2001:db8::1 icmp_seq=2 Destination unreachable: No route
From 2001:db8::1 icmp_seq=3 Destination unreachable: No route
From 2001:db8::1 icmp_seq=4 Destination unreachable: No route
^C
--- 2001:db8::2 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3070ms
```
Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 329011566
|
|
In Linux, a kernel configuration is set that compiles the kernel with a
custom function that is called at the beginning of every basic block, which
updates the memory-mapped coverage information. The Go coverage tool does not
allow us to inject arbitrary instructions into basic blocks, but it does
provide data that we can convert to a kcov-like format and transfer them to
userspace through a memory mapping.
Note that this is not a strict implementation of kcov, which is especially
tricky to do because we do not have the same coverage tools available in Go
that that are available for the actual Linux kernel. In Linux, a kernel
configuration is set that compiles the kernel with a custom function that is
called at the beginning of every basic block to write program counters to the
kcov memory mapping. In Go, however, coverage tools only give us a count of
basic blocks as they are executed. Every time we return to userspace, we
collect the coverage information and write out PCs for each block that was
executed, providing userspace with the illusion that the kcov data is always
up to date. For convenience, we also generate a unique synthetic PC for each
block instead of using actual PCs. Finally, we do not provide thread-specific
coverage data (each kcov instance only contains PCs executed by the thread
owning it); instead, we will supply data for any file specified by --
instrumentation_filter.
Also, fix issue in nogo that was causing pkg/coverage:coverage_nogo
compilation to fail.
PiperOrigin-RevId: 328426526
|
|
When a loopback interface is configurd with an address and associated
subnet, the loopback should treat all addresses in that subnet as an
address it owns.
This is mimicking linux behaviour as seen below:
```
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
^C
--- 192.0.2.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
^C
--- 192.0.2.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms
$ sudo ip addr add 192.0.2.1/24 dev lo
$ ip addr show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 192.0.2.1/24 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
$ ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms
^C
--- 192.0.2.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms
$ ping 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms
^C
--- 192.0.2.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3049ms
rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms
```
Test: integration_test.TestLoopbackAcceptAllInSubnet
PiperOrigin-RevId: 328188546
|
|
It frequently times out under GoTSAN.
PiperOrigin-RevId: 327894343
|
|
Accept 128 + SIGNAL as well as SIGNAL as valid
returns for fork/exec tests.
Also, make changes so that test compiles in opensource. Test
had compile errors on latest Ubuntu 16.04 image with updated bazel to
3.4.0 (as well as base 2.0) used for Kokoro tests.
PiperOrigin-RevId: 327510310
|
|
Tests that we have the correct initial (empty) state for ip6tables.
#3549
PiperOrigin-RevId: 327477657
|
|
Skip InvalidOffset and InvalidLength for Linux as the test is invalid for
later Kernel versions.
Add UnsupportedFile test as this check is in all kernel versions.
PiperOrigin-RevId: 327248035
|
|
PiperOrigin-RevId: 322859907
|
|
Updates #2746
PiperOrigin-RevId: 320757963
|
|
PiperOrigin-RevId: 319283715
|
|
Bring udp_socket_test into complianc by:
- Eliminating IsRunningOnGvisor() invocations.
- Wrapping sockets in RAII FileDescriptor objects.
- Creating a Bind() method so that the first bind happens on port 0.
PiperOrigin-RevId: 318909396
|
|
Also make some fixes to vfs1's F_SETOWN. The fcntl test now entirely passes
on vfs2.
Fixes #2920.
PiperOrigin-RevId: 318669529
|
|
IPv6 raw sockets never include the IPv6 header.
PiperOrigin-RevId: 318582989
|
|
Because there is no inode structure stored in the sandbox, inotify watches
must be held on the dentry. This would be an issue in the presence of hard
links, where multiple dentries would need to share the same set of watches,
but in VFS2, we do not support the internal creation of hard links on gofer
fs. As a result, we make the assumption that every dentry corresponds to a
unique inode.
Furthermore, dentries can be cached and then evicted, even if the underlying
file has not be deleted. We must prevent this from occurring if there are any
watches that would be lost. Note that if the dentry was deleted or invalidated
(d.vfsd.IsDead()), we should still destroy it along with its watches.
Additionally, when a dentry’s last watch is removed, we cache it if it also
has zero references. This way, the dentry can eventually be evicted from
memory if it is no longer needed. This is accomplished with a new dentry
method, OnZeroWatches(), which is called by Inotify.RmWatch and
Inotify.Release. Note that it must be called after all inotify locks are
released to avoid violating lock order. Stress tests are added to make sure
that inotify operations don't deadlock with gofer.OnZeroWatches.
Updates #1479.
PiperOrigin-RevId: 317958034
|
|
- Change FileDescriptionImpl Lock/UnlockPOSIX signature to
take {start,length,whence}, so the correct offset can be
calculated in the implementations.
- Create PosixLocker interface to make it possible to share
the same locking code from different implementations.
Closes #1480
PiperOrigin-RevId: 316910286
|
|
Also fix test bugs uncovered now that they aren't silently skipped on
VFS2.
Updates #1487.
PiperOrigin-RevId: 316415807
|
|
PiperOrigin-RevId: 315979564
|