Age | Commit message (Collapse) | Author |
|
A deadlock may occur if a write lock on a RWMutex is blocked between
nested read lock attempts as the inner read lock attempt will be
blocked in this scenario.
Example (T1 and T2 are differnt goroutines):
T1: obtain read-lock
T2: attempt write-lock (blocks)
T1: attempt inner/nested read-lock (blocks)
Here we can see that T1 and T2 are deadlocked.
Tests: Existing tests pass.
PiperOrigin-RevId: 298426678
|
|
PiperOrigin-RevId: 298405064
|
|
DATA RACE in netstack.(*SocketOperations).fetchReadView
Write at 0x00c001dca138 by goroutine 1001:
gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).fetchReadView()
pkg/sentry/socket/netstack/netstack.go:418 +0x85
gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).coalescingRead()
pkg/sentry/socket/netstack/netstack.go:2309 +0x67
gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).nonBlockingRead()
pkg/sentry/socket/netstack/netstack.go:2378 +0x183d
Previous read at 0x00c001dca138 by goroutine 1111:
gvisor.dev/gvisor/pkg/sentry/socket/netstack.(*SocketOperations).Ioctl()
pkg/sentry/socket/netstack/netstack.go:2666 +0x533
gvisor.dev/gvisor/pkg/sentry/syscalls/linux.Ioctl()
Reported-by: syzbot+d4c3885fcc346f08deb6@syzkaller.appspotmail.com
PiperOrigin-RevId: 298387377
|
|
PiperOrigin-RevId: 298380654
|
|
PiperOrigin-RevId: 297982488
|
|
This is needed for syzkaller to proper classify issues.
Right now, all watchdog issues are duped to one with the
subject "panic: Sentry detected stuck task(s). See stack
trace and message above for more details".
PiperOrigin-RevId: 297975363
|
|
There is no cpuid instruction on arm64, so we need to defined it
just to avoid a compile time error.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
A follow-up change will convert the networking code to use this standard
pipe implementation.
PiperOrigin-RevId: 297903206
|
|
/dev/net/tun does not currently work with hostinet. This has caused some
program starts failing because it thinks the feature exists.
PiperOrigin-RevId: 297876196
|
|
We changed syscalls to allow dup3 for ARM64.
Updates #1198
PiperOrigin-RevId: 297870816
|
|
Call stack.Close on stacks when we are done with them in tcp_test. This avoids
leaking resources and reduces the test's flakiness when race/gotsan is enabled.
It also provides test coverage for the race also fixed in this change, which
can be reliably triggered with the stack.Close change (and without the other
changes) when race/gotsan is enabled.
The race was possible when calling Abort (via stack.Close) on an endpoint
processing a SYN segment as part of a passive connect.
Updates #1564
PiperOrigin-RevId: 297685432
|
|
PiperOrigin-RevId: 297674924
|
|
PiperOrigin-RevId: 297638665
|
|
PiperOrigin-RevId: 297494373
|
|
PiperOrigin-RevId: 297492004
|
|
Analogous to Linux's kern_mount().
PiperOrigin-RevId: 297259580
|
|
MayDelete must lock the directory also, otherwise concurrent renames may
race. Note that this also changes the methods to be aligned with the actual
Remove and RemoveDirectory methods to minimize confusion when reading the
code. (It was hard to see that resolution was correct.)
PiperOrigin-RevId: 297258304
|
|
PiperOrigin-RevId: 297230721
|
|
PiperOrigin-RevId: 297220008
|
|
Tests:
- header_test.TestIsV6LinkLocalMulticastAddress
- header_test.TestScopeForIPv6Address
- stack_test.TestIPv6SourceAddressSelectionScopeAndSameAddress
PiperOrigin-RevId: 297215576
|
|
PiperOrigin-RevId: 297192390
|
|
PiperOrigin-RevId: 297191168
|
|
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2.
mount and umount2 aren't ported out of temporary laziness. access and faccessat
need additional FSImpl methods to implement properly, but are stubbed to
prevent googletest from CHECK-failing. Other syscalls require additional
plumbing.
Updates #1623
PiperOrigin-RevId: 297188448
|
|
PiperOrigin-RevId: 297175316
|
|
Fixes #1049
PiperOrigin-RevId: 297175164
|
|
Each mount is holds a reference on a root Dirent, but the mount itself may
live beyond it's own reference. This means that a call to Root() can come
after the associated reference has been dropped.
Instead of introducing a separate layer of references for mount objects,
we simply change the Root() method to use TryIncRef() and allow it to return
nil if the mount is already gone. This requires updating a small number of
callers and minimizes the change (since VFSv2 will replace this code shortly).
PiperOrigin-RevId: 297174230
|
|
TestCurrentConnectedIncrement fails consistently under gotsan due to the sleep
to check metrics is exactly the same as the TIME-WAIT duration. Under gotsan
things can be slow enough that the increment test is done before the protocol
goroutine is run after the TIME-WAIT timer expires and does its cleanup.
Increasing the sleep from 1s to 1.2s makes the test pass consistently.
PiperOrigin-RevId: 297160181
|
|
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.
Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.
PiperOrigin-RevId: 296922548
|
|
PiperOrigin-RevId: 296526279
|
|
Tests: stack_test.TestAttachToLinkEndpointImmediately
PiperOrigin-RevId: 296474068
|
|
Test: stack_test.TestRouterSolicitation
PiperOrigin-RevId: 296454766
|
|
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets
will have the same behavior as it is now.
Before the userspace is able to create device, the default loopback device can
be used to test.
/proc/net and /sys/net will still be connected to the root network stack; this
is the same behavior now.
Issue #1833
PiperOrigin-RevId: 296309389
|
|
- Disabled NICs will have their associated NDP state cleared.
- Disabled NICs will not accept incoming packets.
- Writes through a Route with a disabled NIC will return an invalid
endpoint state error.
- stack.Stack.FindRoute will not return a route with a disabled NIC.
- NIC's Running flag will report the NIC's enabled status.
Tests:
- stack_test.TestDisableUnknownNIC
- stack_test.TestDisabledNICsNICInfoAndCheckNIC
- stack_test.TestRoutesWithDisabledNIC
- stack_test.TestRouteWritePacketWithDisabledNIC
- stack_test.TestStopStartSolicitingRouters
- stack_test.TestCleanupNDPState
- stack_test.TestAddRemoveIPv4BroadcastAddressOnNICEnableDisable
- stack_test.TestJoinLeaveAllNodesMulticastOnNICEnableDisable
PiperOrigin-RevId: 296298588
|
|
Users of the API only care about whether the copy in/out succeeds in
their entirety, which is already signalled by the returned error.
PiperOrigin-RevId: 296297843
|
|
Example:
epoll_ctl(0x3 anon_inode:[eventpoll], EPOLL_CTL_ADD, 0x6 anon_inode:[eventfd], 0x7efe2fd92a80 {events=EPOLLIN|EPOLLOUT data=0x10203040506070a}) = 0x0 (4.411µs)
epoll_wait(0x3 anon_inode:[eventpoll], 0x7efe2fd92b50 {{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}}, 0x3, 0xffffffff) = 0x3 (29.891µs)
PiperOrigin-RevId: 296258146
|
|
This was inadverently dropped by cl/295811743.
PiperOrigin-RevId: 296254482
|
|
tmpfs.fileDescription now implements ConfigureMMap. And tmpfs.regularFile
implement memmap.Mappable. The methods are mostly unchanged from VFS1 tmpfs.
PiperOrigin-RevId: 296234557
|
|
This patch defines the structures and
adds the implementations for fpsimd initialization.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
Consistent with QEMU, getUserRegisters() should be an arch-specific
function. So, it should be called in dieArchSetup().
With this patch and the pagetable/pcid patch, the kvm modules on Arm64 can be
built successfully.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 296088213
|
|
|
|
Added the ability to get/set the IP_RECVTCLASS socket option on UDP endpoints.
If enabled, traffic class from the incoming Network Header passed as ancillary
data in the ControlMessages.
Adding Get/SetSockOptBool to decrease the overhead of getting/setting simple
options. (This was absorbed in a CL that will be landing before this one).
Test:
* Added unit test to udp_test.go that tests getting/setting as well as
verifying that we receive expected TOS from incoming packet.
* Added a syscall test for verifying getting/setting
* Removed test skip for existing syscall test to enable end to end test.
PiperOrigin-RevId: 295840218
|
|
Package syncevent is intended to subsume ~all uses of channels in the sentry
(including //pkg/waiter), as well as //pkg/sleep.
Compared to channels:
- Delivery of events to a syncevent.Receiver allows *synchronous* execution of
an arbitrary callback, whereas delivery of events to a channel requires a
goroutine to receive from that channel, resulting in substantial scheduling
overhead. (This is also part of the motivation for the waiter package.)
- syncevent.Waiter can wait on multiple event sources without the high O(N)
overhead of select. (This is the same motivation as for the sleep package.)
Compared to the waiter package:
- syncevent.Waiters are intended to be persistent (i.e. per-kernel.Task), and
syncevent.Broadcaster (analogous to waiter.Queue) is a hash table rather than
a linked list, such that blocking is (usually) allocation-free.
- syncevent.Source (analogous to waiter.Waitable) does not include an equivalent
to waiter.Waitable.Readiness(), since this is inappropriate for transient
events (see e.g. //pkg/sentry/kernel/time.ClockEventSource).
Compared to the sleep package:
- syncevent events are represented by bits in a bitmask rather than discrete
sleep.Waker objects, reducing overhead and making it feasible to broadcast
events to multiple syncevent.Receivers.
- syncevent.Receiver invokes an arbitrary callback, which is required by the
sentry's epoll implementation. (syncevent.Waiter, which is analogous to
sleep.Sleeper, pairs a syncevent.Receiver with a callback that wakes a
waiting goroutine; the implementation of this aspect is nearly identical to
that of sleep.Sleeper, except that it represents *runtime.g as unsafe.Pointer
rather than uintptr.)
- syncevent.Waiter.Wait (analogous to sleep.Sleeper.Fetch(block=true)) does not
automatically un-assert returned events. This is useful in cases where the
path for handling an event is not the same as the path that observes it, such
as for application signals (a la Linux's TIF_SIGPENDING).
- Unlike sleep.Sleeper, which Fetches Wakers in the order that they were
Asserted, the event bitmasks used by syncevent.Receiver have no way of
preserving event arrival order. (This is similar to select, which goes out of
its way to randomize event ordering.)
The disadvantage of the syncevent package is that, since events are represented
by bits in a uint64 bitmask, each syncevent.Receiver can "only" multiplex
between 64 distinct events; this does not affect any known use case.
Benchmarks:
BenchmarkBroadcasterSubscribeUnsubscribe
BenchmarkBroadcasterSubscribeUnsubscribe-12 45133884 26.3 ns/op
BenchmarkMapSubscribeUnsubscribe
BenchmarkMapSubscribeUnsubscribe-12 28504662 41.8 ns/op
BenchmarkQueueSubscribeUnsubscribe
BenchmarkQueueSubscribeUnsubscribe-12 22747668 45.6 ns/op
BenchmarkBroadcasterSubscribeUnsubscribeBatch
BenchmarkBroadcasterSubscribeUnsubscribeBatch-12 31609177 37.8 ns/op
BenchmarkMapSubscribeUnsubscribeBatch
BenchmarkMapSubscribeUnsubscribeBatch-12 17563906 62.1 ns/op
BenchmarkQueueSubscribeUnsubscribeBatch
BenchmarkQueueSubscribeUnsubscribeBatch-12 26248838 46.6 ns/op
BenchmarkBroadcasterBroadcastRedundant
BenchmarkBroadcasterBroadcastRedundant/0
BenchmarkBroadcasterBroadcastRedundant/0-12 100907563 11.8 ns/op
BenchmarkBroadcasterBroadcastRedundant/1
BenchmarkBroadcasterBroadcastRedundant/1-12 85103068 13.3 ns/op
BenchmarkBroadcasterBroadcastRedundant/4
BenchmarkBroadcasterBroadcastRedundant/4-12 52716502 22.3 ns/op
BenchmarkBroadcasterBroadcastRedundant/16
BenchmarkBroadcasterBroadcastRedundant/16-12 20278165 58.7 ns/op
BenchmarkBroadcasterBroadcastRedundant/64
BenchmarkBroadcasterBroadcastRedundant/64-12 5905428 205 ns/op
BenchmarkMapBroadcastRedundant
BenchmarkMapBroadcastRedundant/0
BenchmarkMapBroadcastRedundant/0-12 87532734 13.5 ns/op
BenchmarkMapBroadcastRedundant/1
BenchmarkMapBroadcastRedundant/1-12 28488411 36.3 ns/op
BenchmarkMapBroadcastRedundant/4
BenchmarkMapBroadcastRedundant/4-12 19628920 60.9 ns/op
BenchmarkMapBroadcastRedundant/16
BenchmarkMapBroadcastRedundant/16-12 6026980 192 ns/op
BenchmarkMapBroadcastRedundant/64
BenchmarkMapBroadcastRedundant/64-12 1640858 754 ns/op
BenchmarkQueueBroadcastRedundant
BenchmarkQueueBroadcastRedundant/0
BenchmarkQueueBroadcastRedundant/0-12 96904807 12.0 ns/op
BenchmarkQueueBroadcastRedundant/1
BenchmarkQueueBroadcastRedundant/1-12 73521873 16.3 ns/op
BenchmarkQueueBroadcastRedundant/4
BenchmarkQueueBroadcastRedundant/4-12 39209468 31.2 ns/op
BenchmarkQueueBroadcastRedundant/16
BenchmarkQueueBroadcastRedundant/16-12 10810058 105 ns/op
BenchmarkQueueBroadcastRedundant/64
BenchmarkQueueBroadcastRedundant/64-12 2998046 376 ns/op
BenchmarkBroadcasterBroadcastAck
BenchmarkBroadcasterBroadcastAck/1
BenchmarkBroadcasterBroadcastAck/1-12 44472397 26.4 ns/op
BenchmarkBroadcasterBroadcastAck/4
BenchmarkBroadcasterBroadcastAck/4-12 17653509 69.7 ns/op
BenchmarkBroadcasterBroadcastAck/16
BenchmarkBroadcasterBroadcastAck/16-12 4082617 260 ns/op
BenchmarkBroadcasterBroadcastAck/64
BenchmarkBroadcasterBroadcastAck/64-12 1220534 1027 ns/op
BenchmarkMapBroadcastAck
BenchmarkMapBroadcastAck/1
BenchmarkMapBroadcastAck/1-12 26760705 44.2 ns/op
BenchmarkMapBroadcastAck/4
BenchmarkMapBroadcastAck/4-12 11495636 100 ns/op
BenchmarkMapBroadcastAck/16
BenchmarkMapBroadcastAck/16-12 2937590 343 ns/op
BenchmarkMapBroadcastAck/64
BenchmarkMapBroadcastAck/64-12 861037 1344 ns/op
BenchmarkQueueBroadcastAck
BenchmarkQueueBroadcastAck/1
BenchmarkQueueBroadcastAck/1-12 19832679 55.0 ns/op
BenchmarkQueueBroadcastAck/4
BenchmarkQueueBroadcastAck/4-12 5618214 189 ns/op
BenchmarkQueueBroadcastAck/16
BenchmarkQueueBroadcastAck/16-12 1569980 713 ns/op
BenchmarkQueueBroadcastAck/64
BenchmarkQueueBroadcastAck/64-12 437672 2814 ns/op
BenchmarkWaiterNotifyRedundant
BenchmarkWaiterNotifyRedundant-12 650823090 1.96 ns/op
BenchmarkSleeperNotifyRedundant
BenchmarkSleeperNotifyRedundant-12 619871544 1.61 ns/op
BenchmarkChannelNotifyRedundant
BenchmarkChannelNotifyRedundant-12 298903778 3.67 ns/op
BenchmarkWaiterNotifyWaitAck
BenchmarkWaiterNotifyWaitAck-12 68358360 17.8 ns/op
BenchmarkSleeperNotifyWaitAck
BenchmarkSleeperNotifyWaitAck-12 25044883 41.2 ns/op
BenchmarkChannelNotifyWaitAck
BenchmarkChannelNotifyWaitAck-12 29572404 40.2 ns/op
BenchmarkSleeperMultiNotifyWaitAck
BenchmarkSleeperMultiNotifyWaitAck-12 16122969 73.8 ns/op
BenchmarkWaiterTempNotifyWaitAck
BenchmarkWaiterTempNotifyWaitAck-12 46111489 25.8 ns/op
BenchmarkSleeperTempNotifyWaitAck
BenchmarkSleeperTempNotifyWaitAck-12 15541882 73.6 ns/op
BenchmarkWaiterNotifyWaitMultiAck
BenchmarkWaiterNotifyWaitMultiAck-12 65878500 18.2 ns/op
BenchmarkSleeperNotifyWaitMultiAck
BenchmarkSleeperNotifyWaitMultiAck-12 28798623 41.5 ns/op
BenchmarkChannelNotifyWaitMultiAck
BenchmarkChannelNotifyWaitMultiAck-12 11308468 101 ns/op
BenchmarkWaiterNotifyAsyncWaitAck
BenchmarkWaiterNotifyAsyncWaitAck-12 2475387 492 ns/op
BenchmarkSleeperNotifyAsyncWaitAck
BenchmarkSleeperNotifyAsyncWaitAck-12 2184507 518 ns/op
BenchmarkChannelNotifyAsyncWaitAck
BenchmarkChannelNotifyAsyncWaitAck-12 2120365 562 ns/op
BenchmarkWaiterNotifyAsyncWaitMultiAck
BenchmarkWaiterNotifyAsyncWaitMultiAck-12 2351247 494 ns/op
BenchmarkSleeperNotifyAsyncWaitMultiAck
BenchmarkSleeperNotifyAsyncWaitMultiAck-12 2205799 522 ns/op
BenchmarkChannelNotifyAsyncWaitMultiAck
BenchmarkChannelNotifyAsyncWaitMultiAck-12 1238079 928 ns/op
Updates #1074
PiperOrigin-RevId: 295834087
|
|
perf shows that ExtendedStateSize cosumes more than 20% of cpu:
23.61% 23.61% [.] pkg/cpuid/cpuid.HostID
PiperOrigin-RevId: 295813263
|
|
- Redocument memory ordering from "no ordering" to "acquire-release". (No
functional change: both LOCK WHATEVER on x86, and LDAXR/STLXR loops on ARM64,
already have this property.)
- Remove IncUnlessZeroInt32 and DecUnlessOneInt32, which were only faster than
the equivalent loops using sync/atomic before the Go compiler inlined
non-unsafe.Pointer atomics many releases ago.
PiperOrigin-RevId: 295811743
|
|
PiperOrigin-RevId: 295785052
|
|
PiperOrigin-RevId: 295770717
|
|
glibc defines struct epoll_event in such a way that epoll_event.data.fd exists.
However, the kernel's definition of struct epoll_event makes epoll_event.data
an opaque uint64, so naming half of it "fd" just introduces confusion. Remove
the Fd field, and make Data a [2]int32 to compensate.
Also add required padding to linux.EpollEvent on ARM64.
PiperOrigin-RevId: 295250424
|
|
This is to fix a data race between sending an external signal to
a ThreadGroup and kernel saving state for S/R.
PiperOrigin-RevId: 295244281
|