Age | Commit message (Collapse) | Author |
|
These options allow overriding the signal that gets sent to the process when
I/O operations are available on the file descriptor, rather than the default
`SIGIO` signal. Doing so also populates `siginfo` to contain extra information
about which file descriptor caused the event (`si_fd`) and what events happened
on it (`si_band`). The logic around which FD is populated within `si_fd`
matches Linux's, which means it has some weird edge cases where that value may
not actually refer to a file descriptor that is still valid.
This CL also ports extra S/R logic regarding async handler in VFS2.
Without this, async I/O handlers aren't properly re-registered after S/R.
PiperOrigin-RevId: 345436598
|
|
Previous experience has shown that these types of wrappers tends to create two
kinds of problems: hidden allocations (e.g. each call to
FileReadWriteSeeker.Read/Write allocates a usermem.BytesIO on the heap) and
hidden lock ordering problems (e.g. VFS1 splice deadlocks). Since this is only
needed by fsimpl/verity, move it there.
PiperOrigin-RevId: 345377830
|
|
`slice := *(*[]unsafe.Pointer)(...)` makes a copy of the slice header, which
then escapes because of the conditional `atomic.StorePointer(&f.slice, &slice)`
from table expansion. This occurs even when the table doesn't expand, and when
it can't (e.g. `close()` => `f.setAll(nil)`). Fix this by avoiding the copy
until after table expansion.
Before this CL:
```
TEXT pkg/sentry/kernel/kernel.(*FDTable).setAll(SB) pkg/sentry/kernel/fd_table_unsafe.go
fd_table_unsafe.go:119 0x7f00005f50e0 64488b0c25f8ffffff MOVQ FS:0xfffffff8, CX
fd_table_unsafe.go:119 0x7f00005f50e9 483b6110 CMPQ 0x10(CX), SP
fd_table_unsafe.go:119 0x7f00005f50ed 0f864d040000 JBE 0x7f00005f5540
fd_table_unsafe.go:119 0x7f00005f50f3 4883c480 ADDQ $-0x80, SP
fd_table_unsafe.go:119 0x7f00005f50f7 48896c2478 MOVQ BP, 0x78(SP)
fd_table_unsafe.go:119 0x7f00005f50fc 488d6c2478 LEAQ 0x78(SP), BP
fd_table_unsafe.go:120 0x7f00005f5101 488b8424a8000000 MOVQ 0xa8(SP), AX
fd_table_unsafe.go:120 0x7f00005f5109 4885c0 TESTQ AX, AX
fd_table_unsafe.go:120 0x7f00005f510c 7411 JE 0x7f00005f511f
fd_table_unsafe.go:120 0x7f00005f510e 488b8c24b0000000 MOVQ 0xb0(SP), CX
fd_table_unsafe.go:120 0x7f00005f5116 4885c9 TESTQ CX, CX
fd_table_unsafe.go:120 0x7f00005f5119 0f8500040000 JNE 0x7f00005f551f
fd_table_unsafe.go:124 0x7f00005f511f 488d05da115700 LEAQ 0x5711da(IP), AX
fd_table_unsafe.go:124 0x7f00005f5126 48890424 MOVQ AX, 0(SP)
fd_table_unsafe.go:124 0x7f00005f512a e8d19fa1ff CALL runtime.newobject(SB)
fd_table_unsafe.go:124 0x7f00005f512f 488b7c2408 MOVQ 0x8(SP), DI
fd_table_unsafe.go:124 0x7f00005f5134 488b842488000000 MOVQ 0x88(SP), AX
fd_table_unsafe.go:124 0x7f00005f513c 488b4820 MOVQ 0x20(AX), CX
fd_table_unsafe.go:124 0x7f00005f5140 488b5108 MOVQ 0x8(CX), DX
fd_table_unsafe.go:124 0x7f00005f5144 488b19 MOVQ 0(CX), BX
fd_table_unsafe.go:124 0x7f00005f5147 488b4910 MOVQ 0x10(CX), CX
fd_table_unsafe.go:124 0x7f00005f514b 48895708 MOVQ DX, 0x8(DI)
fd_table_unsafe.go:124 0x7f00005f514f 48894f10 MOVQ CX, 0x10(DI)
fd_table_unsafe.go:124 0x7f00005f5153 833df6e1120100 CMPL $0x0, runtime.writeBarrier(SB)
fd_table_unsafe.go:124 0x7f00005f515a 660f1f440000 NOPW 0(AX)(AX*1)
fd_table_unsafe.go:124 0x7f00005f5160 0f8589030000 JNE 0x7f00005f54ef
fd_table_unsafe.go:124 0x7f00005f5166 48891f MOVQ BX, 0(DI)
fd_table_unsafe.go:124 0x7f00005f5169 48897c2470 MOVQ DI, 0x70(SP)
fd_table_unsafe.go:127 0x7f00005f516e 8bb424a0000000 MOVL 0xa0(SP), SI
fd_table_unsafe.go:127 0x7f00005f5175 39d6 CMPL DX, SI
fd_table_unsafe.go:127 0x7f00005f5177 0f8c5f030000 JL 0x7f00005f54dc
...
```
After this CL:
```
TEXT pkg/sentry/kernel/kernel.(*FDTable).setAll(SB) pkg/sentry/kernel/fd_table_unsafe.go
fd_table_unsafe.go:119 0x7f00005f50e0 64488b0c25f8ffffff MOVQ FS:0xfffffff8, CX
fd_table_unsafe.go:119 0x7f00005f50e9 488d4424e8 LEAQ -0x18(SP), AX
fd_table_unsafe.go:119 0x7f00005f50ee 483b4110 CMPQ 0x10(CX), AX
fd_table_unsafe.go:119 0x7f00005f50f2 0f868e040000 JBE 0x7f00005f5586
fd_table_unsafe.go:119 0x7f00005f50f8 4881ec98000000 SUBQ $0x98, SP
fd_table_unsafe.go:119 0x7f00005f50ff 4889ac2490000000 MOVQ BP, 0x90(SP)
fd_table_unsafe.go:119 0x7f00005f5107 488dac2490000000 LEAQ 0x90(SP), BP
fd_table_unsafe.go:120 0x7f00005f510f 488b9424c0000000 MOVQ 0xc0(SP), DX
fd_table_unsafe.go:120 0x7f00005f5117 660f1f840000000000 NOPW 0(AX)(AX*1)
fd_table_unsafe.go:120 0x7f00005f5120 4885d2 TESTQ DX, DX
fd_table_unsafe.go:120 0x7f00005f5123 0f8406040000 JE 0x7f00005f552f
fd_table_unsafe.go:120 0x7f00005f5129 488b9c24c8000000 MOVQ 0xc8(SP), BX
fd_table_unsafe.go:120 0x7f00005f5131 4885db TESTQ BX, BX
fd_table_unsafe.go:120 0x7f00005f5134 0f852b040000 JNE 0x7f00005f5565
fd_table_unsafe.go:124 0x7f00005f513a 488bb424a0000000 MOVQ 0xa0(SP), SI
fd_table_unsafe.go:124 0x7f00005f5142 488b7e20 MOVQ 0x20(SI), DI
fd_table_unsafe.go:127 0x7f00005f5146 4c8b4708 MOVQ 0x8(DI), R8
fd_table_unsafe.go:127 0x7f00005f514a 448b8c24b8000000 MOVL 0xb8(SP), R9
fd_table_unsafe.go:127 0x7f00005f5152 4539c1 CMPL R8, R9
fd_table_unsafe.go:127 0x7f00005f5155 0f8d4a020000 JGE 0x7f00005f53a5
...
```
PiperOrigin-RevId: 345363242
|
|
This change lets us split the v4 stats from the v6 stats, which will be
useful when adding stats for each network endpoint.
PiperOrigin-RevId: 345322615
|
|
Moved AddressAndFamily() and ConvertAddress() to socket package from netstack.
This helps because these utilities are used by sibling netstack packages.
Such sibling dependencies can later cause circular dependencies. Common utils
shared between siblings should be moved up to the parent.
PiperOrigin-RevId: 345275571
|
|
Refactor some utilities and rename some others for clarity.
PiperOrigin-RevId: 345247836
|
|
PiperOrigin-RevId: 345178956
|
|
PiperOrigin-RevId: 344958513
|
|
Ports the following options:
- TCP_NODELAY
- TCP_CORK
- TCP_QUICKACK
Also deletes the {Get/Set}SockOptBool interface methods from all implementations
PiperOrigin-RevId: 344378824
|
|
We will use SocketOptions for all kinds of options, not just SOL_SOCKET options
because (1) it is consistent with Linux which defines all option variables on
the top level socket struct, (2) avoid code complexity. Appropriate checks
have been added for matching option level to the endpoint type.
Ported the following options to this new utility:
- IP_MULTICAST_LOOP
- IP_RECVTOS
- IPV6_RECVTCLASS
- IP_PKTINFO
- IP_HDRINCL
- IPV6_V6ONLY
Changes in behavior (these are consistent with what Linux does AFAICT):
- Now IP_MULTICAST_LOOP can be set for TCP (earlier it was a noop) but does not
affect the endpoint itself.
- We can now getsockopt IP_HDRINCL (earlier we would get an error).
- Now we return ErrUnknownProtocolOption if SOL_IP or SOL_IPV6 options are used
on unix sockets.
- Now we return ErrUnknownProtocolOption if SOL_IPV6 options are used on non
AF_INET6 endpoints.
This change additionally makes the following modifications:
- Add State() uint32 to commonEndpoint because both tcpip.Endpoint and
transport.Endpoint interfaces have it. It proves to be quite useful.
- Gets rid of SocketOptionsHandler.IsListening(). It was an anomaly as it was
not a handler. It is now implemented on netstack itself.
- Gets rid of tcp.endpoint.EndpointInfo and directly embeds
stack.TransportEndpointInfo. There was an unnecessary level of embedding
which served no purpose.
- Removes some checks dual_stack_test.go that used the errors from
GetSockOptBool(tcpip.V6OnlyOption) to confirm some state. This is not
consistent with the new design and also seemed to be testing the
implementation instead of behavior.
PiperOrigin-RevId: 344354051
|
|
The bug has been fixed.
PiperOrigin-RevId: 344088206
|
|
PiperOrigin-RevId: 343959348
|
|
1. Add getD/getDentry methods to avoid long casting line in each test
2. Factor all calls to vfs.OpenAt/UnlinkAt/RenameAt on lower filesystem
to their own method (for both lower file and lower Merkle file) so
the tests are more readable
3. Add descriptive test names for delete/remove tests
PiperOrigin-RevId: 343540202
|
|
PiperOrigin-RevId: 343398191
|
|
This also makes the formatting nicer; the caller will add ":\n" to the end of
the message.
PiperOrigin-RevId: 343397099
|
|
We would like to track locks ordering to detect ordering violations. Detecting
violations is much simpler if mutexes must be unlocked by the same goroutine
that locked them.
Thus, as a first step to tracking lock ordering, add this lock/unlock
requirement to gVisor's sync.Mutex. This is more strict than the Go standard
library's sync.Mutex, but initial testing indicates only a single lock that is
used across goroutines. The new sync.CrossGoroutineMutex relaxes the
requirement (but will not provide lock order checking).
Due to the additional overhead, enforcement is only enabled with the
"checklocks" build tag. Build with this tag using:
bazel build --define=gotags=checklocks ...
From my spot-checking, this has no changed inlining properties when disabled.
Updates #4804
PiperOrigin-RevId: 343370200
|
|
If a kernfs user does not cache dentries, then cacheLocked will destroy the
dentry. The current DecRef implementation will be racy in this case as the
following can happen:
- Goroutine 1 calls DecRef and decreases ref count from 1 to 0.
- Goroutine 2 acquires d.fs.mu for reading and calls IncRef and increasing the
ref count from 0 to 1.
- Goroutine 2 releases d.fs.mu and calls DecRef again decreasing ref count from
1 to 0.
- Goroutine 1 now acquires d.fs.mu and calls cacheLocked which destroys the
dentry.
- Goroutine 2 now acquires d.fs.mu and calls cacheLocked to find that the dentry
is already destroyed!
Earlier we would panic in this case, we could instead just return instead of
adding complexity to handle this race. This is similar to what the gofer client
does.
We do not want to lock d.fs.mu in the case that the filesystem caches dentries
(common case as procfs and sysfs do this) to prevent congestion due to lock
contention.
PiperOrigin-RevId: 343229496
|
|
PiperOrigin-RevId: 343217712
|
|
PiperOrigin-RevId: 343196927
|
|
This changes also introduces:
- `SocketOptionsHandler` interface which can be implemented by endpoints to
handle endpoint specific behavior on SetSockOpt. This is analogous to what
Linux does.
- `DefaultSocketOptionsHandler` which is a default implementation of the above.
This is embedded in all endpoints so that we don't have to uselessly
implement empty functions. Endpoints with specific behavior can override the
embedded method by manually defining its own implementation.
PiperOrigin-RevId: 343158301
|
|
PiperOrigin-RevId: 343146856
|
|
PiperOrigin-RevId: 343130667
|
|
PiperOrigin-RevId: 343123278
|
|
This change also makes the following fixes:
- Make SocketOptions use atomic operations instead of having to acquire/drop
locks upon each get/set option.
- Make documentation more consistent.
- Remove tcpip.SocketOptions from socketOpsCommon because it already exists
in transport.Endpoint.
- Refactors get/set socket options tests to be easily extendable.
PiperOrigin-RevId: 343103780
|
|
PiperOrigin-RevId: 343000335
|
|
PiperOrigin-RevId: 342992936
|
|
If we don't hold a reference, the dentry can be destroyed by another thread.
Reported-by: syzbot+f2132e50060c41f6d41f@syzkaller.appspotmail.com
PiperOrigin-RevId: 342951940
|
|
Also add the lock order for verity fs, and add a lock to protect dentry
hash.
PiperOrigin-RevId: 342946537
|
|
PiperOrigin-RevId: 342943430
|
|
Fixes the behaviour of SO_ERROR for tcp sockets where in linux it returns
sk->sk_err and if sk->sk_err is 0 then it returns sk->sk_soft_err. In gVisor TCP
we endpoint.HardError is the equivalent of sk->sk_err and endpoint.LastError
holds soft errors. This change brings this into alignment with Linux such that
both hard/soft errors are cleared when retrieved using getsockopt(.. SO_ERROR)
is called on a socket.
Fixes #3812
PiperOrigin-RevId: 342868552
|
|
Optimize and bug fix all fpsimd related code.
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
I added 2 unified processing functions for all exceptions of el/el0
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
As part of this, change Task.interrupted() to not drain Task.interruptChan, and
do so explicitly using new function Task.unsetInterrupted() instead.
PiperOrigin-RevId: 342768365
|
|
Closes #4746
PiperOrigin-RevId: 342747165
|
|
- Make AddressableEndpoint optional for NetworkEndpoint.
Not all NetworkEndpoints need to support addressing (e.g. ARP), so
AddressableEndpoint should only be implemented for protocols that
support addressing such as IPv4 and IPv6.
With this change, tcpip.ErrNotSupported will be returned by the stack
when attempting to modify addresses on a network endpoint that does
not support addressing.
Now that packets are fully handled at the network layer, and (with this
change) addresses are optional for network endpoints, we no longer need
the workaround for ARP where a fake ARP address was added to each NIC
that performs ARP so that packets would be delivered to the ARP layer.
PiperOrigin-RevId: 342722547
|
|
PiperOrigin-RevId: 342373580
|
|
Checks in Task.block() and Task.Value() are conditional on race detection being
enabled, since these functions are relatively hot. Checks in Task.SleepStart()
and Task.UninterruptibleSleepStart() are enabled unconditionally, since these
functions are not thought to lie on any critical paths, and misuse of these
functions is required for b/168241471 to manifest.
PiperOrigin-RevId: 342342175
|
|
This is actually just b/168751672 again; cl/332394146 was incorrectly reverted
by cl/341411151. Document the reference holder to reduce the likelihood that
this happens again.
Also document a few other bugs observed in the process.
PiperOrigin-RevId: 342339144
|
|
PiperOrigin-RevId: 342221309
|
|
PiperOrigin-RevId: 342214859
|
|
Store all the socket level options in a struct and call {Get/Set}SockOpt on
this struct. This will avoid implementing socket level options on all
endpoints. This CL contains implementing one socket level option for tcp and
udp endpoints.
PiperOrigin-RevId: 342203981
|
|
kernel.Task can only be used as context.Context by that Task's task goroutine.
This is violated in at least two places:
- In any case where one thread accesses the /proc/[tid] of any other thread,
passing the kernel.Task for [tid] as the context.Context is incorrect.
- Task.rebuildTraceContext() may be called by Kernel.RebuildTraceContexts()
outside the scope of any task goroutine.
Fix these (as well as a data race on Task.traceContext discovered during the
course of finding the latter).
PiperOrigin-RevId: 342174404
|
|
children names map can be used to verify whether a child is expected
during walking, so that we can detect unexpected modifications that
deleted/renamed both the target file and the corresponding merkle tree
file.
PiperOrigin-RevId: 342170715
|
|
This reduces confusion with context.Context (which is also relevant to
kernel.Tasks) and is consistent with existing function kernel.LoadTaskImage().
PiperOrigin-RevId: 342167298
|
|
PiperOrigin-RevId: 342161204
|
|
feature
Signed-off-by: Robin Luk <lubin.lu@alibaba-inc.com>
|
|
PiperOrigin-RevId: 341982672
|
|
This will help to debug:
https://syzkaller.appspot.com/bug?id=0d717bd7028dceeb4b38f09aab2841c398b41d81
PiperOrigin-RevId: 341458715
|
|
PiperOrigin-RevId: 341445910
|
|
This lets us avoid treating a value of 0 as one reference. All references
using the refsvfs2 template must call InitRefs() before the reference is
incremented/decremented, or else a panic will occur. Therefore, it should be
pretty easy to identify missing InitRef calls during testing.
Updates #1486.
PiperOrigin-RevId: 341411151
|