Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 387885663
|
|
Apply bitmap in fd_table to record open file fd. It can
accelerate the speed of allocating or removing fd from
fdtable.
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
|
|
`slice := *(*[]unsafe.Pointer)(...)` makes a copy of the slice header, which
then escapes because of the conditional `atomic.StorePointer(&f.slice, &slice)`
from table expansion. This occurs even when the table doesn't expand, and when
it can't (e.g. `close()` => `f.setAll(nil)`). Fix this by avoiding the copy
until after table expansion.
Before this CL:
```
TEXT pkg/sentry/kernel/kernel.(*FDTable).setAll(SB) pkg/sentry/kernel/fd_table_unsafe.go
fd_table_unsafe.go:119 0x7f00005f50e0 64488b0c25f8ffffff MOVQ FS:0xfffffff8, CX
fd_table_unsafe.go:119 0x7f00005f50e9 483b6110 CMPQ 0x10(CX), SP
fd_table_unsafe.go:119 0x7f00005f50ed 0f864d040000 JBE 0x7f00005f5540
fd_table_unsafe.go:119 0x7f00005f50f3 4883c480 ADDQ $-0x80, SP
fd_table_unsafe.go:119 0x7f00005f50f7 48896c2478 MOVQ BP, 0x78(SP)
fd_table_unsafe.go:119 0x7f00005f50fc 488d6c2478 LEAQ 0x78(SP), BP
fd_table_unsafe.go:120 0x7f00005f5101 488b8424a8000000 MOVQ 0xa8(SP), AX
fd_table_unsafe.go:120 0x7f00005f5109 4885c0 TESTQ AX, AX
fd_table_unsafe.go:120 0x7f00005f510c 7411 JE 0x7f00005f511f
fd_table_unsafe.go:120 0x7f00005f510e 488b8c24b0000000 MOVQ 0xb0(SP), CX
fd_table_unsafe.go:120 0x7f00005f5116 4885c9 TESTQ CX, CX
fd_table_unsafe.go:120 0x7f00005f5119 0f8500040000 JNE 0x7f00005f551f
fd_table_unsafe.go:124 0x7f00005f511f 488d05da115700 LEAQ 0x5711da(IP), AX
fd_table_unsafe.go:124 0x7f00005f5126 48890424 MOVQ AX, 0(SP)
fd_table_unsafe.go:124 0x7f00005f512a e8d19fa1ff CALL runtime.newobject(SB)
fd_table_unsafe.go:124 0x7f00005f512f 488b7c2408 MOVQ 0x8(SP), DI
fd_table_unsafe.go:124 0x7f00005f5134 488b842488000000 MOVQ 0x88(SP), AX
fd_table_unsafe.go:124 0x7f00005f513c 488b4820 MOVQ 0x20(AX), CX
fd_table_unsafe.go:124 0x7f00005f5140 488b5108 MOVQ 0x8(CX), DX
fd_table_unsafe.go:124 0x7f00005f5144 488b19 MOVQ 0(CX), BX
fd_table_unsafe.go:124 0x7f00005f5147 488b4910 MOVQ 0x10(CX), CX
fd_table_unsafe.go:124 0x7f00005f514b 48895708 MOVQ DX, 0x8(DI)
fd_table_unsafe.go:124 0x7f00005f514f 48894f10 MOVQ CX, 0x10(DI)
fd_table_unsafe.go:124 0x7f00005f5153 833df6e1120100 CMPL $0x0, runtime.writeBarrier(SB)
fd_table_unsafe.go:124 0x7f00005f515a 660f1f440000 NOPW 0(AX)(AX*1)
fd_table_unsafe.go:124 0x7f00005f5160 0f8589030000 JNE 0x7f00005f54ef
fd_table_unsafe.go:124 0x7f00005f5166 48891f MOVQ BX, 0(DI)
fd_table_unsafe.go:124 0x7f00005f5169 48897c2470 MOVQ DI, 0x70(SP)
fd_table_unsafe.go:127 0x7f00005f516e 8bb424a0000000 MOVL 0xa0(SP), SI
fd_table_unsafe.go:127 0x7f00005f5175 39d6 CMPL DX, SI
fd_table_unsafe.go:127 0x7f00005f5177 0f8c5f030000 JL 0x7f00005f54dc
...
```
After this CL:
```
TEXT pkg/sentry/kernel/kernel.(*FDTable).setAll(SB) pkg/sentry/kernel/fd_table_unsafe.go
fd_table_unsafe.go:119 0x7f00005f50e0 64488b0c25f8ffffff MOVQ FS:0xfffffff8, CX
fd_table_unsafe.go:119 0x7f00005f50e9 488d4424e8 LEAQ -0x18(SP), AX
fd_table_unsafe.go:119 0x7f00005f50ee 483b4110 CMPQ 0x10(CX), AX
fd_table_unsafe.go:119 0x7f00005f50f2 0f868e040000 JBE 0x7f00005f5586
fd_table_unsafe.go:119 0x7f00005f50f8 4881ec98000000 SUBQ $0x98, SP
fd_table_unsafe.go:119 0x7f00005f50ff 4889ac2490000000 MOVQ BP, 0x90(SP)
fd_table_unsafe.go:119 0x7f00005f5107 488dac2490000000 LEAQ 0x90(SP), BP
fd_table_unsafe.go:120 0x7f00005f510f 488b9424c0000000 MOVQ 0xc0(SP), DX
fd_table_unsafe.go:120 0x7f00005f5117 660f1f840000000000 NOPW 0(AX)(AX*1)
fd_table_unsafe.go:120 0x7f00005f5120 4885d2 TESTQ DX, DX
fd_table_unsafe.go:120 0x7f00005f5123 0f8406040000 JE 0x7f00005f552f
fd_table_unsafe.go:120 0x7f00005f5129 488b9c24c8000000 MOVQ 0xc8(SP), BX
fd_table_unsafe.go:120 0x7f00005f5131 4885db TESTQ BX, BX
fd_table_unsafe.go:120 0x7f00005f5134 0f852b040000 JNE 0x7f00005f5565
fd_table_unsafe.go:124 0x7f00005f513a 488bb424a0000000 MOVQ 0xa0(SP), SI
fd_table_unsafe.go:124 0x7f00005f5142 488b7e20 MOVQ 0x20(SI), DI
fd_table_unsafe.go:127 0x7f00005f5146 4c8b4708 MOVQ 0x8(DI), R8
fd_table_unsafe.go:127 0x7f00005f514a 448b8c24b8000000 MOVL 0xb8(SP), R9
fd_table_unsafe.go:127 0x7f00005f5152 4539c1 CMPL R8, R9
fd_table_unsafe.go:127 0x7f00005f5155 0f8d4a020000 JGE 0x7f00005f53a5
...
```
PiperOrigin-RevId: 345363242
|
|
This lets us avoid treating a value of 0 as one reference. All references
using the refsvfs2 template must call InitRefs() before the reference is
incremented/decremented, or else a panic will occur. Therefore, it should be
pretty easy to identify missing InitRef calls during testing.
Updates #1486.
PiperOrigin-RevId: 341411151
|
|
Our current reference leak checker uses finalizers to verify whether an object
has reached zero references before it is garbage collected. There are multiple
problems with this mechanism, so a rewrite is in order.
With finalizers, there is no way to guarantee that a finalizer will run before
the program exits. When an unreachable object with a finalizer is garbage
collected, its finalizer will be added to a queue and run asynchronously. The
best we can do is run garbage collection upon sandbox exit to make sure that
all finalizers are enqueued.
Furthermore, if there is a chain of finalized objects, e.g. A points to B
points to C, garbage collection needs to run multiple times before all of the
finalizers are enqueued. The first GC run will register the finalizer for A but
not free it. It takes another GC run to free A, at which point B's finalizer
can be registered. As a result, we need to run GC as many times as the length
of the longest such chain to have a somewhat reliable leak checker.
Finally, a cyclical chain of structs pointing to one another will never be
garbage collected if a finalizer is set. This is a well-known issue with Go
finalizers (https://github.com/golang/go/issues/7358). Using leak checking on
filesystem objects that produce cycles will not work and even result in memory
leaks.
The new leak checker stores reference counted objects in a global map when
leak check is enabled and removes them once they are destroyed. At sandbox
exit, any remaining objects in the map are considered as leaked. This provides
a deterministic way of detecting leaks without relying on the complexities of
finalizers and garbage collection.
This approach has several benefits over the former, including:
- Always detects leaks of objects that should be destroyed very close to
sandbox exit. The old checker very rarely detected these leaks, because it
relied on garbage collection to be run in a short window of time.
- Panics if we forgot to enable leak check on a ref-counted object (we will try
to remove it from the map when it is destroyed, but it will never have been
added).
- Can store extra logging information in the map values without adding to the
size of the ref count struct itself. With the size of just an int64, the ref
count object remains compact, meaning frequent operations like IncRef/DecRef
are more cache-efficient.
- Can aggregate leak results in a single report after the sandbox exits.
Instead of having warnings littered in the log, which were
non-deterministically triggered by garbage collection, we can print all
warning messages at once. Note that this could also be a limitation--the
sandbox must exit properly for leaks to be detected.
Some basic benchmarking indicates that this change does not significantly
affect performance when leak checking is enabled, which is understandable
since registering/unregistering is only done once for each filesystem object.
Updates #1486.
PiperOrigin-RevId: 338685972
|
|
This is needed for SO_LINGER, where close() is blocked for linger timeout and
we are holding the FDTable lock for the entire timeout which will not allow
us to create/delete other fds. We have to release the locks and then drop the
fds.
PiperOrigin-RevId: 331844185
|
|
In Linux, FDSize is fs/proc/array.c:task_state() => struct fdtable::max_fds,
which is set to the underlying array's length in fs/file.c:alloc_fdtable().
Follow-up changes:
- Remove FDTable.GetRefs() and FDTable.GetRefsVFS2(), which are unused.
- Reset FDTable.used to 0 during restore, since the subsequent calls to
FDTable.setAll() increment it again, causing its value to be doubled. (After
this CL, FDTable.used is only used to avoid reallocation in FDTable.GetFDs(),
so this fix is not very visible.)
PiperOrigin-RevId: 331588190
|
|
PiperOrigin-RevId: 329572337
|
|
This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and
vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted
soon anyway.
The following structs now use the new tool, with leak check enabled:
devpts:rootInode
fuse:inode
kernfs:Dentry
kernfs:dir
kernfs:readonlyDir
kernfs:StaticDirectory
proc:fdDirInode
proc:fdInfoDirInode
proc:subtasksInode
proc:taskInode
proc:tasksInode
vfs:FileDescription
vfs:MountNamespace
vfs:Filesystem
sys:dir
kernel:FSContext
kernel:ProcessGroup
kernel:Session
shm:Shm
mm:aioMappable
mm:SpecialMappable
transport:queue
And the following use the template, but because they currently are not leak
checked, a TODO is left instead of enabling leak check in this patch:
kernel:FDTable
tun:tunEndpoint
Updates #1486.
PiperOrigin-RevId: 328460377
|
|
FD table now holds both VFS1 and VFS2 types and uses the correct
one based on what's set.
Parts of this CL are just initial changes (e.g. sys_read.go,
runsc/main.go) to serve as a template for the remaining changes.
Updates #1487
Updates #1623
PiperOrigin-RevId: 292023223
|
|
The preferred Copyright holder is "The gVisor Authors".
PiperOrigin-RevId: 291786657
|
|
This renames FDMap to FDTable and drops the kernel.FD type, which had an entire
package to itself and didn't serve much use (it was freely cast between types,
and served as more of an annoyance than providing any protection.)
Based on BenchmarkFDLookupAndDecRef-12, we can expect 5-10 ns per lookup
operation, and 10-15 ns per concurrent lookup operation of savings.
This also fixes two tangential usage issues with the FDMap. Namely, non-atomic
use of NewFDFrom and associated calls to Remove (that are both racy and fail to
drop the reference on the underlying file.)
PiperOrigin-RevId: 256285890
|