Age | Commit message (Collapse) | Author |
|
|
|
Right now, each vdso call triggers vmexit. VDSO and VVAR pages are
mapped with VM_IO and get_user_pages fails for such vma-s. KVM was not
able to handle this case up to the v4.8 kernel. This problem was fixed by
add6a0cd1c5ba ("KVM: MMU: try to fix up page faults before giving up").
For some unknown reasons, it still doesn't work in case of nested
virtualization.
Before:
BenchmarkKernelVDSO-6 252519 4598 ns/op
After:
BenchmarkKernelVDSO-6 34431957 34.91 ns/op
PiperOrigin-RevId: 405715941
|
|
PiperOrigin-RevId: 404400399
|
|
|
|
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go
will not clobber floating point registers before/after calling them
respectively. Due to regabig in Go 1.17, this is no longer the case; regabig
(among other things) maintains a zeroed XMM15 during ABIInternal execution,
including by zeroing it after ABI0-to-ABIInternal transitions. In
ring0.sysenter/exception, this happens in
ring0.kernelSyscall/kernelException.abi0 respectively; in
ring0.CPU.SwitchToUser, this happens after returning from
ring0.sysret/iret.abi0. Delete these functions and do floating point save/load
in assembly.
While arm64 doesn't appear to be immediately affected (so this CL permits us to
resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be
incorrect for the same fundamental reason (Go code can't sanely assume what
registers the Go compiler will or won't use) and should be fixed eventually.
PiperOrigin-RevId: 401895658
|
|
|
|
PiperOrigin-RevId: 401624134
|
|
|
|
PiperOrigin-RevId: 399560357
|
|
|
|
PiperOrigin-RevId: 398572735
|
|
Signed-off-by: Andrei Vagin <avagin@google.com>
|
|
We install seccomp rules so that the SIGSYS signal is generated for
each mmap system call. Then our signal handler executes the real mmap
syscall and if a new regions is created, it maps it to the guest.
Signed-off-by: Andrei Vagin <avagin@google.com>
|
|
|
|
|
|
|
|
PiperOrigin-RevId: 392554743
|
|
Right now, the first slot starts with an address of a memory region and its size is faultBlockSize,
but the second slot starts with (physicalStart + faultBlockSize) & faultBlockMask.
It means they will overlap if a start address of a memory region are not aligned to faultBlockSize.
The kernel doesn't allow to add overlapped regions, but we ignore the EEXIST error.
Signed-off-by: Andrei Vagin <avagin@google.com>
|
|
|
|
Right now, it contains the code:
origState := atomic.LoadUint32(&c.state)
atomicbitops.AndUint32(&c.state, ^vCPUUser)
The problem here is that vCPU.bounce that is called from another thread can add
vCPUWaiter when origState has been read but vCPUUser isn't cleared yet. In this
case, vCPU.unlock doesn't notify other threads about changes and c.bounce will
be stuck in the futex_wait call.
PiperOrigin-RevId: 389697411
|
|
|
|
This CL introduces a 'checklinkname' analyzer, which provides rudimentary
type-checking that verifies that function signatures on the local and remote
sides of //go:linkname directives match expected values.
If the Go standard library changes the definitions of any of these function,
checklinkname will flag the change as a finding, providing an error informing
the gVisor team to adapt to the upstream changes. This allows us to eliminate
the majority of gVisor's forward-looking negative build tags, as we can catch
mismatches in testing [1].
The remaining forward-looking negative build tags are covering shared struct
definitions, which I hope to add to checklinkname in a future CL.
[1] Of course, semantics/requirements can change without the signature
changing, so we still must be careful, but this covers the common case.
PiperOrigin-RevId: 387873847
|
|
|
|
Make hasSlot scan allocated slot, rather than the whole slice.
It is supposed to store physicalStart in usedSlot.
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
|
|
|
|
|
|
PiperOrigin-RevId: 385919423
|
|
|
|
PiperOrigin-RevId: 385894869
|
|
|
|
PiperOrigin-RevId: 384305599
|
|
|
|
Go 1.17 adds a new register-based calling convention. While transparent for
most applications, the KVM platform needs special work in a few cases.
First of all, we need the actual address of some assembly functions, rather
than the address of a wrapper. See http://gvisor.dev/pr/5832 for complete
discussion of this.
More relevant to this CL is that ABI0-to-ABIInternal wrappers (i.e., calls from
assembly to Go) access the G via FS_BASE. The KVM quite fast-and-loose about
the Go environment, often calling into (nosplit) Go functions with
uninitialized FS_BASE.
That will no longer work in Go 1.17, so this CL changes the platform to
consistently restore FS_BASE before calling into Go code.
This CL does not affect arm64 code. Go 1.17 does not support the register-based
calling convention for arm64 (it will come in 1.18), but arm64 also does not
use a non-standard register like FS_BASE for TLS, so it may not require any
changes.
PiperOrigin-RevId: 384234305
|
|
|
|
- LockOSThread() around prctl(PR_SET_NO_NEW_PRIVS) => seccomp(). go:nosplit
"mostly" prevents async preemption, but IIUC preemption is still permitted
during function prologues:
funcpctab "".seccomp [valfunc=pctopcdata]
0 -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) TEXT "".seccomp(SB), NOSPLIT|ABIInternal, $72-32
0 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) TEXT "".seccomp(SB), NOSPLIT|ABIInternal, $72-32
0 -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) SUBQ $72, SP
4 00004 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) MOVQ BP, 64(SP)
9 00009 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) LEAQ 64(SP), BP
e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $0, gclocals·ba30782f8935b28ed1adaec603e72627(SB)
e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $1, gclocals·663f8c6bfa83aa777198789ce63d9ab4(SB)
e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $2, "".seccomp.stkobj(SB)
e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111) PCDATA $0, $-2
e -2 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111) MOVQ "".ptr+88(SP), AX
(-1 is objabi.PCDATA_UnsafePointSafe and -2 is objabi.PCDATA_UnsafePointUnsafe,
from Go's cmd/internal/objabi.)
- Handle non-errno failures from seccomp() with SECCOMP_FILTER_FLAG_TSYNC.
PiperOrigin-RevId: 383757580
|
|
|
|
PiperOrigin-RevId: 380904249
|
|
When sentry is running in guest ring0, the goroutine stack is changing
and it will not be the stack when bluepill() is called.
If PMU interrupt hits when the CPU is in host ring 0, the perf handler
will try to get the stack of the kernel and the userspace(sentry). It
can travel back to sighandler() and try to continue to the stack of
the goroutine with the outdated frame pointer if sentry has been running
in the guest. The perf handler can't record correct addresses
from the outdated and wrong frames. Those addresses are often
irresolvable, and even if it is resolvable accidentally, it would
be misleading names.
To fix the problem, we just set the frame pointer(%RBP) to zero and
disconnect the link when the zeroed frame pointer is saved in the
frame in bluepillHandler().
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
|
|
|
|
UpperHalf is shared with all address spaces.
PiperOrigin-RevId: 379790539
|
|
|
|
PiperOrigin-RevId: 379337677
|
|
|
|
Fixes #214
PiperOrigin-RevId: 378680466
|
|
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
|
|
on Arm64 platform, we can use TLBI with 'IS' instead of IPI operation.
According to my understanding, the logic in invalidate() is much like
an IPI operation.
On Arm64, we can simply perform vmalle1is invalidation here, not
use IPI.
Reference:
https://github.com/torvalds/linux/blob/v5.12/arch/arm64/kvm/mmu.c#L81
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
|
|
|
|
PiperOrigin-RevId: 372608247
|
|
This patch is to solve problem that vCPU timer mess up when
adding vCPU dynamically on ARM64, for detailed information
please refer to:
https://github.com/google/gvisor/issues/5739
There is no influence on x86 and here are main changes for
ARM64:
1. create maxVCPUs number of vCPU in machine initialization
2. we want to sync gvisor vCPU number with host CPU number,
so use smaller number between runtime.NumCPU and
KVM_CAP_MAX_VCPUS to be maxVCPUS
3. put unused vCPUs into architecture-specific map initialvCPUs
4. When machine need to bind a new vCPU with tid, rather
than creating new one, it would pick a vCPU from map initalvCPUs
5. change the setSystemTime function. When vCPU number increasing,
the time cost for function setTSC(use syscall to set cntvoff) is
liner growth from around 300 ns to 100000 ns, and this leads to
the function setSystemTimeLegacy can not get correct offset
value.
6. initializing StdioFDs and goferFD before a platform to avoid
StdioFDs confects with vCPU fds
Signed-off-by: howard zhang <howard.zhang@arm.com>
|