summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/platform
AgeCommit message (Collapse)Author
2021-09-28Move `safecopy.ReplaceSignalHandler` into `sighandling` package.Etienne Perot
PiperOrigin-RevId: 399560357
2021-09-23Merge pull request #6573 from avagin:kvm-seccomp-mmapgVisor bot
PiperOrigin-RevId: 398572735
2021-09-22kvm: check that safecopy is handled correctly in the guest ring0Andrei Vagin
Signed-off-by: Andrei Vagin <avagin@google.com>
2021-09-22kvm: trap mmap syscalls to map new regions to the guestAndrei Vagin
We install seccomp rules so that the SIGSYS signal is generated for each mmap system call. Then our signal handler executes the real mmap syscall and if a new regions is created, it maps it to the guest. Signed-off-by: Andrei Vagin <avagin@google.com>
2021-09-22kvm/arm: calculate virtual-to-physical mappings only onceAndrei Vagin
2021-09-22kvm: fix tests on arm64AV
2021-08-23Merge pull request #6491 from avagin:kvm-mem-slot-overlapgVisor bot
PiperOrigin-RevId: 392554743
2021-08-21platform/kvm: set physical slots without overlappingAndrei Vagin
Right now, the first slot starts with an address of a memory region and its size is faultBlockSize, but the second slot starts with (physicalStart + faultBlockSize) & faultBlockMask. It means they will overlap if a start address of a memory region are not aligned to faultBlockSize. The kernel doesn't allow to add overlapped regions, but we ignore the EEXIST error. Signed-off-by: Andrei Vagin <avagin@google.com>
2021-08-09platform/kvm: fix a race condition in vCPU.unlock()Andrei Vagin
Right now, it contains the code: origState := atomic.LoadUint32(&c.state) atomicbitops.AndUint32(&c.state, ^vCPUUser) The problem here is that vCPU.bounce that is called from another thread can add vCPUWaiter when origState has been read but vCPUUser isn't cleared yet. In this case, vCPU.unlock doesn't notify other threads about changes and c.bounce will be stuck in the futex_wait call. PiperOrigin-RevId: 389697411
2021-07-30checklinkname: rudimentary type-checking of linkname directivesMichael Pratt
This CL introduces a 'checklinkname' analyzer, which provides rudimentary type-checking that verifies that function signatures on the local and remote sides of //go:linkname directives match expected values. If the Go standard library changes the definitions of any of these function, checklinkname will flag the change as a finding, providing an error informing the gVisor team to adapt to the upstream changes. This allows us to eliminate the majority of gVisor's forward-looking negative build tags, as we can catch mismatches in testing [1]. The remaining forward-looking negative build tags are covering shared struct definitions, which I hope to add to checklinkname in a future CL. [1] Of course, semantics/requirements can change without the signature changing, so we still must be careful, but this covers the common case. PiperOrigin-RevId: 387873847
2021-07-28tunning hasSlot function and fix store wrong value in usedSlotsHoward Zhang
Make hasSlot scan allocated slot, rather than the whole slice. It is supposed to store physicalStart in usedSlot. Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2021-07-20Merge pull request #6220 from laijs:disconnect-fpgVisor bot
PiperOrigin-RevId: 385919423
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-12Mark all functions that are called from a forked child with go:noraceAndrei Vagin
PiperOrigin-RevId: 384305599
2021-07-12Go 1.17 support for the KVM platformMichael Pratt
Go 1.17 adds a new register-based calling convention. While transparent for most applications, the KVM platform needs special work in a few cases. First of all, we need the actual address of some assembly functions, rather than the address of a wrapper. See http://gvisor.dev/pr/5832 for complete discussion of this. More relevant to this CL is that ABI0-to-ABIInternal wrappers (i.e., calls from assembly to Go) access the G via FS_BASE. The KVM quite fast-and-loose about the Go environment, often calling into (nosplit) Go functions with uninitialized FS_BASE. That will no longer work in Go 1.17, so this CL changes the platform to consistently restore FS_BASE before calling into Go code. This CL does not affect arm64 code. Go 1.17 does not support the register-based calling convention for arm64 (it will come in 1.18), but arm64 also does not use a non-standard register like FS_BASE for TLS, so it may not require any changes. PiperOrigin-RevId: 384234305
2021-07-08Fix some //pkg/seccomp bugs.Jamie Liu
- LockOSThread() around prctl(PR_SET_NO_NEW_PRIVS) => seccomp(). go:nosplit "mostly" prevents async preemption, but IIUC preemption is still permitted during function prologues: funcpctab "".seccomp [valfunc=pctopcdata] 0 -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) TEXT "".seccomp(SB), NOSPLIT|ABIInternal, $72-32 0 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) TEXT "".seccomp(SB), NOSPLIT|ABIInternal, $72-32 0 -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) SUBQ $72, SP 4 00004 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) MOVQ BP, 64(SP) 9 00009 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) LEAQ 64(SP), BP e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $0, gclocals·ba30782f8935b28ed1adaec603e72627(SB) e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $1, gclocals·663f8c6bfa83aa777198789ce63d9ab4(SB) e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $2, "".seccomp.stkobj(SB) e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111) PCDATA $0, $-2 e -2 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111) MOVQ "".ptr+88(SP), AX (-1 is objabi.PCDATA_UnsafePointSafe and -2 is objabi.PCDATA_UnsafePointUnsafe, from Go's cmd/internal/objabi.) - Handle non-errno failures from seccomp() with SECCOMP_FILTER_FLAG_TSYNC. PiperOrigin-RevId: 383757580
2021-06-22Merge pull request #5051 from lubinszARM:pr_escapes_1gVisor bot
PiperOrigin-RevId: 380904249
2021-06-22Disconnect call-chain between sighandler() and bluepill().Lai Jiangshan
When sentry is running in guest ring0, the goroutine stack is changing and it will not be the stack when bluepill() is called. If PMU interrupt hits when the CPU is in host ring 0, the perf handler will try to get the stack of the kernel and the userspace(sentry). It can travel back to sighandler() and try to continue to the stack of the goroutine with the outdated frame pointer if sentry has been running in the guest. The perf handler can't record correct addresses from the outdated and wrong frames. Those addresses are often irresolvable, and even if it is resolvable accidentally, it would be misleading names. To fix the problem, we just set the frame pointer(%RBP) to zero and disconnect the link when the zeroed frame pointer is saved in the frame in bluepillHandler(). Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
2021-06-16kvm: mark UpperHalf PTE-s as globalAndrei Vagin
UpperHalf is shared with all address spaces. PiperOrigin-RevId: 379790539
2021-06-14Fix typoMichael Pratt
PiperOrigin-RevId: 379337677
2021-06-10[op] Move SignalInfo to abi/linux package.Ayush Ranjan
Fixes #214 PiperOrigin-RevId: 378680466
2021-06-01 Fix errors for noescape casesRobin Luk
Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
2021-05-24arm64 kvm:use TLBI with "Inner Shareable" instead of IPI operationRobin Luk
on Arm64 platform, we can use TLBI with 'IS' instead of IPI operation. According to my understanding, the logic in invalidate() is much like an IPI operation. On Arm64, we can simply perform vmalle1is invalidation here, not use IPI. Reference: https://github.com/torvalds/linux/blob/v5.12/arch/arm64/kvm/mmu.c#L81 Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
2021-05-07Merge pull request #5758 from zhlhahaha:2125gVisor bot
PiperOrigin-RevId: 372608247
2021-05-07Init all vCPU when initializing machine on ARM64howard zhang
This patch is to solve problem that vCPU timer mess up when adding vCPU dynamically on ARM64, for detailed information please refer to: https://github.com/google/gvisor/issues/5739 There is no influence on x86 and here are main changes for ARM64: 1. create maxVCPUs number of vCPU in machine initialization 2. we want to sync gvisor vCPU number with host CPU number, so use smaller number between runtime.NumCPU and KVM_CAP_MAX_VCPUS to be maxVCPUS 3. put unused vCPUs into architecture-specific map initialvCPUs 4. When machine need to bind a new vCPU with tid, rather than creating new one, it would pick a vCPU from map initalvCPUs 5. change the setSystemTime function. When vCPU number increasing, the time cost for function setTSC(use syscall to set cntvoff) is liner growth from around 300 ns to 100000 ns, and this leads to the function setSystemTimeLegacy can not get correct offset value. 6. initializing StdioFDs and goferFD before a platform to avoid StdioFDs confects with vCPU fds Signed-off-by: howard zhang <howard.zhang@arm.com>
2021-04-30kvm: prefault a root table page before switching into a user address spaceAndrei Vagin
The root table physical page has to be mapped to not fault in iret or sysret after switching into a user address space. sysret and iret are in the upper half that is global and so page tables of lower levels are already mapped. Fixes #5742 PiperOrigin-RevId: 371458644
2021-04-21Merge pull request #5737 from dqminh:tsc-scalinggVisor bot
PiperOrigin-RevId: 369758655
2021-04-21Fallback to legacy system time logic when host does not have TSC_CONTROLDaniel Dao
If the host doesn't have TSC scaling feature, then scaling down TSC to the lowest value will fail, and we will fall back to legacy logic anyway, but we leave an ugly log message in host's kernel log. kernel: user requested TSC rate below hardware speed Instead, check for KVM_CAP_TSC_CONTROL when initializing KVM, and fall back to legacy logic early if host's cpu doesn't support that. Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-04-14Use assembly stub to take the address of assembly functionsMichael Pratt
Go 1.17 is adding a new register-based calling convention [1] ("ABIInternal"), which used is when calling between Go functions. Assembly functions are still written using the old ABI ("ABI0"). That is, they still accept arguments on the stack, and pass arguments to other functions on the stack. The call rules look approximately like this: 1. Direct call from Go function to Go function: compiler emits direct ABIInternal call. 2. Indirect call from Go function to Go function: compiler emits indirect ABIInternal call. 3. Direct call from Go function to assembly function: compiler emits direct ABI0 call. 4. Indirect call from Go function to assembly function: compiler emits indirect ABIInternal call to ABI conversion wrapper function. 5. Direct or indirect call from assembly function to assembly function: assembly/linker emits call to original ABI0 function. 6. Direct or indirect call from assembly function to Go function: assembly/linker emits ABI0 call to ABI conversion wrapper function. Case 4 is the interesting one here. Since the compiler can't know the ABI of an indirect call, all indirect calls are made with ABIInternal. In order to support indirect ABI0 assembly function calls, a wrapper is generated that translates ABIInternal arguments to ABI0 arguments, calls the target function, and then converts results back. When the address of an ABI0 function is taken from Go code, it evaluates to the address of this wrapper function rather than the target function so that later indirect calls will work as expected. This is normally fine, but gVisor does more than just call some of the assembly functions we take the address of: either noting the start and end address for future reference from a signal handler (safecopy), or copying the function text to a new mapping (platforms). Both of these fail with wrappers enabled (currently, this is Go tip with GOEXPERIMENT=regabiwrappers) because these operations end up operating on the wrapper instead of the target function. We work around this issue by taking advantage of case 5: references to assembly symbols from other assembly functions resolve directly to the desired target symbol. Thus, rather than using reflect to get the address of a Go reference to the functions, we create assembly stubs that return the address of the function. This approach works just as well on current versions of Go, so the change can be made immediately and doesn't require any build tags. [1] https://go.googlesource.com/go/+/refs/heads/master/src/cmd/compile/abi-internal.md PiperOrigin-RevId: 368505655
2021-04-09Merge pull request #5767 from avagin:mxcsrgVisor bot
PiperOrigin-RevId: 367730917
2021-04-08Merge pull request #5736 from lubinszARM:pr_bblu_tlb_asidgVisor bot
PiperOrigin-RevId: 367523491
2021-04-01platform/kvm/x86: restore mxcsr when switching from guest to sentryAndrei Vagin
Goruntime sets mxcsr once and never changes it. Reported-by: syzbot+ec55cea6e57ec083b7a6@syzkaller.appspotmail.com Fixes: #5754
2021-03-29[syserror] Split usermem packageZach Koopmans
Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-29Merge pull request #5728 from zhlhahaha:2091gVisor bot
PiperOrigin-RevId: 365613394
2021-03-29[perf] Reduce contention in ptrace.threadPool.lookupOrCreate().Ayush Ranjan
lookupOrCreate is called from subprocess.switchToApp() and subprocess.syscall(). lookupOrCreate() looks for a thread already created for the current TID. If a thread exists (common case), it returns immediately. Otherwise it creates a new one. This change switches to using a sync.RWMutex. The initial thread existence lookup is now done only with the read lock. So multiple successful lookups can occur concurrently. Only when a new thread is created will it acquire the lock for writing and update the map (which is not the common case). Discovered in mutex profiles from the various ptrace benchmarks. Example: https://gvisor.dev/profile/gvisor-buildkite/fd14bfad-b30f-44dc-859b-80ebac50beb4/843827db-da50-4dc9-a2ea-ecf734dde2d5/tmp/profile/ptrace/BenchmarkFio/operation.write/blockSize.4K/filesystem.tmpfs/benchmarks/fio/mutex.pprof/flamegraph PiperOrigin-RevId: 365612094
2021-03-26arm64 ring0: don't use inner-sharable to invalidate tlbRobin Luk
It is enough to invalidate the tlb of local vcpu in switch(). TLBI with inner-sharable will invalidate the tlb in other vcpu. Arm64 hardware supports at least 256 pcid, so I think it's ok to set the length of pcid pool to 128. Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
2021-03-25Fix comments errorHoward Zhang
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2021-03-23Merge pull request #5677 from avagin:kvm-mmiogVisor bot
PiperOrigin-RevId: 364728696
2021-03-23Move the code that manages floating-point state to a separate packageAndrei Vagin
This change is inspired by Adin's cl/355256448. PiperOrigin-RevId: 364695931
2021-03-16kvm: prefault a floating point state before restoring itAndrei Vagin
If physical pages of a memory region are not mapped yet, the kernel will trigger KVM_EXIT_MMIO and we will map physical pages in bluepillHandler(). An instruction that triggered a fault will not be re-executed, it will be emulated in the kernel, but it can't emulate complex instructions like xsave, xrstor. We can touch the memory with simple instructions to workaround this problem.
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-18Bump build constraints to Go 1.18Michael Pratt
These are bumped to allow early testing of Go 1.17. Use will be audited closer to the 1.17 release. PiperOrigin-RevId: 358278615
2021-02-10Merge pull request #5267 from lubinszARM:pr_usr_lazy_fpgVisor bot
PiperOrigin-RevId: 356762859
2021-02-04Move getcpu() to core filter listMichael Pratt
Some versions of the Go runtime call getcpu(), so add it for compatibility. The hostcpu package already uses getcpu() on arm64. PiperOrigin-RevId: 355717757
2021-02-03arm64 kvm:implement basic lazy save and restore for FPSIMD registersRobin Luk
Implement basic lazy save and restore for FPSIMD registers, which only restore FPSIMD state on el0_fpsimd_acc and save FPSIMD state in switch(). Signed-off-by: Robin Luk <lubin.lu@antgroup.com>
2021-02-02Move ring0 package.Adin Scannell
This allows the package to serve as a general purpose ring0 support package, as opposed to being bound to specific sentry platforms. Updates #5039 PiperOrigin-RevId: 355220044
2021-02-02Minor page tables improvements.Adin Scannell
* Make split safe. * Enable looking up next valid address. * Support mappings with !accessType.Any(), distinct from unmap. These changes allow for the use of pagetables in low-level OS packages, such as ring0, and allow for the use of pagetables for more generic address space reservation (by writing entries with no access specified). Updates #5039 PiperOrigin-RevId: 355109016
2021-01-19platform/ptrace: workaround a kernel ptrace issue on ARM64Andrei Vagin
On ARM64, when ptrace stops on a system call, it uses the x7 register to indicate whether the stop has been signalled from syscall entry or syscall exit. This means that we can't get a value of this register and we can't change it. More details are in the comment for tracehook_report_syscall in arch/arm64/kernel/ptrace.c. This happens only if we stop on a system call, so let's queue a signal, resume a stub thread and catch it on a signal handling. Fixes: #5238 PiperOrigin-RevId: 352668695
2021-01-13Merge pull request #4792 from lubinszARM:pr_kvm_testgVisor bot
PiperOrigin-RevId: 351638451
2021-01-12Fix simple mistakes identified by goreportcard.Adin Scannell
These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971