summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/platform
AgeCommit message (Collapse)Author
2020-10-06Implement membarrier(2) commands other than *_SYNC_CORE.Jamie Liu
Updates #267 PiperOrigin-RevId: 335713923
2020-10-05Merge pull request #4079 from lemin9538:arm64_fixgVisor bot
PiperOrigin-RevId: 335532690
2020-10-02kvm/x86: handle a case when interrupts are enabled in the kernel spaceAndrei Vagin
Before we thought that interrupts are always disabled in the kernel space, but here is a case when goruntime switches on a goroutine which has been saved in the host mode. On restore, the popf instruction is used to restore flags and this means that all flags what the goroutine has in the host mode will be restored in the kernel mode. And in the host mode, interrupts are always enabled. The long story short, we can't use the IF flag for determine whether a tasks is running in user or kernel mode. This patch reworks the code so that in userspace, the first bit of the IOPL flag will be always set. This doesn't give any new privilidges for a task because CPL in userspace is always 3. But then we can use this flag to distinguish user and kernel modes. The IOPL flag is never set in the kernel and host modes. Reported-by: syzbot+5036b325a8eb15c030cf@syzkaller.appspotmail.com Reported-by: syzbot+034d580e89ad67b8dc75@syzkaller.appspotmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-09-30Merge pull request #2256 from laijs:kptigVisor bot
PiperOrigin-RevId: 334674481
2020-09-25make sure use the kernel space after change ASIDMin Le
after the SWITCH_TO_APP_PAGETABLE, the ASID is changed to the application ASID, but there are still some instruction before ERET, since these instruction is not use the kernel address space, it may use the application's TLB, which will cause fault, this patch can make sure that after SWITCH_TO_APP_PAGETABLE sentry is still use kernel address space which is mapped as Global. Signed-off-by: Min Le <lemin.lm@antgroup.com>
2020-09-16Merge pull request #3893 from lubinszARM:pr_n1_03gVisor bot
PiperOrigin-RevId: 332069743
2020-09-15Add support for OCI seccomp filters in the sandbox.Ian Lewis
OCI configuration includes support for specifying seccomp filters. In runc, these filter configurations are converted into seccomp BPF programs and loaded into the kernel via libseccomp. runsc needs to be a static binary so, for runsc, we cannot rely on a C library and need to implement the functionality in Go. The generator added here implements basic support for taking OCI seccomp configuration and converting it into a seccomp BPF program with the same behavior as a program generated by libseccomp. - New conditional operations were added to pkg/seccomp to support operations available in OCI. - AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect that syscalls matching the conditionals result in the provided action not simply SCMP_RET_ALLOW. - BuildProgram in pkg/seccomp no longer panics if provided an empty list of rules. It now builds a program with the architecture sanity check only. - ProgramBuilder now allows adding labels that are unused. However, backwards jumps are still not permitted. Fixes #510 PiperOrigin-RevId: 331938697
2020-09-11arm64 mm: asid and tlb supportBin Lu
Some optimizations in this pr: 1, Move ASID from TTBR0 to TTBR1 2, tlb_flush_all Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-10arm64:place an SB sequence following an ERET instructionBin Lu
Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-09Don't sched_setaffinity in ptrace platform.Jamie Liu
PiperOrigin-RevId: 330777900
2020-08-26Merge pull request #3742 from lubinszARM:pr_n1_1gVisor bot
PiperOrigin-RevId: 328639254
2020-08-26Support stdlib analyzers with nogo.Adin Scannell
This immediately revealed an escape analysis violation (!), where the sync.Map was being used in a context that escapes were not allowed. This is a relatively minor fix and is included. PiperOrigin-RevId: 328611237
2020-08-24Device major number greater than 2 digits in /proc/self/maps on arm64 N1 machineBin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-08-24Bump build constraints to 1.17Michael Pratt
This enables pre-release testing with 1.16. The intention is to replace these with a nogo check before the next release. PiperOrigin-RevId: 328193911
2020-08-20Consistent precondition formattingMichael Pratt
Our "Preconditions:" blocks are very useful to determine the input invariants, but they are bit inconsistent throughout the codebase, which makes them harder to read (particularly cases with 5+ conditions in a single paragraph). I've reformatted all of the cases to fit in simple rules: 1. Cases with a single condition are placed on a single line. 2. Cases with multiple conditions are placed in a bulleted list. This format has been added to the style guide. I've also mentioned "Postconditions:", though those are much less frequently used, and all uses already match this style. PiperOrigin-RevId: 327687465
2020-08-12Running hello-world on Thunderx2 with kvmBin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-08-07Add context.FullStateChanged()Andrei Vagin
It indicates that the Sentry has changed the state of the thread and next calls of PullFullState() has to do nothing. PiperOrigin-RevId: 325567415
2020-08-07Merge pull request #3069 from lubinszARM:pr_serr_injection2gVisor bot
PiperOrigin-RevId: 325546308
2020-08-06amd64: implement KPTI for gvisorLai Jiangshan
Actually, gvisor has KPTI (Kernel PageTable Isolation) between gr0 and gr3. But the upper half of the userCR3 contains the whole sentry kernel which makes the kernel vulnerable to gr3 APP through CPU bugs. This patch implement full KPTI functionality for gvisor. It doesn't map the whole kernel in the upper. It maps only the text section of the binary and the entry area required by the ISA. The entry area contains the global idt, the percpu gdt/tss etc. The entry area packs all these together which is less than 350k for 512 vCPUs. The text section is normally nonsensitive. It is possible to map only the entry functions (interrupt handler etc.) only. But it requires some hacks. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-05amd64: introduce kernelEntryLai Jiangshan
kernelEntry is split from CPU that contains minimal CPU-specific arch state that can be mapped at the upper of the address space. It is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-05amd64: don't check vcpu in bluepill()Lai Jiangshan
m.Get() has guaranteed that if any OS thread TID is in guest, m.vCPUs[TID] points to the vCPU in which the OS thread TID is running. So if m.Get() returns with the corrent context in guest, the vCPU of it must be the same as what Get() returns. So bluepill() doesn't need to check if the vCPU is matched or not. The check need to access to %gs register which will not points to vCPU later when KPTI for gvisor is enabled. We can still fetch the vCPU pointer from %gs later (when %gs points to kernelEntry), but it needs the ENTRY_CPU_SELF which is generated by ring0/offset_amd64.go. So we just simply remove the check. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: less code and data in the upper halfLai Jiangshan
Call jumpToKernel() in sysret()/iret() so that there is less code and data in the upper half, and, especially, current goroutine's stack and user regs will not be accessed from the upper half (also with the help from previous patches which make less code in userCR3 context). jumpToUser() will not be needed, because current goroutine's stack and return value in the stack is lower half address. It is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: switch to kernelCR3 when just return to gr0Lai Jiangshan
KernelCR3 takes effect as early as possible so that less code is in the userCR3 environment. It is prepared for the next patches that make less code and data in the upper half, which is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: switch to userCR3 just before return to gr3Lai Jiangshan
UserCR3 takes effect as late as possible so that less code is in the userCR3 environment. It is prepared for the next patches that make less code and data in the upper half, which is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-03Add callbacks to support lazy loading/restoring thread statesAndrei Vagin
PiperOrigin-RevId: 324748508
2020-07-31Merge pull request #3300 from lubinszARM:pr_fpsimd_usrgVisor bot
PiperOrigin-RevId: 324309862
2020-07-30Merge pull request #3448 from lubinszARM:pr_tls_testsgVisor bot
PiperOrigin-RevId: 324127810
2020-07-30Merge pull request #3028 from lubinszARM:pr_kvm_hello1gVisor bot
PiperOrigin-RevId: 324125938
2020-07-30add usr-tls test cases for Arm64Bin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-30supporting sError injection step 2 on Arm64Bin Lu
I disabled DAIF(DEBUG, sError, IRQ, FIQ) in guest kernel mode, and enabled them in guest user mode. So, I can make sure all DAIF-s come from guest user mode, and then the case 'TestBounceStress' can passed on Arm64. Test steps: 1, cd pkg/sentry/platform/kvm 2, bazel test kvm_test --strip=never --test_output=streamed Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-29load/store user fpsimd on Arm64Bin Lu
full context switch: add fpsimd load/store support to container application. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-27Merge pull request #3201 from lubinszARM:pr_sys64_2gVisor bot
PiperOrigin-RevId: 323456118
2020-07-27Merge pull request #3299 from lubinszARM:pr_asidgVisor bot
PiperOrigin-RevId: 323455097
2020-07-27Move platform.File in memmapAndrei Vagin
The subsequent systrap changes will need to import memmap from the platform package. PiperOrigin-RevId: 323409486
2020-07-26updated the functions to distinguish IA/DA for Arm64Bin Lu
We need to correctly distinguish instruction_abort/data_abort for mem_abort@Arm64. So, EC/WNR/FSC in esr_el1 should be checked. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-26allow guest user applications read CNTVCT_EL0/CNTFRQ_EL0Bin Lu
At present, when doing syscall_kvm test, we need to enable the function of ESR_ELx_SYS64_ISS_SYS_CNTVCT/ESR_ELx_SYS64_ISS_SYS_CNTFRQ to successfully pass the test. I set CNTKCTL_EL1.EL0VCTEN==1/CNTKCTL_EL1.EL0PCTEN==1, so that the related cases can passed. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-23kvm-tls-2:add the preservation of user-TLS in the Arm64 kvm platformlubinszARM
This patch load/save TLS for the container application. Related issue: full context-switch supporting for Arm64 #1238 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/2761 from lubinszARM:pr_tls_2 cb5dbca1c9c3f378002406da7a58887f9b5032b3 PiperOrigin-RevId: 322887044
2020-07-20add asid support to Arm64Bin Lu
Support the operation of asid, so that I can optimize tlb performance by combining with nG. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-13Merge pull request #3200 from lubinszARM:pr_kvm_ut_1gVisor bot
PiperOrigin-RevId: 321060717
2020-07-10Split the kvm ut test cases to correspond to different platformsBin Lu
Split the kvm ut test cases to pass unit-tests on Arm64. I will add the tls and full-context test cases for Arm64 later. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-03allow guest user applications read ctr_el0 on Arm64Bin Lu
At present, when doing syscall_kvm test, we need to enable the function of ESR_ELx_SYS64_ISS_SYS_CTR_READ to successfully pass the test. I set SCTLR_EL1.UCT==1, so that the related cases can passed. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-06-16support sError injection in kvm module on Arm64Bin Lu
There are 3 types of asynchronous exceptions on Arm64: sError, IRQ, FIQ. In this case, we use the sError injection method in bluepillHandler to force the guest to quit. So that the test case of "TestBounce" can be passed on Arm64. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-06-10Merge pull request #2711 from lubinszARM:pr_mmiogVisor bot
PiperOrigin-RevId: 315812219
2020-06-09minor change in kvm module for Arm64Bin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-06-05Add +checkescape annotations to kvm/ring0.Adin Scannell
This analysis also catches a potential bug, which is a split on mapPhysical. This would have led to potential guest-exit during Mapping (although this would have been handled by the now-unecessary retryInGuest loop). PiperOrigin-RevId: 315025106
2020-06-01Merge pull request #2689 from lubinszARM:pr_prot_nonegVisor bot
PiperOrigin-RevId: 314186752
2020-05-29Update Go version build tagsMichael Pratt
None of the dependencies have changed in 1.15. It may be possible to simplify some of the wrappers in rawfile following 1.13, but that can come in a later change. PiperOrigin-RevId: 313863264
2020-05-17adding the VM-Exit method for Arm64Bin Lu
On amd64, it uses 'HLT' to leave the guest. Unlike amd64, arm64 can only uses mmio_exit/psci to leave the guest. So, I designed the HYPERCALL_VMEXIT to be compatible with amd64/arm64. To keep it simple, I used the address of exception table as the MMIO base address, so that I can trigger a MMIO-EXIT by forcibly writing this space. Then, in host user space, I can calculate this address to find out which hypercall. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-13PROT_NONE should be specially treated in the step of mapPhysicalBin Lu
It's a workaround to treat PROT_NONE as RDONLY temporarily. TODO(gvisor.dev/issue/2686): PROT_NONE should be specially treated. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-05-13adding the methods to get/set TLS for Arm64 kvm platformBin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>