Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 336976081
|
|
PiperOrigin-RevId: 336970511
|
|
PiperOrigin-RevId: 336962937
|
|
The required states may simply not be observed by the thread running bounce, so
track guest and user generations to ensure that at least one of the desired
state transitions happens.
Fixes #3532
PiperOrigin-RevId: 336908216
|
|
PiperOrigin-RevId: 336719900
|
|
Signed-off-by: Min Le <lemin.lm@antgroup.com>
|
|
PiperOrigin-RevId: 336366624
|
|
PiperOrigin-RevId: 336362818
|
|
the correct value needed is 0xbbff440c0400 but the const
defined is 0x000000000000ffc0 due to the operator error
in _MT_EL1_INIT, both kernel and user space memory
attribute should be Normal memory not DEVICE_nGnRE
Signed-off-by: Min Le <lemin.lm@antgroup.com>
|
|
PiperOrigin-RevId: 335930035
|
|
By using TSC scaling as a hack, we can trick the kernel into setting an offset
of exactly zero. Huzzah!
PiperOrigin-RevId: 335922019
|
|
Updates #267
PiperOrigin-RevId: 335713923
|
|
PiperOrigin-RevId: 335532690
|
|
Before we thought that interrupts are always disabled in the kernel
space, but here is a case when goruntime switches on a goroutine which
has been saved in the host mode. On restore, the popf instruction is
used to restore flags and this means that all flags what the goroutine
has in the host mode will be restored in the kernel mode. And in the
host mode, interrupts are always enabled.
The long story short, we can't use the IF flag for determine whether a
tasks is running in user or kernel mode.
This patch reworks the code so that in userspace, the first bit of the
IOPL flag will be always set. This doesn't give any new privilidges for
a task because CPL in userspace is always 3. But then we can use this
flag to distinguish user and kernel modes. The IOPL flag is never set in
the kernel and host modes.
Reported-by: syzbot+5036b325a8eb15c030cf@syzkaller.appspotmail.com
Reported-by: syzbot+034d580e89ad67b8dc75@syzkaller.appspotmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
Related with issue #3019, #4056.
When running hello-world with gvisor-kvm, there is panic when exits:
"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x3c0 pc=0x7c3f18]
goroutine 284 [running]:
... ...
gvisor.dev/gvisor/pkg/sentry/platform/kvm.(*machine).dropPageTables(0x4000166840, 0x400032a040)
pkg/sentry/platform/kvm/machine_arm64.go:111 +0x88 fp=0x4000479e00 sp=0x4000479da0 pc=0x7c3f18
"
Also make dropPageTables() arch independent.
|
|
PiperOrigin-RevId: 334674481
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
Currently there is a problem with the preservation of usr-tls, which leads
to the contamination of sentry tls.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
after the SWITCH_TO_APP_PAGETABLE, the ASID is changed
to the application ASID, but there are still some
instruction before ERET, since these instruction is
not use the kernel address space, it may use the application's
TLB, which will cause fault, this patch can make sure that
after SWITCH_TO_APP_PAGETABLE sentry is still use kernel
address space which is mapped as Global.
Signed-off-by: Min Le <lemin.lm@antgroup.com>
|
|
some application such as openjdk will excute
DC CVAU at el0, if SCTLR_UCI is not set, it
will trap to EL1 which will cause panic.
Signed-off-by: Min Le <lemin.lm@antgroup.com>
|
|
PiperOrigin-RevId: 332069743
|
|
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.
The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.
- New conditional operations were added to pkg/seccomp to support operations
available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
that syscalls matching the conditionals result in the provided action not
simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
jumps are still not permitted.
Fixes #510
PiperOrigin-RevId: 331938697
|
|
Some optimizations in this pr:
1, Move ASID from TTBR0 to TTBR1
2, tlb_flush_all
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
Some CPUs(eg: ampere-emag) can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 330777900
|
|
PiperOrigin-RevId: 328639254
|
|
This immediately revealed an escape analysis violation (!), where
the sync.Map was being used in a context that escapes were not
allowed. This is a relatively minor fix and is included.
PiperOrigin-RevId: 328611237
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
This enables pre-release testing with 1.16. The intention is to replace these
with a nogo check before the next release.
PiperOrigin-RevId: 328193911
|
|
Our "Preconditions:" blocks are very useful to determine the input invariants,
but they are bit inconsistent throughout the codebase, which makes them harder
to read (particularly cases with 5+ conditions in a single paragraph).
I've reformatted all of the cases to fit in simple rules:
1. Cases with a single condition are placed on a single line.
2. Cases with multiple conditions are placed in a bulleted list.
This format has been added to the style guide.
I've also mentioned "Postconditions:", though those are much less frequently
used, and all uses already match this style.
PiperOrigin-RevId: 327687465
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
It indicates that the Sentry has changed the state of the thread and
next calls of PullFullState() has to do nothing.
PiperOrigin-RevId: 325567415
|
|
PiperOrigin-RevId: 325546308
|
|
Actually, gvisor has KPTI (Kernel PageTable Isolation) between
gr0 and gr3. But the upper half of the userCR3 contains the
whole sentry kernel which makes the kernel vulnerable to
gr3 APP through CPU bugs.
This patch implement full KPTI functionality for gvisor. It doesn't
map the whole kernel in the upper. It maps only the text section
of the binary and the entry area required by the ISA. The entry area
contains the global idt, the percpu gdt/tss etc. The entry area
packs all these together which is less than 350k for 512 vCPUs.
The text section is normally nonsensitive. It is possible to
map only the entry functions (interrupt handler etc.) only.
But it requires some hacks.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
kernelEntry is split from CPU that contains minimal CPU-specific
arch state that can be mapped at the upper of the address space.
It is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
m.Get() has guaranteed that if any OS thread TID is in guest,
m.vCPUs[TID] points to the vCPU in which the OS thread TID is running.
So if m.Get() returns with the corrent context in guest,
the vCPU of it must be the same as what Get() returns.
So bluepill() doesn't need to check if the vCPU is matched or not.
The check need to access to %gs register which will not points
to vCPU later when KPTI for gvisor is enabled. We can still
fetch the vCPU pointer from %gs later (when %gs points to kernelEntry),
but it needs the ENTRY_CPU_SELF which is generated by
ring0/offset_amd64.go. So we just simply remove the check.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
Call jumpToKernel() in sysret()/iret() so that there is less
code and data in the upper half, and, especially, current
goroutine's stack and user regs will not be accessed from the
upper half (also with the help from previous patches which make
less code in userCR3 context).
jumpToUser() will not be needed, because current goroutine's stack
and return value in the stack is lower half address.
It is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
KernelCR3 takes effect as early as possible so that less code
is in the userCR3 environment. It is prepared for the next
patches that make less code and data in the upper half,
which is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
UserCR3 takes effect as late as possible so that less code
is in the userCR3 environment. It is prepared for the next
patches that make less code and data in the upper half,
which is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
PiperOrigin-RevId: 324748508
|
|
PiperOrigin-RevId: 324309862
|
|
PiperOrigin-RevId: 324127810
|
|
PiperOrigin-RevId: 324125938
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
I disabled DAIF(DEBUG, sError, IRQ, FIQ) in guest kernel mode,
and enabled them in guest user mode.
So, I can make sure all DAIF-s come from guest user mode,
and then the case 'TestBounceStress' can passed on Arm64.
Test steps:
1, cd pkg/sentry/platform/kvm
2, bazel test kvm_test --strip=never --test_output=streamed
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
full context switch: add fpsimd load/store support to container
application.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 323456118
|
|
PiperOrigin-RevId: 323455097
|
|
The subsequent systrap changes will need to import memmap from
the platform package.
PiperOrigin-RevId: 323409486
|