Age | Commit message (Collapse) | Author |
|
Actually, gvisor has KPTI (Kernel PageTable Isolation) between
gr0 and gr3. But the upper half of the userCR3 contains the
whole sentry kernel which makes the kernel vulnerable to
gr3 APP through CPU bugs.
This patch implement full KPTI functionality for gvisor. It doesn't
map the whole kernel in the upper. It maps only the text section
of the binary and the entry area required by the ISA. The entry area
contains the global idt, the percpu gdt/tss etc. The entry area
packs all these together which is less than 350k for 512 vCPUs.
The text section is normally nonsensitive. It is possible to
map only the entry functions (interrupt handler etc.) only.
But it requires some hacks.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
kernelEntry is split from CPU that contains minimal CPU-specific
arch state that can be mapped at the upper of the address space.
It is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
m.Get() has guaranteed that if any OS thread TID is in guest,
m.vCPUs[TID] points to the vCPU in which the OS thread TID is running.
So if m.Get() returns with the corrent context in guest,
the vCPU of it must be the same as what Get() returns.
So bluepill() doesn't need to check if the vCPU is matched or not.
The check need to access to %gs register which will not points
to vCPU later when KPTI for gvisor is enabled. We can still
fetch the vCPU pointer from %gs later (when %gs points to kernelEntry),
but it needs the ENTRY_CPU_SELF which is generated by
ring0/offset_amd64.go. So we just simply remove the check.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
Call jumpToKernel() in sysret()/iret() so that there is less
code and data in the upper half, and, especially, current
goroutine's stack and user regs will not be accessed from the
upper half (also with the help from previous patches which make
less code in userCR3 context).
jumpToUser() will not be needed, because current goroutine's stack
and return value in the stack is lower half address.
It is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
KernelCR3 takes effect as early as possible so that less code
is in the userCR3 environment. It is prepared for the next
patches that make less code and data in the upper half,
which is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
UserCR3 takes effect as late as possible so that less code
is in the userCR3 environment. It is prepared for the next
patches that make less code and data in the upper half,
which is prepared for KPTI for gvisor.
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
|
|
PiperOrigin-RevId: 324309862
|
|
PiperOrigin-RevId: 324279280
|
|
PiperOrigin-RevId: 324259991
|
|
Envoy (#170) uses this to get the original destination of redirected
packets.
|
|
Move to setstat.go and add a FileDescription wrapper method.
PiperOrigin-RevId: 324165277
|
|
PiperOrigin-RevId: 324127810
|
|
PiperOrigin-RevId: 324125938
|
|
PiperOrigin-RevId: 324100220
|
|
Return on success should be 0, not size of the struct copied out.
PiperOrigin-RevId: 324029193
|
|
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
This change allows the sentry to send FUSE_INIT request and process
the reply. It adds the corresponding structs, employs the fuse
device to send and read the message, and stores the results of negotiation
in corresponding places (inside connection struct).
It adds a CallAsync() function to the FUSE connection interface:
- like Call(), but it's for requests that do not expect immediate response (init, release, interrupt etc.)
- will block if the connection hasn't initialized, which is the same for Call()
|
|
Compare Linux's fs/eventpoll.c:do_epoll_ctl(). I don't know where EPOLLRDHUP
came from.
PiperOrigin-RevId: 323874419
|
|
full context switch: add fpsimd load/store support to container
application.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 323715260
|
|
PiperOrigin-RevId: 323491461
|
|
PiperOrigin-RevId: 323456118
|
|
PiperOrigin-RevId: 323455097
|
|
PiperOrigin-RevId: 323443142
|
|
This PR adds the following:
- [x] Marshall-able structs for fuse headers
- [x] Data structures needed in /dev/fuse to communicate with the daemon server
- [x] Implementation of the device interface
- [x] Go unit tests
This change adds the `/dev/fuse` implementation. `Connection` controls the
communication between the server and the sentry. The FUSE server uses
the `FileDescription` interface to interact with the Sentry. The Sentry
implmenetation of fusefs, uses `Connection` and the Connection interface
to interact with the Server. All communication messages are in the form
of `go_marshal` backed structs defined in the ABI package.
This change also adds some go unit tests that test (pretty basically)
the interfaces and should be used as an example of an end to end FUSE
operation.
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/3083 from ridwanmsharif:ridwanmsharif/fuse-device-impl 69aa2ce970004938fe9f918168dfe57636ab856e
PiperOrigin-RevId: 323428180
|
|
The subsequent systrap changes will need to import memmap from
the platform package.
PiperOrigin-RevId: 323409486
|
|
We need to correctly distinguish instruction_abort/data_abort for
mem_abort@Arm64.
So, EC/WNR/FSC in esr_el1 should be checked.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
At present, when doing syscall_kvm test, we need to
enable the function of ESR_ELx_SYS64_ISS_SYS_CNTVCT/ESR_ELx_SYS64_ISS_SYS_CNTFRQ to
successfully pass the test.
I set CNTKCTL_EL1.EL0VCTEN==1/CNTKCTL_EL1.EL0PCTEN==1, so that the related cases can passed.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 322954792
|
|
PiperOrigin-RevId: 322937495
|
|
- Check write permission on truncate(2). Unlike ftruncate(2),
truncate(2) fails if the user does not have write permissions
on the file.
- For gofers under InteropModeShared, check file type before
making a truncate request. We should fail early and avoid
making an rpc when possible. Furthermore, depending on the
remote host's failure may give us unexpected behavior--if the
host converts the truncate request to an ftruncate syscall on
an open fd, we will get EINVAL instead of EISDIR.
Updates #2923.
PiperOrigin-RevId: 322913569
|
|
Fix typos.
PiperOrigin-RevId: 322913282
|
|
Changes the API of tcpip.Clock to also provide a method for scheduling and
rescheduling work after a specified duration. This change also implements the
AfterFunc method for existing implementations of tcpip.Clock.
This is the groundwork required to mock time within tests. All references to
CancellableTimer has been replaced with the tcpip.Job interface, allowing for
custom implementations of scheduling work.
This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or
use tcpip.CancellableTimer. Migration plan:
1. Add AfterFunc(d, f) to tcpip.Clock
2. Replace references of tcpip.CancellableTimer with tcpip.Job
3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel
4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule
5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob.
PiperOrigin-RevId: 322906897
|
|
PiperOrigin-RevId: 322904430
|
|
PiperOrigin-RevId: 322890087
|
|
Like task_work in Linux, this allows us to register callbacks to be executed
before returning to userspace. This is needed for kcov support, which requires
coverage information to be up-to-date whenever we are in user mode. We will
provide coverage data through the kcov interface to enable coverage-directed
fuzzing in syzkaller.
One difference from Linux is that task work cannot queue work before the
transition to userspace that it precedes; queued work will be picked up before
the next transition.
PiperOrigin-RevId: 322889984
|
|
This patch load/save TLS for the container application.
Related issue: full context-switch supporting for Arm64 #1238
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/2761 from lubinszARM:pr_tls_2 cb5dbca1c9c3f378002406da7a58887f9b5032b3
PiperOrigin-RevId: 322887044
|
|
|
|
Allow FUSE filesystems to be mounted using libfuse.
The appropriate flags and mount options are parsed and
understood by fusefs.
|
|
Socket option values are now required to implement marshal.Marshallable.
Co-authored-by: Rahat Mahmood <rahat@google.com>
PiperOrigin-RevId: 322831612
|
|
And do some refactoring of the wait logic in sendfile/splice/tee.
Updates #1035 #2923
PiperOrigin-RevId: 322815521
|
|
We were getting the file attributes before locking the metadataMu which was
causing stale updates to the file attributes.
Fixes OpenTest_AppendConcurrentWrite.
Updates #2923
PiperOrigin-RevId: 322804438
|
|
For iptables users, Check() is a hot path called for every packet one or more
times. Let's avoid a bunch of map lookups.
PiperOrigin-RevId: 322678699
|
|
Updates #2923
PiperOrigin-RevId: 322671489
|
|
Helps in fixing open syscall tests: AppendConcurrentWrite and AppendOnly.
We also now update the file size for seekable special files (regular files)
which we were not doing earlier.
Updates #2923
PiperOrigin-RevId: 322670843
|
|
Updates #173
PiperOrigin-RevId: 322665518
|
|
Support the operation of asid, so that I can optimize tlb performance
by combining with nG.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
Updates #173
PiperOrigin-RevId: 321690756
|
|
PiperOrigin-RevId: 321496734
|
|
Now it calls pkt.Data.ToView() when writing the packet. This may require
copying when the packet is large, which puts the worse case in an even worse
situation.
This sent out in a separate preparation change as it requires syscall filter
changes. This change will be followed by the change for the adoption of the new
PacketHeader API.
PiperOrigin-RevId: 321447003
|