summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2020-08-06amd64: implement KPTI for gvisorLai Jiangshan
Actually, gvisor has KPTI (Kernel PageTable Isolation) between gr0 and gr3. But the upper half of the userCR3 contains the whole sentry kernel which makes the kernel vulnerable to gr3 APP through CPU bugs. This patch implement full KPTI functionality for gvisor. It doesn't map the whole kernel in the upper. It maps only the text section of the binary and the entry area required by the ISA. The entry area contains the global idt, the percpu gdt/tss etc. The entry area packs all these together which is less than 350k for 512 vCPUs. The text section is normally nonsensitive. It is possible to map only the entry functions (interrupt handler etc.) only. But it requires some hacks. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-05amd64: introduce kernelEntryLai Jiangshan
kernelEntry is split from CPU that contains minimal CPU-specific arch state that can be mapped at the upper of the address space. It is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-05amd64: don't check vcpu in bluepill()Lai Jiangshan
m.Get() has guaranteed that if any OS thread TID is in guest, m.vCPUs[TID] points to the vCPU in which the OS thread TID is running. So if m.Get() returns with the corrent context in guest, the vCPU of it must be the same as what Get() returns. So bluepill() doesn't need to check if the vCPU is matched or not. The check need to access to %gs register which will not points to vCPU later when KPTI for gvisor is enabled. We can still fetch the vCPU pointer from %gs later (when %gs points to kernelEntry), but it needs the ENTRY_CPU_SELF which is generated by ring0/offset_amd64.go. So we just simply remove the check. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: less code and data in the upper halfLai Jiangshan
Call jumpToKernel() in sysret()/iret() so that there is less code and data in the upper half, and, especially, current goroutine's stack and user regs will not be accessed from the upper half (also with the help from previous patches which make less code in userCR3 context). jumpToUser() will not be needed, because current goroutine's stack and return value in the stack is lower half address. It is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: switch to kernelCR3 when just return to gr0Lai Jiangshan
KernelCR3 takes effect as early as possible so that less code is in the userCR3 environment. It is prepared for the next patches that make less code and data in the upper half, which is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: switch to userCR3 just before return to gr3Lai Jiangshan
UserCR3 takes effect as late as possible so that less code is in the userCR3 environment. It is prepared for the next patches that make less code and data in the upper half, which is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-07-31Merge pull request #3300 from lubinszARM:pr_fpsimd_usrgVisor bot
PiperOrigin-RevId: 324309862
2020-07-31Merge pull request #3348 from kevinGC:so-orig-dstgVisor bot
PiperOrigin-RevId: 324279280
2020-07-31Internal change.gVisor bot
PiperOrigin-RevId: 324259991
2020-07-31iptables: support SO_ORIGINAL_DSTKevin Krakauer
Envoy (#170) uses this to get the original destination of redirected packets.
2020-07-31Clean up vfs2 fallocate.Dean Deng
Move to setstat.go and add a FileDescription wrapper method. PiperOrigin-RevId: 324165277
2020-07-30Merge pull request #3448 from lubinszARM:pr_tls_testsgVisor bot
PiperOrigin-RevId: 324127810
2020-07-30Merge pull request #3028 from lubinszARM:pr_kvm_hello1gVisor bot
PiperOrigin-RevId: 324125938
2020-07-30Merge pull request #3179 from jinmouil:fuse_initgVisor bot
PiperOrigin-RevId: 324100220
2020-07-30Fix SETOWN_EX return value.Dean Deng
Return on success should be 0, not size of the struct copied out. PiperOrigin-RevId: 324029193
2020-07-30add usr-tls test cases for Arm64Bin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-29Add FUSE_INITJinmou Li
This change allows the sentry to send FUSE_INIT request and process the reply. It adds the corresponding structs, employs the fuse device to send and read the message, and stores the results of negotiation in corresponding places (inside connection struct). It adds a CallAsync() function to the FUSE connection interface: - like Call(), but it's for requests that do not expect immediate response (init, release, interrupt etc.) - will block if the connection hasn't initialized, which is the same for Call()
2020-07-29Force registration for EPOLLHUP, not EPOLLRDHUP, in vfs2's epoll.Jamie Liu
Compare Linux's fs/eventpoll.c:do_epoll_ctl(). I don't know where EPOLLRDHUP came from. PiperOrigin-RevId: 323874419
2020-07-29load/store user fpsimd on Arm64Bin Lu
full context switch: add fpsimd load/store support to container application. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-28Redirect TODO to GitHub issuesFabricio Voznika
PiperOrigin-RevId: 323715260
2020-07-27Fix strace for epoll event arrays.Jamie Liu
PiperOrigin-RevId: 323491461
2020-07-27Merge pull request #3201 from lubinszARM:pr_sys64_2gVisor bot
PiperOrigin-RevId: 323456118
2020-07-27Merge pull request #3299 from lubinszARM:pr_asidgVisor bot
PiperOrigin-RevId: 323455097
2020-07-27Merge pull request #2958 from lubinszARM:pr_vfs2_1gVisor bot
PiperOrigin-RevId: 323443142
2020-07-27Add device implementation for /dev/fuseRidwan Sharif
This PR adds the following: - [x] Marshall-able structs for fuse headers - [x] Data structures needed in /dev/fuse to communicate with the daemon server - [x] Implementation of the device interface - [x] Go unit tests This change adds the `/dev/fuse` implementation. `Connection` controls the communication between the server and the sentry. The FUSE server uses the `FileDescription` interface to interact with the Sentry. The Sentry implmenetation of fusefs, uses `Connection` and the Connection interface to interact with the Server. All communication messages are in the form of `go_marshal` backed structs defined in the ABI package. This change also adds some go unit tests that test (pretty basically) the interfaces and should be used as an example of an end to end FUSE operation. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/3083 from ridwanmsharif:ridwanmsharif/fuse-device-impl 69aa2ce970004938fe9f918168dfe57636ab856e PiperOrigin-RevId: 323428180
2020-07-27Move platform.File in memmapAndrei Vagin
The subsequent systrap changes will need to import memmap from the platform package. PiperOrigin-RevId: 323409486
2020-07-26updated the functions to distinguish IA/DA for Arm64Bin Lu
We need to correctly distinguish instruction_abort/data_abort for mem_abort@Arm64. So, EC/WNR/FSC in esr_el1 should be checked. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-26allow guest user applications read CNTVCT_EL0/CNTFRQ_EL0Bin Lu
At present, when doing syscall_kvm test, we need to enable the function of ESR_ELx_SYS64_ISS_SYS_CNTVCT/ESR_ELx_SYS64_ISS_SYS_CNTFRQ to successfully pass the test. I set CNTKCTL_EL1.EL0VCTEN==1/CNTKCTL_EL1.EL0PCTEN==1, so that the related cases can passed. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-24Enable automated marshalling for netstack.Ayush Ranjan
PiperOrigin-RevId: 322954792
2020-07-23Merge pull request #3142 from tanjianfeng:fix-3141gVisor bot
PiperOrigin-RevId: 322937495
2020-07-23Add permission checks to vfs2 truncate.Dean Deng
- Check write permission on truncate(2). Unlike ftruncate(2), truncate(2) fails if the user does not have write permissions on the file. - For gofers under InteropModeShared, check file type before making a truncate request. We should fail early and avoid making an rpc when possible. Furthermore, depending on the remote host's failure may give us unexpected behavior--if the host converts the truncate request to an ftruncate syscall on an open fd, we will get EINVAL instead of EISDIR. Updates #2923. PiperOrigin-RevId: 322913569
2020-07-23FileDescription is hard to spell.Dean Deng
Fix typos. PiperOrigin-RevId: 322913282
2020-07-23Add AfterFunc to tcpip.ClockSam Balana
Changes the API of tcpip.Clock to also provide a method for scheduling and rescheduling work after a specified duration. This change also implements the AfterFunc method for existing implementations of tcpip.Clock. This is the groundwork required to mock time within tests. All references to CancellableTimer has been replaced with the tcpip.Job interface, allowing for custom implementations of scheduling work. This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or use tcpip.CancellableTimer. Migration plan: 1. Add AfterFunc(d, f) to tcpip.Clock 2. Replace references of tcpip.CancellableTimer with tcpip.Job 3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel 4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule 5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob. PiperOrigin-RevId: 322906897
2020-07-23Implement get/set_robust_list.Nicolas Lacasse
PiperOrigin-RevId: 322904430
2020-07-23Merge pull request #3024 from ridwanmsharif:ridwanmsharif/fuse-stub-implgVisor bot
PiperOrigin-RevId: 322890087
2020-07-23Add task work mechanism.Dean Deng
Like task_work in Linux, this allows us to register callbacks to be executed before returning to userspace. This is needed for kcov support, which requires coverage information to be up-to-date whenever we are in user mode. We will provide coverage data through the kcov interface to enable coverage-directed fuzzing in syzkaller. One difference from Linux is that task work cannot queue work before the transition to userspace that it precedes; queued work will be picked up before the next transition. PiperOrigin-RevId: 322889984
2020-07-23kvm-tls-2:add the preservation of user-TLS in the Arm64 kvm platformlubinszARM
This patch load/save TLS for the container application. Related issue: full context-switch supporting for Arm64 #1238 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/2761 from lubinszARM:pr_tls_2 cb5dbca1c9c3f378002406da7a58887f9b5032b3 PiperOrigin-RevId: 322887044
2020-07-23Use mode supplied by the mount optionsRidwan Sharif
2020-07-23Added stub FUSE filesystemRidwan Sharif
Allow FUSE filesystems to be mounted using libfuse. The appropriate flags and mount options are parsed and understood by fusefs.
2020-07-23Marshallable socket opitons.Ayush Ranjan
Socket option values are now required to implement marshal.Marshallable. Co-authored-by: Rahat Mahmood <rahat@google.com> PiperOrigin-RevId: 322831612
2020-07-23Port sendfile to vfs2.Nicolas Lacasse
And do some refactoring of the wait logic in sendfile/splice/tee. Updates #1035 #2923 PiperOrigin-RevId: 322815521
2020-07-23[vfs2][gofer] Fix update attributes race condition.Ayush Ranjan
We were getting the file attributes before locking the metadataMu which was causing stale updates to the file attributes. Fixes OpenTest_AppendConcurrentWrite. Updates #2923 PiperOrigin-RevId: 322804438
2020-07-22iptables: replace maps with arraysKevin Krakauer
For iptables users, Check() is a hot path called for every packet one or more times. Let's avoid a bunch of map lookups. PiperOrigin-RevId: 322678699
2020-07-22[vfs2][tmpfs] Implement O_APPENDAyush Ranjan
Updates #2923 PiperOrigin-RevId: 322671489
2020-07-22Add O_APPEND support in vfs2 gofer.Ayush Ranjan
Helps in fixing open syscall tests: AppendConcurrentWrite and AppendOnly. We also now update the file size for seekable special files (regular files) which we were not doing earlier. Updates #2923 PiperOrigin-RevId: 322670843
2020-07-22Support for receiving outbound packets in AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 322665518
2020-07-20add asid support to Arm64Bin Lu
Support the operation of asid, so that I can optimize tlb performance by combining with nG. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-16Add support to return protocol in recvmsg for AF_PACKET.Bhasker Hariharan
Updates #173 PiperOrigin-RevId: 321690756
2020-07-15Merge pull request #3236 from craig08:fuse-kernfs-inode-stat-add-ctxgVisor bot
PiperOrigin-RevId: 321496734
2020-07-15fdbased: Vectorized write for packet; relax writev syscall filter.Ting-Yu Wang
Now it calls pkt.Data.ToView() when writing the packet. This may require copying when the packet is large, which puts the worse case in an even worse situation. This sent out in a separate preparation change as it requires syscall filter changes. This change will be followed by the change for the adoption of the new PacketHeader API. PiperOrigin-RevId: 321447003