summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/platform
AgeCommit message (Collapse)Author
2018-10-10Add seccomp filter configuration to ptrace stubs.Adin Scannell
This is a defense-in-depth measure. If the sentry is compromised, this prevents system call injection to the stubs. There is some complexity with respect to ptrace and seccomp interactions, so this protection is not really available for kernel versions < 4.8; this is detected dynamically. Note that this also solves the vsyscall emulation issue by adding in appropriate trapping for those system calls. It does mean that a compromised sentry could theoretically inject these into the stub (ignoring the trap and resume, thereby allowing execution), but they are harmless. PiperOrigin-RevId: 216647581 Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
2018-09-18Provide better message when memfd_create fails with ENOSYSFabricio Voznika
Updates #100 PiperOrigin-RevId: 213414821 Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37
2018-09-14Avoid reuse of pending SignalInfo objectsnewmanwang
runApp.execute -> Task.SendSignal -> sendSignalLocked -> sendSignalTimerLocked -> pendingSignals.enqueue assumes that it owns the arch.SignalInfo returned from platform.Context.Switch. On the other hand, ptrace.context.Switch assumes that it owns the returned SignalInfo and can safely reuse it on the next call to Switch. The KVM platform always returns a unique SignalInfo. This becomes a problem when the returned signal is not immediately delivered, allowing a future signal in Switch to change the previous pending SignalInfo. This is noticeable in #38 when external SIGINTs are delivered from the PTY slave FD. Note that the ptrace stubs are in the same process group as the sentry, so they are eligible to receive the PTY signals. This should probably change, but is not the only possible cause of this bug. Updates #38 Original change by newmanwang <wcs1011@gmail.com>, updated by Michael Pratt <mpratt@google.com>. Change-Id: I5383840272309df70a29f67b25e8221f933622cd PiperOrigin-RevId: 213071072
2018-09-13platform/kvm: Get max vcpu number dynamically by ioctlChenggang
The old kernel version, such as 4.4, only support 255 vcpus. While gvisor is ran on these kernels, it could panic because the vcpu id and vcpu number beyond max_vcpus. Use ioctl(vmfd, _KVM_CHECK_EXTENSION, _KVM_CAP_MAX_VCPUS) to get max vcpus number dynamically. Change-Id: I50dd859a11b1c2cea854a8e27d4bf11a411aa45c PiperOrigin-RevId: 212929704
2018-09-11platform: Pass device fd into platform constructor.Nicolas Lacasse
We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-10Map committed chunks concurrently in FileMem.LoadFrom.Jamie Liu
PiperOrigin-RevId: 212345401 Change-Id: Iac626ee87ba312df88ab1019ade6ecd62c04c75c
2018-08-28Bump to Go 1.11Michael Pratt
The procid offset is unchanged. PiperOrigin-RevId: 210551969 Change-Id: I33ba1ce56c2f5631b712417d870aa65ef24e6022
2018-08-22Add separate Recycle method for allocator.Adin Scannell
This improves debugging for pagetable-related issues. PiperOrigin-RevId: 209827795 Change-Id: I4cfa11664b0b52f26f6bc90a14c5bb106f01e038
2018-08-08Protect PCIDs with a mutex.Adin Scannell
Because the Drop method may be called across vCPUs, it is necessary to protect the PCID database with a mutex to prevent concurrent modification. The PCID is assigned prior to entersyscall, so it's safe to block. PiperOrigin-RevId: 207992864 Change-Id: I8b36d55106981f51e30dcf03e12886330bb79d67
2018-08-06Fix a bug in PCIDs.AssignShiruRen
Store the new assigned pcid in p.cache[pt]. Signed-off-by: ShiruRen <renshiru2000@gmail.com> Change-Id: I4aee4e06559e429fb5e90cb9fe28b36139e3b4b6 PiperOrigin-RevId: 207563833
2018-08-02Automated rollback of changelist 207037226Zhaozhong Ni
PiperOrigin-RevId: 207125440 Change-Id: I6c572afb4d693ee72a0c458a988b0e96d191cd49
2018-08-01Automated rollback of changelist 207007153Michael Pratt
PiperOrigin-RevId: 207037226 Change-Id: I8b5f1a056d4f3eab17846f2e0193bb737ecb5428
2018-08-01stateify: convert all packages to use explicit mode.Zhaozhong Ni
PiperOrigin-RevId: 207007153 Change-Id: Ifedf1cc3758dc18be16647a4ece9c840c1c636c9
2018-07-27stateify: support explicit annotation mode; convert refs and stack packages.Zhaozhong Ni
We have been unnecessarily creating too many savable types implicitly. PiperOrigin-RevId: 206334201 Change-Id: Idc5a3a14bfb7ee125c4f2bb2b1c53164e46f29a8
2018-07-23Add KVM and overlay dimensions to container_testFabricio Voznika
PiperOrigin-RevId: 205714667 Change-Id: I317a2ca98ac3bdad97c4790fcc61b004757d99ef
2018-07-17Merge FileMem.usage in IncRefMichael Pratt
Per the doc, usage must be kept maximally merged. Beyond that, it is simply a good idea to keep fragmentation in usage to a minimum. The glibc malloc allocator allocates one page at a time, potentially causing lots of fragmentation. However, those pages are likely to have the same number of references, often making it possible to merge ranges. PiperOrigin-RevId: 204960339 Change-Id: I03a050cf771c29a4f05b36eaf75b1a09c9465e14
2018-07-16Add CPUID faulting for ptrace and KVM.Adin Scannell
PiperOrigin-RevId: 204858314 Change-Id: I8252bf8de3232a7a27af51076139b585e73276d4
2018-07-16Start allocation and reclaim scans only where they may find a matchMichael Pratt
If usageSet is heavily fragmented, findUnallocatedRange and findReclaimable can spend excessive cycles linearly scanning the set for unallocated/free pages. Improve common cases by beginning the scan only at the first page that could possibly contain an unallocated/free page. This metadata only guarantees that there is no lower unallocated/free page, but a scan may still be required (especially for multi-page allocations). That said, this heuristic can still provide significant performance improvements for certain applications. PiperOrigin-RevId: 204841833 Change-Id: Ic41ad33bf9537ecd673a6f5852ab353bf63ea1e6
2018-07-11Add MemoryManager.Pin.Jamie Liu
PiperOrigin-RevId: 204162313 Change-Id: Ib0593dde88ac33e222c12d0dca6733ef1f1035dc
2018-06-26Change SIGCHLD to SIGKILL in ptrace stubs.Adin Scannell
If the child stubs are killed by any unmaskable signal (e.g. SIGKILL), then the parent process will similarly be killed, resulting in the death of all other stubs. The effect of this is that if the OOM killer selects and kills a stub, the effect is the same as though the OOM killer selected and killed the sentry. PiperOrigin-RevId: 202219984 Change-Id: I0b638ce7e59e0a0f4d5cde12a7d05242673049d7
2018-06-19Make KVM more scalable by removing CPU cap.Adin Scannell
Instead, CPUs will be created dynamically. We also allow a relatively efficient mechanism for stealing and notifying when a vCPU becomes available via unlock. Since the number of vCPUs is no longer fixed at machine creation time, we make the dirtySet packing more efficient. This has the pleasant side effect of cutting out the unsafe address space code. PiperOrigin-RevId: 201266691 Change-Id: I275c73525a4f38e3714b9ac0fd88731c26adfe66
2018-06-15Use notify explicitly on unlock path.Adin Scannell
There are circumstances under which the redpill call will not generate the appropriate action and notification. Replace this call with an explicit notification, which is guaranteed to transition as well as perform the futex wake. PiperOrigin-RevId: 200726934 Change-Id: Ie19e008a6007692dd7335a31a8b59f0af6e54aaa
2018-06-13Deflake kvm_test.Adin Scannell
PiperOrigin-RevId: 200439846 Change-Id: I9970fe0716cb02f0f41b754891d55db7e0729f56
2018-06-13Log filemem state when panicing due to invalid refcount.Jamie Liu
PiperOrigin-RevId: 200408305 Change-Id: I676ee49ec77697105723577928c7f82088cd378e
2018-06-11Minor ring0 interface cleanup.Adin Scannell
- Remove unused methods. - Provide declaration for asm function. PiperOrigin-RevId: 200146850 Change-Id: Ic455c96ffe0d2e78ef15f824eb65d7de705b054a
2018-06-11Make page tables split-safe.Adin Scannell
In order to minimize the likelihood of exit during page table modifications, make the full set of page table functions split-safe. This is not strictly necessary (and you may still incur splits due to allocations from the allocator pool) but should make retries a very rare occurance. PiperOrigin-RevId: 200146688 Change-Id: I8fa36aa16b807beda2f0b057be60038258e8d597
2018-06-11Handle all exception vectors.Adin Scannell
PiperOrigin-RevId: 200144655 Change-Id: I5a753c74b75007b7714d6fe34aa0d2e845dc5c41
2018-06-08Fix kernel flags handling and add missing vectors.Adin Scannell
PiperOrigin-RevId: 199877174 Change-Id: I9d19ea301608c2b989df0a6123abb1e779427853
2018-06-06Ensure guest-mode for page table modifications.Adin Scannell
Because of the KVM shadow page table implementation, modifications made to guest page tables from host mode may not be syncronized correctly, resulting in undefined behavior. This is a KVM bug: page table pages should also be tracked for host modifications and resynced appropriately (e.g. the guest could "DMA" into a page table page in theory). However, since we can't rely on this being fixed everywhere, workaround the issue by forcing page table modifications to be in guest mode. This will generally be the case anyways, but now if an exit occurs during modifications, we will re-enter and perform the modifications again. PiperOrigin-RevId: 199587895 Change-Id: I83c20b4cf2a9f9fa56f59f34939601dd34538fb0
2018-06-06Split PCID implementation from page tables.Adin Scannell
Instead of associating a single PCID with each set of page tables (which will reach the maximum quickly), allow a dynamic pool for each vCPU. This is the same way that Linux operates. We also split management of PCIDs out of the page tables themselves for simplicity. PiperOrigin-RevId: 199585631 Change-Id: I42f3486ada3cb2a26f623c65ac279b473ae63201
2018-06-06Add allocator abstraction for page tables.Adin Scannell
In order to prevent possible garbage collection and reuse of page table pages prior to invalidation, introduce a former allocator abstraction that can ensure entries are held during a single traversal. This also cleans up the abstraction and splits it out of the machine itself. PiperOrigin-RevId: 199581636 Change-Id: I2257d5d7ffd9c36f9b7ecd42f769261baeaf115c
2018-06-01Move page tables lock into the address space.Adin Scannell
This is necessary to prevent races with invalidation. It is currently possible that page tables are garbage collected while paging caches refer to them. We must ensure that pages are held until caches can be invalidated. This is not achieved by this goal alone, but moving locking to outside the page tables themselves is a requisite. PiperOrigin-RevId: 198920784 Change-Id: I66fffecd49cb14aa2e676a84a68cabfc0c8b3e9a
2018-05-30Restore FS on resume.Adin Scannell
Previously, the vCPU FS was always correct because it relied on the reset coming out of the switch. When that doesn't occur, for example, using bluepill directly, the FS value can be incorrect leading to strange corruption. This change is necessary for a subsequent change that enforces guest mode for page table modifications, and it may reduce test flakiness. (The problematic path may occur in tests, but does not occur in the actual platform.) PiperOrigin-RevId: 198648137 Change-Id: I513910a973dd8666c9a1d18cf78990964d6a644d
2018-05-30Change ring0 & page tables arguments to structs.Adin Scannell
This is a refactor of ring0 and ring0/pagetables that changes from individual arguments to opts structures. This should involve no functional changes, but sets the stage for subsequent changes. PiperOrigin-RevId: 198627556 Change-Id: Id4460340f6a73f0c793cd879324398139cd58ae9
2018-05-21Dramatically improve handling of KVM vCPU pool.Adin Scannell
Especially in situations with small numbers of vCPUs, the existing system resulted in excessive thrashing. Now, execution contexts co-ordinate as smoothly as they can to share a small number of cores. PiperOrigin-RevId: 197483323 Change-Id: I0afc0c5363ea9386994355baf3904bf5fe08c56c
2018-05-15Fix KVM EFAULT handling.Adin Scannell
PiperOrigin-RevId: 196781718 Change-Id: I889766eed871929cdc247c6b9aa634398adea9c9
2018-05-15Simplify KVM invalidation logic.Adin Scannell
PiperOrigin-RevId: 196780209 Change-Id: I89f39eec914ce54a7c6c4f28e1b6d5ff5a7dd38d
2018-05-15Simplify KVM state handling.Adin Scannell
This also removes the dependency on tmutex. PiperOrigin-RevId: 196764317 Change-Id: I523fb67454318e1a2ca9da3a08e63bfa3c1eeed3
2018-05-14Disable INVPCID check; it's not used.Adin Scannell
PiperOrigin-RevId: 196615029 Change-Id: Idfa383a9aee6a9397167a4231ce99d0b0e5b9912
2018-05-14Make KVM system call first check.Adin Scannell
PiperOrigin-RevId: 196613447 Change-Id: Ib76902896798f072c3031b0c5cf7b433718928b7
2018-05-14Simplify KVM host map handling.Adin Scannell
PiperOrigin-RevId: 196611084 Change-Id: I6afa6b01e1dcd2aa9776dfc0f910874cc6b8d72c
2018-05-14Ignore spurious KVM emulation failures.Adin Scannell
PiperOrigin-RevId: 196609789 Change-Id: Ie261eea3b7fa05b6c348ca93e229de26cbd4dc7d
2018-05-11Remove error return from AddressSpace.Release()Michael Pratt
PiperOrigin-RevId: 196291289 Change-Id: Ie3487be029850b0b410b82416750853a6c4a2b00
2018-05-07Fix misspellingsIan Gudger
PiperOrigin-RevId: 195742598 Change-Id: Ibd4a8e4394e268c87700b6d1e50b4b37dfce5182
2018-05-01Set LMA in EFERMichael Pratt
As of Linux 4.15 (f29810335965ac1f7bcb501ee2af5f039f792416 KVM/x86: Check input paging mode when cs.l is set), KVM validates that LMA is set along with LME. PiperOrigin-RevId: 195047401 Change-Id: I8b43d8f758a85b1f58ccbd747dcacd4056ef3f66
2018-04-28Check in gVisor.Googler
PiperOrigin-RevId: 194583126 Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463