summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/kernel
AgeCommit message (Collapse)Author
2019-03-25Call memmap.Mappable.Translate with more conservative usermem.AccessType.Jamie Liu
MM.insertPMAsLocked() passes vma.maxPerms to memmap.Mappable.Translate (although it unsets AccessType.Write if the vma is private). This somewhat simplifies handling of pmas, since it means only COW-break needs to replace existing pmas. However, it also means that a MAP_SHARED mapping of a file opened O_RDWR dirties the file, regardless of the mapping's permissions and whether or not the mapping is ever actually written to with I/O that ignores permissions (e.g. ptrace(PTRACE_POKEDATA)). To fix this: - Change the pma-getting path to request only the permissions that are required for the calling access. - Change memmap.Mappable.Translate to take requested permissions, and return allowed permissions. This preserves the existing behavior in the common cases where the memmap.Mappable isn't fsutil.CachingInodeOperations and doesn't care if the translated platform.File pages are written to. - Change the MM.getPMAsLocked path to support permission upgrading of pmas outside of copy-on-write. PiperOrigin-RevId: 240196979 Change-Id: Ie0147c62c1fbc409467a6fa16269a413f3d7d571
2019-03-25epoll: use ilist:generic_list instead of ilist:ilistAndrei Vagin
ilist:generic_list works faster than ilist:ilist. Here is a beanchmark test to measure performance of epoll_wait, when readyList isn't empty. It shows about 30% better performance with these changes. Benchmark Time(ns) CPU(ns) Iterations Before: BM_EpollAllEvents 46725 46899 14286 After: BM_EpollAllEvents 33167 33300 18919 PiperOrigin-RevId: 240185278 Change-Id: I3e33f9b214db13ab840b91613400525de5b58d18
2019-03-22Implement PTRACE_SEIZE, PTRACE_INTERRUPT, and PTRACE_LISTEN.Jamie Liu
PiperOrigin-RevId: 239803092 Change-Id: I42d612ed6a889e011e8474538958c6de90c6fcab
2019-03-20gvisor: don't allocate a new credential object on forkAndrei Vagin
A credential object is immutable, so we don't need to copy it for a new task. PiperOrigin-RevId: 239519266 Change-Id: I0632f641fdea9554779ac25d84bee4231d0d18f2
2019-03-18Remove racy access to shm fields.Rahat Mahmood
PiperOrigin-RevId: 239016776 Change-Id: Ia7af4258e7c69b16a4630a6f3278aa8e6b627746
2019-03-14Decouple filemem from platform and move it to pgalloc.MemoryFile.Jamie Liu
This is in preparation for improved page cache reclaim, which requires greater integration between the page cache and page allocator. PiperOrigin-RevId: 238444706 Change-Id: Id24141b3678d96c7d7dc24baddd9be555bffafe4
2019-03-05Priority-inheritance futex implementationFabricio Voznika
It is Implemented without the priority inheritance part given that gVisor defers scheduling decisions to Go runtime and doesn't have control over it. PiperOrigin-RevId: 236989545 Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
2019-03-01Add semctl(GETPID) syscallFabricio Voznika
Also added unimplemented notification for semctl(2) commands. PiperOrigin-RevId: 236340672 Change-Id: I0795e3bd2e6d41d7936fabb731884df426a42478
2019-02-20Make some ptrace commands x86-onlyHaibo Xu
Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9751f859332d433ca772d6b9733f5a5a64398ec7 PiperOrigin-RevId: 234877624
2019-02-19Set rax to syscall number on SECCOMP_RET_TRAP.Jamie Liu
PiperOrigin-RevId: 234690475 Change-Id: I1cbfb5aecd4697a4a26ec8524354aa8656cc3ba1
2019-02-19Fix clone(CLONE_NEWUSER).Jamie Liu
- Use new user namespace for namespace creation checks. - Ensure userns is never nil since it's used by other namespaces. PiperOrigin-RevId: 234673175 Change-Id: I4b9d9d1e63ce4e24362089793961a996f7540cd9
2019-02-14Don't allow writing or reading to TTY unless process group is in foreground.Nicolas Lacasse
If a background process tries to read from a TTY, linux sends it a SIGTTIN unless the signal is blocked or ignored, or the process group is an orphan, in which case the syscall returns EIO. See drivers/tty/n_tty.c:n_tty_read()=>job_control(). If a background process tries to write a TTY, set the termios, or set the foreground process group, linux then sends a SIGTTOU. If the signal is ignored or blocked, linux allows the write. If the process group is an orphan, the syscall returns EIO. See drivers/tty/tty_io.c:tty_check_change(). PiperOrigin-RevId: 234044367 Change-Id: I009461352ac4f3f11c5d42c43ac36bb0caa580f9
2019-02-07Implement /proc/net/unix.Rahat Mahmood
PiperOrigin-RevId: 232948478 Change-Id: Ib830121e5e79afaf5d38d17aeef5a1ef97913d23
2019-02-07Implement semctl(2) SETALL and GETALLFabricio Voznika
PiperOrigin-RevId: 232914984 Change-Id: Id2643d7ad8e986ca9be76d860788a71db2674cda
2019-01-31Move package sync to third_partyMichael Pratt
PiperOrigin-RevId: 231889261 Change-Id: I482f1df055bcedf4edb9fe3fe9b8e9c80085f1a0
2019-01-31Remove license commentsMichael Pratt
Nothing reads them and they can simply get stale. Generated with: $ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD PiperOrigin-RevId: 231818945 Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
2019-01-14Remove fs.Handle, ramfs.Entry, and all the DeprecatedFileOperations.Nicolas Lacasse
More helper structs have been added to the fsutil package to make it easier to implement fs.InodeOperations and fs.FileOperations. PiperOrigin-RevId: 229305982 Change-Id: Ib6f8d3862f4216745116857913dbfa351530223b
2019-01-09Minor memevent fixes.Jamie Liu
- Call MemoryEvents.done.Add(1) outside of MemoryEvents.run() so that if MemoryEvents.Stop() => MemoryEvents.done.Wait() is called before the goroutine starts running, it still waits for the goroutine to stop. - Use defer to call MemoryEvents.done.Done() in MemoryEvents.run() so that it's called even if the goroutine panics. PiperOrigin-RevId: 228623307 Change-Id: I1b0459e7999606c1a1a271b16092b1ca87005015
2019-01-08Improve loader related error messages returned to users.Brian Geffon
PiperOrigin-RevId: 228382827 Change-Id: Ica1d30e0df826bdd77f180a5092b2b735ea5c804
2019-01-08Grant no initial capabilities to non-root UIDs.Jamie Liu
See modified comment in auth.NewUserCredentials(); compare to the behavior of setresuid(2) as implemented by //pkg/sentry/kernel/task_identity.go:kernel.Task.setKUIDsUncheckedLocked(). PiperOrigin-RevId: 228381765 Change-Id: I45238777c8f63fcf41b99fce3969caaf682fe408
2018-12-26Add EventChannel messages for uncaught signals.Brian Geffon
PiperOrigin-RevId: 226936778 Change-Id: I2a6dda157c55d39d81e1b543ab11a58a0bfe5c05
2018-12-18Add BPFAction type with StringerFabricio Voznika
PiperOrigin-RevId: 226018694 Change-Id: I98965e26fe565f37e98e5df5f997363ab273c91b
2018-12-14Move fdnotifier package to reduce internal confusion.Adin Scannell
PiperOrigin-RevId: 225632398 Change-Id: I909e7e2925aa369adc28e844c284d9a6108e85ce
2018-12-12Filesystems shouldn't be saving references to Platform.Rahat Mahmood
Platform objects are not savable, storing references to them in filesystem datastructures would cause save to fail if someone actually passed in a Platform. Current implementations work because everywhere a Platform is expected, we currently pass in a Kernel object which embeds Platform and thus satisfies the interface. Eliminate this indirection and save pointers to Kernel directly. PiperOrigin-RevId: 225288336 Change-Id: Ica399ff43f425e15bc150a0d7102196c3d54a2ab
2018-12-12Fix a data race on Shm.key.Rahat Mahmood
PiperOrigin-RevId: 225240907 Change-Id: Ie568ce3cd643f3e4a0eaa0444f4ed589dcf6031f
2018-12-12Pass information about map writableness to filesystems.Rahat Mahmood
This is necessary to implement file seals for memfds. PiperOrigin-RevId: 225239394 Change-Id: Ib3f1ab31385afc4b24e96cd81a05ef1bebbcbb70
2018-12-10Add type safety to shm ids and keys.Rahat Mahmood
PiperOrigin-RevId: 224864380 Change-Id: I49542279ad56bf15ba462d3de1ef2b157b31830a
2018-12-10Validate FS_BASE in Task.CloneMichael Pratt
arch_prctl already verified that the new FS_BASE was canonical, but Task.Clone did not. Centralize these checks in the arch packages. Failure to validate could cause an error in PTRACE_SET_REGS when we try to switch to the app. PiperOrigin-RevId: 224862398 Change-Id: Iefe63b3f9aa6c4810326b8936e501be3ec407f14
2018-12-06Add counters for memory events.Rahat Mahmood
Also ensure an event is emitted at startup. PiperOrigin-RevId: 224372065 Change-Id: I5f642b6d6b13c6468ee8f794effe285fcbbf29cf
2018-12-04Max link traversals should be for an entire path.Brian Geffon
The number of symbolic links that are allowed to be followed are for a full path and not just a chain of symbolic links. PiperOrigin-RevId: 224047321 Change-Id: I5e3c4caf66a93c17eeddcc7f046d1e8bb9434a40
2018-11-20Use RET_KILL_PROCESS if available in kernelFabricio Voznika
RET_KILL_THREAD doesn't work well for Go because it will kill only the offending thread and leave the process hanging. RET_TRAP can be masked out and it's not guaranteed to kill the process. RET_KILL_PROCESS is available since 4.14. For older kernel, continue to use RET_TRAP as this is the best option (likely to kill process, easy to debug). PiperOrigin-RevId: 222357867 Change-Id: Icc1d7d731274b16c2125b7a1ba4f7883fbdb2cbd
2018-11-20Fix recursive read lock taken on TaskSetFabricio Voznika
SyncSyscallFiltersToThreadGroup and Task.TheadID() both acquired TaskSet RWLock in R mode and could deadlock if a writer comes in between. PiperOrigin-RevId: 222313551 Change-Id: I4221057d8d46fec544cbfa55765c9a284fe7ebfa
2018-11-20Update futex to use usermem abstractions.Adin Scannell
This eliminates the indirection that existed in task_futex. PiperOrigin-RevId: 221832498 Change-Id: Ifb4c926d493913aa6694e193deae91616a29f042
2018-11-15Advertise vsyscall support via /proc/<pid>/maps.Rahat Mahmood
Also update test utilities for probing vsyscall support and add a metric to see if vsyscalls are actually used in sandboxes. PiperOrigin-RevId: 221698834 Change-Id: I57870ecc33ea8c864bd7437833f21aa1e8117477
2018-11-08Create stubs for syscalls upto Linux 4.4.Rahat Mahmood
Create syscall stubs for missing syscalls upto Linux 4.4 and advertise a kernel version of 4.4. PiperOrigin-RevId: 220667680 Change-Id: Idbdccde538faabf16debc22f492dd053a8af0ba7
2018-11-01Prevent premature destruction of shm segments.Rahat Mahmood
Shm segments can be marked for lazy destruction via shmctl(IPC_RMID), which destroys a segment once it is no longer attached to any processes. We were unconditionally decrementing the segment refcount on shmctl(IPC_RMID) which allowed a user to force a segment to be destroyed by repeatedly calling shmctl(IPC_RMID), with outstanding memory maps to the segment. This is problematic because the memory released by a segment destroyed this way can be reused by a different process while remaining accessible by the process with outstanding maps to the segment. PiperOrigin-RevId: 219713660 Change-Id: I443ab838322b4fb418ed87b2722c3413ead21845
2018-10-23Fix panic on creation of zero-len shm segments.Rahat Mahmood
Attempting to create a zero-len shm segment causes a panic since we try to allocate a zero-len filemem region. The existing code had a guard to disallow this, but the check didn't encode the fact that requesting a private segment implies a segment creation regardless of whether IPC_CREAT is explicitly specified. PiperOrigin-RevId: 218405743 Change-Id: I30aef1232b2125ebba50333a73352c2f907977da
2018-10-23Track paths and provide a rename hook.Adin Scannell
This change also adds extensive testing to the p9 package via mocks. The sanity checks and type checks are moved from the gofer into the core package, where they can be more easily validated. PiperOrigin-RevId: 218296768 Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
2018-10-20Add more unimplemented syscall eventsFabricio Voznika
Added events for *ctl syscalls that may have multiple different commands. For runsc, each syscall event is only logged once. For *ctl syscalls, use the cmd as identifier, not only the syscall number. PiperOrigin-RevId: 218015941 Change-Id: Ie3c19131ae36124861e9b492a7dbe1765d9e5e59
2018-10-19Use correct company name in copyright headerIan Gudger
PiperOrigin-RevId: 217951017 Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
2018-10-17Check thread group CPU timers in the CPU clock ticker.Jamie Liu
This reduces the number of goroutines and runtime timers when ITIMER_VIRTUAL or ITIMER_PROF are enabled, or when RLIMIT_CPU is set. This also ensures that thread group CPU timers only advance if running tasks are observed at the time the CPU clock advances, mostly eliminating the possibility that a CPU timer expiration observes no running tasks and falls back to the group leader. PiperOrigin-RevId: 217603396 Change-Id: Ia24ce934d5574334857d9afb5ad8ca0b6a6e65f4
2018-10-17runsc: Support job control signals for the root container.Nicolas Lacasse
Now containers run with "docker run -it" support control characters like ^C and ^Z. This required refactoring our signal handling a bit. Signals delivered to the "runsc boot" process are turned into loader.Signal calls with the appropriate delivery mode. Previously they were always sent directly to PID 1. PiperOrigin-RevId: 217566770 Change-Id: I5b7220d9a0f2b591a56335479454a200c6de8732
2018-10-17Fix PTRACE_GETREGSET write sizeMichael Pratt
The existing logic is backwards and writes iov_len == 0 for a full write. PiperOrigin-RevId: 217560377 Change-Id: I5a39c31bf0ba9063a8495993bfef58dc8ab7c5fa
2018-10-17Move Unix transport out of netstackIan Gudger
PiperOrigin-RevId: 217557656 Change-Id: I63d27635b1a6c12877279995d2d9847b6a19da9b
2018-10-15Merge host.endpoint into host.ConnectedEndpointIan Gudger
host.endpoint contained duplicated logic from the sockerpair implementation and host.ConnectedEndpoint. Remove host.endpoint in favor of a host.ConnectedEndpoint wrapped in a socketpair end. PiperOrigin-RevId: 217240096 Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c
2018-10-10When creating a new process group, add it to the session.Nicolas Lacasse
PiperOrigin-RevId: 216554791 Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35
2018-10-08Implement shared futexes.Jamie Liu
- Shared futex objects on shared mappings are represented by Mappable + offset, analogous to Linux's use of inode + offset. Add type futex.Key, and change the futex.Manager bucket API to use futex.Keys instead of addresses. - Extend the futex.Checker interface to be able to return Keys for memory mappings. It returns Keys rather than just mappings because whether the address or the target of the mapping is used in the Key depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this matters because using mapping target for a futex on a MAP_PRIVATE mapping causes it to stop working across COW-breaking. - futex.Manager.WaitComplete depends on atomic updates to futex.Waiter.addr to determine when it has locked the right bucket, which is much less straightforward for struct futex.Waiter.key. Switch to an atomically-accessed futex.Waiter.bucket pointer. - futex.Manager.Wake now needs to take a futex.Checker to resolve addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit path to perform a shared futex wakeup (Linux: kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid, FUTEX_WAKE, ...)). This is a problem because futexChecker is in the syscalls/linux package. Move it to kernel. PiperOrigin-RevId: 216207039 Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
2018-10-03Fix panic if FIOASYNC callback is registered and triggered without targetIan Gudger
PiperOrigin-RevId: 215674589 Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
2018-10-03Add S/R support for FIOASYNCIan Gudger
PiperOrigin-RevId: 215655197 Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
2018-10-01runsc: Support job control signals in "exec -it".Nicolas Lacasse
Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39