summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/kernel
AgeCommit message (Collapse)Author
2021-07-12Fix deadlock in procfsFabricio Voznika
Kernfs provides an internal mechanism to defer calls to `DecRef()` because on the last reference `Filesystem.mu` must be held and most places that need to call `DecRef()` are inside the lock. The same can be true for filesystems that extend kernfs. procfs needs to look up files and `DecRef()` them inside the `kernfs.Filesystem.mu`. If the files happen to be procfs files, it can deadlock trying to decrement if it's the last reference. This change extends the mechanism to external callers to defer DecRefs to `vfs.FileDescription` and `vfs.VirtualDentries`. PiperOrigin-RevId: 384361647
2021-07-12[syserror] Update syserror to linuxerr for more errors.Zach Koopmans
Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-09Drop unnecessary checklocksignore.Adin Scannell
PiperOrigin-RevId: 383940663
2021-07-08Replace kernel.ExitStatus with linux.WaitStatus.Jamie Liu
PiperOrigin-RevId: 383705129
2021-07-01Mix checklocks and atomic analyzers.Adin Scannell
This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01[syserror] Update several syserror errors to linuxerr equivalents.Zach Koopmans
Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-30[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.Zach Koopmans
Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVALZach Koopmans
Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-24CreateProcessGroup has to check whether a target process stil exists or notAndrei Vagin
A caller of CreateProcessGroup looks up a thread group without locks, so the target process can exit before CreateProcessGroup will be called. Reported-by: syzbot+6abb7c34663dacbd55a8@syzkaller.appspotmail.com PiperOrigin-RevId: 381351069
2021-06-23Fix PR_SET_PTRACER applicability to non-leader threads.Jamie Liu
Compare if (!thread_group_leader(tracee)) tracee = rcu_dereference(tracee->group_leader); in security/yama/yama_lsm.c:ptracer_exception_found(). PiperOrigin-RevId: 381074242
2021-06-22[syserror] Add conversions to linuxerr with temporary Equals method.Zach Koopmans
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-17Move tcpip.Clock impl to TimekeeperTamir Duberstein
...and pass it explicitly. This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed. PiperOrigin-RevId: 380039167
2021-06-16[syserror] Refactor linuxerr and error package.Zach Koopmans
Move Error struct to pkg/errors package for use in multiple places. Move linuxerr static definitions under pkg/errors/linuxerr. Add a lookup list for quick lookup of *errors.Error by errno. This is useful when converting syserror errors and unix.Errno/syscall.Errrno values to *errors.Error. Update benchmarks routines to include conversions. The below benchmarks show *errors.Error usage to be comparable to using unix.Errno. BenchmarkAssignUnix BenchmarkAssignUnix-32 787875022 1.284 ns/op BenchmarkAssignLinuxerr BenchmarkAssignLinuxerr-32 1000000000 1.209 ns/op BenchmarkAssignSyserror BenchmarkAssignSyserror-32 759269229 1.429 ns/op BenchmarkCompareUnix BenchmarkCompareUnix-32 1000000000 1.310 ns/op BenchmarkCompareLinuxerr BenchmarkCompareLinuxerr-32 1000000000 1.241 ns/op BenchmarkCompareSyserror BenchmarkCompareSyserror-32 147196165 8.248 ns/op BenchmarkSwitchUnix BenchmarkSwitchUnix-32 373233556 3.664 ns/op BenchmarkSwitchLinuxerr BenchmarkSwitchLinuxerr-32 476323929 3.294 ns/op BenchmarkSwitchSyserror BenchmarkSwitchSyserror-32 39293408 29.62 ns/op BenchmarkReturnUnix BenchmarkReturnUnix-32 1000000000 0.5042 ns/op BenchmarkReturnLinuxerr BenchmarkReturnLinuxerr-32 1000000000 0.8152 ns/op BenchmarkConvertUnixLinuxerr BenchmarkConvertUnixLinuxerr-32 739948875 1.547 ns/op BenchmarkConvertUnixLinuxerrZero BenchmarkConvertUnixLinuxerrZero-32 977733974 1.489 ns/op PiperOrigin-RevId: 379806801
2021-06-14Cleanup iptables bug TODOsKevin Krakauer
There are many references to unimplemented iptables features that link to #170, but that bug is about Istio support specifically. Istio is supported, so the references should change. Some TODOs are addressed, some removed because they are not features requested by users, and some are left as implementation notes. Fixes #170. PiperOrigin-RevId: 379328488
2021-06-13Remove usermem dependency from marshalIan Lewis
Both marshal and usermem are depended on by many packages and a dependency on marshal can often create circular dependencies. marshal should consider adding internal dependencies carefully moving forward. Fixes #6160 PiperOrigin-RevId: 379199882
2021-06-10Report task exit in /proc/[pid]/{stat,status} before task goroutine exit.Jamie Liu
Between when runExitNotify.execute() returns nil (indicating that the task goroutine should exit) and when Task.run() advances Task.gosched.State to TaskGoroutineNonexistent (indicating that the task goroutine is exiting), there is a race window in which the Task is waitable (since TaskSet.mu is unlocked and Task.exitParentNotified is true) but will be reported by /proc/[pid]/status as running. Close the window by checking Task.exitState before task goroutine exit. PiperOrigin-RevId: 378711484
2021-06-10[op] Move SignalInfo to abi/linux package.Ayush Ranjan
Fixes #214 PiperOrigin-RevId: 378680466
2021-06-10Merge pull request #6103 from sudo-sturbia:semaphore-errgVisor bot
PiperOrigin-RevId: 378607458
2021-06-10[op] Move SignalStack to abi/linux package.Ayush Ranjan
Updates #214 PiperOrigin-RevId: 378594929
2021-06-09[op] Move SignalAct to abi/linux package.Ayush Ranjan
There were also other duplicate definitions of the same struct that I have now removed. Updates #214 PiperOrigin-RevId: 378579954
2021-06-09Change TODO bug to a more specific issueKevin Krakauer
This lets us close a tracking bug that's too widely-scoped to be reasonably finished. PiperOrigin-RevId: 378563203
2021-06-07cgroupfs: don't add a task in the root cgroup if it is already there.Andrei Vagin
PiperOrigin-RevId: 377975013
2021-06-03Implement stringer for ExitStatusTamir Duberstein
PiperOrigin-RevId: 377370807
2021-06-01Move sync generics to their own packagesTamir Duberstein
The presence of multiple packages in a single directory sometimes confuses `go mod`, producing output like: go: downloading gvisor.dev/gvisor v0.0.0-20210601174640-77dc0f5bc94d $GOMODCACHE/gvisor.dev/gvisor@v0.0.0-20210601174640-77dc0f5bc94d/pkg/linewriter/linewriter.go:21:2: found packages sync (aliases.go) and seqatomic (generic_atomicptr_unsafe.go) in $GOMODCACHE/gvisor.dev/gvisor@v0.0.0-20210601174640-77dc0f5bc94d/pkg/sync imports.go:67:2: found packages tcp (accept.go) and rcv (rcv_test.go) in $GOMODCACHE/gvisor.dev/gvisor@v0.0.0-20210601174640-77dc0f5bc94d/pkg/tcpip/transport/tcp PiperOrigin-RevId: 376956213
2021-05-31Update comments on ambient caps to point to bugIan Lewis
PiperOrigin-RevId: 376747671
2021-05-31Use syserror.ENOSPC for system-wide semaphore limits.Zyad A. Ali
semget(2) man page specifies that ENOSPC should be used if "the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded."
2021-05-27nanosleep has to store the finish time in the restart blockAndrei Vagin
nanosleep has to count time that a thread spent in the stopped state. PiperOrigin-RevId: 376258641
2021-05-25Initialize Kernel.Timekeeper before network NSTamir Duberstein
PiperOrigin-RevId: 375843579
2021-05-25Use specific fmt verbs (avoid %v)Tamir Duberstein
Remove useless conversions. Avoid unhandled errors. PiperOrigin-RevId: 375834275
2021-05-25Merge pull request #6064 from sudo-sturbia:misspellinggVisor bot
PiperOrigin-RevId: 375789776
2021-05-25Use opaque types to represent timeTamir Duberstein
Introduce tcpip.MonotonicTime; replace int64 in tcpip.Clock method returns with time.Time and MonotonicTime to improve type safety and ensure that monotonic clock readings are never compared to wall clock readings. PiperOrigin-RevId: 375775907
2021-05-24Fix misspellings.Zyad A. Ali
2021-05-20Send SIGPIPE for closed pipes.Ian Lewis
Fixes #5974 Updates #161 PiperOrigin-RevId: 375024740
2021-05-20Merge pull request #6037 from sudo-sturbia:docgVisor bot
PiperOrigin-RevId: 375007632
2021-05-20Fix cgroupfs mount racing with unmount.Rahat Mahmood
Previously, mount could discover a hierarchy being destroyed concurrently, which resulted in mount attempting to take a ref on an already destroyed cgroupfs. Reported-by: syzbot+062c0a67798a200f23ee@syzkaller.appspotmail.com PiperOrigin-RevId: 374959054
2021-05-20Format precondition to match style guide.Zyad A. Ali
2021-05-14Resolve remaining O_PATH TODOs.Dean Deng
O_PATH is now implemented in vfs2. Fixes #2782. PiperOrigin-RevId: 373861410
2021-05-14Fix cgroup hierarchy registration.Rahat Mahmood
Previously, registration was racy because we were publishing hierarchies in the registry without fully initializing the underlying filesystem. This led to concurrent mount(2)s discovering the partially intialized filesystems and dropping the final refs on them which cause them to be freed prematurely. Reported-by: syzbot+13f54e77bdf59f0171f0@syzkaller.appspotmail.com Reported-by: syzbot+2c7f0a9127ac6a84f17e@syzkaller.appspotmail.com PiperOrigin-RevId: 373824552
2021-04-26Remove metrics: fallback, vsyscallCount and partialResultNayana Bidari
The newly added Weirdness metric with fields should be used instead of them. Simple query for weirdness metric: http://shortn/_DGNk0z2Up6 PiperOrigin-RevId: 370578132
2021-04-22Add weirdness sentry metric.Nayana Bidari
Weirdness metric contains fields to track the number of clock fallback, partial result and vsyscalls. This metric will avoid the overhead of having three different metrics (fallbackMetric, partialResultMetric, vsyscallCount). PiperOrigin-RevId: 369970218
2021-04-12Don't grab TaskSet mu recursively when reading task state.Rahat Mahmood
Reported-by: syzbot+a6ef0f95a2c9e7da26f3@syzkaller.appspotmail.com Reported-by: syzbot+2eaf8a9f115edec468fe@syzkaller.appspotmail.com PiperOrigin-RevId: 368093861
2021-04-02Implement cgroupfs.Rahat Mahmood
A skeleton implementation of cgroupfs. It supports trivial cpu and memory controllers with no support for hierarchies. PiperOrigin-RevId: 366561126
2021-03-29[syserror] Split usermem packageZach Koopmans
Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-25Lock TaskSet mutex for writing in ptraceClone().Jamie Liu
This is necessary since ptraceClone() mutates tracer.ptraceTracees. PiperOrigin-RevId: 365152396
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-23Move the code that manages floating-point state to a separate packageAndrei Vagin
This change is inspired by Adin's cl/355256448. PiperOrigin-RevId: 364695931
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-25Implement SEM_STAT_ANY cmd of semctl.Jing Chen
PiperOrigin-RevId: 359591577
2021-02-24Add YAMA security module restrictions on ptrace(2).Dean Deng
Restrict ptrace(2) according to the default configurations of the YAMA security module (mode 1), which is a common default among various Linux distributions. The new access checks only permit the tracer to proceed if one of the following conditions is met: a) The tracer is already attached to the tracee. b) The target is a descendant of the tracer. c) The target has explicitly given permission to the tracer through the PR_SET_PTRACER prctl. d) The tracer has CAP_SYS_PTRACE. See security/yama/yama_lsm.c for more details. Note that these checks are added to CanTrace, which is checked for PTRACE_ATTACH as well as some other operations, e.g., checking a process' memory layout through /proc/[pid]/mem. Since this patch adds restrictions to ptrace, it may break compatibility for applications run by non-root users that, for instance, rely on being able to trace processes that are not descended from the tracer (e.g., `gdb -p`). YAMA restrictions can be turned off by setting /proc/sys/kernel/yama/ptrace_scope to 0, or exceptions can be made on a per-process basis with the PR_SET_PTRACER prctl. Reported-by: syzbot+622822d8bca08c99e8c8@syzkaller.appspotmail.com PiperOrigin-RevId: 359237723
2021-02-19Don't hold baseEndpoint.mu while calling EventUpdate().Nicolas Lacasse
This removes a three-lock deadlock between fdnotifier.notifier.mu, epoll.EventPoll.listsMu, and baseEndpoint.mu. A lock order comment was added to epoll/epoll.go. Also fix unsafe access of baseEndpoint.connected/receiver. PiperOrigin-RevId: 358515191