summaryrefslogtreecommitdiffhomepage
path: root/g3doc/proposals
diff options
context:
space:
mode:
Diffstat (limited to 'g3doc/proposals')
-rw-r--r--g3doc/proposals/BUILD16
-rw-r--r--g3doc/proposals/gsoc-2021-ideas.md146
-rw-r--r--g3doc/proposals/runtime_dedicate_os_thread.md188
3 files changed, 0 insertions, 350 deletions
diff --git a/g3doc/proposals/BUILD b/g3doc/proposals/BUILD
deleted file mode 100644
index 710283142..000000000
--- a/g3doc/proposals/BUILD
+++ /dev/null
@@ -1,16 +0,0 @@
-load("//website:defs.bzl", "doc")
-
-package(
- default_visibility = ["//website:__pkg__"],
- licenses = ["notice"],
-)
-
-doc(
- name = "gsoc_2021",
- src = "gsoc-2021-ideas.md",
- category = "Project",
- include_in_menu = False,
- permalink = "/community/gsoc_2021/",
- subcategory = "Community",
- weight = "99",
-)
diff --git a/g3doc/proposals/gsoc-2021-ideas.md b/g3doc/proposals/gsoc-2021-ideas.md
deleted file mode 100644
index ecaf0dfe1..000000000
--- a/g3doc/proposals/gsoc-2021-ideas.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# Project Ideas for Google Summer of Code 2021
-
-This is a collection of project ideas for
-[Google Summer of Code 2021][gsoc-2021-site]. These projects are intended to be
-relatively self-contained and should be good starting projects for new
-contributors to gVisor. We expect individual contributors to be able to make
-reasonable progress on these projects over the course of several weeks.
-Familiarity with Golang and knowledge about systems programming in Linux will be
-helpful.
-
-If you're interested in contributing to gVisor through Google Summer of Code
-2021, but would like to propose your own idea for a project, please see our
-[roadmap](../roadmap.md) for areas of development, and get in touch through our
-[mailing list][gvisor-mailing-list] or [chat][gvisor-chat]!
-
-## Implement the `setns` syscall
-
-Estimated complexity: *easy*
-
-This project involves implementing the [`setns`][man-setns] syscall. gVisor
-currently supports manipulation of namespaces through the `clone` and `unshare`
-syscalls. These two syscalls essentially implement the requisite logic for
-`setns`, but there is currently no way to obtain a file descriptor referring to
-a namespace in gVisor. As described in the `setns` man page, the two typical
-ways of obtaining such a file descriptor in Linux are by opening a file in
-`/proc/[pid]/ns`, or through the `pidfd_open` syscall.
-
-For gVisor, we recommend implementing the `/proc/[pid]/ns` mechanism first,
-which would involve implementing a trivial namespace file type in procfs.
-
-## Implement `fanotify`
-
-Estimated complexity: *medium*
-
-Implement [`fanotify`][man-fanotify] in gVisor, which is a filesystem event
-notification mechanism. gVisor currently supports `inotify`, which is a similar
-mechanism with slightly different capabilities, but which should serve as a good
-reference.
-
-The `fanotify` interface adds two new syscalls:
-
-- `fanotify_init` creates a new notification group, which is a collection of
- filesystem objects watched by the kernel. The group is represented by a file
- descriptor returned by this syscall. Events on the watched objects can be
- retrieved by reading from this file descriptor.
-
-- `fanotify_mark` adds a filesystem object to a watch group, or modifies the
- parameters of an existing watch.
-
-Unlike `inotify`, `fanotify` can set watches on filesystems and mount points,
-which will require some additional data tracking on the corresponding filesystem
-objects within the sentry.
-
-A well-designed implementation should reuse the notifications from `inotify` for
-files and directories (this is also how Linux implements these mechanisms), and
-should implement the necessary tracking and notifications for filesystems and
-mount points.
-
-## Implement `io_uring`
-
-Estimated complexity: *hard*
-
-`io_uring` is the latest asynchronous I/O API in Linux. This project will
-involve implementing the system interfaces required to support `io_uring` in
-gVisor. A successful implementation should have similar relatively performance
-and scalability characteristics compared to synchronous I/O syscalls, as in
-Linux.
-
-The core of the `io_uring` interface is deceptively simple, involving only three
-new syscalls:
-
-- `io_uring_setup(2)` creates a new `io_uring` instance represented by a file
- descriptor, including a set of request submission and completion queues
- backed by shared memory ring buffers.
-
-- `io_uring_register(2)` optionally binds kernel resources such as files and
- memory buffers to handles, which can then be passed to `io_uring`
- operations. Pre-registering resources in this way moves the cost of looking
- up and validating these resources to registration time rather than paying
- the cost during the operation.
-
-- `io_uring_enter(2)` is the syscall used to submit queued operations and wait
- for completions. This is the most complex part of the mechanism, requiring
- the kernel to process queued request from the submission queue, dispatching
- the appropriate I/O operation based on the request arguments and blocking
- for the requested number of operations to be completed before returning.
-
-An `io_uring` request is effectively an opcode specifying the I/O operation to
-perform, and corresponding arguments. The opcodes and arguments closely relate
-to the the corresponding synchronous I/O syscall. In addition, there are some
-`io_uring`-specific arguments that specify things like how to process requests,
-how to interpret the arguments and communicate the status of the ring buffers.
-
-For a detailed description of the `io_uring` interface, see the
-[design doc][io-uring-doc] by the `io_uring` authors.
-
-Due to the complexity of the full `io_uring` mechanism and the numerous
-supported operations, it should be implemented in two stages:
-
-In the first stage, a simplified version of the `io_uring_setup` and
-`io_uring_enter` syscalls should be implemented, which will only support a
-minimal set of arguments and just one or two simple opcodes. This simplified
-implementation can be used to figure out how to integrate `io_uring` with
-gVisor's virtual filesystem and memory management subsystems, as well as
-benchmark the implementation to ensure it has the desired performance
-characteristics. The goal in this stage should be to implement the smallest
-subset of features required to perform a basic operation through `io_uring`s.
-
-In the second stage, support can be added for all the I/O operations supported
-by Linux, as well as advanced `io_uring` features such as fixed files and
-buffers (via `io_uring_register`), polled I/O and kernel-side request polling.
-
-A single contributor can expect to make reasonable progress on the first stage
-within the scope of Google Summer of Code. The second stage, while not
-necessarily difficult, is likely to be very time consuming. However it also
-lends itself well to parallel development by multiple contributors.
-
-## Implement message queues
-
-Estimated complexity: *hard*
-
-Linux provides two alternate message queues:
-[System V message queues][man-sysvmq] and [POSIX message queues][man-posixmq].
-gVisor currently doesn't implement either.
-
-Both mechanisms add multiple syscalls for managing and using the message queues,
-see the relevant man pages above for their full description.
-
-The core of both mechanisms are very similar, it may be possible to back both
-mechanisms with a common implementation in gVisor. Linux however has two
-distinct implementations.
-
-An individual contributor can reasonably implement a minimal version of one of
-these two mechanisms within the scope of Google Summer of Code. The System V
-queue may be slightly easier to implement, as gVisor already implements System V
-semaphores and shared memory regions, so the code for managing IPC objects and
-the registry already exist.
-
-[gsoc-2021-site]: https://summerofcode.withgoogle.com
-[gvisor-chat]: https://gitter.im/gvisor/community
-[gvisor-mailing-list]: https://groups.google.com/g/gvisor-dev
-[io-uring-doc]: https://kernel.dk/io_uring.pdf
-[man-fanotify]: https://man7.org/linux/man-pages/man7/fanotify.7.html
-[man-sysvmq]: https://man7.org/linux/man-pages/man7/sysvipc.7.html
-[man-posixmq]: https://man7.org/linux/man-pages//man7/mq_overview.7.html
-[man-setns]: https://man7.org/linux/man-pages/man2/setns.2.html
diff --git a/g3doc/proposals/runtime_dedicate_os_thread.md b/g3doc/proposals/runtime_dedicate_os_thread.md
deleted file mode 100644
index dc70055b0..000000000
--- a/g3doc/proposals/runtime_dedicate_os_thread.md
+++ /dev/null
@@ -1,188 +0,0 @@
-# `runtime.DedicateOSThread`
-
-Status as of 2020-09-18: Deprioritized; initial studies in #2180 suggest that
-this may be difficult to support in the Go runtime due to issues with GC.
-
-## Summary
-
-Allow goroutines to bind to kernel threads in a way that allows their scheduling
-to be kernel-managed rather than runtime-managed.
-
-## Objectives
-
-* Reduce Go runtime overhead in the gVisor sentry (#2184).
-
-* Minimize intrusiveness of changes to the Go runtime.
-
-## Background
-
-In Go, execution contexts are referred to as goroutines, which the runtime calls
-Gs. The Go runtime maintains a variably-sized pool of threads (called Ms by the
-runtime) on which Gs are executed, as well as a pool of "virtual processors"
-(called Ps by the runtime) of size equal to `runtime.GOMAXPROCS()`. Usually,
-each M requires a P in order to execute Gs, limiting the number of concurrently
-executing goroutines to `runtime.GOMAXPROCS()`.
-
-The `runtime.LockOSThread` function temporarily locks the invoking goroutine to
-its current thread. It is primarily useful for interacting with OS or non-Go
-library facilities that are per-thread. It does not reduce interactions with the
-Go runtime scheduler: locked Ms relinquish their P when they become blocked, and
-only continue execution after another M "chooses" their locked G to run and
-donates their P to the locked M instead.
-
-## Problems
-
-### Context Switch Overhead
-
-Most goroutines in the gVisor sentry are task goroutines, which back application
-threads. Task goroutines spend large amounts of time blocked on syscalls that
-execute untrusted application code. When invoking said syscall (which varies by
-gVisor platform), the task goroutine may interact with the Go runtime in one of
-three ways:
-
-* It can invoke the syscall without informing the runtime. In this case, the
- task goroutine will continue to hold its P during the syscall, limiting the
- number of application threads that can run concurrently to
- `runtime.GOMAXPROCS()`. This is problematic because the Go runtime scheduler
- is known to scale poorly with `GOMAXPROCS`; see #1942 and
- https://github.com/golang/go/issues/28808. It also means that preemption of
- application threads must be driven by sentry or runtime code, which is
- strictly slower than kernel-driven preemption (since the sentry must invoke
- another syscall to preempt the application thread).
-
-* It can call `runtime.entersyscallblock` before invoking the syscall, and
- `runtime.exitsyscall` after the syscall returns. In this case, the task
- goroutine will release its P while the syscall is executing. This allows the
- number of threads concurrently executing application code to exceed
- `GOMAXPROCS`. However, this incurs additional latency on syscall entry (to
- hand off the released P to another M, often requiring a `futex(FUTEX_WAKE)`
- syscall) and on syscall exit (to acquire a new P). It also drastically
- increases the number of threads that concurrently interact with the runtime
- scheduler, which is also problematic for performance (both in terms of CPU
- utilization and in terms of context switch latency); see #205.
-
-- It can call `runtime.entersyscall` before invoking the syscall, and
- `runtime.exitsyscall` after the syscall returns. In this case, the task
- goroutine "lazily releases" its P, allowing the runtime's "sysmon" thread to
- steal it on behalf of another M after a 20us delay. This mitigates the
- context switch latency problem when there are few task goroutines and the
- interval between switches to application code (i.e. the interval between
- application syscalls, page faults, or signal delivery) is short. (Cynically,
- this means that it's most effective in microbenchmarks). However, the delay
- before a P is stolen can also be problematic for performance when there are
- both many task goroutines switching to application code (lazily releasing
- their Ps) *and* many task goroutines switching to sentry code (contending
- for Ps), which is likely in larger heterogeneous workloads.
-
-### Blocking Overhead
-
-Task goroutines block on behalf of application syscalls like `futex` and
-`epoll_wait` by receiving from a Go channel. (Future work may convert task
-goroutine blocking to use the `syncevent` package to avoid overhead associated
-with channels and `select`, but this does not change how blocking interacts with
-the Go runtime scheduler.)
-
-If `runtime.LockOSThread()` is not in effect when a task goroutine blocks, then
-when the task goroutine is unblocked (by e.g. an application `FUTEX_WAKE`,
-signal delivery, or a timeout) by sending to the blocked channel,
-`runtime.ready` migrates the unblocked G to the unblocking P. In most cases,
-this implies that every application thread block/unblock cycle results in a
-migration of the thread between Ps, and therefore Ms, and therefore cores,
-resulting in reduced application performance due to loss of CPU caches.
-Furthermore, in most cases, the unblocking P cannot immediately switch to the
-unblocked G (instead resuming execution of its current application thread after
-completing the application's `futex(FUTEX_WAKE)`, `tgkill`, etc. syscall), often
-requiring that another P steal the unblocked G before it can resume execution.
-
-If `runtime.LockOSThread()` is in effect when a task goroutine blocks, then the
-G will remain locked to its M, avoiding the core migration described above;
-however, wakeup latency is significantly increased since, as described in
-"Background", the G still needs to be selected by the scheduler before it can
-run, and the M that selects the G then needs to transfer its P to the locked M,
-incurring an additional `FUTEX_WAKE` syscall and round of kernel scheduling.
-
-## Proposal
-
-We propose to add a function, tentatively called `DedicateOSThread`, to the Go
-`runtime` package, documented as follows:
-
-```go
-// DedicateOSThread wires the calling goroutine to its current operating system
-// thread, and exempts it from counting against GOMAXPROCS. The calling
-// goroutine will always execute in that thread, and no other goroutine will
-// execute in it, until the calling goroutine has made as many calls to
-// UndedicateOSThread as to DedicateOSThread. If the calling goroutine exits
-// without unlocking the thread, the thread will be terminated.
-//
-// DedicateOSThread should only be used by long-lived goroutines that usually
-// block due to blocking system calls, rather than interaction with other
-// goroutines.
-func DedicateOSThread()
-```
-
-Mechanically, `DedicateOSThread` implies `LockOSThread` (i.e. it locks the
-invoking G to a M), but additionally locks the invoking M to a P. Ps locked by
-`DedicateOSThread` are not counted against `GOMAXPROCS`; that is, the actual
-number of Ps in the system (`len(runtime.allp)`) is `GOMAXPROCS` plus the number
-of bound Ps (plus some slack to avoid frequent changes to `runtime.allp`).
-Corollaries:
-
-* If `runtime.ready` observes that a readied G is locked to a M locked to a P,
- it immediately wakes the locked M without migrating the G to the readying P
- or waiting for a future call to `runtime.schedule` to select the readied G
- in `runtime.findrunnable`.
-
-* `runtime.stoplockedm` and `runtime.reentersyscall` skip the release of
- locked Ps; the latter also skips sysmon wakeup. `runtime.stoplockedm` and
- `runtime.exitsyscall` skip re-acquisition of Ps if one is locked.
-
-* sysmon does not attempt to preempt Gs that are locked to Ps, avoiding
- fruitless overhead from `tgkill` syscalls and signal delivery.
-
-* `runtime.findrunnable`'s work stealing skips locked Ps (suggesting that
- unlocked Ps be tracked in a separate array). `runtime.findrunnable` on
- locked Ps skip the global run queue, work stealing, and possibly netpoll.
-
-* New goroutines created by goroutines with locked Ps are enqueued on the
- global run queue rather than the invoking P's local run queue.
-
-While gVisor's use case does not strictly require that the association is
-reversible (with `runtime.UndedicateOSThread`), such a feature is required to
-allow reuse of locked Ms, which is likely to be critical for performance.
-
-## Alternatives Considered
-
-* Make the runtime scale well with `GOMAXPROCS`. While we are also
- concurrently investigating this problem, this would not address the issues
- of increased preemption cost or blocking overhead.
-
-* Make the runtime scale well with number of Ms. It is unclear if this is
- actually feasible, and would not address blocking overhead.
-
-* Make P-locking part of `LockOSThread`'s behavior. This would likely
- introduce performance regressions in existing uses of `LockOSThread` that do
- not fit this usage pattern. In particular, since `DedicateOSThread`
- transitions the invoker's P from "counted against `GOMAXPROCS`" to "not
- counted against `GOMAXPROCS`", it may need to wake another M to run a new P
- (that is counted against `GOMAXPROCS`), and the converse applies to
- `UndedicateOSThread`.
-
-* Rewrite the gVisor sentry in a language that does not force userspace
- scheduling. This is a last resort due to the amount of code involved.
-
-## Related Issues
-
-The proposed functionality is directly analogous to `spawn_blocking` in Rust
-async runtimes
-[`async_std`](https://docs.rs/async-std/1.8.0/async_std/task/fn.spawn_blocking.html)
-and [`tokio`](https://docs.rs/tokio/0.3.5/tokio/task/fn.spawn_blocking.html).
-
-Outside of gVisor:
-
-* https://github.com/golang/go/issues/21827#issuecomment-595152452 describes a
- use case for this feature in go-delve, where the goroutine that would use
- this feature spends much of its time blocked in `ptrace` syscalls.
-
-* This feature may improve performance in the use case described in
- https://github.com/golang/go/issues/18237, given the prominence of
- syscall.Syscall in the profile given in that bug report.