diff options
Diffstat (limited to 'g3doc')
-rw-r--r-- | g3doc/README.md | 12 | ||||
-rw-r--r-- | g3doc/architecture_guide/README.md | 4 | ||||
-rw-r--r-- | g3doc/architecture_guide/performance.md | 64 | ||||
-rw-r--r-- | g3doc/architecture_guide/platforms.md | 20 | ||||
-rw-r--r-- | g3doc/architecture_guide/resources.md | 9 | ||||
-rw-r--r-- | g3doc/architecture_guide/security.md | 120 | ||||
-rw-r--r-- | g3doc/community.md | 4 | ||||
-rw-r--r-- | g3doc/roadmap.md | 43 | ||||
-rw-r--r-- | g3doc/user_guide/FAQ.md | 28 | ||||
-rw-r--r-- | g3doc/user_guide/checkpoint_restore.md | 26 | ||||
-rw-r--r-- | g3doc/user_guide/compatibility.md | 98 | ||||
-rw-r--r-- | g3doc/user_guide/debugging.md | 50 | ||||
-rw-r--r-- | g3doc/user_guide/filesystem.md | 31 | ||||
-rw-r--r-- | g3doc/user_guide/install.md | 24 | ||||
-rw-r--r-- | g3doc/user_guide/networking.md | 12 | ||||
-rw-r--r-- | g3doc/user_guide/platforms.md | 23 | ||||
-rw-r--r-- | g3doc/user_guide/quick_start/docker.md | 11 | ||||
-rw-r--r-- | g3doc/user_guide/quick_start/kubernetes.md | 12 | ||||
-rw-r--r-- | g3doc/user_guide/quick_start/oci.md | 3 | ||||
-rw-r--r-- | g3doc/user_guide/tutorials/cni.md | 28 | ||||
-rw-r--r-- | g3doc/user_guide/tutorials/docker.md | 8 | ||||
-rw-r--r-- | g3doc/user_guide/tutorials/kubernetes.md | 198 |
22 files changed, 381 insertions, 447 deletions
diff --git a/g3doc/README.md b/g3doc/README.md index 7c582ba79..7999f5d47 100644 --- a/g3doc/README.md +++ b/g3doc/README.md @@ -16,12 +16,12 @@ providing new tools and ideas for the container security landscape. gVisor can be used with Docker, Kubernetes, or directly using `runsc`. Use the links below to see detailed instructions for each of them: -* [Docker](./user_guide/quick_start/docker/): The quickest and easiest way to - get started. -* [Kubernetes](./user_guide/quick_start/kubernetes/): Isolate Pods in your K8s - cluster with gVisor. -* [OCI Quick Start](./user_guide/quick_start/oci/): Expert mode. Customize - gVisor for your environment. +* [Docker](./user_guide/quick_start/docker/): The quickest and easiest way to + get started. +* [Kubernetes](./user_guide/quick_start/kubernetes/): Isolate Pods in your K8s + cluster with gVisor. +* [OCI Quick Start](./user_guide/quick_start/oci/): Expert mode. Customize + gVisor for your environment. [linux]: https://en.wikipedia.org/wiki/Linux_kernel_interfaces [oci]: https://www.opencontainers.org diff --git a/g3doc/architecture_guide/README.md b/g3doc/architecture_guide/README.md index ce4c4ae69..1364a5358 100644 --- a/g3doc/architecture_guide/README.md +++ b/g3doc/architecture_guide/README.md @@ -3,8 +3,8 @@ gVisor provides a virtualized environment in order to sandbox untrusted containers. The system interfaces normally implemented by the host kernel are moved into a distinct, per-sandbox user space kernel in order to minimize the -risk of an exploit. gVisor does not introduce large fixed overheads however, -and still retains a process-like model with respect to resource utilization. +risk of an exploit. gVisor does not introduce large fixed overheads however, and +still retains a process-like model with respect to resource utilization. ## How is this different? diff --git a/g3doc/architecture_guide/performance.md b/g3doc/architecture_guide/performance.md index 2f83c0d20..3862d78ee 100644 --- a/g3doc/architecture_guide/performance.md +++ b/g3doc/architecture_guide/performance.md @@ -67,7 +67,9 @@ accesses. Page faults and other Operating System (OS) mechanisms are translated through the Sentry, but once mappings are installed and available to the application, there is no additional overhead. -{% include graph.html id="sysbench-memory" url="/performance/sysbench-memory.csv" title="perf.py sysbench.memory --runtime=runc --runtime=runsc" %} +{% include graph.html id="sysbench-memory" +url="/performance/sysbench-memory.csv" title="perf.py sysbench.memory +--runtime=runc --runtime=runsc" %} The above figure demonstrates the memory transfer rate as measured by `sysbench`. @@ -83,7 +85,8 @@ For many use cases, fixed memory overheads are a primary concern. This may be because sandboxed containers handle a low volume of requests, and it is therefore important to achieve high densities for efficiency. -{% include graph.html id="density" url="/performance/density.csv" title="perf.py density --runtime=runc --runtime=runsc" log="true" y_min="100000" %} +{% include graph.html id="density" url="/performance/density.csv" title="perf.py +density --runtime=runc --runtime=runsc" log="true" y_min="100000" %} The above figure demonstrates these costs based on three sample applications. This test is the result of running many instances of a container (50, or 5 in @@ -106,7 +109,8 @@ gVisor does not perform emulation or otherwise interfere with the raw execution of CPU instructions by the application. Therefore, there is no runtime cost imposed for CPU operations. -{% include graph.html id="sysbench-cpu" url="/performance/sysbench-cpu.csv" title="perf.py sysbench.cpu --runtime=runc --runtime=runsc" %} +{% include graph.html id="sysbench-cpu" url="/performance/sysbench-cpu.csv" +title="perf.py sysbench.cpu --runtime=runc --runtime=runsc" %} The above figure demonstrates the `sysbench` measurement of CPU events per second. Events per second is based on a CPU-bound loop that calculates all prime @@ -117,7 +121,8 @@ This has important consequences for classes of workloads that are often CPU-bound, such as data processing or machine learning. In these cases, `runsc` will similarly impose minimal runtime overhead. -{% include graph.html id="tensorflow" url="/performance/tensorflow.csv" title="perf.py tensorflow --runtime=runc --runtime=runsc" %} +{% include graph.html id="tensorflow" url="/performance/tensorflow.csv" +title="perf.py tensorflow --runtime=runc --runtime=runsc" %} For example, the above figure shows a sample TensorFlow workload, the [convolutional neural network example][cnn]. The time indicated includes the @@ -125,13 +130,16 @@ full start-up and run time for the workload, which trains a model. ## System calls -Some **structural costs** of gVisor are heavily influenced by the [platform -choice](../platforms/), which implements system call interception. Today, gVisor -supports a variety of platforms. These platforms present distinct performance, -compatibility and security trade-offs. For example, the KVM platform has low -overhead system call interception but runs poorly with nested virtualization. +Some **structural costs** of gVisor are heavily influenced by the +[platform choice](../platforms/), which implements system call interception. +Today, gVisor supports a variety of platforms. These platforms present distinct +performance, compatibility and security trade-offs. For example, the KVM +platform has low overhead system call interception but runs poorly with nested +virtualization. -{% include graph.html id="syscall" url="/performance/syscall.csv" title="perf.py syscall --runtime=runc --runtime=runsc-ptrace --runtime=runsc-kvm" y_min="100" log="true" %} +{% include graph.html id="syscall" url="/performance/syscall.csv" title="perf.py +syscall --runtime=runc --runtime=runsc-ptrace --runtime=runsc-kvm" y_min="100" +log="true" %} The above figure demonstrates the time required for a raw system call on various platforms. The test is implemented by a custom binary which performs a large @@ -142,7 +150,8 @@ tend to be high-performance data stores and static network services. In general, the impact of system call interception will be lower the more work an application does. -{% include graph.html id="redis" url="/performance/redis.csv" title="perf.py redis --runtime=runc --runtime=runsc" %} +{% include graph.html id="redis" url="/performance/redis.csv" title="perf.py +redis --runtime=runc --runtime=runsc" %} For example, `redis` is an application that performs relatively little work in userspace: in general it reads from a connected socket, reads or modifies some @@ -162,7 +171,8 @@ For many use cases, the ability to spin-up containers quickly and efficiently is important. A sandbox may be short-lived and perform minimal user work (e.g. a function invocation). -{% include graph.html id="startup" url="/performance/startup.csv" title="perf.py startup --runtime=runc --runtime=runsc" %} +{% include graph.html id="startup" url="/performance/startup.csv" title="perf.py +startup --runtime=runc --runtime=runsc" %} The above figure indicates how total time required to start a container through [Docker][docker]. This benchmark uses three different applications. First, an @@ -173,26 +183,29 @@ similarly loads a number of modules and binds an HTTP server. > Note: most of the time overhead above is associated Docker itself. This is > evident with the empty `runc` benchmark. To avoid these costs with `runsc`, -> you may also consider using `runsc do` mode or invoking the [OCI -> runtime](../../user_guide/quick_start/oci/) directly. +> you may also consider using `runsc do` mode or invoking the +> [OCI runtime](../../user_guide/quick_start/oci/) directly. ## Network -Networking is mostly bound by **implementation costs**, and gVisor's network stack -is improving quickly. +Networking is mostly bound by **implementation costs**, and gVisor's network +stack is improving quickly. While typically not an important metric in practice for common sandbox use cases, nevertheless `iperf` is a common microbenchmark used to measure raw throughput. -{% include graph.html id="iperf" url="/performance/iperf.csv" title="perf.py iperf --runtime=runc --runtime=runsc" %} +{% include graph.html id="iperf" url="/performance/iperf.csv" title="perf.py +iperf --runtime=runc --runtime=runsc" %} The above figure shows the result of an `iperf` test between two instances. For the upload case, the specified runtime is used for the `iperf` client, and in the download case, the specified runtime is the server. A native runtime is always used for the other endpoint in the test. -{% include graph.html id="applications" metric="requests_per_second" url="/performance/applications.csv" title="perf.py http.(node|ruby) --connections=25 --runtime=runc --runtime=runsc" %} +{% include graph.html id="applications" metric="requests_per_second" +url="/performance/applications.csv" title="perf.py http.(node|ruby) +--connections=25 --runtime=runc --runtime=runsc" %} The above figure shows the result of simple `node` and `ruby` web services that render a template upon receiving a request. Because these synthetic benchmarks @@ -213,20 +226,26 @@ through the [Gofer](../) as a result of our [security model](../security/), but in most cases are dominated by **implementation costs**, due to an internal [Virtual File System][vfs] (VFS) implementation that needs improvement. -{% include graph.html id="fio-bw" url="/performance/fio.csv" title="perf.py fio --engine=sync --runtime=runc --runtime=runsc" log="true" %} +{% include graph.html id="fio-bw" url="/performance/fio.csv" title="perf.py fio +--engine=sync --runtime=runc --runtime=runsc" log="true" %} The above figures demonstrate the results of `fio` for reads and writes to and from the disk. In this case, the disk quickly becomes the bottleneck and dominates other costs. -{% include graph.html id="fio-tmpfs-bw" url="/performance/fio-tmpfs.csv" title="perf.py fio --engine=sync --runtime=runc --tmpfs=True --runtime=runsc" log="true" %} +{% include graph.html id="fio-tmpfs-bw" url="/performance/fio-tmpfs.csv" +title="perf.py fio --engine=sync --runtime=runc --tmpfs=True --runtime=runsc" +log="true" %} The above figure shows the raw I/O performance of using a `tmpfs` mount which is sandbox-internal in the case of `runsc`. Generally these operations are similarly bound to the cost of copying around data in-memory, and we don't see the cost of VFS operations. -{% include graph.html id="httpd100k" metric="transfer_rate" url="/performance/httpd100k.csv" title="perf.py http.httpd --connections=1 --connections=5 --connections=10 --connections=25 --runtime=runc --runtime=runsc" %} +{% include graph.html id="httpd100k" metric="transfer_rate" +url="/performance/httpd100k.csv" title="perf.py http.httpd --connections=1 +--connections=5 --connections=10 --connections=25 --runtime=runc +--runtime=runsc" %} The high costs of VFS operations can manifest in benchmarks that execute many such operations in the hot path for serving requests, for example. The above @@ -239,7 +258,8 @@ internal serialization points (since all requests are reading the same file). Note that some of some of network stack performance issues also impact this benchmark. -{% include graph.html id="ffmpeg" url="/performance/ffmpeg.csv" title="perf.py media.ffmpeg --runtime=runc --runtime=runsc" %} +{% include graph.html id="ffmpeg" url="/performance/ffmpeg.csv" title="perf.py +media.ffmpeg --runtime=runc --runtime=runsc" %} For benchmarks that are bound by raw disk I/O and a mix of compute, file system operations are less of an issue. The above figure shows the total time required diff --git a/g3doc/architecture_guide/platforms.md b/g3doc/architecture_guide/platforms.md index 1f79971d1..6e63da8ce 100644 --- a/g3doc/architecture_guide/platforms.md +++ b/g3doc/architecture_guide/platforms.md @@ -6,12 +6,12 @@ be run. Each sandbox has its own isolated instance of: -* The **Sentry**, A user-space kernel that runs the container and intercepts - and responds to system calls made by the application. +* The **Sentry**, A user-space kernel that runs the container and intercepts + and responds to system calls made by the application. Each container running in the sandbox has its own isolated instance of: -* A **Gofer** which provides file system access to the container. +* A **Gofer** which provides file system access to the container. ![gVisor architecture diagram](Sentry-Gofer.png "gVisor architecture diagram") @@ -20,9 +20,9 @@ Each container running in the sandbox has its own isolated instance of: The entrypoint to running a sandboxed container is the `runsc` executable. `runsc` implements the [Open Container Initiative (OCI)][oci] runtime specification. This means that OCI compatible _filesystem bundles_ can be run by -`runsc`. Filesystem bundles are comprised of a `config.json` file containing -container configuration, and a root filesystem for the container. Please see -the [OCI runtime spec][runtime-spec] for more information on filesystem bundles. +`runsc`. Filesystem bundles are comprised of a `config.json` file containing +container configuration, and a root filesystem for the container. Please see the +[OCI runtime spec][runtime-spec] for more information on filesystem bundles. `runsc` implements multiple commands that perform various functions such as starting, stopping, listing, and querying the status of containers. @@ -31,8 +31,8 @@ starting, stopping, listing, and querying the status of containers. The Sentry is the largest component of gVisor. It can be thought of as a userspace OS kernel. The Sentry implements all the kernel functionality needed by the untrusted application. It implements all of the supported system calls, -signal delivery, memory management and page faulting logic, the threading -model, and more. +signal delivery, memory management and page faulting logic, the threading model, +and more. When the untrusted application makes a system call, the currently used platform redirects the call to the Sentry, which will do the necessary work to service @@ -43,8 +43,8 @@ application to directly control the system calls it makes. The Sentry aims to present an equivalent environment to (upstream) Linux v4.4. -File system operations that extend beyond the sandbox (not internal /proc -files, pipes, etc) are sent to the Gofer, described below. +File system operations that extend beyond the sandbox (not internal /proc files, +pipes, etc) are sent to the Gofer, described below. ## Platforms diff --git a/g3doc/architecture_guide/resources.md b/g3doc/architecture_guide/resources.md index 4580bf9f4..894f995ae 100644 --- a/g3doc/architecture_guide/resources.md +++ b/g3doc/architecture_guide/resources.md @@ -40,8 +40,8 @@ Sentry via [SCM_RIGHTS][scmrights][^1]. These files may be read from and written to through standard system calls, and also mapped into the associated application's address space. This allows the same host memory to be shared across multiple sandboxes, although this mechanism -does not preclude the use of side-channels (see the [security -model](../security/)). +does not preclude the use of side-channels (see the +[security model](../security/)). Note that some file systems exist only within the context of the sandbox. For example, in many cases a `tmpfs` mount will be available at `/tmp` or @@ -138,5 +138,6 @@ to purge internal caches. [scmrights]: http://man7.org/linux/man-pages/man7/unix.7.html [madvise]: http://man7.org/linux/man-pages/man2/madvise.2.html [exec]: https://docs.docker.com/engine/reference/commandline/exec/ - -[^1]: Unless host networking is enabled, the Sentry is not able to create or open host file descriptors itself, it can only receive them in this way from the Gofer. +[^1]: Unless host networking is enabled, the Sentry is not able to create or + open host file descriptors itself, it can only receive them in this way + from the Gofer. diff --git a/g3doc/architecture_guide/security.md b/g3doc/architecture_guide/security.md index afafe5c05..f78586291 100644 --- a/g3doc/architecture_guide/security.md +++ b/g3doc/architecture_guide/security.md @@ -27,14 +27,14 @@ namespaces. Although the System API is exposed to applications by design, bugs and race conditions within the kernel or hypervisor may occasionally be exploitable via -the API. This is common in part due to the fact that most kernels and hypervisors -are written in [C][clang], which is well-suited to interfacing with hardware but -often prone to security issues. In order to exploit these issues, a typical attack -might involve some combination of the following: +the API. This is common in part due to the fact that most kernels and +hypervisors are written in [C][clang], which is well-suited to interfacing with +hardware but often prone to security issues. In order to exploit these issues, a +typical attack might involve some combination of the following: -1. Opening or creating some combination of files, sockets or other descriptors. -1. Passing crafted, malicious arguments, structures or packets. -1. Racing with multiple threads in order to hit specific code paths. +1. Opening or creating some combination of files, sockets or other descriptors. +1. Passing crafted, malicious arguments, structures or packets. +1. Racing with multiple threads in order to hit specific code paths. For example, for the [Dirty Cow][dirtycow] privilege escalation bug, an application would open a specific file in `/proc` or use a specific `ptrace` @@ -74,8 +74,8 @@ hyperthread. The above categories in no way represent an exhaustive list of exploits, as we focus only on running untrusted code from within the operating system or -hypervisor. We do not consider other ways that a more generic adversary -may interact with a system, such as inserting a portable storage device with a +hypervisor. We do not consider other ways that a more generic adversary may +interact with a system, such as inserting a portable storage device with a malicious filesystem image, using a combination of crafted keyboard or touch inputs, or saturating a network device with ill-formed packets. @@ -100,30 +100,30 @@ The first principle is similar to the security basis for a Virtual Machine (VM). With a VM, an application’s interactions with the host are replaced by interactions with a guest operating system and a set of virtualized hardware devices. These hardware devices are then implemented via the host System API by -a Virtual Machine Monitor (VMM). The Sentry similarly prevents direct interactions -by providing its own implementation of the System API that the application -must interact with. Applications are not able to to directly craft specific -arguments or flags for the host System API, or interact directly with host -primitives. +a Virtual Machine Monitor (VMM). The Sentry similarly prevents direct +interactions by providing its own implementation of the System API that the +application must interact with. Applications are not able to to directly craft +specific arguments or flags for the host System API, or interact directly with +host primitives. For both the Sentry and a VMM, it’s worth noting that while direct interactions are not possible, indirect interactions are still possible. For example, a read on a host-backed file in the Sentry may ultimately result in a host read system -call (made by the Sentry, not by passing through arguments from the application), -similar to how a read on a block device in a VM may result in the VMM issuing -a corresponding host read system call from a backing file. - -An important distinction from a VM is that the Sentry implements a System API based -directly on host System API primitives instead of relying on virtualized hardware -and a guest operating system. This selects a distinct set of trade-offs, largely -in the performance, efficiency and compatibility domains. Since transitions in -and out of the sandbox are relatively expensive, a guest operating system will -typically take ownership of resources. For example, in the above case, the -guest operating system may read the block device data in a local page cache, -to avoid subsequent reads. This may lead to better performance but lower -efficiency, since memory may be wasted or duplicated. The Sentry opts instead -to defer to the host for many operations during runtime, for improved efficiency -but lower performance in some use cases. +call (made by the Sentry, not by passing through arguments from the +application), similar to how a read on a block device in a VM may result in the +VMM issuing a corresponding host read system call from a backing file. + +An important distinction from a VM is that the Sentry implements a System API +based directly on host System API primitives instead of relying on virtualized +hardware and a guest operating system. This selects a distinct set of +trade-offs, largely in the performance, efficiency and compatibility domains. +Since transitions in and out of the sandbox are relatively expensive, a guest +operating system will typically take ownership of resources. For example, in the +above case, the guest operating system may read the block device data in a local +page cache, to avoid subsequent reads. This may lead to better performance but +lower efficiency, since memory may be wasted or duplicated. The Sentry opts +instead to defer to the host for many operations during runtime, for improved +efficiency but lower performance in some use cases. ### What can a sandbox do? @@ -140,15 +140,15 @@ filesystem attributes) and not underlying host system resources. While the sandbox virtualizes many operations for the application, we limit the sandbox's own interactions with the host to the following high-level operations: -1. Communicate with a Gofer process via a connected socket. The sandbox may - receive new file descriptors from the Gofer process, corresponding to opened - files. These files can then be read from and written to by the sandbox. -1. Make a minimal set of host system calls. The calls do not include the - creation of new sockets (unless host networking mode is enabled) or opening - files. The calls include duplication and closing of file descriptors, - synchronization, timers and signal management. -1. Read and write packets to a virtual ethernet device. This is not required if - host networking is enabled (or networking is disabled). +1. Communicate with a Gofer process via a connected socket. The sandbox may + receive new file descriptors from the Gofer process, corresponding to opened + files. These files can then be read from and written to by the sandbox. +1. Make a minimal set of host system calls. The calls do not include the + creation of new sockets (unless host networking mode is enabled) or opening + files. The calls include duplication and closing of file descriptors, + synchronization, timers and signal management. +1. Read and write packets to a virtual ethernet device. This is not required if + host networking is enabled (or networking is disabled). ### System ABI, Side Channels and Other Vectors @@ -173,32 +173,32 @@ less likely to exploit or override these controls through other means. For gVisor development, there are several engineering principles that are employed in order to ensure that the system meets its design goals. -1. No system call is passed through directly to the host. Every supported call - has an independent implementation in the Sentry, that is unlikely to suffer - from identical vulnerabilities that may appear in the host. This has the - consequence that all kernel features used by applications require an - implementation within the Sentry. -1. Only common, universal functionality is implemented. Some filesystems, - network devices or modules may expose specialized functionality to user - space applications via mechanisms such as extended attributes, raw sockets - or ioctls. Since the Sentry is responsible for implementing the full system - call surface, we do not implement or pass through these specialized APIs. -1. The host surface exposed to the Sentry is minimized. While the system call - surface is not trivial, it is explicitly enumerated and controlled. The - Sentry is not permitted to open new files, create new sockets or do many - other interesting things on the host. +1. No system call is passed through directly to the host. Every supported call + has an independent implementation in the Sentry, that is unlikely to suffer + from identical vulnerabilities that may appear in the host. This has the + consequence that all kernel features used by applications require an + implementation within the Sentry. +1. Only common, universal functionality is implemented. Some filesystems, + network devices or modules may expose specialized functionality to user + space applications via mechanisms such as extended attributes, raw sockets + or ioctls. Since the Sentry is responsible for implementing the full system + call surface, we do not implement or pass through these specialized APIs. +1. The host surface exposed to the Sentry is minimized. While the system call + surface is not trivial, it is explicitly enumerated and controlled. The + Sentry is not permitted to open new files, create new sockets or do many + other interesting things on the host. Additionally, we have practical restrictions that are imposed on the project to minimize the risk of Sentry exploitability. For example: -1. Unsafe code is carefully controlled. All unsafe code is isolated in files - that end with "unsafe.go", in order to facilitate validation and auditing. - No file without the unsafe suffix may import the unsafe package. -1. No CGo is allowed. The Sentry must be a pure Go binary. -1. External imports are not generally allowed within the core packages. Only - limited external imports are used within the setup code. The code available - inside the Sentry is carefully controlled, to ensure that the above rules - are effective. +1. Unsafe code is carefully controlled. All unsafe code is isolated in files + that end with "unsafe.go", in order to facilitate validation and auditing. + No file without the unsafe suffix may import the unsafe package. +1. No CGo is allowed. The Sentry must be a pure Go binary. +1. External imports are not generally allowed within the core packages. Only + limited external imports are used within the setup code. The code available + inside the Sentry is carefully controlled, to ensure that the above rules + are effective. Finally, we recognize that security is a process, and that vigilance is critical. Beyond our security disclosure process, the Sentry is fuzzed diff --git a/g3doc/community.md b/g3doc/community.md index b3dc2d2cf..76f4d87c3 100644 --- a/g3doc/community.md +++ b/g3doc/community.md @@ -9,8 +9,8 @@ community forums as well as technical participation. The project maintains two mailing lists: -* [gvisor-users][gvisor-users] for accouncements and general discussion. -* [gvisor-dev][gvisor-dev] for development and contribution. +* [gvisor-users][gvisor-users] for accouncements and general discussion. +* [gvisor-dev][gvisor-dev] for development and contribution. We also have a [chat room hosted on Gitter][gitter-chat]. diff --git a/g3doc/roadmap.md b/g3doc/roadmap.md index 86bb11c3b..06ea25a8b 100644 --- a/g3doc/roadmap.md +++ b/g3doc/roadmap.md @@ -10,27 +10,28 @@ feature work. Most gVisor work is focused on four areas. -* [Performance][performance]: overall sandbox performance, including platform - performance, is a critical area for investment. This includes: network - performance (throughput and latency), file system performance (metadata and - data I/O), application switch and fault costs, etc. The goal of gVisor is to - provide sandboxing without a material performance or efficiency impact on all - but the most performance-sensitive applications. - -* [Compatibility][compatibility]: supporting a wide range of applications - requires supporting a large system API, including special system files (e.g. - proc, sys, dev, etc.). The goal of gVisor is to support the broad set of - applications that depend on a generic Linux API, rather than a specific kernel - version. - -* [Infrastructure & tooling][infrastructure]: the above goals require aggressive - testing and coverage, and well-established processes. This includes adding - appropriate system call coverage, end-to-end suites and runtime tests. - -* [Integration][integration]: Container infrastructure is evolving rapidly and - becoming more complex, and gVisor must continuously implement relevant and - popular features to ensure that integration points remain robust and - feature-complete while preserving security guarantees. +* [Performance][performance]: overall sandbox performance, including platform + performance, is a critical area for investment. This includes: network + performance (throughput and latency), file system performance (metadata and + data I/O), application switch and fault costs, etc. The goal of gVisor is to + provide sandboxing without a material performance or efficiency impact on + all but the most performance-sensitive applications. + +* [Compatibility][compatibility]: supporting a wide range of applications + requires supporting a large system API, including special system files (e.g. + proc, sys, dev, etc.). The goal of gVisor is to support the broad set of + applications that depend on a generic Linux API, rather than a specific + kernel version. + +* [Infrastructure & tooling][infrastructure]: the above goals require + aggressive testing and coverage, and well-established processes. This + includes adding appropriate system call coverage, end-to-end suites and + runtime tests. + +* [Integration][integration]: Container infrastructure is evolving rapidly and + becoming more complex, and gVisor must continuously implement relevant and + popular features to ensure that integration points remain robust and + feature-complete while preserving security guarantees. ## Releases diff --git a/g3doc/user_guide/FAQ.md b/g3doc/user_guide/FAQ.md index 9eb9f4501..89df65e99 100644 --- a/g3doc/user_guide/FAQ.md +++ b/g3doc/user_guide/FAQ.md @@ -63,7 +63,8 @@ not realize a new file was copied to a given directory. To invalidate the cache and force a refresh, create a file under the directory in question and list the contents again. -As a workaround, shared root filesystem can be enabled. See [Filesystem][filesystem]. +As a workaround, shared root filesystem can be enabled. See +[Filesystem][filesystem]. This bug is tracked in [bug #4](https://gvisor.dev/issue/4). @@ -82,13 +83,16 @@ sudo chmod 0755 /usr/local/bin/runsc ### I'm getting an error like `mount submount "/etc/hostname": creating mount with source ".../hostname": input/output error: unknown.` {#memlock} -There is a bug in Linux kernel versions 5.1 to 5.3.15, 5.4.2, and 5.5. Upgrade to a newer kernel or add the following to `/lib/systemd/system/containerd.service` as a workaround. +There is a bug in Linux kernel versions 5.1 to 5.3.15, 5.4.2, and 5.5. Upgrade +to a newer kernel or add the following to +`/lib/systemd/system/containerd.service` as a workaround. ``` LimitMEMLOCK=infinity ``` -And run `systemctl daemon-reload && systemctl restart containerd` to restart containerd. +And run `systemctl daemon-reload && systemctl restart containerd` to restart +containerd. See [issue #1765](https://gvisor.dev/issue/1765) for more details. @@ -97,18 +101,18 @@ See [issue #1765](https://gvisor.dev/issue/1765) for more details. This is normally indicated by errors like `bad address 'container-name'` when trying to communicate to another container in the same network. -Docker user defined bridge uses an embedded DNS server bound to the loopback +Docker user defined bridge uses an embedded DNS server bound to the loopback interface on address 127.0.0.10. This requires access to the host network in -order to communicate to the DNS server. runsc network is isolated from the -host and cannot access the DNS server on the host network without breaking the +order to communicate to the DNS server. runsc network is isolated from the host +and cannot access the DNS server on the host network without breaking the sandbox isolation. There are a few different workarounds you can try: -* Use default bridge network with `--link` to connect containers. Default - bridge doesn't use embedded DNS. -* Use [`--network=host`][host-net] option in runsc, however beware that it will - use the host network stack and is less secure. -* Use IPs instead of container names. -* Use [Kubernetes][k8s]. Container name lookup works fine in Kubernetes. +* Use default bridge network with `--link` to connect containers. Default + bridge doesn't use embedded DNS. +* Use [`--network=host`][host-net] option in runsc, however beware that it + will use the host network stack and is less secure. +* Use IPs instead of container names. +* Use [Kubernetes][k8s]. Container name lookup works fine in Kubernetes. [security-model]: /docs/architecture_guide/security/ [host-net]: /docs/user_guide/networking/#network-passthrough diff --git a/g3doc/user_guide/checkpoint_restore.md b/g3doc/user_guide/checkpoint_restore.md index b0aa308f3..0ab0911b0 100644 --- a/g3doc/user_guide/checkpoint_restore.md +++ b/g3doc/user_guide/checkpoint_restore.md @@ -83,19 +83,19 @@ docker start --checkpoint --checkpoint-dir=<directory> <container> ### Issues Preventing Compatibility with Docker -- **[Moby #37360][leave-running]:** Docker version 18.03.0-ce and earlier hangs - when checkpointing and does not create the checkpoint. To successfully use - this feature, install a custom version of docker-ce from the moby repository. - This issue is caused by an improper implementation of the `--leave-running` - flag. This issue is fixed in newer releases. -- **Docker does not support restoration into new containers:** Docker currently - expects the container which created the checkpoint to be the same container - used to restore which is not possible in runsc. When Docker supports container - migration and therefore restoration into new containers, this will be the - flow. -- **[Moby #37344][checkpoint-dir]:** Docker does not currently support the - `--checkpoint-dir` flag but this will be required when restoring from a - checkpoint made in another container. +- **[Moby #37360][leave-running]:** Docker version 18.03.0-ce and earlier + hangs when checkpointing and does not create the checkpoint. To successfully + use this feature, install a custom version of docker-ce from the moby + repository. This issue is caused by an improper implementation of the + `--leave-running` flag. This issue is fixed in newer releases. +- **Docker does not support restoration into new containers:** Docker + currently expects the container which created the checkpoint to be the same + container used to restore which is not possible in runsc. When Docker + supports container migration and therefore restoration into new containers, + this will be the flow. +- **[Moby #37344][checkpoint-dir]:** Docker does not currently support the + `--checkpoint-dir` flag but this will be required when restoring from a + checkpoint made in another container. [leave-running]: https://github.com/moby/moby/pull/37360 [checkpoint-dir]: https://github.com/moby/moby/issues/37344 diff --git a/g3doc/user_guide/compatibility.md b/g3doc/user_guide/compatibility.md index 30c787e75..9d3e3680f 100644 --- a/g3doc/user_guide/compatibility.md +++ b/g3doc/user_guide/compatibility.md @@ -5,12 +5,12 @@ gVisor implements a large portion of the Linux surface and while we strive to make it broadly compatible, there are (and always will be) unimplemented features and bugs. The only real way to know if it will work is to try. If you -find a container that doesn’t work and there is no known issue, please [file a -bug][bug] indicating the full command you used to run the image. You can view -open issues related to compatibility [here][issues]. +find a container that doesn’t work and there is no known issue, please +[file a bug][bug] indicating the full command you used to run the image. You can +view open issues related to compatibility [here][issues]. -If you're able to provide the [debug logs](../debugging/), the -problem likely to be fixed much faster. +If you're able to provide the [debug logs](../debugging/), the problem likely to +be fixed much faster. ## What works? @@ -40,50 +40,54 @@ The following applications/images have been tested: Most common utilities work. Note that: -* Some tools, such as `tcpdump` and old versions of `ping`, require explicitly - enabling raw sockets via the unsafe `--net-raw` runsc flag. -* Different Docker images can behave differently. For example, Alpine Linux and - Ubuntu have different `ip` binaries. +* Some tools, such as `tcpdump` and old versions of `ping`, require explicitly + enabling raw sockets via the unsafe `--net-raw` runsc flag. +* Different Docker images can behave differently. For example, Alpine Linux + and Ubuntu have different `ip` binaries. - Specific tools include: + Specific tools include: -| Tool | Status | -| --- | --- | -| apt-get | Working | -| bundle | Working | -| cat | Working | -| curl | Working | -| dd | Working | -| df | Working | -| dig | Working | -| drill | Working | -| env | Working | -| find | Working | -| gdb | Working | -| gosu | Working | -| grep | Working (unless stdin is a pipe and stdout is /dev/null) | -| ifconfig | Works partially, like ip. Full support [in progress](https://gvisor.dev/issue/578) | -| ip | Some subcommands work (e.g. addr, route). Full support [in progress](https://gvisor.dev/issue/578) | -| less | Working | -| ls | Working | -| lsof | Working | -| mount | Works in readonly mode. gVisor doesn't currently support creating new mounts at runtime | -| nc | Working | -| nmap | Not working | -| netstat | [In progress](https://gvisor.dev/issue/2112) | -| nslookup | Working | -| ping | Working | -| ps | Working | -| route | Working | -| ss | [In progress](https://gvisor.dev/issue/2114) | -| sshd | Partially working. Job control [in progress](https://gvisor.dev/issue/154) | -| strace | Working | -| tar | Working | -| tcpdump | [In progress](https://gvisor.dev/issue/173) | -| top | Working | -| uptime | Working | -| vim | Working | -| wget | Working | +<!-- mdformat off(don't wrap the table) --> + +| Tool | Status | +|:--------:|:-----------------------------------------:| +| apt-get | Working. | +| bundle | Working. | +| cat | Working. | +| curl | Working. | +| dd | Working. | +| df | Working. | +| dig | Working. | +| drill | Working. | +| env | Working. | +| find | Working. | +| gdb | Working. | +| gosu | Working. | +| grep | Working (unless stdin is a pipe and stdout is /dev/null). | +| ifconfig | Works partially, like ip. Full support [in progress](https://gvisor.dev/issue/578). | +| ip | Some subcommands work (e.g. addr, route). Full support [in progress](https://gvisor.dev/issue/578). | +| less | Working. | +| ls | Working. | +| lsof | Working. | +| mount | Works in readonly mode. gVisor doesn't currently support creating new mounts at runtime. | +| nc | Working. | +| nmap | Not working. | +| netstat | [In progress](https://gvisor.dev/issue/2112). | +| nslookup | Working. | +| ping | Working. | +| ps | Working. | +| route | Working. | +| ss | [In progress](https://gvisor.dev/issue/2114). | +| sshd | Partially working. Job control [in progress](https://gvisor.dev/issue/154). | +| strace | Working. | +| tar | Working. | +| tcpdump | [In progress](https://gvisor.dev/issue/173). | +| top | Working. | +| uptime | Working. | +| vim | Working. | +| wget | Working. | + +<!-- mdformat on --> [bug]: https://github.com/google/gvisor/issues/new?title=Compatibility%20Issue: [issues]: https://github.com/google/gvisor/issues?q=is%3Aissue+is%3Aopen+label%3A%22area%3A+compatibility%22 diff --git a/g3doc/user_guide/debugging.md b/g3doc/user_guide/debugging.md index 38e26db76..0525fd5c0 100644 --- a/g3doc/user_guide/debugging.md +++ b/g3doc/user_guide/debugging.md @@ -21,8 +21,8 @@ To enable debug and system call logging, add the `runtimeArgs` below to your ``` > Note: the last `/` in `--debug-log` is needed to interpret it as a directory. -> Then each `runsc` command executed will create a separate log file. -> Otherwise, log messages from all commands will be appended to the same file. +> Then each `runsc` command executed will create a separate log file. Otherwise, +> log messages from all commands will be appended to the same file. You may also want to pass `--log-packets` to troubleshoot network problems. Then restart the Docker daemon: @@ -32,17 +32,17 @@ sudo systemctl restart docker ``` Run your container again, and inspect the files under `/tmp/runsc`. The log file -ending with `.boot` will contain the strace logs from your application, which can -be useful for identifying missing or broken system calls in gVisor. If you are -having problems starting the container, the log file ending with `.create` may -have the reason for the failure. +ending with `.boot` will contain the strace logs from your application, which +can be useful for identifying missing or broken system calls in gVisor. If you +are having problems starting the container, the log file ending with `.create` +may have the reason for the failure. ## Stack traces The command `runsc debug --stacks` collects stack traces while the sandbox is running which can be useful to troubleshoot issues or just to learn more about -gVisor. It connects to the sandbox process, collects a stack dump, and writes -it to the console. For example: +gVisor. It connects to the sandbox process, collects a stack dump, and writes it +to the console. For example: ```bash docker run --runtime=runsc --rm -d alpine sh -c "while true; do echo running; sleep 1; done" @@ -52,14 +52,14 @@ sudo runsc --root /var/run/docker/runtime-runsc/moby debug --stacks 63254c6ab3a6 ``` > Note: `--root` variable is provided by docker and is normally set to -> `/var/run/docker/runtime-[runtime-name]/moby`. If in doubt, `--root` is logged to -> `runsc` logs. +> `/var/run/docker/runtime-[runtime-name]/moby`. If in doubt, `--root` is logged +> to `runsc` logs. ## Debugger -You can debug gVisor like any other Golang program. If you're running with Docker, -you'll need to find the sandbox PID and attach the debugger as root. Here is an -example: +You can debug gVisor like any other Golang program. If you're running with +Docker, you'll need to find the sandbox PID and attach the debugger as root. +Here is an example: ```bash # Get a runsc with debug symbols (download nightly or build with symbols). @@ -81,9 +81,9 @@ continue ## Profiling -`runsc` integrates with Go profiling tools and gives you easy commands to profile -CPU and heap usage. First you need to enable `--profile` in the command line options -before starting the container: +`runsc` integrates with Go profiling tools and gives you easy commands to +profile CPU and heap usage. First you need to enable `--profile` in the command +line options before starting the container: ```json { @@ -101,13 +101,13 @@ before starting the container: > Note: Enabling profiling loosens the seccomp protection added to the sandbox, > and should not be run in production under normal circumstances. -Then restart docker to refresh the runtime options. While the container is running, -execute `runsc debug` to collect profile information and save to a file. Here are -the options available: +Then restart docker to refresh the runtime options. While the container is +running, execute `runsc debug` to collect profile information and save to a +file. Here are the options available: -* **--profile-heap:** Generates heap profile to the speficied file. -* **--profile-cpu:** Enables CPU profiler, waits for `--duration` seconds - and generates CPU profile to the speficied file. +* **--profile-heap:** Generates heap profile to the speficied file. +* **--profile-cpu:** Enables CPU profiler, waits for `--duration` seconds and + generates CPU profile to the speficied file. For example: @@ -119,9 +119,9 @@ sudo runsc --root /var/run/docker/runtime-runsc-prof/moby debug --profile-heap=/ sudo runsc --root /var/run/docker/runtime-runsc-prof/moby debug --profile-cpu=/tmp/cpu.prof --duration=30s 63254c6ab3a6989623fa1fb53616951eed31ac605a2637bb9ddba5d8d404b35b ``` -The resulting files can be opened using `go tool pprof` or [pprof][]. The examples -below create image file (`.svg`) with the heap profile and writes the top -functions using CPU to the console: +The resulting files can be opened using `go tool pprof` or [pprof][]. The +examples below create image file (`.svg`) with the heap profile and writes the +top functions using CPU to the console: ```bash go tool pprof -svg /usr/local/bin/runsc /tmp/heap.prof diff --git a/g3doc/user_guide/filesystem.md b/g3doc/user_guide/filesystem.md index 50a1c0020..6c69f42a1 100644 --- a/g3doc/user_guide/filesystem.md +++ b/g3doc/user_guide/filesystem.md @@ -4,19 +4,19 @@ gVisor accesses the filesystem through a file proxy, called the Gofer. The gofer runs as a separate process, that is isolated from the sandbox. Gofer instances -communicate with their respective sentry using the 9P protocol. For a more detailed -explanation see [Overview > Gofer](../../architecture_guide/#gofer). +communicate with their respective sentry using the 9P protocol. For a more +detailed explanation see [Overview > Gofer](../../architecture_guide/#gofer). ## Sandbox overlay -To isolate the host filesystem from the sandbox, you can set a writable tmpfs overlay -on top of the entire filesystem. All modifications are made to the overlay, keeping -the host filesystem unmodified. +To isolate the host filesystem from the sandbox, you can set a writable tmpfs +overlay on top of the entire filesystem. All modifications are made to the +overlay, keeping the host filesystem unmodified. > Note: All created and modified files are stored in memory inside the sandbox. -To use the tmpfs overlay, add the following `runtimeArgs` to your Docker configuration -(`/etc/docker/daemon.json`) and restart the Docker daemon: +To use the tmpfs overlay, add the following `runtimeArgs` to your Docker +configuration (`/etc/docker/daemon.json`) and restart the Docker daemon: ```json { @@ -33,17 +33,18 @@ To use the tmpfs overlay, add the following `runtimeArgs` to your Docker configu ## Shared root filesystem -The root filesystem is where the image is extracted and is not generally modified -from outside the sandbox. This allows for some optimizations, like skipping checks -to determine if a directory has changed since the last time it was cached, thus -missing updates that may have happened. If you need to `docker cp` files inside the -root filesystem, you may want to enable shared mode. Just be aware that file system -access will be slower due to the extra checks that are required. +The root filesystem is where the image is extracted and is not generally +modified from outside the sandbox. This allows for some optimizations, like +skipping checks to determine if a directory has changed since the last time it +was cached, thus missing updates that may have happened. If you need to `docker +cp` files inside the root filesystem, you may want to enable shared mode. Just +be aware that file system access will be slower due to the extra checks that are +required. > Note: External mounts are always shared. -To use set the root filesystem shared, add the following `runtimeArgs` to your Docker -configuration (`/etc/docker/daemon.json`) and restart the Docker daemon: +To use set the root filesystem shared, add the following `runtimeArgs` to your +Docker configuration (`/etc/docker/daemon.json`) and restart the Docker daemon: ```json { diff --git a/g3doc/user_guide/install.md b/g3doc/user_guide/install.md index a4cb926f5..0de2b9932 100644 --- a/g3doc/user_guide/install.md +++ b/g3doc/user_guide/install.md @@ -20,11 +20,11 @@ to the preferred installation mechanism: manual or from an `apt` repository. Binaries are available for every commit on the `master` branch, and are available at the following URL: - `https://storage.googleapis.com/gvisor/releases/master/latest/runsc` +`https://storage.googleapis.com/gvisor/releases/master/latest/runsc` Checksums for the release binary are at: - `https://storage.googleapis.com/gvisor/releases/master/latest/runsc.sha512` +`https://storage.googleapis.com/gvisor/releases/master/latest/runsc.sha512` For `apt` installation, use the `master` as the `${DIST}` below. @@ -33,15 +33,15 @@ For `apt` installation, use the `master` as the `${DIST}` below. Nightly releases are built most nights from the master branch, and are available at the following URL: - `https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc` +`https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc` Checksums for the release binary are at: - `https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc.sha512` +`https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc.sha512` Specific nightly releases can be found at: - `https://storage.googleapis.com/gvisor/releases/nightly/${yyyy-mm-dd}/runsc` +`https://storage.googleapis.com/gvisor/releases/nightly/${yyyy-mm-dd}/runsc` Note that a release may not be available for every day. @@ -51,7 +51,7 @@ For `apt` installation, use the `nightly` as the `${DIST}` below. The latest official release is available at the following URL: - `https://storage.googleapis.com/gvisor/releases/release/latest` +`https://storage.googleapis.com/gvisor/releases/release/latest` For `apt` installation, use the `release` as the `${DIST}` below. @@ -59,7 +59,7 @@ For `apt` installation, use the `release` as the `${DIST}` below. A given release release is available at the following URL: - `https://storage.googleapis.com/gvisor/releases/release/${yyyymmdd}` +`https://storage.googleapis.com/gvisor/releases/release/${yyyymmdd}` See the [releases][releases] page for information about specific releases. @@ -72,7 +72,7 @@ use the date of the release, e.g. `${yyyymmdd}`, as the `${DIST}` below. A given point release is available at the following URL: - `https://storage.googleapis.com/gvisor/releases/release/${yyyymmdd}.${rc}` +`https://storage.googleapis.com/gvisor/releases/release/${yyyymmdd}.${rc}` Note that `apt` installation of a specific point release is not supported. @@ -100,10 +100,10 @@ curl -fsSL https://gvisor.dev/archive.key | sudo apt-key add - Based on the release type, you will need to substitute `${DIST}` below, using one of: -* `master`: For HEAD. -* `nightly`: For nightly releases. -* `release`: For the latest release. -* `${yyyymmdd}`: For a specific releases (see above). +* `master`: For HEAD. +* `nightly`: For nightly releases. +* `release`: For the latest release. +* `${yyyymmdd}`: For a specific releases (see above). The repository for the release you wish to install should be added: diff --git a/g3doc/user_guide/networking.md b/g3doc/user_guide/networking.md index 348b66bfd..4aa394c91 100644 --- a/g3doc/user_guide/networking.md +++ b/g3doc/user_guide/networking.md @@ -19,8 +19,8 @@ docker run --rm --runtime=runsc alpine ip addr ## Network passthrough For high-performance networking applications, you may choose to disable the user -space network stack and instead use the host network stack, including the loopback. -Note that this mode decreases the isolation to the host. +space network stack and instead use the host network stack, including the +loopback. Note that this mode decreases the isolation to the host. Add the following `runtimeArgs` to your Docker configuration (`/etc/docker/daemon.json`) and restart the Docker daemon: @@ -40,9 +40,8 @@ Add the following `runtimeArgs` to your Docker configuration ## Disabling external networking -To completely isolate the host and network from the sandbox, external -networking can be disabled. The sandbox will still contain a loopback provided -by netstack. +To completely isolate the host and network from the sandbox, external networking +can be disabled. The sandbox will still contain a loopback provided by netstack. Add the following `runtimeArgs` to your Docker configuration (`/etc/docker/daemon.json`) and restart the Docker daemon: @@ -67,7 +66,8 @@ Offload (GSO) to run with a kernel that is newer than 3.17. Add the `--gso=false` flag to your Docker runtime configuration (`/etc/docker/daemon.json`) and restart the Docker daemon: -> Note: Network performance, especially for large payloads, will be greatly reduced. +> Note: Network performance, especially for large payloads, will be greatly +> reduced. ```json { diff --git a/g3doc/user_guide/platforms.md b/g3doc/user_guide/platforms.md index f13092016..eefb6b222 100644 --- a/g3doc/user_guide/platforms.md +++ b/g3doc/user_guide/platforms.md @@ -8,15 +8,15 @@ platform. ## What is a Platform? gVisor requires a *platform* to implement interception of syscalls, basic -context switching, and memory mapping functionality. These are described in -more depth in the [Platform Design](../../architecture_guide/platforms/). +context switching, and memory mapping functionality. These are described in more +depth in the [Platform Design](../../architecture_guide/platforms/). ## Selecting a Platform The platform is selected by the `--platform` command line flag passed to `runsc`. By default, the ptrace platform is selected. To select a different -platform, modify your Docker configuration (`/etc/docker/daemon.json`) to -pass this argument: +platform, modify your Docker configuration (`/etc/docker/daemon.json`) to pass +this argument: ```json { @@ -57,10 +57,11 @@ If you are using a virtual machine you will need to make sure that nested virtualization is configured. Here are links to documents on how to set up nested virtualization in several popular environments: -* Google Cloud: [Enabling Nested Virtualization for VM Instances][nested-gcp] -* Microsoft Azure: [How to enable nested virtualization in an Azure VM][nested-azure] -* VirtualBox: [Nested Virtualization][nested-virtualbox] -* KVM: [Nested Guests][nested-kvm] +* Google Cloud: [Enabling Nested Virtualization for VM Instances][nested-gcp] +* Microsoft Azure: + [How to enable nested virtualization in an Azure VM][nested-azure] +* VirtualBox: [Nested Virtualization][nested-virtualbox] +* KVM: [Nested Guests][nested-kvm] ***Note: nested virtualization will have poor performance and is historically a cause of security issues (e.g. @@ -70,9 +71,9 @@ recommended for production.*** ### Configuring Docker Per above, you will need to configure Docker to use `runsc` with the KVM -platform. You will remember from the Docker Quick Start that you configured -Docker to use `runsc` as the runtime. Docker allows you to add multiple -runtimes to the Docker configuration. +platform. You will remember from the Docker Quick Start that you configured +Docker to use `runsc` as the runtime. Docker allows you to add multiple runtimes +to the Docker configuration. Add a new entry for the KVM platform entry to your Docker configuration (`/etc/docker/daemon.json`) in order to provide the `--platform=kvm` runtime diff --git a/g3doc/user_guide/quick_start/docker.md b/g3doc/user_guide/quick_start/docker.md index 7dfc3d4b7..5228db4c0 100644 --- a/g3doc/user_guide/quick_start/docker.md +++ b/g3doc/user_guide/quick_start/docker.md @@ -13,10 +13,10 @@ the next section and proceed straight to running a container. ## Configuring Docker -First you will need to configure Docker to use `runsc` by adding a runtime -entry to your Docker configuration (`/etc/docker/daemon.json`). You may have to -create this file if it does not exist. Also, some Docker versions also require -you to [specify the `storage-driver` field][storage-driver]. +First you will need to configure Docker to use `runsc` by adding a runtime entry +to your Docker configuration (`/etc/docker/daemon.json`). You may have to create +this file if it does not exist. Also, some Docker versions also require you to +[specify the `storage-driver` field][storage-driver]. In the end, the file should look something like: @@ -51,7 +51,8 @@ You can also run a terminal to explore the container. docker run --runtime=runsc --rm -it ubuntu /bin/bash ``` -Many docker options are compatible with gVisor, try them out. Here is an example: +Many docker options are compatible with gVisor, try them out. Here is an +example: ```bash docker run --runtime=runsc --rm --link backend:database -v ~/bin:/tools:ro -p 8080:80 --cpus=0.5 -it busybox telnet towel.blinkenlights.nl diff --git a/g3doc/user_guide/quick_start/kubernetes.md b/g3doc/user_guide/quick_start/kubernetes.md index 237b3c17f..b1f67252e 100644 --- a/g3doc/user_guide/quick_start/kubernetes.md +++ b/g3doc/user_guide/quick_start/kubernetes.md @@ -5,18 +5,18 @@ with Kubernetes. ## Using Minikube -gVisor can run sandboxed containers in a Kubernetes cluster with Minikube. -After the gVisor addon is enabled, pods with -`io.kubernetes.cri.untrusted-workload` set to true will execute with `runsc`. -Follow [these instructions][minikube] to enable gVisor addon. +gVisor can run sandboxed containers in a Kubernetes cluster with Minikube. After +the gVisor addon is enabled, pods with `io.kubernetes.cri.untrusted-workload` +set to true will execute with `runsc`. Follow [these instructions][minikube] to +enable gVisor addon. ## Using Containerd You can also setup Kubernetes nodes to run pods in gvisor using the [containerd][containerd] CRI runtime and the `gvisor-containerd-shim`. You can use either the `io.kubernetes.cri.untrusted-workload` annotation or -[RuntimeClass][runtimeclass] to run Pods with `runsc`. You can find -instructions [here][gvisor-containerd-shim]. +[RuntimeClass][runtimeclass] to run Pods with `runsc`. You can find instructions +[here][gvisor-containerd-shim]. ## Using GKE Sandbox diff --git a/g3doc/user_guide/quick_start/oci.md b/g3doc/user_guide/quick_start/oci.md index 271ed24ce..57bcc4f63 100644 --- a/g3doc/user_guide/quick_start/oci.md +++ b/g3doc/user_guide/quick_start/oci.md @@ -38,7 +38,8 @@ Finally run the container. sudo runsc run hello ``` -Next try [using CNI to set up networking](../../../tutorials/cni/) or [running gVisor using Docker](../docker/). +Next try [using CNI to set up networking](../../../tutorials/cni/) or +[running gVisor using Docker](../docker/). [oci]: https://opencontainers.org/ [install]: /docs/user_guide/install diff --git a/g3doc/user_guide/tutorials/cni.md b/g3doc/user_guide/tutorials/cni.md index 6546f2737..ad6c9fa59 100644 --- a/g3doc/user_guide/tutorials/cni.md +++ b/g3doc/user_guide/tutorials/cni.md @@ -1,12 +1,13 @@ # Using CNI This tutorial will show you how to set up networking for a gVisor sandbox using -the [Container Networking Interface (CNI)](https://github.com/containernetworking/cni). +the +[Container Networking Interface (CNI)](https://github.com/containernetworking/cni). ## Install CNI Plugins -First you will need to install the CNI plugins. CNI plugins are used to set up -a network namespace that `runsc` can use with the sandbox. +First you will need to install the CNI plugins. CNI plugins are used to set up a +network namespace that `runsc` can use with the sandbox. Start by creating the directories for CNI plugin binaries: @@ -74,8 +75,8 @@ EOF' ## Create a Network Namespace For each gVisor sandbox you will create a network namespace and configure it -using CNI. First, create a random network namespace name and then create -the namespace. +using CNI. First, create a random network namespace name and then create the +namespace. The network namespace path will then be `/var/run/netns/${CNI_CONTAINERID}`. @@ -113,8 +114,8 @@ Now that our network namespace is created and configured, we can create the OCI bundle for our container. As part of the bundle's `config.json` we will specify that the container use the network namespace that we created. -The container will run a simple python webserver that we will be able to -connect to via the IP address assigned to it via the bridge CNI plugin. +The container will run a simple python webserver that we will be able to connect +to via the IP address assigned to it via the bridge CNI plugin. Create the bundle and root filesystem directories: @@ -127,13 +128,12 @@ sudo mkdir -p rootfs/var/www/html sudo sh -c 'echo "Hello World!" > rootfs/var/www/html/index.html' ``` -Next create the `config.json` specifying the network namespace. -``` -sudo /usr/local/bin/runsc spec -sudo sed -i 's;"sh";"python", "-m", "http.server";' config.json -sudo sed -i "s;\"cwd\": \"/\";\"cwd\": \"/var/www/html\";" config.json -sudo sed -i "s;\"type\": \"network\";\"type\": \"network\",\n\t\t\t\t\"path\": \"/var/run/netns/${CNI_CONTAINERID}\";" config.json -``` +Next create the `config.json` specifying the network namespace. `sudo +/usr/local/bin/runsc spec sudo sed -i 's;"sh";"python", "-m", "http.server";' +config.json sudo sed -i "s;\"cwd\": \"/\";\"cwd\": \"/var/www/html\";" +config.json sudo sed -i "s;\"type\": \"network\";\"type\": +\"network\",\n\t\t\t\t\"path\": \"/var/run/netns/${CNI_CONTAINERID}\";" +config.json` ## Run the Container diff --git a/g3doc/user_guide/tutorials/docker.md b/g3doc/user_guide/tutorials/docker.md index 514af8489..c0a3db506 100644 --- a/g3doc/user_guide/tutorials/docker.md +++ b/g3doc/user_guide/tutorials/docker.md @@ -5,13 +5,13 @@ This page shows you how to deploy a sample [WordPress][wordpress] site using ### Before you begin -[Follow these instructions][docker-install] to install runsc with Docker. -This document assumes that the runtime name chosen is `runsc`. +[Follow these instructions][docker-install] to install runsc with Docker. This +document assumes that the runtime name chosen is `runsc`. ### Running WordPress -Now, let's deploy a WordPress site using Docker. WordPress site requires -two containers: web server in the frontend, MySQL database in the backend. +Now, let's deploy a WordPress site using Docker. WordPress site requires two +containers: web server in the frontend, MySQL database in the backend. First, let's define a few environment variables that are shared between both containers: diff --git a/g3doc/user_guide/tutorials/kubernetes.md b/g3doc/user_guide/tutorials/kubernetes.md index a686c1982..d2a94b1b7 100644 --- a/g3doc/user_guide/tutorials/kubernetes.md +++ b/g3doc/user_guide/tutorials/kubernetes.md @@ -7,9 +7,9 @@ This page shows you how to deploy a sample [WordPress][wordpress] site using Take the following steps to enable the Kubernetes Engine API: -1. Visit the [Kubernetes Engine page][project-selector] in the Google Cloud - Platform Console. -1. Create or select a project. +1. Visit the [Kubernetes Engine page][project-selector] in the Google Cloud + Platform Console. +1. Create or select a project. ### Creating a node pool with gVisor enabled @@ -43,8 +43,8 @@ kubectl get runtimeclasses Now, let's deploy a WordPress site using GKE Sandbox. WordPress site requires two pods: web server in the frontend, MySQL database in the backend. Both -applications use PersistentVolumes to store the site data data. -In addition, they use secret store to share MySQL password between them. +applications use PersistentVolumes to store the site data data. In addition, +they use secret store to share MySQL password between them. First, let's download the deployment configuration files to add the runtime class annotation to them: @@ -57,150 +57,50 @@ curl -LO https://k8s.io/examples/application/wordpress/mysql-deployment.yaml Add a **spec.template.spec.runtimeClassName** set to **gvisor** to both files, as shown below: -**wordpress-deployment.yaml:** -```yaml -apiVersion: v1 -kind: Service -metadata: - name: wordpress - labels: - app: wordpress -spec: - ports: - - port: 80 - selector: - app: wordpress - tier: frontend - type: LoadBalancer ---- -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: wp-pv-claim - labels: - app: wordpress -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 20Gi ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: wordpress - labels: - app: wordpress -spec: - selector: - matchLabels: - app: wordpress - tier: frontend - strategy: - type: Recreate - template: - metadata: - labels: - app: wordpress - tier: frontend - spec: - runtimeClassName: gvisor # ADD THIS LINE - containers: - - image: wordpress:4.8-apache - name: wordpress - env: - - name: WORDPRESS_DB_HOST - value: wordpress-mysql - - name: WORDPRESS_DB_PASSWORD - valueFrom: - secretKeyRef: - name: mysql-pass - key: password - ports: - - containerPort: 80 - name: wordpress - volumeMounts: - - name: wordpress-persistent-storage - mountPath: /var/www/html - volumes: - - name: wordpress-persistent-storage - persistentVolumeClaim: - claimName: wp-pv-claim -``` +**wordpress-deployment.yaml:** ```yaml apiVersion: v1 kind: Service metadata: +name: wordpress labels: app: wordpress spec: ports: - port: 80 selector: app: +wordpress tier: frontend -**mysql-deployment.yaml:** -```yaml -apiVersion: v1 -kind: Service -metadata: - name: wordpress-mysql - labels: - app: wordpress -spec: - ports: - - port: 3306 - selector: - app: wordpress - tier: mysql - clusterIP: None ---- -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: mysql-pv-claim - labels: - app: wordpress -spec: - accessModes: - - ReadWriteOnce - resources: - requests: - storage: 20Gi ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: wordpress-mysql - labels: - app: wordpress -spec: - selector: - matchLabels: - app: wordpress - tier: mysql - strategy: - type: Recreate - template: - metadata: - labels: - app: wordpress - tier: mysql - spec: - runtimeClassName: gvisor # ADD THIS LINE - containers: - - image: mysql:5.6 - name: mysql - env: - - name: MYSQL_ROOT_PASSWORD - valueFrom: - secretKeyRef: - name: mysql-pass - key: password - ports: - - containerPort: 3306 - name: mysql - volumeMounts: - - name: mysql-persistent-storage - mountPath: /var/lib/mysql - volumes: - - name: mysql-persistent-storage - persistentVolumeClaim: - claimName: mysql-pv-claim -``` +## type: LoadBalancer + +apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wp-pv-claim labels: +app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: + +## storage: 20Gi + +apiVersion: apps/v1 kind: Deployment metadata: name: wordpress labels: app: +wordpress spec: selector: matchLabels: app: wordpress tier: frontend strategy: +type: Recreate template: metadata: labels: app: wordpress tier: frontend spec: +runtimeClassName: gvisor # ADD THIS LINE containers: - image: +wordpress:4.8-apache name: wordpress env: - name: WORDPRESS_DB_HOST value: +wordpress-mysql - name: WORDPRESS_DB_PASSWORD valueFrom: secretKeyRef: name: +mysql-pass key: password ports: - containerPort: 80 name: wordpress +volumeMounts: - name: wordpress-persistent-storage mountPath: /var/www/html +volumes: - name: wordpress-persistent-storage persistentVolumeClaim: claimName: +wp-pv-claim ``` + +**mysql-deployment.yaml:** ```yaml apiVersion: v1 kind: Service metadata: name: +wordpress-mysql labels: app: wordpress spec: ports: - port: 3306 selector: app: +wordpress tier: mysql + +## clusterIP: None + +apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim +labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: + +## storage: 20Gi + +apiVersion: apps/v1 kind: Deployment metadata: name: wordpress-mysql labels: +app: wordpress spec: selector: matchLabels: app: wordpress tier: mysql strategy: +type: Recreate template: metadata: labels: app: wordpress tier: mysql spec: +runtimeClassName: gvisor # ADD THIS LINE containers: - image: mysql:5.6 name: +mysql env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-pass +key: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: +mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: +mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim ``` -Note that apart from `runtimeClassName: gvisor`, nothing else about the -Deployment has is changed. +Note that apart from `runtimeClassName: gvisor`, nothing else about the +Deployment has is changed. You are now ready to deploy the entire application. Just create a secret to store MySQL's password and *apply* both deployments: @@ -225,8 +125,8 @@ Congratulations! You have just deployed a WordPress site using GKE Sandbox. ### What's next -To learn more about GKE Sandbox and how to run your deployment securely, take -a look at the [documentation][gke-sandbox-docs]. +To learn more about GKE Sandbox and how to run your deployment securely, take a +look at the [documentation][gke-sandbox-docs]. [gke-sandbox-docs]: https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods [gke-sandbox]: https://cloud.google.com/kubernetes-engine/sandbox/ |