diff options
Diffstat (limited to 'g3doc/architecture_guide/performance.md')
-rw-r--r-- | g3doc/architecture_guide/performance.md | 277 |
1 files changed, 0 insertions, 277 deletions
diff --git a/g3doc/architecture_guide/performance.md b/g3doc/architecture_guide/performance.md deleted file mode 100644 index b89facfd3..000000000 --- a/g3doc/architecture_guide/performance.md +++ /dev/null @@ -1,277 +0,0 @@ -# Performance Guide - -[TOC] - -gVisor is designed to provide a secure, virtualized environment while preserving -key benefits of containerization, such as small fixed overheads and a dynamic -resource footprint. For containerized infrastructure, this can provide a -turn-key solution for sandboxing untrusted workloads: there are no changes to -the fundamental resource model. - -gVisor imposes runtime costs over native containers. These costs come in two -forms: additional cycles and memory usage, which may manifest as increased -latency, reduced throughput or density, or not at all. In general, these costs -come from two different sources. - -First, the existence of the [Sentry](../README.md#sentry) means that additional -memory will be required, and application system calls must traverse additional -layers of software. The design emphasizes -[security](/docs/architecture_guide/security/) and therefore we chose to use a -language for the Sentry that provides benefits in this domain but may not yet -offer the raw performance of other choices. Costs imposed by these design -choices are **structural costs**. - -Second, as gVisor is an independent implementation of the system call surface, -many of the subsystems or specific calls are not as optimized as more mature -implementations. A good example here is the network stack, which is continuing -to evolve but does not support all the advanced recovery mechanisms offered by -other stacks and is less CPU efficient. This is an **implementation cost** and -is distinct from **structural costs**. Improvements here are ongoing and driven -by the workloads that matter to gVisor users and contributors. - -This page provides a guide for understanding baseline performance, and calls out -distinct **structural costs** and **implementation costs**, highlighting where -improvements are possible and not possible. - -While we include a variety of workloads here, it’s worth emphasizing that gVisor -may not be an appropriate solution for every workload, for reasons other than -performance. For example, a sandbox may provide minimal benefit for a trusted -database, since _user data would already be inside the sandbox_ and there is no -need for an attacker to break out in the first place. - -## Methodology - -All data below was generated using the [benchmark tools][benchmark-tools] -repository, and the machines under test are uniform [Google Compute Engine][gce] -Virtual Machines (VMs) with the following specifications: - - Machine type: n1-standard-4 (broadwell) - Image: Debian GNU/Linux 9 (stretch) 4.19.0-0 - BootDisk: 2048GB SSD persistent disk - -Through this document, `runsc` is used to indicate the runtime provided by -gVisor. When relevant, we use the name `runsc-platform` to describe a specific -[platform choice](/docs/architecture_guide/platforms/). - -**Except where specified, all tests below are conducted with the `ptrace` -platform. The `ptrace` platform works everywhere and does not require hardware -virtualization or kernel modifications but suffers from the highest structural -costs by far. This platform is used to provide a clear understanding of the -performance model, but in no way represents an ideal scenario. In the future, -this guide will be extended to bare metal environments and include additional -platforms.** - -## Memory access - -gVisor does not introduce any additional costs with respect to raw memory -accesses. Page faults and other Operating System (OS) mechanisms are translated -through the Sentry, but once mappings are installed and available to the -application, there is no additional overhead. - -{% include graph.html id="sysbench-memory" -url="/performance/sysbench-memory.csv" title="perf.py sysbench.memory ---runtime=runc --runtime=runsc" %} - -The above figure demonstrates the memory transfer rate as measured by -`sysbench`. - -## Memory usage - -The Sentry provides an additional layer of indirection, and it requires memory -in order to store state associated with the application. This memory generally -consists of a fixed component, plus an amount that varies with the usage of -operating system resources (e.g. how many sockets or files are opened). - -For many use cases, fixed memory overheads are a primary concern. This may be -because sandboxed containers handle a low volume of requests, and it is -therefore important to achieve high densities for efficiency. - -{% include graph.html id="density" url="/performance/density.csv" title="perf.py -density --runtime=runc --runtime=runsc" log="true" y_min="100000" %} - -The above figure demonstrates these costs based on three sample applications. -This test is the result of running many instances of a container (50, or 5 in -the case of redis) and calculating available memory on the host before and -afterwards, and dividing the difference by the number of containers. This -technique is used for measuring memory usage over the `usage_in_bytes` value of -the container cgroup because we found that some container runtimes, other than -`runc` and `runsc`, do not use an individual container cgroup. - -The first application is an instance of `sleep`: a trivial application that does -nothing. The second application is a synthetic `node` application which imports -a number of modules and listens for requests. The third application is a similar -synthetic `ruby` application which does the same. Finally, we include an -instance of `redis` storing approximately 1GB of data. In all cases, the sandbox -itself is responsible for a small, mostly fixed amount of memory overhead. - -## CPU performance - -gVisor does not perform emulation or otherwise interfere with the raw execution -of CPU instructions by the application. Therefore, there is no runtime cost -imposed for CPU operations. - -{% include graph.html id="sysbench-cpu" url="/performance/sysbench-cpu.csv" -title="perf.py sysbench.cpu --runtime=runc --runtime=runsc" %} - -The above figure demonstrates the `sysbench` measurement of CPU events per -second. Events per second is based on a CPU-bound loop that calculates all prime -numbers in a specified range. We note that `runsc` does not impose a performance -penalty, as the code is executing natively in both cases. - -This has important consequences for classes of workloads that are often -CPU-bound, such as data processing or machine learning. In these cases, `runsc` -will similarly impose minimal runtime overhead. - -{% include graph.html id="tensorflow" url="/performance/tensorflow.csv" -title="perf.py tensorflow --runtime=runc --runtime=runsc" %} - -For example, the above figure shows a sample TensorFlow workload, the -[convolutional neural network example][cnn]. The time indicated includes the -full start-up and run time for the workload, which trains a model. - -## System calls - -Some **structural costs** of gVisor are heavily influenced by the -[platform choice](/docs/architecture_guide/platforms/), which implements system -call interception. Today, gVisor supports a variety of platforms. These -platforms present distinct performance, compatibility and security trade-offs. -For example, the KVM platform has low overhead system call interception but runs -poorly with nested virtualization. - -{% include graph.html id="syscall" url="/performance/syscall.csv" title="perf.py -syscall --runtime=runc --runtime=runsc-ptrace --runtime=runsc-kvm" y_min="100" -log="true" %} - -The above figure demonstrates the time required for a raw system call on various -platforms. The test is implemented by a custom binary which performs a large -number of system calls and calculates the average time required. - -This cost will principally impact applications that are system call bound, which -tend to be high-performance data stores and static network services. In general, -the impact of system call interception will be lower the more work an -application does. - -{% include graph.html id="redis" url="/performance/redis.csv" title="perf.py -redis --runtime=runc --runtime=runsc" %} - -For example, `redis` is an application that performs relatively little work in -userspace: in general it reads from a connected socket, reads or modifies some -data, and writes a result back to the socket. The above figure shows the results -of running [comprehensive set of benchmarks][redis-benchmark]. We can see that -small operations impose a large overhead, while larger operations, such as -`LRANGE`, where more work is done in the application, have a smaller relative -overhead. - -Some of these costs above are **structural costs**, and `redis` is likely to -remain a challenging performance scenario. However, optimizing the -[platform](/docs/architecture_guide/platforms/) will also have a dramatic -impact. - -## Start-up time - -For many use cases, the ability to spin-up containers quickly and efficiently is -important. A sandbox may be short-lived and perform minimal user work (e.g. a -function invocation). - -{% include graph.html id="startup" url="/performance/startup.csv" title="perf.py -startup --runtime=runc --runtime=runsc" %} - -The above figure indicates how total time required to start a container through -[Docker][docker]. This benchmark uses three different applications. First, an -alpine Linux-container that executes `true`. Second, a `node` application that -loads a number of modules and binds an HTTP server. The time is measured by a -successful request to the bound port. Finally, a `ruby` application that -similarly loads a number of modules and binds an HTTP server. - -> Note: most of the time overhead above is associated Docker itself. This is -> evident with the empty `runc` benchmark. To avoid these costs with `runsc`, -> you may also consider using `runsc do` mode or invoking the -> [OCI runtime](../user_guide/quick_start/oci.md) directly. - -## Network - -Networking is mostly bound by **implementation costs**, and gVisor's network -stack is improving quickly. - -While typically not an important metric in practice for common sandbox use -cases, nevertheless `iperf` is a common microbenchmark used to measure raw -throughput. - -{% include graph.html id="iperf" url="/performance/iperf.csv" title="perf.py -iperf --runtime=runc --runtime=runsc" %} - -The above figure shows the result of an `iperf` test between two instances. For -the upload case, the specified runtime is used for the `iperf` client, and in -the download case, the specified runtime is the server. A native runtime is -always used for the other endpoint in the test. - -{% include graph.html id="applications" metric="requests_per_second" -url="/performance/applications.csv" title="perf.py http.(node|ruby) ---connections=25 --runtime=runc --runtime=runsc" %} - -The above figure shows the result of simple `node` and `ruby` web services that -render a template upon receiving a request. Because these synthetic benchmarks -do minimal work per request, much like the `redis` case, they suffer from high -overheads. In practice, the more work an application does the smaller the impact -of **structural costs** become. - -## File system - -Some aspects of file system performance are also reflective of **implementation -costs**, and an area where gVisor's implementation is improving quickly. - -In terms of raw disk I/O, gVisor does not introduce significant fundamental -overhead. For general file operations, gVisor introduces a small fixed overhead -for data that transitions across the sandbox boundary. This manifests as -**structural costs** in some cases, since these operations must be routed -through the [Gofer](../README.md#gofer) as a result of our -[Security Model](/docs/architecture_guide/security/), but in most cases are -dominated by **implementation costs**, due to an internal -[Virtual File System][vfs] (VFS) implementation that needs improvement. - -{% include graph.html id="fio-bw" url="/performance/fio.csv" title="perf.py fio ---engine=sync --runtime=runc --runtime=runsc" log="true" %} - -The above figures demonstrate the results of `fio` for reads and writes to and -from the disk. In this case, the disk quickly becomes the bottleneck and -dominates other costs. - -{% include graph.html id="fio-tmpfs-bw" url="/performance/fio-tmpfs.csv" -title="perf.py fio --engine=sync --runtime=runc --tmpfs=True --runtime=runsc" -log="true" %} - -The above figure shows the raw I/O performance of using a `tmpfs` mount which is -sandbox-internal in the case of `runsc`. Generally these operations are -similarly bound to the cost of copying around data in-memory, and we don't see -the cost of VFS operations. - -{% include graph.html id="httpd100k" metric="transfer_rate" -url="/performance/httpd100k.csv" title="perf.py http.httpd --connections=1 ---connections=5 --connections=10 --connections=25 --runtime=runc ---runtime=runsc" %} - -The high costs of VFS operations can manifest in benchmarks that execute many -such operations in the hot path for serving requests, for example. The above -figure shows the result of using gVisor to serve small pieces of static content -with predictably poor results. This workload represents `apache` serving a -single file sized 100k from the container image to a client running -[ApacheBench][ab] with varying levels of concurrency. The high overhead comes -principally from the VFS implementation that needs improvement, with several -internal serialization points (since all requests are reading the same file). -Note that some of some of network stack performance issues also impact this -benchmark. - -{% include graph.html id="ffmpeg" url="/performance/ffmpeg.csv" title="perf.py -media.ffmpeg --runtime=runc --runtime=runsc" %} - -For benchmarks that are bound by raw disk I/O and a mix of compute, file system -operations are less of an issue. The above figure shows the total time required -for an `ffmpeg` container to start, load and transcode a 27MB input video. - -[ab]: https://en.wikipedia.org/wiki/ApacheBench -[benchmark-tools]: https://github.com/google/gvisor/tree/master/test/benchmarks -[gce]: https://cloud.google.com/compute/ -[cnn]: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py -[docker]: https://docker.io -[redis-benchmark]: https://redis.io/topics/benchmarks -[vfs]: https://en.wikipedia.org/wiki/Virtual_file_system |