From 22f1890a9beab11d8cfdceba3a4d66f8bbbb468c Mon Sep 17 00:00:00 2001 From: Ian Lewis Date: Fri, 29 Mar 2019 22:40:11 -0400 Subject: Initial commit --- content/docs/LICENSE | 395 ++++++++++++ content/docs/_index.md | 29 + content/docs/architecture_guide/Layers.png | Bin 0 -> 11044 bytes content/docs/architecture_guide/Layers.svg | 1 + .../architecture_guide/Machine-Virtualization.png | Bin 0 -> 13205 bytes .../architecture_guide/Machine-Virtualization.svg | 1 + .../architecture_guide/Rule-Based-Execution.png | Bin 0 -> 6780 bytes .../architecture_guide/Rule-Based-Execution.svg | 1 + content/docs/architecture_guide/Sentry-Gofer.png | Bin 0 -> 9064 bytes content/docs/architecture_guide/Sentry-Gofer.svg | 1 + content/docs/architecture_guide/_index.md | 80 +++ content/docs/architecture_guide/overview.md | 88 +++ content/docs/architecture_guide/performance.md | 39 ++ content/docs/architecture_guide/security.md | 221 +++++++ content/docs/community/_index.md | 25 + content/docs/includes/index.md | 3 + content/docs/includes/install_gvisor.md | 30 + content/docs/user_guide/FAQ.md | 36 ++ content/docs/user_guide/_index.md | 8 + content/docs/user_guide/checkpoint_restore.md | 107 ++++ content/docs/user_guide/compatibility/_index.md | 43 ++ content/docs/user_guide/compatibility/amd64.md | 710 +++++++++++++++++++++ content/docs/user_guide/debugging.md | 33 + content/docs/user_guide/docker.md | 81 +++ content/docs/user_guide/filesystem.md | 1 + content/docs/user_guide/kubernetes.md | 16 + content/docs/user_guide/networking.md | 36 ++ content/docs/user_guide/oci.md | 53 ++ content/docs/user_guide/platforms.md | 108 ++++ 29 files changed, 2146 insertions(+) create mode 100644 content/docs/LICENSE create mode 100644 content/docs/_index.md create mode 100644 content/docs/architecture_guide/Layers.png create mode 100644 content/docs/architecture_guide/Layers.svg create mode 100644 content/docs/architecture_guide/Machine-Virtualization.png create mode 100644 content/docs/architecture_guide/Machine-Virtualization.svg create mode 100644 content/docs/architecture_guide/Rule-Based-Execution.png create mode 100644 content/docs/architecture_guide/Rule-Based-Execution.svg create mode 100644 content/docs/architecture_guide/Sentry-Gofer.png create mode 100644 content/docs/architecture_guide/Sentry-Gofer.svg create mode 100644 content/docs/architecture_guide/_index.md create mode 100644 content/docs/architecture_guide/overview.md create mode 100644 content/docs/architecture_guide/performance.md create mode 100644 content/docs/architecture_guide/security.md create mode 100644 content/docs/community/_index.md create mode 100644 content/docs/includes/index.md create mode 100644 content/docs/includes/install_gvisor.md create mode 100644 content/docs/user_guide/FAQ.md create mode 100644 content/docs/user_guide/_index.md create mode 100644 content/docs/user_guide/checkpoint_restore.md create mode 100644 content/docs/user_guide/compatibility/_index.md create mode 100644 content/docs/user_guide/compatibility/amd64.md create mode 100644 content/docs/user_guide/debugging.md create mode 100644 content/docs/user_guide/docker.md create mode 100644 content/docs/user_guide/filesystem.md create mode 100644 content/docs/user_guide/kubernetes.md create mode 100644 content/docs/user_guide/networking.md create mode 100644 content/docs/user_guide/oci.md create mode 100644 content/docs/user_guide/platforms.md (limited to 'content/docs') diff --git a/content/docs/LICENSE b/content/docs/LICENSE new file mode 100644 index 000000000..b6988e7ed --- /dev/null +++ b/content/docs/LICENSE @@ -0,0 +1,395 @@ +Attribution 4.0 International + +======================================================================= + +Creative Commons Corporation ("Creative Commons") is not a law firm and +does not provide legal services or legal advice. Distribution of +Creative Commons public licenses does not create a lawyer-client or +other relationship. Creative Commons makes its licenses and related +information available on an "as-is" basis. Creative Commons gives no +warranties regarding its licenses, any material licensed under their +terms and conditions, or any related information. Creative Commons +disclaims all liability for damages resulting from their use to the +fullest extent possible. + +Using Creative Commons Public Licenses + +Creative Commons public licenses provide a standard set of terms and +conditions that creators and other rights holders may use to share +original works of authorship and other material subject to copyright +and certain other rights specified in the public license below. The +following considerations are for informational purposes only, are not +exhaustive, and do not form part of our licenses. + + Considerations for licensors: Our public licenses are + intended for use by those authorized to give the public + permission to use material in ways otherwise restricted by + copyright and certain other rights. Our licenses are + irrevocable. Licensors should read and understand the terms + and conditions of the license they choose before applying it. + Licensors should also secure all rights necessary before + applying our licenses so that the public can reuse the + material as expected. Licensors should clearly mark any + material not subject to the license. This includes other CC- + licensed material, or material used under an exception or + limitation to copyright. More considerations for licensors: + wiki.creativecommons.org/Considerations_for_licensors + + Considerations for the public: By using one of our public + licenses, a licensor grants the public permission to use the + licensed material under specified terms and conditions. If + the licensor's permission is not necessary for any reason--for + example, because of any applicable exception or limitation to + copyright--then that use is not regulated by the license. Our + licenses grant only permissions under copyright and certain + other rights that a licensor has authority to grant. Use of + the licensed material may still be restricted for other + reasons, including because others have copyright or other + rights in the material. A licensor may make special requests, + such as asking that all changes be marked or described. + Although not required by our licenses, you are encouraged to + respect those requests where reasonable. More_considerations + for the public: + wiki.creativecommons.org/Considerations_for_licensees + +======================================================================= + +Creative Commons Attribution 4.0 International Public License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution 4.0 International Public License ("Public License"). To the +extent this Public License may be interpreted as a contract, You are +granted the Licensed Rights in consideration of Your acceptance of +these terms and conditions, and the Licensor grants You such rights in +consideration of benefits the Licensor receives from making the +Licensed Material available under these terms and conditions. + + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + d. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + e. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + f. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + g. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + h. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + i. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + j. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + k. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part; and + + b. produce, reproduce, and Share Adapted Material. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties. + + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + 4. If You Share Adapted Material You produce, the Adapter's + License You apply must not prevent recipients of the Adapted + Material from complying with this Public License. + + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material; and + + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the "Licensor." The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. diff --git a/content/docs/_index.md b/content/docs/_index.md new file mode 100644 index 000000000..238ece1e1 --- /dev/null +++ b/content/docs/_index.md @@ -0,0 +1,29 @@ ++++ +title = "gVisor Documentation" ++++ +gVisor is a user-space kernel, written in Go, that implements a substantial +portion of the [Linux system call interface][linux]. It provides an additional +layer of isolation between running applications and the host operating system. + +gVisor includes an [Open Container Initiative (OCI)][oci] runtime called `runsc` +that makes it easy to work with existing container tooling. The `runsc` runtime +integrates with Docker and Kubernetes, making it simple to run sandboxed +containers. + +gVisor takes a distinct approach to container sandboxing and makes a different +set of technical trade-offs compared to existing sandbox technologies, thus +providing new tools and ideas for the container security landscape. + +Check out the [gVisor Quick Start](./user_guide/docker/) to get started +using gVisor. + +## How this documentation is organized + +- The [User Guide](./user_guide/) contains info on how to use gVisor + and integrate it into your application or platform. +- The [Architecture Guide](./architecture_guide/) explains about + gVisor's architecture & design philosophy. Start here if you would like to + know more about how gVisor works and why it was created. + +[linux]: https://en.wikipedia.org/wiki/Linux_kernel_interfaces +[oci]: https://www.opencontainers.org diff --git a/content/docs/architecture_guide/Layers.png b/content/docs/architecture_guide/Layers.png new file mode 100644 index 000000000..308c6c451 Binary files /dev/null and b/content/docs/architecture_guide/Layers.png differ diff --git a/content/docs/architecture_guide/Layers.svg b/content/docs/architecture_guide/Layers.svg new file mode 100644 index 000000000..0a366f841 --- /dev/null +++ b/content/docs/architecture_guide/Layers.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/content/docs/architecture_guide/Machine-Virtualization.png b/content/docs/architecture_guide/Machine-Virtualization.png new file mode 100644 index 000000000..1ba2ed6b2 Binary files /dev/null and b/content/docs/architecture_guide/Machine-Virtualization.png differ diff --git a/content/docs/architecture_guide/Machine-Virtualization.svg b/content/docs/architecture_guide/Machine-Virtualization.svg new file mode 100644 index 000000000..5352da07b --- /dev/null +++ b/content/docs/architecture_guide/Machine-Virtualization.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/content/docs/architecture_guide/Rule-Based-Execution.png b/content/docs/architecture_guide/Rule-Based-Execution.png new file mode 100644 index 000000000..b42654a90 Binary files /dev/null and b/content/docs/architecture_guide/Rule-Based-Execution.png differ diff --git a/content/docs/architecture_guide/Rule-Based-Execution.svg b/content/docs/architecture_guide/Rule-Based-Execution.svg new file mode 100644 index 000000000..bd6717043 --- /dev/null +++ b/content/docs/architecture_guide/Rule-Based-Execution.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/content/docs/architecture_guide/Sentry-Gofer.png b/content/docs/architecture_guide/Sentry-Gofer.png new file mode 100644 index 000000000..ca2c27ef7 Binary files /dev/null and b/content/docs/architecture_guide/Sentry-Gofer.png differ diff --git a/content/docs/architecture_guide/Sentry-Gofer.svg b/content/docs/architecture_guide/Sentry-Gofer.svg new file mode 100644 index 000000000..5c10750d2 --- /dev/null +++ b/content/docs/architecture_guide/Sentry-Gofer.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/content/docs/architecture_guide/_index.md b/content/docs/architecture_guide/_index.md new file mode 100644 index 000000000..c1c16a79c --- /dev/null +++ b/content/docs/architecture_guide/_index.md @@ -0,0 +1,80 @@ ++++ +title = "Architecture Guide" +weight = 20 ++++ +gVisor provides a fully-virtualized environment in order to sandbox untrusted +containers. The system interfaces normally implemented by the host kernel are +moved into a distinct, per-sandbox user space kernel in order to minimize the +risk of an exploit. gVisor does not introduce large fixed overheads however, +and still retains a process-like model with respect to resource utilization. + +## How is this different? + +Two other approaches are commonly taken to provide stronger isolation than +native containers. + +**Machine-level virtualization**, such as [KVM][kvm] and [Xen][xen], exposes +virtualized hardware to a guest kernel via a Virtual Machine Monitor (VMM). This +virtualized hardware is generally enlightened (paravirtualized) and additional +mechanisms can be used to improve the visibility between the guest and host +(e.g. balloon drivers, paravirtualized spinlocks). Running containers in +distinct virtual machines can provide great isolation, compatibility and +performance (though nested virtualization may bring challenges in this area), +but for containers it often requires additional proxies and agents, and may +require a larger resource footprint and slower start-up times. + +![Machine-level virtualization](Machine-Virtualization.png "Machine-level virtualization") + +**Rule-based execution**, such as [seccomp][seccomp], [SELinux][selinux] and +[AppArmor][apparmor], allows the specification of a fine-grained security policy +for an application or container. These schemes typically rely on hooks +implemented inside the host kernel to enforce the rules. If the surface can be +made small enough (i.e. a sufficiently complete policy defined), then this is an +excellent way to sandbox applications and maintain native performance. However, +in practice it can be extremely difficult (if not impossible) to reliably define +a policy for arbitrary, previously unknown applications, making this approach +challenging to apply universally. + +![Rule-based execution](Rule-Based-Execution.png "Rule-based execution") + +Rule-based execution is often combined with additional layers for +defense-in-depth. + +**gVisor** provides a third isolation mechanism, distinct from those above. + +gVisor intercepts application system calls and acts as the guest kernel, without +the need for translation through virtualized hardware. gVisor may be thought of +as either a merged guest kernel and VMM, or as seccomp on steroids. This +architecture allows it to provide a flexible resource footprint (i.e. one based +on threads and memory mappings, not fixed guest physical resources) while also +lowering the fixed costs of virtualization. However, this comes at the price of +reduced application compatibility and higher per-system call overhead. + +![gVisor](Layers.png "gVisor") + +On top of this, gVisor employs rule-based execution to provide defense-in-depth +(details below). + +gVisor's approach is similar to [User Mode Linux (UML)][uml], although UML +virtualizes hardware internally and thus provides a fixed resource footprint. + +Each of the above approaches may excel in distinct scenarios. For example, +machine-level virtualization will face challenges achieving high density, while +gVisor may provide poor performance for system call heavy workloads. + +### Why Go? + +gVisor is written in [Go][golang] in order to avoid security pitfalls that can +plague kernels. With Go, there are strong types, built-in bounds checks, no +uninitialized variables, no use-after-free, no stack overflow, and a built-in +race detector. (The use of Go has its challenges too, and isn't free.) + +[apparmor]: https://wiki.ubuntu.com/AppArmor +[golang]: https://golang.org +[kvm]: https://www.linux-kvm.org +[oci]: https://www.opencontainers.org +[sandbox]: https://en.wikipedia.org/wiki/Sandbox_(computer_security) +[seccomp]: https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt +[selinux]: https://selinuxproject.org +[uml]: http://user-mode-linux.sourceforge.net/ +[xen]: https://www.xenproject.org diff --git a/content/docs/architecture_guide/overview.md b/content/docs/architecture_guide/overview.md new file mode 100644 index 000000000..dc963dc70 --- /dev/null +++ b/content/docs/architecture_guide/overview.md @@ -0,0 +1,88 @@ ++++ +title = "Overview & Platforms" +weight = 10 ++++ +gVisor sandbox consists of multiple processes when running. These sandboxes +collectively comprise a shared environment in which one or more containers can +be run. + +Each sandbox has its own isolated instance of: + +* The **Sentry**, A user-space kernel that runs the container and intercepts + and responds to system calls made by the application. + +Each container running in the sandbox has its own isolated instance of: + +* A **Gofer** which provides file system access to the container. + +![gVisor architecture diagram](../Sentry-Gofer.png "gVisor architecture diagram") + +## runsc + +The entrypoint to running a sandboxed container is the `runsc` executable. +`runsc` implements the [Open Container Initiative (OCI)][oci] runtime +specification. This means that OCI compatible _filesystem bundles_ can be run by +`runsc`. Filesystem bundles are comprised of a `config.json` file containing +container configuration, and a root filesystem for the container. Please see +the [OCI runtime spec][runtime-spec] for more information on filesystem bundles. +`runsc` implements multiple commands that perform various functions such as +starting, stopping, listing, and querying the status of containers. + +## The Sentry + +The Sentry is the largest component of gVisor. It can be thought of as a +userspace OS kernel. The Sentry implements all the kernel functionality needed +by the untrusted application. It implements all of the supported system calls, +signal delivery, memory management and page faulting logic, the threading +model, and more. + +When the untrusted application makes a system call, the currently used platform +redirects to the Sentry, which will do the necessary work to service the system +call. It is important to note that the Sentry will not simply pass through +system calls to the host kernel. As a userspace application, the Sentry will +make some host system calls to support its operation, but it will not allow the +application to directly control the system calls it makes. + +The Sentry aims to present an equivalent environment to (upstream) Linux v4.4. + +I/O operations that extend beyond the sandbox (not internal /proc files, pipes, +etc) are sent to the Gofer, described below. + +## Platforms + +gVisor requires a platform to implement interruption of syscalls, basic context +switching, and memory mapping functionality. + +### ptrace + +The ptrace platform uses `PTRACE_SYSEMU` to execute user code without executing +host system calls. This platform can run anywhere that ptrace works (even VMs +without nested virtualization). + +### KVM (experimental) + +The KVM platform allows the Sentry to act as both guest OS and VMM, switching +back and forth between the two worlds seamlessly. The KVM platform can run on +bare-metal or on a VM with nested virtualization enabled. While there is no +virtualized hardware layer -- the sandbox retains a process model -- gVisor +leverages virtualization extensions available on modern processors in order to +improve isolation and performance of address space switches. + +## Gofer + +The Gofer is a normal host Linux process. The Gofer is started with each sandbox +and connected to the Sentry. The Sentry process is started in a restricted +seccomp container without access to file system resources. The Gofer provides +access to file system resources to the Sentry via the 9P protocol and provides +an additional level of isolation. + +## Application + +The application (aka, the untrusted application) is a normal Linux binary +provided to gVisor in an OCI runtime bundle. gVisor aims to provide an +environment equivalent to Linux v4.4, so applications should be able to run +unmodified. However, gVisor does not presently implement every system call, +/proc file, or /sys file so some incompatibilities may occur. + +[oci]: https://www.opencontainers.org +[runtime-spec]: https://github.com/opencontainers/runtime-spec diff --git a/content/docs/architecture_guide/performance.md b/content/docs/architecture_guide/performance.md new file mode 100644 index 000000000..f2e928b84 --- /dev/null +++ b/content/docs/architecture_guide/performance.md @@ -0,0 +1,39 @@ ++++ +title = "Performance" +weight = 30 ++++ +gVisor is designed to provide a secure, virtualized environment while preserving +key benefits of containerization such as small fixed overheads and a dynamic +resource footprint. For containerized infrastructure, this can provide an “easy +button” for sandboxing untrusted workloads: there are no changes to the +fundamental resource model. + +However, there are clear trade-offs in this approach. gVisor does not fully +implement the system call surface provided by an upstream Linux kernel. We are +always working to improve this support, and current limitations are described +[Compatibility](../../user_guide/compatibility). + +gVisor also imposes runtime costs over native containers. These costs come in +two forms: additional cycles and memory usage, and they come from two different +sources. First, the existence of the Sentry itself means that additional memory +will be required, and application system calls generally traverse additional +layers. We place an emphasis on [Security](../security/) and therefore chose to +use a language for the Sentry that provides lots of benefits in this domain, but +may not offer the raw performance of other choices. Costs imposed by this design +are structural costs. + +Second, as gVisor is a fresh implementation of the system call surface, many of +the subsystems or specific calls are not as optimized as more mature +implementations. A good example here is the network stack, which is continuing +to evolve but does not support all the advanced recovery mechanisms offered by +other stacks and is less CPU efficient. This an implementation cost and should +not be confused with structural costs. Improvements here are ongoing and largely +driven by the workloads that matter to gVisor contributors and users. + +## Structural Costs + +The structural costs of gVisor are heavily influenced by the platform choice, +which implements system call interception. Today, gVisor supports a variety of +platforms. These platforms present distinct performance, compatibility and +security trade-offs. For example, the KVM platform low overhead system call +interception but runs poorly with nested virtualization. diff --git a/content/docs/architecture_guide/security.md b/content/docs/architecture_guide/security.md new file mode 100644 index 000000000..935301fc7 --- /dev/null +++ b/content/docs/architecture_guide/security.md @@ -0,0 +1,221 @@ ++++ +title = "Security Model" +weight = 20 ++++ +gVisor was created in order to provide additional defense against the +exploitation of kernel bugs when running untrusted code. In order to understand +how gVisor achieves this goal, it is first necessary to understand the basic +threat model. + +## Threats: the Anatomy of an Exploit + +An exploit takes advantage of a software or hardware bug in order to escalate +privileges, gain access to privileged data, or disrupt services. All of the +possible interactions that a malicious application can have with the rest of the +system (attack vectors) define the attack surface. We categorize these attack +vectors into several common classes. + +### System API + +An operating system or hypervisor exposes an abstract System API in the form of +system calls and traps. This API may be documented and stable, as with Linux, or +it may be hidden behind a library, as with Windows (i.e. win32.dll or +ntdll.dll). The System API includes all standard interfaces that application +code uses to interact with the system. This includes high-level abstractions +that are derived from low-level system calls, such as system files, sockets and +namespaces. + +Although the System API is exposed to applications by design, bugs and race +conditions within the kernel or hypervisor may occasionally be exploitable via +the API. A typical exploit might perform some combination of the following: + + 1. Opening or creating some combination of files, sockets or other descriptors. + 1. Passing crafted, malicious arguments, structures or packets. + 1. Racing with multiple threads in order to hit specific code paths. + +For example, for the “Dirty Cow” privilege escalation bug (CVE-2016-5195), an +application would open a specific file in proc or use a specific ptrace system +call, and use multiple threads in order to trigger a race condition when +touching a fresh page of memory. The attacker then gains control over a page of +memory belonging to the system. With additional privileges or access to +privileged data in the kernel, an attacker will often be able to employ +additional techniques to gain full access to the rest of the system. + +While bugs in the implementation of the System API are readily fixed, they are +also the most common form of exploit. The exposure created by this class of +exploit is what gVisor aims to minimize and control, described in detail below. + +### System ABI + +Hardware and software exploits occasionally exist in execution paths that are +not part of an intended System API. In this case, exploits may be found as part +of implicit actions the hardware or privileged system code takes in response to +certain events, such as traps or interrupts. For example, the recent “POPSS” +flaw (CVE-2018-8897) required only native code execution (no specific system +call or file access). In that case, the Xen hypervisor was similarly vulnerable, +highlighting that hypervisors are not immune to this vector. + +### Side Channels + +Hardware side channels may be exploitable by any code running on a system: +native, sandboxed, or virtualized. However, many host-level mitigations against +hardware side channels are still effective with a sandbox. For example, kernels +built with retpoline protect against some speculative execution attacks +(Spectre) and frame poisoning may protect against L1 terminal fault (L1TF) +attacks. Hypervisors may introduce additional complications in this regard, as +there is no mitigation against an application in a normally functioning Virtual +Machine (VM) exploiting the L1TF vulnerability for another VM on the sibling +hyperthread. + +### What’s missing? + +These categories in no way represent an exhaustive list of exploits, as we focus +only on running untrusted code from within the operating system or hypervisor. +We do not consider the many other ways that a more generic adversary may +interact with a system, such as inserting a portable storage device with a +malicious filesystem image, using a combination of crafted keyboard or touch +inputs, or saturating a network device with ill-formed ICMP packets. + +Furthermore, high-level systems may contain exploitable components. An attacker +need not escalate privileges within a container if there’s an exploitable +network-accessible service on the host or some other API path. A sandbox is not +a substitute for a secure architecture. + +## Goals: Limiting Exposure + +gVisor’s primary design goal is to minimize the System API attack vector while +still providing a process model. There are two primary security principles that +inform this design. First, the application’s direct interactions with the host +System API are intercepted by the Sentry, which implements the System API +instead. Second, the System API accessible to the Sentry itself is minimized to +a safer, restricted set. The first principle minimizes the possibility of direct +exploitation of the host System API by applications, and the second principle +minimizes indirect exploitability, which is the exploitation by an exploited or +buggy Sentry (e.g. chaining an exploit). + +The first principle is similar to the security basis for a Virtual Machine (VM). +With a VM, an application’s interactions with the host are replaced by +interactions with a guest operating system and a set of virtualized hardware +devices. These hardware devices are then implemented via the host System API by +a Virtual Machine Monitor (VMM). For both the Sentry and a VMM, it’s worth +noting that while direct interactions are minimized, indirect interactions are +still possible. For example, a read on a host-backed file in the Sentry with +ultimately result in a host read system call (made by the Sentry, not by passing +through arguments from the application), similarly to how a read on a block +device in a VMM will often result in a host read system call from the backing +file. The same applies for a write on a socket, on a write on a tap device. + +The key difference here is that the Sentry implements a second System API +directly instead of relying on virtualized hardware and a guest operating +system. This selects a distinct set of trade-offs, largely in the performance +and compatibility domains. Since sandbox transitions of the nature described +above are generally expensive, a guest operating system will typically take +ownership of resources. For example, in the above case, the guest operating +system may read the block device data in a local page cache, to avoid subsequent +reads. This may lead to better performance but lower efficiency, since memory +may be wasted or duplicated. The Sentry opts instead to defer to the host for +many operations during runtime, for improved efficiency but lower performance in +some use cases. + +gVisor relies on the host operating system and the platform for defense against +hardware-based attacks. Given the nature of these vulnerabilities, there is +little defense that gVisor can provide (there’s no guarantee that additional +hardware measures, such as virtualization, memory encryption, etc. would +actually decrease the attack surface). Note that this is true even when using +hardware virtualization for acceleration, as the host kernel or hypervisor is +ultimately responsible for defending against attacks from within malicious +guests. + +### What can a sandbox do? + +We allow a sandbox to do the following. + + 1. Communicate with a Gofer process via a connected socket. The sandbox may + receive new file descriptors from the Gofer process, corresponding to opened + files. + 1. Make a minimal set of host system calls. The calls do not include the + creation of new sockets (unless host networking mode is enabled) or opening + files. The calls include duplication and closing of file descriptors, + synchronization, timers and signal management. + 1. Read and write packets to a virtual ethernet device. This is not required if + not host networking is enabled. + +## Principles: Defense-in-Depth + +For gVisor development, there are several engineering principles that are +employed in order to ensure that the system meets its design goals. + + 1. No system call is passed through directly to the host. Every supported call + has a distinct implementation in the Sentry, that is unlikely to suffer from + identical vulnerabilities that may appear in the host. This has the + consequence that all kernel features used by applications require an + implementation within the Sentry. + 1. Only common, universal functionality is implemented. Some filesystems, + network devices or modules may expose specialized functionality to user + space applications via mechanisms such as extended attributes, raw sockets + or ioctls. Since the Sentry is responsible for implementing the full system + call surface, we do not implement or pass through these specialized APIs. + 1. The host surface exposed to the Sentry is minimized. While the system call + surface is not trivial, it is explicitly enumerated and controlled. The + Sentry is not permitted to open new files, create new sockets or do many + other interesting things on the host. + +Additionally, we have practical restrictions that are imposed on the project to +minimize the risk of Sentry exploitability. For example: + + 1. Unsafe code is carefully controlled. All unsafe code is isolated in files + that end with “_unsafe.go”, in order to facilitate validation and auditing. + No file without the unsafe suffix may import the unsafe package. + 1. No CGo is allowed. The Sentry must be a pure Go binary. + 1. External imports are not generally allowed within the core packages. Only + limited external imports are used within the setup code. The code available + inside the Sentry is carefully controlled, to ensure that the above rules + are effective. + +Finally, we recognize that security is a process, and that vigilance is +critical. Beyond our security disclosure process, the Sentry is fuzzed +continuously to identify potential bugs and races proactively, and production +crashes are recorded and triaged to similarly identify material issues. + +## FAQ + +### Is this more or less secure than a Virtual Machine? + +The security of a VM depends to a large extent on what is exposed from the host +kernel and user space support code. For example, device emulation code in the +host kernel (e.g. APIC) or optimizations (e.g. vhost) can be more complex than a +simple system call, and exploits carry the same risks. Similarly, the user space +support code is frequently unsandboxed and exploits, while rare, may allowed +unfettered access to the system. + +Some platforms leverage the same virtualization hardware as VMs in order to +provide better system call interception performance. However, gVisor does not +implement any device emulation, and instead opts to use a sandboxed host System +API directly. Both approaches significantly reduce the original attack surface. +Ultimately, since gVisor uses the same hardware mechanism, one should not assume +that the mere use of virtualization hardware makes a system more or less secure, +just as it would be a mistake to make the claim that the use of an engine makes +a car safe. + +### Does this stop hardware side channels? + +In general, gVisor does not provide protection against hardware side channels, +although it may make exploits that rely on direct access to the host System API +more difficult to use. To minimize exposure, you should follow relevant guidance +from vendors and keep your host kernel and firmware up-to-date. + +### Is this just a ptrace sandbox? + +No: the term “ptrace sandbox” generally refers to software that uses ptrace in +order to inspect and authorize system calls made by applications, enforcing a +specific policy. These commonly suffer from two issues. First, vulnerable system +calls may be authorized by the sandbox, as the application still has direct +access to some System API. Second, it’s impossible to avoid time-of-check, +time-of-use race conditions without disabling multi-threading. + +In gVisor, the platforms that use ptrace operate differently. The stubs that are +traced are never allowed to continue execution into the host kernel and complete +a call directly. Instead, all system calls are interpreted and handled by the +Sentry itself, who reflects resulting register state back into the tracee before +continuing execution in user space. This is very similar to the mechanism used +by User-Mode Linux (UML). diff --git a/content/docs/community/_index.md b/content/docs/community/_index.md new file mode 100644 index 000000000..44b2ac0dc --- /dev/null +++ b/content/docs/community/_index.md @@ -0,0 +1,25 @@ ++++ +title = "Community & Contributing" ++++ +The authoritative document for community resources and organization is the +[community repository][community], which contains the project's [governance +model][governance]. and [code of conduct][codeofconduct]. Individual +repositories have their own guidelines and processes for contributing. See the +[canonical list of repositories][repositories] for more information. + +The project maintains two mailing lists: + + * [gvisor-users][gvisor-users] for accouncements and general discussion. + * [gvisor-dev][gvisor-dev] for development and contribution. + +The community calendar shows upcoming public meetings and opportunities to +collaborate. + + + +[community]: https://gvisor.googlesource.com/community +[goverance]: https://gvisor.googlesource.com/community/+/refs/heads/master/README.md +[gvisor-dev]: https://groups.google.com/forum/#!forum/gvisor-dev +[gvisor-users]: https://groups.google.com/forum/#!forum/gvisor-users +[codeofconduct]: https://gvisor.googlesource.com/community/+/refs/heads/master/CODE_OF_CONDUCT.md +[repositories]: https://gvisor.googlesource.com/ diff --git a/content/docs/includes/index.md b/content/docs/includes/index.md new file mode 100644 index 000000000..cbb7365a6 --- /dev/null +++ b/content/docs/includes/index.md @@ -0,0 +1,3 @@ ++++ +headless = true ++++ diff --git a/content/docs/includes/install_gvisor.md b/content/docs/includes/install_gvisor.md new file mode 100644 index 000000000..2db11f179 --- /dev/null +++ b/content/docs/includes/install_gvisor.md @@ -0,0 +1,30 @@ +> Note: gVisor requires Linux x86\_64 Linux 3.17+. + +The easiest way to get `runsc` is from the [latest nightly +build][latest-nightly]. After you download the binary, check it against the +SHA512 [checksum file][latest-hash]. + +Older builds can also be found here: +`https://storage.googleapis.com/gvisor/releases/nightly/${yyyy-mm-dd}/runsc` + +With corresponding SHA512 checksums here: +`https://storage.googleapis.com/gvisor/releases/nightly/${yyyy-mm-dd}/runsc.sha512` + +**It is important to copy this binary to some place that is accessible to all +users, and make is executable to all users**, since `runsc` executes itself as +user `nobody` to avoid unnecessary privileges. The `/usr/local/bin` directory is +a good place to put the `runsc` binary. + +```bash +{ + wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc + wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc.sha512 + sha512sum -c runsc.sha512 + chmod a+x runsc + sudo mv runsc /usr/local/bin +} +``` + +[latest-nightly]: https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc +[latest-hash]: https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc.sha512 +[oci]: https://www.opencontainers.org diff --git a/content/docs/user_guide/FAQ.md b/content/docs/user_guide/FAQ.md new file mode 100644 index 000000000..b94b93b1f --- /dev/null +++ b/content/docs/user_guide/FAQ.md @@ -0,0 +1,36 @@ ++++ +title = "FAQ" +weight = 1000 ++++ +## My container runs fine with `runc` but fails with `runsc` + +If you’re having problems running a container with `runsc` it’s most likely due +to a compatibility issue or a missing feature in gVisor. See +[Debugging](../debugging/). + +## When I run my container, docker fails with: `flag provided but not defined: -console` + +You're using an old version of Docker. See [Docker Quick Start](../docker/). + +## I can’t see a file copied with: `docker cp` + +For performance reasons, gVisor caches directory contents, and therefore it may +not realize a new file was copied to a given directory. To invalidate the cache +and force a refresh, create a file under the directory in question and list the +contents again. + +This bug is tracked in [bug #4](https://github.com/google/gvisor/issues/4). + +Note that `kubectl cp` works because it does the copy by exec'ing inside the +sandbox, and thus gVisor cache is aware of the new files and dirs. + +There are also different filesystem modes that can be used to avoid this issue. +See [Filesystem](../filesystem/). + +## What's the security model? + +See the [Security Model](../../architecture_guide/security/). + +## What's the expected performance? + +See the [Performance Guide](../../architecture_guide/performance/). diff --git a/content/docs/user_guide/_index.md b/content/docs/user_guide/_index.md new file mode 100644 index 000000000..82b4cf845 --- /dev/null +++ b/content/docs/user_guide/_index.md @@ -0,0 +1,8 @@ ++++ +title = "User Guide" +weight = 10 ++++ + +Using gVisor for the first time? To get started, use either the [Docker Quick +Start](./docker/), the [OCI Quick Start](./oci/) or select a specific topic via +the menu. diff --git a/content/docs/user_guide/checkpoint_restore.md b/content/docs/user_guide/checkpoint_restore.md new file mode 100644 index 000000000..fdecb2d08 --- /dev/null +++ b/content/docs/user_guide/checkpoint_restore.md @@ -0,0 +1,107 @@ ++++ +title = "Checkpoint/Restore" +weight = 90 ++++ +gVisor has the ability to checkpoint a process, save its current state in a +state file, and restore into a new container using the state file. + +## How to use checkpoint/restore + +Checkpoint/restore functionality is currently available via raw `runsc` +commands. To use the checkpoint command, first run a container. + +```bash +runsc run +``` + +To checkpoint the container, the `--image-path` flag must be provided. This is +the directory path within which the checkpoint state-file will be created. The +file will be called `checkpoint.img` and necessary directories will be created +if they do not yet exist. + +> Note: Two checkpoints cannot be saved to the save directory; every image-path +provided must be unique. + +```bash +runsc checkpoint --image-path= +``` + +There is also an optional `--leave-running` flag that allows the container to +continue to run after the checkpoint has been made. (By default, containers stop +their processes after committing a checkpoint.) + +> Note: All top-level runsc flags needed when calling run must be provided to +checkpoint if --leave-running is used. + +> Note: --leave-running functions by causing an immediate restore so the +container, although will maintain its given container id, may have a different +process id. + +```bash +runsc checkpoint --image-path= --leave-running +``` + +To restore, provide the image path to the `checkpoint.img` file created during +the checkpoint. Because containers stop by default after checkpointing, restore +needs to happen in a new container (restore is a command which parallels start). + +```bash +runsc create + +runsc restore --image-path= +``` + +## How to use checkpoint/restore in Docker: + +Currently checkpoint/restore through `runsc` is not entirely compatible with +Docker, although there has been progress made from both gVisor and Docker to +enable compatibility. Here, we document the ideal workflow. + +Run a container: + +```bash +docker run [options] --runtime=runsc ` +``` + +Checkpoint a container: + +```bash +docker checkpoint create ` +``` + +Create a new container into which to restore: + +```bash +docker create [options] --runtime=runsc +``` + +Restore a container: + +```bash +docker start --checkpoint --checkpoint-dir= +``` + +### Issues Preventing Compatibility with Docker + +#### [Moby #37360][leave-running] + +Docker version 18.03.0-ce and earlier hangs when checkpointing and does not +create the checkpoint. To successfully use this feature, install a custom +version of docker-ce from the moby repository. This issue is caused by an +improper implementation of the `--leave-running` flag. This issue is fixed in +newer releases. + +#### Docker does not support restoration into new containers. + +Docker currently expects the container which created the checkpoint to be the +same container used to restore which is not possible in runsc. When Docker +supports container migration and therefore restoration into new containers, this +will be the flow. + +#### [Moby #37344][checkpoint-dir] + +Docker does not currently support the `--checkpoint-dir` flag but this will be +required when restoring from a checkpoint made in another container. + +[leave-running]: https://github.com/moby/moby/pull/37360 +[checkpoint-dir]: https://github.com/moby/moby/issues/37344 diff --git a/content/docs/user_guide/compatibility/_index.md b/content/docs/user_guide/compatibility/_index.md new file mode 100644 index 000000000..cbde08fdb --- /dev/null +++ b/content/docs/user_guide/compatibility/_index.md @@ -0,0 +1,43 @@ ++++ +title = "Compatibility" +weight = 100 ++++ +gVisor implements a large portion of the Linux surface and while we strive to +make it broadly compatible, there are (and always will be) unimplemented +features and bugs. The only real way to know if it will work is to try. If you +find a container that doesn’t work and there is no known issue, please [file a +bug][bug] indicating the full command you used to run the image. + +If you're able to provide the [debug logs](../debugging/), the +problem likely to be fixed much faster. + +## What works? + +The following applications/images have been tested: + +* elasticsearch +* golang +* httpd +* java8 +* jenkins +* mariadb +* memcached +* mongo +* mysql +* nginx +* node +* php +* postgres +* prometheus +* python +* redis +* registry +* tomcat +* wordpress + +[bug]: https://github.com/google/gvisor/issues + +## Syscall Reference + +This section contains an architecture-specific syscall reference guide. These +tables are automatically generated from source-level annotations. diff --git a/content/docs/user_guide/compatibility/amd64.md b/content/docs/user_guide/compatibility/amd64.md new file mode 100644 index 000000000..8a0bf2385 --- /dev/null +++ b/content/docs/user_guide/compatibility/amd64.md @@ -0,0 +1,710 @@ ++++ +title = "AMD64" +weight = 10 ++++ +This table is a reference of Linux syscalls for AMD64 and their compatibility +status in gVisor. gVisor does not support all syscalls and some syscalls may +have a partial implementation. + +Of 329 syscalls, 47 syscalls have a full or partial implementation. There are +currently 51 unimplemented syscalls. 231 syscalls are not yet documented. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
#NameSupportGitHub IssueNotes
68msggetUnimplementedReturns ENOSYS
69msgsndUnimplementedReturns ENOSYS
70msgrcvUnimplementedReturns ENOSYS
71msgctlUnimplementedReturns ENOSYS
122setfsuidUnimplementedReturns ENOSYS
123setfsgidUnimplementedReturns ENOSYS
134uselibUnimplementedReturns ENOSYS; Obsolete
135setpersonalityPartialReturns EINVAL; Unable to change personality
136ustatUnimplementedReturns ENOSYS; Needs filesystem support
139sysfsUnimplementedReturns ENOSYS
142schedsetparamPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_nice; ENOSYS otherwise
148schedrrgetintervalPartialReturns EPERM
153vhangupPartialReturns EPERM
154modifyldtPartialReturns EPERM
155pivotrootPartialReturns EPERM
156sysctlPartialReturns EPERM
159adjtimexPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_time; ENOSYS otherwise
163acctPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_pacct; ENOSYS otherwise
164settimeofdayPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_time; ENOSYS otherwise
167swaponPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_admin; ENOSYS otherwise
168swapoffPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_admin; ENOSYS otherwise
169rebootPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_boot; ENOSYS otherwise
172ioplPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_rawio; ENOSYS otherwise
173iopermPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_rawio; ENOSYS otherwise
174createmodulePartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_module; ENOSYS otherwise
175initmodulePartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_module; ENOSYS otherwise
176deletemodulePartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_module; ENOSYS otherwise
177getkernelsymsUnimplementedReturns ENOSYS; Not supported in > 2.6
178querymoduleUnimplementedReturns ENOSYS; Not supported in > 2.6
179quotactlPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_admin; ENOSYS otherwise
180nfsservctlUnimplementedReturns ENOSYS; Does not exist > 3.1
181getpmsgUnimplementedReturns ENOSYS; Not implemented in Linux
182putpmsgUnimplementedReturns ENOSYS; Not implemented in Linux
183afssyscallUnimplementedReturns ENOSYS; Not implemented in Linux
184tuxcallUnimplementedReturns ENOSYS; Not implemented in Linux
185securityUnimplementedReturns ENOSYS; Not implemented in Linux
187readaheadUnimplementedReturns ENOSYS
188setxattrPartialReturns ENOTSUP; Requires filesystem support
189lsetxattrPartialReturns ENOTSUP; Requires filesystem support
190fsetxattrPartialReturns ENOTSUP; Requires filesystem support
191getxattrPartialReturns ENOTSUP; Requires filesystem support
192lgetxattrPartialReturns ENOTSUP; Requires filesystem support
193fgetxattrPartialReturns ENOTSUP; Requires filesystem support
194listxattrPartialReturns ENOTSUP; Requires filesystem support
195llistxattrPartialReturns ENOTSUP; Requires filesystem support
196flistxattrPartialReturns ENOTSUP; Requires filesystem support
197removexattrPartialReturns ENOTSUP; Requires filesystem support
198lremovexattrPartialReturns ENOTSUP; Requires filesystem support
199fremovexattrPartialReturns ENOTSUP; Requires filesystem support
205setthreadareaUnimplementedReturns ENOSYS; Expected to return ENOSYS on 64-bit
211getthreadareaUnimplementedReturns ENOSYS; Expected to return ENOSYS on 64-bit
212lookupdcookiePartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_admin; ENOSYS otherwise
214epollctloldUnimplementedReturns ENOSYS; Deprecated
215epollwaitoldUnimplementedReturns ENOSYS; Deprecated
216remapfilepagesUnimplementedReturns ENOSYS; Deprecated
220semtimedopUnimplementedReturns ENOSYS
236vserverUnimplementedReturns ENOSYS; Not implemented by Linux
237mbindPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_nice; ENOSYS otherwise
240mqopenUnimplementedReturns ENOSYS
241mqunlinkUnimplementedReturns ENOSYS
242mqtimedsendUnimplementedReturns ENOSYS
243mqtimedreceiveUnimplementedReturns ENOSYS
244mqnotifyUnimplementedReturns ENOSYS
245mqgetsetattrUnimplementedReturns ENOSYS
248addkeyPartialReturns EACCES; Not available to user
249requestkeyPartialReturns EACCES; Not available to user
250keyctlPartialReturns EACCES; Not available to user
251iopriosetPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_admin; ENOSYS otherwise
252iopriogetPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_admin; ENOSYS otherwise
256migratepagesPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_nice; ENOSYS otherwise
273setrobustlistUnimplementedReturns ENOSYS; Obsolete
274getrobustlistUnimplementedReturns ENOSYS; Obsolete
275spliceUnimplementedReturns ENOSYS
276teeUnimplementedReturns ENOSYS
278vmspliceUnimplementedReturns ENOSYS
279movepagesPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_nice; ENOSYS otherwise
282signalfdUnimplementedReturns ENOSYS
289signalfd4UnimplementedReturns ENOSYS
298perfeventopenPartialReturns ENODEV; No support for perf counters
300fanotifyinitUnimplementedReturns ENOSYS; Needs CONFIG_FANOTIFY
301fanotifymarkUnimplementedReturns ENOSYS; Needs CONFIG_FANOTIFY
303nametohandleatPartialReturns EOPNOTSUPP; Needs filesystem support
304openbyhandleatPartialReturns EOPNOTSUPP; Needs filesystem support
305clockadjtimePartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_module; ENOSYS otherwise
308setnsUnimplementedReturns ENOSYS
310processvmreadvUnimplementedReturns ENOSYS
311processvmwritevUnimplementedReturns ENOSYS
312kcmpPartialReturns EPERM or ENOSYS; Requires cap_sys_ptrace
313finitmodulePartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_module; ENOSYS otherwise
314schedsetattrUnimplementedReturns ENOSYS
315schedgetattrUnimplementedReturns ENOSYS
316renameat2UnimplementedReturns ENOSYS
319memfdcreateUnimplementedReturns ENOSYS
321bpfPartialReturns EPERM or ENOSYS; Returns EPERM if the process does not have cap_sys_boot; ENOSYS otherwise
322execveatUnimplementedReturns ENOSYS
323userfaultfdUnimplementedReturns ENOSYS
324membarrierUnimplementedReturns ENOSYS
326copyfilerangeUnimplementedReturns ENOSYS
diff --git a/content/docs/user_guide/debugging.md b/content/docs/user_guide/debugging.md new file mode 100644 index 000000000..561aeb8d7 --- /dev/null +++ b/content/docs/user_guide/debugging.md @@ -0,0 +1,33 @@ ++++ +title = "Debugging" +weight = 120 ++++ + +To enable debug and system call logging, add the `runtimeArgs` below to your +[Docker](../docker/) configuration (`/etc/docker/daemon.json`): + +```json +{ + "runtimes": { + "runsc": { + "path": "/usr/local/bin/runsc", + "runtimeArgs": [ + "--debug-log=/tmp/runsc/", + "--debug", + "--strace" + ] + } + } +} +``` + +You may also want to pass `--log-packets` to troubleshoot network problems. Then +restart the Docker daemon: + +```bash +sudo systemctl restart docker +``` + +Run your container again, and inspect the files under `/tmp/runsc`. The log file +with name `boot` will contain the strace logs from your application, which can +be useful for identifying missing or broken system calls in gVisor. diff --git a/content/docs/user_guide/docker.md b/content/docs/user_guide/docker.md new file mode 100644 index 000000000..3123785a7 --- /dev/null +++ b/content/docs/user_guide/docker.md @@ -0,0 +1,81 @@ ++++ +title = "Docker Quick Start" +weight = 10 ++++ +This guide will help you quickly get started running Docker containers using +gVisor with the default platform. + +## Install gVisor + +{{% readfile file="docs/includes/install_gvisor.md" markdown="true" %}} + +## Configuring Docker + +> Note: This guide requires Docker. Refer to the [Docker documentation][docker] for +> how to install it. + +First you will need to configure Docker to use `runsc` by adding a runtime +entry to your Docker configuration (`/etc/docker/daemon.json`). You may have to +create this file if it does not exist. Also, some Docker versions also require +you to [specify the `storage-driver` field][storage-driver]. + +In the end, the file should look something like: + +```json +{ + "runtimes": { + "runsc": { + "path": "/usr/local/bin/runsc" + } + } +} +``` + +You must restart the Docker daemon after making changes to this file, typically +this is done via `systemd`: + +```bash +sudo systemctl restart docker +``` + +## Running a container + +Now run your container using the `runsc` runtime: + +```bash +docker run --runtime=runsc hello-world +``` + +You can also run a terminal to explore the container. + +```bash +docker run --runtime=runsc -it ubuntu /bin/bash +``` + +## Verify the runtime + +You can verify that you are running in gVisor using the `dmesg` command. + +```text +$ docker run --runtime=runsc -it ubuntu dmesg +[ 0.000000] Starting gVisor... +[ 0.354495] Daemonizing children... +[ 0.564053] Constructing home... +[ 0.976710] Preparing for the zombie uprising... +[ 1.299083] Creating process schedule... +[ 1.479987] Committing treasure map to memory... +[ 1.704109] Searching for socket adapter... +[ 1.748935] Generating random numbers by fair dice roll... +[ 2.059747] Digging up root... +[ 2.259327] Checking naughty and nice process list... +[ 2.610538] Rewriting operating system in Javascript... +[ 2.613217] Ready! +``` + +Note that this is easily replicated by an attacker so applications should never +use `dmesg` to verify the runtime in a security sensitive context. + +Next, try running gVisor using the [KVM platform](../platforms/). + +[docker]: https://docs.docker.com/install/ +[storage-driver]: https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-storage-driver diff --git a/content/docs/user_guide/filesystem.md b/content/docs/user_guide/filesystem.md new file mode 100644 index 000000000..314040804 --- /dev/null +++ b/content/docs/user_guide/filesystem.md @@ -0,0 +1 @@ +TODO: Add information about shared/exclusive modes? diff --git a/content/docs/user_guide/kubernetes.md b/content/docs/user_guide/kubernetes.md new file mode 100644 index 000000000..a1150622f --- /dev/null +++ b/content/docs/user_guide/kubernetes.md @@ -0,0 +1,16 @@ ++++ +title = "Kubernetes" +weight = 30 ++++ +gVisor can run sandboxed containers in a Kubernetes cluster with Minikube. After +the gVisor addon is enabled, pods with `io.kubernetes.cri.untrusted-workload` +set to true will execute with `runsc`. Follow [these instructions][minikube] to +enable gVisor addon. + +You can also setup Kubernetes nodes to run pods in gvisor using the `containerd` +CRI runtime and the `gvisor-containerd-shim`. Pods with the +`io.kubernetes.cri.untrusted-workload` annotation will execute with `runsc`. You +can find instructions [here][gvisor-containerd-shim]. + +[minikube]: https://github.com/kubernetes/minikube/blob/master/deploy/addons/gvisor/README.md +[gvisor-containerd-shim]: https://github.com/google/gvisor-containerd-shim diff --git a/content/docs/user_guide/networking.md b/content/docs/user_guide/networking.md new file mode 100644 index 000000000..09d4b9789 --- /dev/null +++ b/content/docs/user_guide/networking.md @@ -0,0 +1,36 @@ ++++ +title = "Networking" +weight = 50 ++++ +gVisor implements its own network stack called [netstack][netstack]. All aspects +of the network stack are handled inside the Sentry — including TCP connection +state, control messages, and packet assembly — keeping it isolated from the host +network stack. Data link layer packets are written directly to the virtual +device inside the network namespace setup by Docker or Kubernetes. + +A network passthrough mode is also supported, but comes at the cost of reduced +isolation. + +## Enabling network passthrough + +For high-performance networking applications, you may choose to disable the user +space network stack and instead use the host network stack. Note that this mode +decreases the isolation to the host. + +Add the following `runtimeArgs` to your Docker configuration +(`/etc/docker/daemon.json`) and restart the Docker daemon: + +```json +{ + "runtimes": { + "runsc": { + "path": "/usr/local/bin/runsc", + "runtimeArgs": [ + "--network=host" + ] + } + } +} +``` + +[netstack]: https://github.com/google/netstack diff --git a/content/docs/user_guide/oci.md b/content/docs/user_guide/oci.md new file mode 100644 index 000000000..49b94bdd3 --- /dev/null +++ b/content/docs/user_guide/oci.md @@ -0,0 +1,53 @@ ++++ +title = "OCI Quick Start" +weight = 20 ++++ +This guide will quickly get you started running your first gVisor sandbox +container using the runtime directly with the default platform. + +## Install gVisor + +{{% readfile file="docs/includes/install_gvisor.md" markdown="true" %}} + +## Run an OCI compatible container + +Now we will create an [OCI][oci] container bundle to run our container. First we +will create a root directory for our bundle. + +```bash +{ + mkdir bundle + cd bundle +} +``` + +Create a root file system for the container. We will use the Docker hello-world +image as the basis for our container. + +```bash +{ + mkdir rootfs + docker export $(docker create hello-world) | tar -xf - -C rootfs +} +``` + +Next, create an specification file called `config.json` that contains our +container specification. We will update the default command it runs to `/hello` +in the `hello-world` container. + +```bash +{ + runsc spec + sed -i 's;"sh";"/hello";' config.json +} +``` + +Finally run the container. + +```bash +sudo runsc run hello +``` + +Next try [running gVisor using Docker](../docker/). + +[oci]: https://opencontainers.org/ diff --git a/content/docs/user_guide/platforms.md b/content/docs/user_guide/platforms.md new file mode 100644 index 000000000..755dadf75 --- /dev/null +++ b/content/docs/user_guide/platforms.md @@ -0,0 +1,108 @@ ++++ +title = "Platforms (KVM)" +weight = 30 ++++ + +This document will help you set up your system to use a different gVisor +platform. + +## What is a Platform? + +gVisor requires a *platform* to implement basic context switching and memory +mapping functionality. These are described in more depth in the [Architecture +Guide](../../architecture_guide/). + +## Selecting a Platform + +The platform is selected by a `--platform` command line flag passed to `runsc`. +To select a different platform, modify your Docker configuration +(`/etc/docker/daemon.json`) to pass this argument: + +```json +{ + "runtimes": { + "runsc": { + "path": "/usr/local/bin/runsc", + "runtimeArgs": [ + "--platform=kvm" + ] + } + } +} +``` + +Then restart the Docker daemon. + +## Example: Using the KVM Platform + +The KVM platform is currently experimental; however, it provides several +benefits over the default ptrace platform. + +### Prerequisites + +You will also to have KVM installed on your system. If you are running a Debian +based system like Debian or Ubuntu you can usually do this by installing the +`qemu-kvm` package. + +```bash +sudo apt-get install qemu-kvm +``` + +If you are using a virtual machine you will need to make sure that nested +virtualization is configured. Here are links to documents on how to set up +nested virtualization in several popular environments. + + * Google Cloud: [Enabling Nested Virtualization for VM Instances][nested-gcp] + * Microsoft Azure: [How to enable nested virtualization in an Azure VM][nested-azure] + * VirtualBox: [Nested Virtualization][nested-virtualbox] + * KVM: [Nested Guests][nested-kvm] + +### Configuring Docker + +Per above, you will need to configure Docker to use `runsc` with the KVM +platform. You will remember from the Docker Quick Start that you configured +Docker to use `runsc` as the runtime. Docker allows you to add multiple +runtimes to the Docker configuration. + +Add a new entry for the KVM platform entry to your Docker configuration +(`/etc/docker/daemon.json`) in order to provide the `--platform=kvm` runtime +argument. + +In the end, the file should look something like: + +```json +{ + "runtimes": { + "runsc": { + "path": "/usr/local/bin/runsc" + }, + "runsc-kvm": { + "path": "/usr/local/bin/runsc", + "runtimeArgs": [ + "--platform=kvm" + ] + } + } +} +``` + +You must restart the Docker daemon after making changes to this file, typically +this is done via `systemd`: + +```bash +sudo systemctl restart docker +``` + +## Running a container + +Now run your container using the `runsc-kvm` runtime. This will run the +container using the KVM platform: + +```bash +docker run --runtime=runsc-kvm hello-world +``` + +[nested-azure]: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/nested-virtualization +[nested-gcp]: https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances +[nested-virtualbox]: https://www.virtualbox.org/manual/UserManual.html#nested-virt +[nested-kvm]: https://www.linux-kvm.org/page/Nested_Guests -- cgit v1.2.3