diff options
author | Adin Scannell <ascannell@google.com> | 2019-11-18 13:40:27 -0800 |
---|---|---|
committer | Adin Scannell <ascannell@google.com> | 2020-04-21 12:00:59 -0700 |
commit | 957e26a6f30d40e2bff042d76a327d0a2cfbabae (patch) | |
tree | 3e95d46355585ae4661de5cef30cdca72a7c94bb /website/content/_posts | |
parent | dc2f198866c5fd8162a79978eb3633975d3ba11f (diff) |
Move website to a simpler jekyll-based template
This will allow us to merge the site into the main repository.
This merge allows the documentation to be kept up-to-date and
synchronized with the main project. Builds will be triggered on any
update, removing the need for the cron-based reploy.
Diffstat (limited to 'website/content/_posts')
-rw-r--r-- | website/content/_posts/2019-11-18-security-basics-figure1.png | bin | 0 -> 19088 bytes | |||
-rw-r--r-- | website/content/_posts/2019-11-18-security-basics-figure2.png | bin | 0 -> 17642 bytes | |||
-rw-r--r-- | website/content/_posts/2019-11-18-security-basics-figure3.png | bin | 0 -> 16471 bytes | |||
-rw-r--r-- | website/content/_posts/2019-11-18-security-basics.md | 224 | ||||
-rw-r--r-- | website/content/_posts/2020-04-02-networking-security-figure1.png | bin | 0 -> 29775 bytes | |||
-rw-r--r-- | website/content/_posts/2020-04-02-networking-security.md | 63 |
6 files changed, 287 insertions, 0 deletions
diff --git a/website/content/_posts/2019-11-18-security-basics-figure1.png b/website/content/_posts/2019-11-18-security-basics-figure1.png Binary files differnew file mode 100644 index 000000000..2a8134a7a --- /dev/null +++ b/website/content/_posts/2019-11-18-security-basics-figure1.png diff --git a/website/content/_posts/2019-11-18-security-basics-figure2.png b/website/content/_posts/2019-11-18-security-basics-figure2.png Binary files differnew file mode 100644 index 000000000..f8b416e1d --- /dev/null +++ b/website/content/_posts/2019-11-18-security-basics-figure2.png diff --git a/website/content/_posts/2019-11-18-security-basics-figure3.png b/website/content/_posts/2019-11-18-security-basics-figure3.png Binary files differnew file mode 100644 index 000000000..833e3e2b5 --- /dev/null +++ b/website/content/_posts/2019-11-18-security-basics-figure3.png diff --git a/website/content/_posts/2019-11-18-security-basics.md b/website/content/_posts/2019-11-18-security-basics.md new file mode 100644 index 000000000..ef2e9a37e --- /dev/null +++ b/website/content/_posts/2019-11-18-security-basics.md @@ -0,0 +1,224 @@ +--- +title: gVisor Security Basics - Part 1 +layout: post +author: jsprad, zkoopmans +permlink: /blog/:title/ +--- + +# Part 1 - Introduction + +This blog is a space for engineers and community members to share perspectives and deep dives on technology and design within the gVisor project. + +Though our logo suggests we're in the business of space exploration (or perhaps fighting sea monsters), we're actually in the business of sandboxing Linux containers. + +When we created gVisor, we had three specific goals in mind; _container-native security_, _resource efficiency_, and _platform portability_. To put it simply, gVisor provides _efficient defense-in-depth for containers anywhere_. + +This post addresses gVisor's _container-native security_, specifically how gVisor provides strong isolation between an application and the host OS. + +Future posts will address _resource efficiency_ (how gVisor preserves container benefits like fast starts, smaller snapshots, and less memory overhead than VMs) and _platform portability_ (run gVisor wherever Linux OCI containers run). + +Delivering on each of these goals requires careful security considerations and a robust design. + + +## What does "sandbox" mean? + +gVisor allows the execution of untrusted containers, preventing them from adversely affecting the host. This means that the untrusted container is prevented from attacking or spying on either the host kernel or any other peer userspace processes on the host. + +For example, if you are a cloud container hosting service, running containers from different customers on the same virtual machine means that compromises expose customer data. Properly configured, gVisor can provide sufficient isolation to allow different customers to run containers on the same host. There are many aspects to the proper configuration, including limiting file and network access, which we will discuss in future posts. + + +## The cost of compromise + +gVisor was designed around the premise that any security boundary could potentially be compromised with enough time and resources. We tried to optimize for a solution that was as costly and time-consuming for an attacker as possible, at every layer. + +Consequently, gVisor was built through a combination of intentional design principles and specific technology choices that work together to provide the security isolation needed for running hostile containers on a host. We'll dig into it in the next section! + + +# Design Principles + +gVisor was designed with some [common secure design principles](https://www.owasp.org/index.php/Security_by_Design_Principles) in mind: Defense-in-Depth, Principle of Least-Privilege, Attack Surface Reduction and Secure-by-Default[^1]. + +In general, Design Principles outline good engineering practices, but in the case of security, they also can be thought of as a set of tactics. In a real-life castle, there is no single defensive feature. Rather, there are many in combination: redundant walls, scattered draw bridges, small bottle-neck entrances, moats, etc. + +A simplified version of the design is below ([more detailed version](https://gvisor.dev/docs/architecture_guide/))[^2]: + +---- + +![Figure 1](./2019-11-18-security-basics-figure1.png) + +Figure 1: Simplified design of gVisor. + +---- + +In order to discuss design principles, the following components are important to know: + +* runsc - binary that packages the Sentry, platform, and Gofer(s) that run containers. runsc is the drop-in binary for running gVisor in Docker and Kubernetes. +* Untrusted Application - container running in the sandbox. Untrusted application/container are used interchangeably in this article. +* Platform Syscall Switcher - intercepts syscalls from the application and passes them to the Sentry with no further handling. +* Sentry - The "application kernel" in userspace that serves the untrusted application. Each application instance has its own Sentry. The Sentry handles syscalls, routes I/O to gofers, and manages memory and CPU, all in userspace. The Sentry is allowed to make limited, filtered syscalls to the host OS. +* Gofer - a process that specifically handles different types of I/O for the Sentry (usually disk I/O). Gofers are also allowed to make filtered syscalls to the Host OS. +* Host OS - the actual OS on which gVisor containers are running, always some flavor of Linux (sorry, Windows/MacOS users). + +It is important to emphasize what is being protected from the untrusted application in this diagram: the host OS and other userspace applications. + +In this post, we are only discussing security-related features of gVisor, and you might ask, "What about performance, compatibility and stability?" We will cover these considerations in future posts. + + +## Defense-in-Depth + +For gVisor, Defense-in-Depth means each component of the software stack trusts the other components as little as possible. + +It may seem strange that we would want our own software components to distrust each other. But by limiting the trust between small, discrete components, each component is forced to defend itself against potentially malicious input. And when you stack these components on top of each other, you can ensure that multiple security barriers must be overcome by an attacker. + +And this leads us to how Defense-in-Depth is applied to gVisor: no single vulnerability should compromise the host. + +In the "Attacker's Advantage / Defender's Dilemma," the defender must succeed all the time while the attacker only needs to succeed once. Defense in Depth inverts this principle: once the attacker successfully compromises any given software component, they are immediately faced with needing to compromise a subsequent, distinct layer in order to move laterally or acquire more privilege. + +For example, the untrusted container is isolated from the Sentry. The Sentry is isolated from host I/O operations by serving those requests in separate processes called Gofers. And both the untrusted container and its associated Gofers are isolated from the host process that is running the sandbox. + +An additional benefit is that this generally leads to more robust and stable software, forcing interfaces to be strictly defined and tested to ensure all inputs are properly parsed and bounds checked. + + +## Least-Privilege + +The principle of Least-Privilege implies that each software component has only the permissions it needs to function, and no more. + +Least-Privilege is applied throughout gVisor. Each component and more importantly, each interface between the components, is designed so that only the minimum level of permission is required for it to perform its function. Specifically, the closer you are to the untrusted application, the less privilege you have. + +---- + +![Figure 2](./2019-11-18-security-basics-figure2.png) + +Figure 2: runsc components and their privileges. + +---- + +This is evident in how runsc (the drop in gVisor binary for Docker/Kubernetes) constructs the sandbox. The Sentry has the least privilege possible (it can't even open a file!). Gofers are only allowed file access, so even if it were compromised, the host network would be unavailable. Only the runsc binary itself has full access to the host OS, and even runsc's access to the host OS is often limited through capabilities / chroot / namespacing. + +Designing a system with Defense-in-Depth and Least-Privilege in mind encourages small, separate, single-purpose components, each with very restricted privileges. + + +## Attack Surface Reduction + +There are no bugs in unwritten code. + +In other words, gVisor supports a feature if and only if it is needed to run host Linux containers. + + +### Host Application/Sentry Interface: + +There are a lot of things gVisor does not need to do. For example, it does not need to support arbitrary device drivers, nor does it need to support video playback. By not implementing what will not be used, we avoid introducing potential bugs in our code. + +That is not to say gVisor has limited functionality! Quite the opposite, we analyzed what is actually needed to run Linux containers and today the Sentry supports 237 syscalls[^3]<sup>,</sup>[^4], along with the range of critical /proc and /dev files. However, gVisor does not support every syscall in the Linux kernel. There are about 350 syscalls[^5] within the 5.3.11 version of the Linux kernel, many of which do not apply to Linux containers that typically host cloud-like workloads. For example, we don't support old versions of epoll (epoll_ctl_old, epoll_wait_old), because they are deprecated in Linux and no supported workloads use them. + +Furthermore, any exploited vulnerabilities in the implemented syscalls (or Sentry code in general) only apply to gaining control of the Sentry. More on this in a later post. + + +### Sentry/Host OS Interface: + +The Sentry's interactions with the Host OS are restricted in many ways. For instance, no syscall is "passed-through" from the untrusted application to the host OS. All syscalls are intercepted and interpreted. In the case where the Sentry needs to call the Host OS, we severely limit the syscalls that the Sentry itself is allowed to make to the host kernel[^6]. + +For example, there are many file-system based attacks, where manipulation of files or their paths, can lead to compromise of the host[^7]. As a result, the Sentry does not allow any syscall that creates or opens a file descriptor. All file descriptors must be donated to the sandbox. By disallowing open or creation of file descriptors, we eliminate entire categories of these file-based attacks. + +This does not affect functionality though. For example, during startup, runsc will donate FDs the Sentry that allow for mapping STDIN/STDOUT/STDERR to the sandboxed application. Also the Gofer may donate an FD to the Sentry, allowing for direct access to some files. And most files will be remotely accessed through the Gofers, in which case no FDs are donated to the Sentry. + +The Sentry itself is only allowed access to specific [whitelisted syscalls](https://github.com/google/gvisor/blob/master/runsc/boot/config.go). Without networking, the Sentry needs 53 host syscalls in order to function, and with networking, it uses an additional 15[^8]. By limiting the whitelist to only these needed syscalls, we radically reduce the amount of host OS attack surface. If any attempts are made to call something outside the whitelist, it is immediately blocked and the sandbox is killed by the Host OS. + + +### Sentry/Gofer Interface: + +The Sentry communicates with the Gofer through a local unix domain socket (UDS) via a version of the 9P protocol[^9]. The UDS file descriptor is passed to the sandbox during initialization and all communication between the Sentry and Gofer happens via 9P. We will go more into how Gofers work in future posts. + + +### End Result + +So, of the 350 syscalls in the Linux kernel, the Sentry needs to implement only 237 of them to support containers. At most, the Sentry only needs to call 68 of the host Linux syscalls. In other words, with gVisor, applications get the vast majority (and growing) functionality of Linux containers for only 68 possible syscalls to the Host OS. 350 syscalls to 68 is attack surface reduction. + +---- + +![Figure 3](./2019-11-18-security-basics-figure3.png) + +Figure 3: Reduction of Attack Surface of the Syscall Table. Note that the Senty's Syscall Emulation Layer keeps the Containerized Process from ever calling the Host OS. + +--- + +## Secure-by-default + +The default choice for a user should be safe. If users need to run a less secure configuration of the sandbox for the sake of performance or application compatibility, they must make the choice explicitly. + +An example of this might be a networking application that is performance sensitive. Instead of using the safer, Go-based Netstack in the Sentry, the untrusted container can instead use the host Linux networking stack directly. However, this means the untrusted container will be directly interacting with the host, without the safety benefits of the sandbox. It also means that an attack could directly compromise the host through his path. + +These less secure configurations are **not** the default. In fact, the user must take action to change the configuration and run in a less secure mode. Additionally, these actions make it very obvious that a less secure configuration is being used. + +This can be as simple as forcing a default runtime flag option to the secure option. gVisor does this by always using its internal netstack by default. However, for certain performance sensitive applications, we allow the usage of the host OS networking stack, but it requires the user to actively set a flag[^10]. + + +# Technology Choices + +Technology choices for gVisor mainly involve things that will give us a security boundary. + +At a higher level, boundaries in software might be describing a great many things. It may be discussing the boundaries between threads, boundaries between processes, boundaries between CPU privilege levels, and more. + +Security boundaries are interfaces that are designed and built so that entire classes of bugs/vulnerabilities are eliminated. + +For example, the Sentry and Gofers are implemented using Go. Go was chosen for a number of the features it provided. Go is a fast, statically-typed, compiled language that has efficient multi-threading support, garbage collection and a constrained set of "unsafe" operations. + +Using these features enabled safe array and pointer handling. This means entire classes of vulnerabilities were eliminated, such as buffer overflows and use-after-free. + +Another example is our use of very strict syscall switching to ensure that the Sentry is always the first software component that parses and interprets the calls being made by the untrusted container. Here is an instance where different platforms use different solutions, but all of them share this common trait, whether it is through the use of ptrace "a la PTRACE_ATTACH"[^11] or kvm's ring0[^12]. + +Finally, one of the most restrictive choices was to use seccomp, to restrict the Sentry from being able to open or create a file descriptor on the host. All file I/O is required to go through Gofers. Preventing the opening or creation of file descriptions eliminates whole categories of bugs around file permissions [like this one](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557)[^13]. + + +# To be continued - Part 2 + +In part 2 of this blog post, we will explore gVisor from an attacker's point of view. We will use it as an opportunity to examine the specific strengths and weaknesses of each gVisor component. + +We will also use it to introduce Google's Vulnerability Reward Program[^14], and other ways the community can contribute to help make gVisor safe, fast and stable. + +<!-- Footnotes themselves at the bottom. --> + +## Notes + +[^1]: + [https://www.owasp.org/index.php/Security_by_Design_Principles](https://www.owasp.org/index.php/Security_by_Design_Principles) + +[^2]: + [https://gvisor.dev/docs/architecture_guide](https://gvisor.dev/docs/architecture_guide/) + +[^3]: + [https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/linux/linux64_amd64.go](https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/syscalls.go) + +[^4]: + Internally that is, it doesn't call to the Host OS to implement them, in fact that is explicitly disallowed, more on that in the future. + +[^5]: + [https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscalls/syscall_64.tbl#L345](https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscalls/syscall_64.tbl#L345) + +[^6]: + [https://github.com/google/gvisor/tree/master/runsc/boot/filter](https://github.com/google/gvisor/tree/master/runsc/boot/filter) + +[^7]: + [https://en.wikipedia.org/wiki/Dirty_COW](https://en.wikipedia.org/wiki/Dirty_COW) + +[^8]: + [https://github.com/google/gvisor/blob/master/runsc/boot/config.go](https://github.com/google/gvisor/blob/master/runsc/boot/config.go) + +[^9]: + [https://en.wikipedia.org/wiki/9P_(protocol)](https://en.wikipedia.org/wiki/9P_(protocol)) + +[^10]: + [https://gvisor.dev/docs/user_guide/networking/#network-passthrough](https://gvisor.dev/docs/user_guide/networking/#network-passthrough) + +[^11]: + [https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ptrace/subprocess.go#L390](https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ptrace/subprocess.go#L390) + +[^12]: + [https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ring0/kernel_amd64.go#L182](https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ring0/kernel_amd64.go#L182) + +[^13]: + [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557) + +[^14]: + [https://www.google.com/about/appsecurity/reward-program/index.html](https://www.google.com/about/appsecurity/reward-program/index.html) + diff --git a/website/content/_posts/2020-04-02-networking-security-figure1.png b/website/content/_posts/2020-04-02-networking-security-figure1.png Binary files differnew file mode 100644 index 000000000..b49cb0242 --- /dev/null +++ b/website/content/_posts/2020-04-02-networking-security-figure1.png diff --git a/website/content/_posts/2020-04-02-networking-security.md b/website/content/_posts/2020-04-02-networking-security.md new file mode 100644 index 000000000..fe3bc62df --- /dev/null +++ b/website/content/_posts/2020-04-02-networking-security.md @@ -0,0 +1,63 @@ +--- +title: gVisor Networking Security +layout: post +author: igudger +permlink: /blog/:title/ +--- + +In our [first blog post](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/), we covered some secure design principles and how they guided the architecture of gVisor as a whole. In this post, we will cover how these principles guided the networking architecture of gVisor, and the tradeoffs involved. In particular, we will cover how these principles culminated in two networking modes, how they work, and the properties of each. + +## gVisor's security architecture in the context of networking + +Linux networking is complicated. The TCP protocol is over 40 years old, and has been repeatedly extended over the years to keep up with the rapid pace of network infrastructure improvements, all while maintaining compatibility. On top of that, Linux networking has a fairly large API surface. Linux supports [over 150 options](https://github.com/google/gvisor/blob/960f6a975b7e44c0efe8fd38c66b02017c4fe137/pkg/sentry/strace/socket.go#L476-L644) for the most common socket types alone. In fact, the net subsystem is one of the largest and fastest growing in Linux at approximately 1.1 million lines of code. For comparison, that is several times the size of the entire gVisor codebase. + +At the same time, networking is increasingly important. The cloud era is arguably about making everything a network service, and in order to make that work, the interconnect performance is critical. Adding networking support to gVisor was difficult, not just due to the inherent complexity, but also because it has the potential to significantly weaken gVisor's security model. + +As outlined in the previous blog post, gVisor's [secure design principles](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#design-principles) are: + +1. Defense in Depth: each component of the software stack trusts each other component as little as possible. +1. Least Privilege: each software component has only the permissions it needs to function, and no more. +1. Attack Surface Reduction: limit the surface area of the host exposed to the sandbox. +1. Secure by Default: the default choice for a user should be safe. + +gVisor manifests these principles as a multi-layered system. An application running in the sandbox interacts with the Sentry, a userspace kernel, which mediates all interactions with the Host OS and beyond. The Sentry is written in pure Go with minimal unsafe code, making it less vulnerable to buffer overflows and related memory bugs that can lead to a variety of compromises including code injection. It emulates Linux using only a minimal and audited set of Host OS syscalls that limit the Host OS's attack surface exposed to the Sentry itself. The syscall restrictions are enforced by running the Sentry with seccomp filters, which enforce that the Sentry can only use the expected set of syscalls. The Sentry runs as an unprivileged user and in namespaces, which, along with the seccomp filters, ensure that the Sentry is run with the Least Privilege required. + +gVisor's multi-layered design provides Defense in Depth. The Sentry, which does not trust the application because it may attack the Sentry and try to bypass it, is the first layer. The sandbox that the Sentry runs in is the second layer. If the Sentry were compromised, the attacker would still be in a highly restrictive sandbox which they must also break out of in order to compromise the Host OS. + +To enable networking functionality while preserving gVisor's security properties, we implemented a [userspace network stack](https://github.com/google/gvisor/tree/master/pkg/tcpip) in the Sentry, which we creatively named Netstack. Netstack is also written in Go, not only to avoid unsafe code in the network stack itself, but also to avoid a complicated and unsafe Foreign Function Interface. Having its own integrated network stack allows the Sentry to implement networking operations using up to three Host OS syscalls to read and write packets. These syscalls allow a very minimal set of operations which are already allowed (either through the same or a similar syscall). Moreover, because packets typically come from off-host (e.g. the internet), the Host OS's packet processing code has received a lot of scrutiny, hopefully resulting in a high degree of hardening. + +---- + +![Figure 1](./2020-04-02-networking-security-figure1.png) + +Figure 1: Netstack and gVisor + +---- + +## Writing a network stack + +Netstack was written from scratch specifically for gVisor. Because Netstack was designed and implemented to be modular, flexible and self-contained, there are now several more projects using Netstack in creative and exciting ways. As we discussed, a custom network stack has enabled a variety of security-related goals which would not have been possible any other way. This came at a cost though. Network stacks are complex and writing a new one comes with many challenges, mostly related to application compatibility and performance. + +Compatibility issues typically come in two forms: missing features, and features with behavior that differs from Linux (usually due to bugs). Both of these are inevitable in an implementation of a complex system spanning many quickly evolving and ambiguous standards. However, we have invested heavily in this area, and the vast majority of applications have no issues using Netstack. For example, [we now support setting 34 different socket options](https://github.com/google/gvisor/blob/815df2959a76e4a19f5882e40402b9bbca9e70be/pkg/sentry/socket/netstack/netstack.go#L830-L1764) versus [only 7 in our initial git commit](https://github.com/google/gvisor/blob/d02b74a5dcfed4bfc8f2f8e545bca4d2afabb296/pkg/sentry/socket/epsocket/epsocket.go#L445-L702). We are continuing to make good progress in this area. + +Performance issues typically come from TCP behavior and packet processing speed. To improve our TCP behavior, we are working on implementing the full set of TCP RFCs. There are many RFCs which are significant to performance (e.g. [RACK](https://tools.ietf.org/id/draft-ietf-tcpm-rack-03.html) and [BBR](https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00)) that we have yet to implement. This mostly affects TCP performance with non-ideal network conditions (e.g. cross continent connections). Faster packet processing mostly improves TCP performance when network conditions are very good (e.g. within a datacenter). Our primary strategy here is to reduce interactions with the Go runtime, specifically the garbage collector (GC) and scheduler. We are currently optimizing buffer management to reduce the amount of garbage, which will lower the GC cost. To reduce scheduler interactions, we are re-architecting the TCP implementation to use fewer goroutines. Performance today is good enough for most applications and we are making steady improvements. For example, since May of 2019, we have improved the Netstack runsc [iperf3 download benchmark](https://github.com/google/gvisor/blob/master/benchmarks/suites/network.py) score by roughly 15% and upload score by around 10,000X. Current numbers are about 17 Gbps download and about 8 Gbps upload versus about 42 Gbps and 43 Gbps for native (Linux) respectively. + + +## An alternative + +We also offer an alternative network mode: passthrough. This name can be misleading as syscalls are never passed through from the app to the Host OS. Instead, the passthrough mode implements networking in gVisor using the Host OS's network stack. (This mode is called [hostinet](https://github.com/google/gvisor/tree/master/pkg/sentry/socket/hostinet) in the codebase.) Passthrough mode can improve performance for some use cases as the Host OS's network stack has had an enormous number of person-years poured into making it highly performant. However, there is a rather large downside to using passthrough mode: it weakens gVisor's security model by increasing the Host OS's Attack Surface. This is because using the Host OS's network stack requires the Sentry to use the Host OS's [Berkeley socket interface](https://en.wikipedia.org/wiki/Berkeley_sockets). The Berkeley socket interface is a much larger API surface than the packet interface that our network stack uses. When passthrough mode is in use, the Sentry is allowed to use [15 additional syscalls](https://github.com/google/gvisor/blob/b1576e533223e98ebe4bd1b82b04e3dcda8c4bf1/runsc/boot/filter/config.go#L312-L517). Further, this set of syscalls includes some that allow the Sentry to create file descriptors, something that [we don't normally allow](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#sentry-host-os-interface) as it opens up classes of file-based attacks. + +There are some networking features that we can't implement on top of syscalls that we feel are safe (most notably those behind [ioctl](http://man7.org/linux/man-pages/man2/ioctl.2.html)) and therefore are not supported. Because of this, we actually support fewer networking features in passthrough mode than we do in Netstack, reducing application compatibility. That's right: using our networking stack provides better overall application compatibility than using our passthrough mode. + +That said, gVisor with passthrough networking still provides a high level of isolation. Applications cannot specify host syscall arguments directly, and the sentry's seccomp policy restricts its syscall use significantly more than a general purpose seccomp policy. + + +## Secure by Default + +The goal of the Secure by Default principle is to make it easy to securely sandbox containers. Of course, disabling network access entirely is the most secure option, but that is not practical for most applications. To make gVisor Secure by Default, we have made Netstack the default networking mode in gVisor as we believe that it provides significantly better isolation. For this reason we strongly caution users from changing the default unless Netstack flat out won't work for them. The passthrough mode option is still provided, but we want users to make an informed decision when selecting it. + +Another way in which gVisor makes it easy to securely sandbox containers is by allowing applications to run unmodified, with no special configuration needed. In order to do this, gVisor needs to support all of the features and syscalls that applications use. Neither seccomp nor gVisor's passthrough mode can do this as applications commonly use syscalls which are too dangerous to be included in a secure policy. Even if this dream isn't fully realized today, gVisor's architecture with Netstack makes this possible. + +## Give Netstack a Try + +If you haven't already, try running a workload in gVisor with Netstack. You can find instructions on how to get started in our [Quick Start](https://gvisor.dev/docs/user_guide/quick_start/docker/). We want to hear about both your successes and any issues you encounter. We welcome your contributions, whether that be verbal feedback or code contributions, via our [Gitter channel](https://gitter.im/gvisor/community), [email list](https://groups.google.com/forum/#!forum/gvisor-users), [issue tracker](https://gvisor.dev/issue/new), and [Github repository](https://github.com/google/gvisor). Feel free to express interest in an [open issue](https://gvisor.dev/issue/), or reach out if you aren't sure where to start. |