diff options
Diffstat (limited to 'website/blog')
-rw-r--r-- | website/blog/2019-11-18-security-basics.md | 140 | ||||
-rw-r--r-- | website/blog/2020-04-02-networking-security.md | 110 | ||||
-rw-r--r-- | website/blog/BUILD | 37 |
3 files changed, 151 insertions, 136 deletions
diff --git a/website/blog/2019-11-18-security-basics.md b/website/blog/2019-11-18-security-basics.md index bec929528..ed6d97ffe 100644 --- a/website/blog/2019-11-18-security-basics.md +++ b/website/blog/2019-11-18-security-basics.md @@ -1,11 +1,4 @@ ---- -title: gVisor Security Basics - Part 1 -layout: post -authors: -- jsprad -- zkoopmans -permalink: /blog/2019/11/18/gvisor-security-basics-part-1/ ---- +# gVisor Security Basics - Part 1 This blog is a space for engineers and community members to share perspectives and deep dives on technology and design within the gVisor project. Though our @@ -51,10 +44,10 @@ into it in the next section! # Design Principles -gVisor was designed with some [common secure design -principles](https://www.owasp.org/index.php/Security_by_Design_Principles) in -mind: Defense-in-Depth, Principle of Least-Privilege, Attack Surface Reduction -and Secure-by-Default[^1]. +gVisor was designed with some +[common secure design principles](https://www.owasp.org/index.php/Security_by_Design_Principles) +in mind: Defense-in-Depth, Principle of Least-Privilege, Attack Surface +Reduction and Secure-by-Default[^1]. In general, Design Principles outline good engineering practices, but in the case of security, they also can be thought of as a set of tactics. In a @@ -62,30 +55,44 @@ real-life castle, there is no single defensive feature. Rather, there are many in combination: redundant walls, scattered draw bridges, small bottle-neck entrances, moats, etc. -A simplified version of the design is below ([more detailed -version](/docs/architecture_guide/))[^2]: +A simplified version of the design is below +([more detailed version](/docs/architecture_guide/))[^2]: ----- +-------------------------------------------------------------------------------- ![Figure 1](/assets/images/2019-11-18-security-basics-figure1.png) Figure 1: Simplified design of gVisor. ----- +-------------------------------------------------------------------------------- In order to discuss design principles, the following components are important to know: -* runsc - binary that packages the Sentry, platform, and Gofer(s) that run containers. runsc is the drop-in binary for running gVisor in Docker and Kubernetes. -* Untrusted Application - container running in the sandbox. Untrusted application/container are used interchangeably in this article. -* Platform Syscall Switcher - intercepts syscalls from the application and passes them to the Sentry with no further handling. -* Sentry - The "application kernel" in userspace that serves the untrusted application. Each application instance has its own Sentry. The Sentry handles syscalls, routes I/O to gofers, and manages memory and CPU, all in userspace. The Sentry is allowed to make limited, filtered syscalls to the host OS. -* Gofer - a process that specifically handles different types of I/O for the Sentry (usually disk I/O). Gofers are also allowed to make filtered syscalls to the Host OS. -* Host OS - the actual OS on which gVisor containers are running, always some flavor of Linux (sorry, Windows/MacOS users). - -It is important to emphasize what is being protected from the untrusted application in this diagram: the host OS and other userspace applications. - -In this post, we are only discussing security-related features of gVisor, and you might ask, "What about performance, compatibility and stability?" We will cover these considerations in future posts. +* runsc - binary that packages the Sentry, platform, and Gofer(s) that run + containers. runsc is the drop-in binary for running gVisor in Docker and + Kubernetes. +* Untrusted Application - container running in the sandbox. Untrusted + application/container are used interchangeably in this article. +* Platform Syscall Switcher - intercepts syscalls from the application and + passes them to the Sentry with no further handling. +* Sentry - The "application kernel" in userspace that serves the untrusted + application. Each application instance has its own Sentry. The Sentry + handles syscalls, routes I/O to gofers, and manages memory and CPU, all in + userspace. The Sentry is allowed to make limited, filtered syscalls to the + host OS. +* Gofer - a process that specifically handles different types of I/O for the + Sentry (usually disk I/O). Gofers are also allowed to make filtered syscalls + to the Host OS. +* Host OS - the actual OS on which gVisor containers are running, always some + flavor of Linux (sorry, Windows/MacOS users). + +It is important to emphasize what is being protected from the untrusted +application in this diagram: the host OS and other userspace applications. + +In this post, we are only discussing security-related features of gVisor, and +you might ask, "What about performance, compatibility and stability?" We will +cover these considerations in future posts. ## Defense-in-Depth @@ -127,13 +134,13 @@ minimum level of permission is required for it to perform its function. Specifically, the closer you are to the untrusted application, the less privilege you have. ----- +-------------------------------------------------------------------------------- ![Figure 2](/assets/images/2019-11-18-security-basics-figure2.png) Figure 2: runsc components and their privileges. ----- +-------------------------------------------------------------------------------- This is evident in how runsc (the drop in gVisor binary for Docker/Kubernetes) constructs the sandbox. The Sentry has the least privilege possible (it can't @@ -183,9 +190,8 @@ itself is allowed to make to the host kernel[^6]. For example, there are many file-system based attacks, where manipulation of files or their paths, can lead to compromise of the host[^7]. As a result, the Sentry does not allow any syscall that creates or opens a file descriptor. All -file descriptors must be donated to the sandbox. By disallowing open or -creation of file descriptors, we eliminate entire categories of these file-based -attacks. +file descriptors must be donated to the sandbox. By disallowing open or creation +of file descriptors, we eliminate entire categories of these file-based attacks. This does not affect functionality though. For example, during startup, runsc will donate FDs the Sentry that allow for mapping STDIN/STDOUT/STDERR to the @@ -193,8 +199,8 @@ sandboxed application. Also the Gofer may donate an FD to the Sentry, allowing for direct access to some files. And most files will be remotely accessed through the Gofers, in which case no FDs are donated to the Sentry. -The Sentry itself is only allowed access to specific [whitelisted -syscalls](https://github.com/google/gvisor/blob/master/runsc/boot/config.go). +The Sentry itself is only allowed access to specific +[whitelisted syscalls](https://github.com/google/gvisor/blob/master/runsc/boot/config.go). Without networking, the Sentry needs 53 host syscalls in order to function, and with networking, it uses an additional 15[^8]. By limiting the whitelist to only these needed syscalls, we radically reduce the amount of host OS attack surface. @@ -216,13 +222,15 @@ the host Linux syscalls. In other words, with gVisor, applications get the vast majority (and growing) functionality of Linux containers for only 68 possible syscalls to the Host OS. 350 syscalls to 68 is attack surface reduction. ----- +-------------------------------------------------------------------------------- ![Figure 3](/assets/images/2019-11-18-security-basics-figure3.png) -Figure 3: Reduction of Attack Surface of the Syscall Table. Note that the Senty's Syscall Emulation Layer keeps the Containerized Process from ever calling the Host OS. +Figure 3: Reduction of Attack Surface of the Syscall Table. Note that the +Senty's Syscall Emulation Layer keeps the Containerized Process from ever +calling the Host OS. ----- +-------------------------------------------------------------------------------- ## Secure-by-default @@ -279,8 +287,8 @@ ring0[^12]. Finally, one of the most restrictive choices was to use seccomp, to restrict the Sentry from being able to open or create a file descriptor on the host. All file I/O is required to go through Gofers. Preventing the opening or creation of file -descriptions eliminates whole categories of bugs around file permissions [like -this one](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557)[^13]. +descriptions eliminates whole categories of bugs around file permissions +[like this one](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557)[^13]. # To be continued - Part 2 @@ -294,44 +302,18 @@ stable. ## Notes -[^1]: - [https://www.owasp.org/index.php/Security_by_Design_Principles](https://www.owasp.org/index.php/Security_by_Design_Principles) - -[^2]: - [https://gvisor.dev/docs/architecture_guide](https://gvisor.dev/docs/architecture_guide/) - -[^3]: - [https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/linux/linux64_amd64.go](https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/syscalls.go) - -[^4]: - Internally that is, it doesn't call to the Host OS to implement them, in fact that is explicitly disallowed, more on that in the future. - -[^5]: - [https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscalls/syscall_64.tbl#L345](https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscalls/syscall_64.tbl#L345) - -[^6]: - [https://github.com/google/gvisor/tree/master/runsc/boot/filter](https://github.com/google/gvisor/tree/master/runsc/boot/filter) - -[^7]: - [https://en.wikipedia.org/wiki/Dirty_COW](https://en.wikipedia.org/wiki/Dirty_COW) - -[^8]: - [https://github.com/google/gvisor/blob/master/runsc/boot/config.go](https://github.com/google/gvisor/blob/master/runsc/boot/config.go) - -[^9]: - [https://en.wikipedia.org/wiki/9P_(protocol)](https://en.wikipedia.org/wiki/9P_(protocol)) - -[^10]: - [https://gvisor.dev/docs/user_guide/networking/#network-passthrough](https://gvisor.dev/docs/user_guide/networking/#network-passthrough) - -[^11]: - [https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ptrace/subprocess.go#L390](https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ptrace/subprocess.go#L390) - -[^12]: - [https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ring0/kernel_amd64.go#L182](https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ring0/kernel_amd64.go#L182) - -[^13]: - [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557) - -[^14]: - [https://www.google.com/about/appsecurity/reward-program/index.html](https://www.google.com/about/appsecurity/reward-program/index.html) +[^1]: [https://www.owasp.org/index.php/Security_by_Design_Principles](https://www.owasp.org/index.php/Security_by_Design_Principles) +[^2]: [https://gvisor.dev/docs/architecture_guide](https://gvisor.dev/docs/architecture_guide/) +[^3]: [https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/linux/linux64_amd64.go](https://github.com/google/gvisor/blob/master/pkg/sentry/syscalls/syscalls.go) +[^4]: Internally that is, it doesn't call to the Host OS to implement them, in + fact that is explicitly disallowed, more on that in the future. +[^5]: [https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscalls/syscall_64.tbl#L345](https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscalls/syscall_64.tbl#L345) +[^6]: [https://github.com/google/gvisor/tree/master/runsc/boot/filter](https://github.com/google/gvisor/tree/master/runsc/boot/filter) +[^7]: [https://en.wikipedia.org/wiki/Dirty_COW](https://en.wikipedia.org/wiki/Dirty_COW) +[^8]: [https://github.com/google/gvisor/blob/master/runsc/boot/config.go](https://github.com/google/gvisor/blob/master/runsc/boot/config.go) +[^9]: [https://en.wikipedia.org/wiki/9P_(protocol)](https://en.wikipedia.org/wiki/9P_\(protocol\)) +[^10]: [https://gvisor.dev/docs/user_guide/networking/#network-passthrough](https://gvisor.dev/docs/user_guide/networking/#network-passthrough) +[^11]: [https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ptrace/subprocess.go#L390](https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ptrace/subprocess.go#L390) +[^12]: [https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ring0/kernel_amd64.go#L182](https://github.com/google/gvisor/blob/c7e901f47a09eaac56bd4813227edff016fa6bff/pkg/sentry/platform/ring0/kernel_amd64.go#L182) +[^13]: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4557) +[^14]: [https://www.google.com/about/appsecurity/reward-program/index.html](https://www.google.com/about/appsecurity/reward-program/index.html) diff --git a/website/blog/2020-04-02-networking-security.md b/website/blog/2020-04-02-networking-security.md index 745246ac4..78f0a6714 100644 --- a/website/blog/2020-04-02-networking-security.md +++ b/website/blog/2020-04-02-networking-security.md @@ -1,14 +1,8 @@ ---- -title: gVisor Networking Security -layout: post -authors: -- igudger -permalink: /blog/2020/04/02/gvisor-networking-security/ ---- - -In our [first blog -post](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/), we -covered some secure design principles and how they guided the architecture of +# gVisor Networking Security + +In our +[first blog post](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/), +we covered some secure design principles and how they guided the architecture of gVisor as a whole. In this post, we will cover how these principles guided the networking architecture of gVisor, and the tradeoffs involved. In particular, we will cover how these principles culminated in two networking modes, how they @@ -19,9 +13,8 @@ work, and the properties of each. Linux networking is complicated. The TCP protocol is over 40 years old, and has been repeatedly extended over the years to keep up with the rapid pace of network infrastructure improvements, all while maintaining compatibility. On top -of that, Linux networking has a fairly large API surface. Linux supports [over -150 -options](https://github.com/google/gvisor/blob/960f6a975b7e44c0efe8fd38c66b02017c4fe137/pkg/sentry/strace/socket.go#L476-L644) +of that, Linux networking has a fairly large API surface. Linux supports +[over 150 options](https://github.com/google/gvisor/blob/960f6a975b7e44c0efe8fd38c66b02017c4fe137/pkg/sentry/strace/socket.go#L476-L644) for the most common socket types alone. In fact, the net subsystem is one of the largest and fastest growing in Linux at approximately 1.1 million lines of code. For comparison, that is several times the size of the entire gVisor codebase. @@ -32,13 +25,16 @@ work, the interconnect performance is critical. Adding networking support to gVisor was difficult, not just due to the inherent complexity, but also because it has the potential to significantly weaken gVisor's security model. -As outlined in the previous blog post, gVisor's [secure design -principles](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#design-principles) +As outlined in the previous blog post, gVisor's +[secure design principles](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#design-principles) are: -1. Defense in Depth: each component of the software stack trusts each other component as little as possible. -1. Least Privilege: each software component has only the permissions it needs to function, and no more. -1. Attack Surface Reduction: limit the surface area of the host exposed to the sandbox. +1. Defense in Depth: each component of the software stack trusts each other + component as little as possible. +1. Least Privilege: each software component has only the permissions it needs + to function, and no more. +1. Attack Surface Reduction: limit the surface area of the host exposed to the + sandbox. 1. Secure by Default: the default choice for a user should be safe. gVisor manifests these principles as a multi-layered system. An application @@ -61,25 +57,25 @@ the Sentry were compromised, the attacker would still be in a highly restrictive sandbox which they must also break out of in order to compromise the Host OS. To enable networking functionality while preserving gVisor's security -properties, we implemented a [userspace network -stack](https://github.com/google/gvisor/tree/master/pkg/tcpip) in the Sentry, -which we creatively named Netstack. Netstack is also written in Go, not only to -avoid unsafe code in the network stack itself, but also to avoid a complicated -and unsafe Foreign Function Interface. Having its own integrated network stack -allows the Sentry to implement networking operations using up to three Host OS -syscalls to read and write packets. These syscalls allow a very minimal set of -operations which are already allowed (either through the same or a similar -syscall). Moreover, because packets typically come from off-host (e.g. the -internet), the Host OS's packet processing code has received a lot of scrutiny, -hopefully resulting in a high degree of hardening. - ----- +properties, we implemented a +[userspace network stack](https://github.com/google/gvisor/tree/master/pkg/tcpip) +in the Sentry, which we creatively named Netstack. Netstack is also written in +Go, not only to avoid unsafe code in the network stack itself, but also to avoid +a complicated and unsafe Foreign Function Interface. Having its own integrated +network stack allows the Sentry to implement networking operations using up to +three Host OS syscalls to read and write packets. These syscalls allow a very +minimal set of operations which are already allowed (either through the same or +a similar syscall). Moreover, because packets typically come from off-host (e.g. +the internet), the Host OS's packet processing code has received a lot of +scrutiny, hopefully resulting in a high degree of hardening. + +-------------------------------------------------------------------------------- ![Figure 1](/assets/images/2020-04-02-networking-security-figure1.png) Figure 1: Netstack and gVisor ----- +-------------------------------------------------------------------------------- ## Writing a network stack @@ -96,10 +92,10 @@ with behavior that differs from Linux (usually due to bugs). Both of these are inevitable in an implementation of a complex system spanning many quickly evolving and ambiguous standards. However, we have invested heavily in this area, and the vast majority of applications have no issues using Netstack. For -example, [we now support setting 34 different socket -options](https://github.com/google/gvisor/blob/815df2959a76e4a19f5882e40402b9bbca9e70be/pkg/sentry/socket/netstack/netstack.go#L830-L1764) -versus [only 7 in our initial git -commit](https://github.com/google/gvisor/blob/d02b74a5dcfed4bfc8f2f8e545bca4d2afabb296/pkg/sentry/socket/epsocket/epsocket.go#L445-L702). +example, +[we now support setting 34 different socket options](https://github.com/google/gvisor/blob/815df2959a76e4a19f5882e40402b9bbca9e70be/pkg/sentry/socket/netstack/netstack.go#L830-L1764) +versus +[only 7 in our initial git commit](https://github.com/google/gvisor/blob/d02b74a5dcfed4bfc8f2f8e545bca4d2afabb296/pkg/sentry/socket/epsocket/epsocket.go#L445-L702). We are continuing to make good progress in this area. Performance issues typically come from TCP behavior and packet processing speed. @@ -117,8 +113,8 @@ which will lower the GC cost. To reduce scheduler interactions, we are re-architecting the TCP implementation to use fewer goroutines. Performance today is good enough for most applications and we are making steady improvements. For example, since May of 2019, we have improved the Netstack -runsc [iperf3 download -benchmark](https://github.com/google/gvisor/blob/master/benchmarks/suites/network.py) +runsc +[iperf3 download benchmark](https://github.com/google/gvisor/blob/master/benchmarks/suites/network.py) score by roughly 15% and upload score by around 10,000X. Current numbers are about 17 Gbps download and about 8 Gbps upload versus about 42 Gbps and 43 Gbps for native (Linux) respectively. @@ -135,15 +131,15 @@ the Host OS's network stack has had an enormous number of person-years poured into making it highly performant. However, there is a rather large downside to using passthrough mode: it weakens gVisor's security model by increasing the Host OS's Attack Surface. This is because using the Host OS's network stack -requires the Sentry to use the Host OS's [Berkeley socket -interface](https://en.wikipedia.org/wiki/Berkeley_sockets). The Berkeley socket -interface is a much larger API surface than the packet interface that our -network stack uses. When passthrough mode is in use, the Sentry is allowed to -use [15 additional -syscalls](https://github.com/google/gvisor/blob/b1576e533223e98ebe4bd1b82b04e3dcda8c4bf1/runsc/boot/filter/config.go#L312-L517). +requires the Sentry to use the Host OS's +[Berkeley socket interface](https://en.wikipedia.org/wiki/Berkeley_sockets). The +Berkeley socket interface is a much larger API surface than the packet interface +that our network stack uses. When passthrough mode is in use, the Sentry is +allowed to use +[15 additional syscalls](https://github.com/google/gvisor/blob/b1576e533223e98ebe4bd1b82b04e3dcda8c4bf1/runsc/boot/filter/config.go#L312-L517). Further, this set of syscalls includes some that allow the Sentry to create file -descriptors, something that [we don't normally -allow](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#sentry-host-os-interface) +descriptors, something that +[we don't normally allow](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#sentry-host-os-interface) as it opens up classes of file-based attacks. There are some networking features that we can't implement on top of syscalls @@ -181,13 +177,13 @@ architecture with Netstack makes this possible. ## Give Netstack a Try If you haven't already, try running a workload in gVisor with Netstack. You can -find instructions on how to get started in our [Quick -Start](/docs/user_guide/quick_start/docker/). We want to hear about both your -successes and any issues you encounter. We welcome your contributions, whether -that be verbal feedback or code contributions, via our [Gitter -channel](https://gitter.im/gvisor/community), [email -list](https://groups.google.com/forum/#!forum/gvisor-users), [issue -tracker](https://gvisor.dev/issue/new), and [Github -repository](https://github.com/google/gvisor). Feel free to express interest in -an [open issue](https://gvisor.dev/issue/), or reach out if you aren't sure -where to start. +find instructions on how to get started in our +[Quick Start](/docs/user_guide/quick_start/docker/). We want to hear about both +your successes and any issues you encounter. We welcome your contributions, +whether that be verbal feedback or code contributions, via our +[Gitter channel](https://gitter.im/gvisor/community), +[email list](https://groups.google.com/forum/#!forum/gvisor-users), +[issue tracker](https://gvisor.dev/issue/new), and +[Github repository](https://github.com/google/gvisor). Feel free to express +interest in an [open issue](https://gvisor.dev/issue/), or reach out if you +aren't sure where to start. diff --git a/website/blog/BUILD b/website/blog/BUILD new file mode 100644 index 000000000..01c1f5a6e --- /dev/null +++ b/website/blog/BUILD @@ -0,0 +1,37 @@ +load("//website:defs.bzl", "doc", "docs") + +package( + default_visibility = ["//website:__pkg__"], + licenses = ["notice"], +) + +exports_files(["index.html"]) + +doc( + name = "security_basics", + src = "2019-11-18-security-basics.md", + authors = [ + "jsprad", + "zkoopmans", + ], + layout = "post", + permalink = "/blog/2019/11/18/gvisor-security-basics-part-1/", +) + +doc( + name = "networking_security", + src = "2020-04-02-networking-security.md", + authors = [ + "igudger", + ], + layout = "post", + permalink = "/blog/2020/04/02/gvisor-networking-security/", +) + +docs( + name = "posts", + deps = [ + ":" + rule + for rule in existing_rules() + ], +) |