diff options
-rw-r--r-- | content/blog/2_networking_security/index.md | 14 |
1 files changed, 7 insertions, 7 deletions
diff --git a/content/blog/2_networking_security/index.md b/content/blog/2_networking_security/index.md index 6b869ce43..29b010820 100644 --- a/content/blog/2_networking_security/index.md +++ b/content/blog/2_networking_security/index.md @@ -25,7 +25,7 @@ gVisor manifests these principles as a multi-layered system. An application runn gVisor's multi-layered design provides Defense in Depth. The Sentry, which does not trust the application because it may attack the Sentry and try to bypass it, is the first layer. The sandbox that the Sentry runs in is the second layer. If the Sentry were compromised, the attacker would still be in a highly restrictive sandbox which they must also break out of in order to compromise the Host OS. -To enable networking functionality while preserving gVisor's security properties, we implemented a [userspace network stack](https://github.com/google/gvisor/tree/master/pkg/tcpip) in the Sentry, which we creatively named netstack. Netstack is also written in Go, not only to avoid unsafe code in the network stack itself, but also to avoid a complicated and unsafe Foreign Function Interface. Having its own integrated network stack allows the Sentry to implement networking operations using up to three Host OS syscalls to read and write packets. These syscalls allow a very minimal set of operations which are already allowed (either through the same or a similar syscall). Moreover, because packets typically come from off-host (e.g. the internet), the Host OS's packet processing code has received a lot of scrutiny, hopefully resulting in a high degree of hardening. +To enable networking functionality while preserving gVisor's security properties, we implemented a [userspace network stack](https://github.com/google/gvisor/tree/master/pkg/tcpip) in the Sentry, which we creatively named Netstack. Netstack is also written in Go, not only to avoid unsafe code in the network stack itself, but also to avoid a complicated and unsafe Foreign Function Interface. Having its own integrated network stack allows the Sentry to implement networking operations using up to three Host OS syscalls to read and write packets. These syscalls allow a very minimal set of operations which are already allowed (either through the same or a similar syscall). Moreover, because packets typically come from off-host (e.g. the internet), the Host OS's packet processing code has received a lot of scrutiny, hopefully resulting in a high degree of hardening. ---- @@ -39,24 +39,24 @@ Figure 1: Netstack and gVisor Netstack was written from scratch specifically for gVisor. There are now other users (e.g. [Fuchsia](https://fuchsia.googlesource.com/fuchsia/+/refs/heads/master/src/connectivity/network/netstack/)), but they came later. As we discussed, a custom network stack has enabled a variety of security-related goals which would not have been possible any other way. This came at a cost though. Network stacks are complex and writing a new one comes with many challenges, mostly related to application compatibility and performance. -Compatibility issues typically come in two forms: missing features, and features with behavior that differs from Linux (usually due to bugs). Both of these are inevitable in an implementation of a complex system spanning many quickly evolving and ambiguous standards. However, we have invested heavily in this area, and the vast majority of applications have no issues using netstack. For example, [we now support setting 34 different socket options](https://github.com/google/gvisor/blob/815df2959a76e4a19f5882e40402b9bbca9e70be/pkg/sentry/socket/netstack/netstack.go#L830-L1764) versus [only 7 in our initial git commit](https://github.com/google/gvisor/blob/d02b74a5dcfed4bfc8f2f8e545bca4d2afabb296/pkg/sentry/socket/epsocket/epsocket.go#L445-L702). We are continuing to make good progress in this area. +Compatibility issues typically come in two forms: missing features, and features with behavior that differs from Linux (usually due to bugs). Both of these are inevitable in an implementation of a complex system spanning many quickly evolving and ambiguous standards. However, we have invested heavily in this area, and the vast majority of applications have no issues using Netstack. For example, [we now support setting 34 different socket options](https://github.com/google/gvisor/blob/815df2959a76e4a19f5882e40402b9bbca9e70be/pkg/sentry/socket/netstack/netstack.go#L830-L1764) versus [only 7 in our initial git commit](https://github.com/google/gvisor/blob/d02b74a5dcfed4bfc8f2f8e545bca4d2afabb296/pkg/sentry/socket/epsocket/epsocket.go#L445-L702). We are continuing to make good progress in this area. -Performance issues typically come from TCP behavior and packet processing speed. To improve our TCP behavior, we are working on implementing the full set of TCP RFCs. There are many RFCs which are significant to performance (e.g. [RACK](https://tools.ietf.org/id/draft-ietf-tcpm-rack-03.html) and [BBR](https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00)) that we have yet to implement. This mostly affects TCP performance with non-ideal network conditions (e.g. cross continent connections). Faster packet processing mostly improves TCP performance when network conditions are very good (e.g. within a datacenter). Our primary strategy here is to reduce interactions with the Go runtime, specifically the garbage collector (GC) and scheduler. We are currently optimizing buffer management to reduce the amount of garbage, which will lower the GC cost. To reduce scheduler interactions, we are re-architecting the TCP implementation to use fewer goroutines. Performance today is good enough for most applications and we are making steady improvements. For example, since May of 2019, we have improved the netstack runsc [iperf3 download benchmark](https://github.com/google/gvisor/blob/master/benchmarks/suites/network.py) score by roughly 15% and upload score by around 10,000X. Current numbers are about 17 Gbps download and about 8 Gbps upload versus about 42 Gbps and 43 Gbps for native (Linux) respectively. +Performance issues typically come from TCP behavior and packet processing speed. To improve our TCP behavior, we are working on implementing the full set of TCP RFCs. There are many RFCs which are significant to performance (e.g. [RACK](https://tools.ietf.org/id/draft-ietf-tcpm-rack-03.html) and [BBR](https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00)) that we have yet to implement. This mostly affects TCP performance with non-ideal network conditions (e.g. cross continent connections). Faster packet processing mostly improves TCP performance when network conditions are very good (e.g. within a datacenter). Our primary strategy here is to reduce interactions with the Go runtime, specifically the garbage collector (GC) and scheduler. We are currently optimizing buffer management to reduce the amount of garbage, which will lower the GC cost. To reduce scheduler interactions, we are re-architecting the TCP implementation to use fewer goroutines. Performance today is good enough for most applications and we are making steady improvements. For example, since May of 2019, we have improved the Netstack runsc [iperf3 download benchmark](https://github.com/google/gvisor/blob/master/benchmarks/suites/network.py) score by roughly 15% and upload score by around 10,000X. Current numbers are about 17 Gbps download and about 8 Gbps upload versus about 42 Gbps and 43 Gbps for native (Linux) respectively. ## An alternative We also offer an alternative network mode: passthrough. This name can be misleading as syscalls are never passed through from the app to the Host OS. Instead, the passthrough mode implements networking in gVisor using the Host OS's network stack. (This mode is called [hostinet](https://github.com/google/gvisor/tree/master/pkg/sentry/socket/hostinet) in the codebase.) Passthrough mode can improve performance for some use cases as the Host OS's network stack has had an enormous number of person-years poured into making it highly performant. However, there is a rather large downside to using passthrough mode: it weakens gVisor's security model by increasing the Host OS's Attack Surface. This is because using the Host OS's network stack requires the Sentry to use the Host OS's [Berkeley socket interface](https://en.wikipedia.org/wiki/Berkeley_sockets). The Berkeley socket interface is a much larger API surface than the packet interface that our network stack uses. When passthrough mode is in use, the Sentry is allowed to use [15 additional syscalls](https://github.com/google/gvisor/blob/b1576e533223e98ebe4bd1b82b04e3dcda8c4bf1/runsc/boot/filter/config.go#L312-L517). Further, this set of syscalls includes some that allow the Sentry to create file descriptors, something that [we don't normally allow](https://gvisor.dev/blog/2019/11/18/gvisor-security-basics-part-1/#sentry-host-os-interface) as it opens up classes of file-based attacks. -There are some networking features that we can't implement on top of syscalls that we feel are safe (most notably those behind [ioctl](http://man7.org/linux/man-pages/man2/ioctl.2.html)) and therefore are not supported. Because of this, we actually support fewer networking features in passthrough mode than we do in netstack, reducing application compatibility. That's right: using our networking stack provides better overall application compatibility than using our passthrough mode. +There are some networking features that we can't implement on top of syscalls that we feel are safe (most notably those behind [ioctl](http://man7.org/linux/man-pages/man2/ioctl.2.html)) and therefore are not supported. Because of this, we actually support fewer networking features in passthrough mode than we do in Netstack, reducing application compatibility. That's right: using our networking stack provides better overall application compatibility than using our passthrough mode. That said, gVisor with passthrough networking still provides a high level of isolation. Applications cannot specify host syscall arguments directly, and the sentry's seccomp policy restricts its syscall use significantly more than a general purpose seccomp policy. ## Secure by Default -The goal of the Secure by Default principle is to make it easy to securely sandbox containers. Of course, disabling network access entirely is the most secure option, but that is not practical for most applications. To make gVisor Secure by Default, we have made netstack the default networking mode in gVisor as we believe that it provides significantly better isolation. For this reason we strongly caution users from changing the default unless netstack flat out won't work for them. The passthrough mode option is still provided, but we want users to make an informed decision when selecting it. +The goal of the Secure by Default principle is to make it easy to securely sandbox containers. Of course, disabling network access entirely is the most secure option, but that is not practical for most applications. To make gVisor Secure by Default, we have made Netstack the default networking mode in gVisor as we believe that it provides significantly better isolation. For this reason we strongly caution users from changing the default unless Netstack flat out won't work for them. The passthrough mode option is still provided, but we want users to make an informed decision when selecting it. -Another way in which gVisor makes it easy to securely sandbox containers is by allowing applications to run unmodified, with no special configuration needed. In order to do this, gVisor needs to support all of the features and syscalls that applications use. Neither seccomp nor gVisor's passthrough mode can do this as applications commonly use syscalls which are too dangerous to be included in a secure policy. Even if this dream isn't fully realized today, gVisor's architecture with netstack makes this possible. +Another way in which gVisor makes it easy to securely sandbox containers is by allowing applications to run unmodified, with no special configuration needed. In order to do this, gVisor needs to support all of the features and syscalls that applications use. Neither seccomp nor gVisor's passthrough mode can do this as applications commonly use syscalls which are too dangerous to be included in a secure policy. Even if this dream isn't fully realized today, gVisor's architecture with Netstack makes this possible. -If you haven't already, try running a workload in gVisor with netstack. You can find instructions on how to get started in our [Quick Start](https://gvisor.dev/docs/user_guide/quick_start/docker/). We want to hear about both your successes and any issues you encounter. We welcome your contributions, whether that be verbal feedback or code contributions, via our [Gitter channel](https://gitter.im/gvisor/community), [email list](https://groups.google.com/forum/#!forum/gvisor-users), [issue tracker](https://gvisor.dev/issue/new), and [Github repository](https://github.com/google/gvisor). Feel free to express interest in an [open issue](https://gvisor.dev/issue/), or reach out if you aren't sure where to start. +If you haven't already, try running a workload in gVisor with Netstack. You can find instructions on how to get started in our [Quick Start](https://gvisor.dev/docs/user_guide/quick_start/docker/). We want to hear about both your successes and any issues you encounter. We welcome your contributions, whether that be verbal feedback or code contributions, via our [Gitter channel](https://gitter.im/gvisor/community), [email list](https://groups.google.com/forum/#!forum/gvisor-users), [issue tracker](https://gvisor.dev/issue/new), and [Github repository](https://github.com/google/gvisor). Feel free to express interest in an [open issue](https://gvisor.dev/issue/), or reach out if you aren't sure where to start. |