summaryrefslogtreecommitdiffhomepage
path: root/pkg
AgeCommit message (Collapse)Author
2021-05-14Don't read forwarding from netstack in sentryGhanan Gowripalan
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt: /proc/sys/net/ipv4/* Variables: ip_forward - BOOLEAN 0 - disabled (default) not 0 - enabled Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers) /proc/sys/net/ipv4/ip_forward only does work when its value is changed and always returns the last written value. The last written value may not reflect the current state of the netstack (e.g. when `ip_forward` was written a value of "1" then disable forwarding on an interface) so there is no need for sentry to probe netstack to get the current forwarding state of interfaces. ``` ~$ cat /proc/sys/net/ipv4/ip_forward 0 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo sysctl -w net.ipv4.conf.wlp1s0.forwarding=0 net.ipv4.conf.wlp1s0.forwarding = 0 ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo bash -c "echo 0 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a | grep ipv4 | grep forward sysctl: unable to open directory "/proc/sys/fs/binfmt_misc/" net.ipv4.conf.all.forwarding = 0 net.ipv4.conf.default.forwarding = 0 net.ipv4.conf.eno1.forwarding = 0 net.ipv4.conf.lo.forwarding = 0 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 0 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 0 ``` In the above example we can see that writing "1" to /proc/sys/net/ipv4/ip_forward configures the stack to be a router (all interfaces are configured to enable forwarding). However, if we manually update an interace (`wlp1s0`) to not forward packets, /proc/sys/net/ipv4/ip_forward continues to return the last written value of "1", even though not all interfaces will forward packets. Also note that writing the same value twice has no effect; work is performed iff the value changes. This change also removes the 'unset' state from sentry's ip forwarding data structures as an 'unset' ip forwarding value is the same as leaving forwarding disabled as the stack is always brought up with forwarding initially disabled; disabling forwarding on a newly created stack is a no-op. PiperOrigin-RevId: 373853106
2021-05-14pkg/buffer: Remove dependency to safemem, code no longer usedTing-Yu Wang
PiperOrigin-RevId: 373846881
2021-05-14Fix panic on consume in a mixed push/consume caseTing-Yu Wang
headerOffset() is incorrectly taking account of previous push(), so it thinks there is more data to consume. This change switches to use pk.reserved as pivot point. Reported-by: syzbot+64fef9acd509976f9ce7@syzkaller.appspotmail.com PiperOrigin-RevId: 373846283
2021-05-14Fix cgroup hierarchy registration.Rahat Mahmood
Previously, registration was racy because we were publishing hierarchies in the registry without fully initializing the underlying filesystem. This led to concurrent mount(2)s discovering the partially intialized filesystems and dropping the final refs on them which cause them to be freed prematurely. Reported-by: syzbot+13f54e77bdf59f0171f0@syzkaller.appspotmail.com Reported-by: syzbot+2c7f0a9127ac6a84f17e@syzkaller.appspotmail.com PiperOrigin-RevId: 373824552
2021-05-13Check filter table when forwarding IP packetsGhanan Gowripalan
This change updates the forwarding path to perform the forwarding hook with iptables so that the filter table is consulted before a packet is forwarded Updates #170. Test: iptables_test.TestForwardingHook PiperOrigin-RevId: 373702359
2021-05-13Apply SWS avoidance to ACKs with window updatesMithun Iyer
When recovering from a zero-receive-window situation, and asked to send out an ACK, ensure that we apply SWS avoidance in our window updates. Fixes #5984 PiperOrigin-RevId: 373689578
2021-05-13Migrate PacketBuffer to use pkg/bufferTing-Yu Wang
Benchmark iperf3: Before After native->runsc 5.14 5.01 (Gbps) runsc->native 4.15 4.07 (Gbps) It did introduce overhead, mainly at the bridge between pkg/buffer and VectorisedView, the ExtractVV method. Once endpoints start migrating away from VV, this overhead will be gone. Updates #2404 PiperOrigin-RevId: 373651666
2021-05-13Rename SetForwarding to SetForwardingDefaultAndAllNICsGhanan Gowripalan
...to make it clear to callers that all interfaces are updated with the forwarding flag and that future NICs will be created with the new forwarding state. PiperOrigin-RevId: 373618435
2021-05-12Fix TODO comments.Ian Lewis
Fix TODO comments referring to incorrect issue numbers. Also fix the link in issue reviver comments to include the right url fragment. PiperOrigin-RevId: 373491821
2021-05-12Send ICMP errors when unable to forward fragmented packetsNick Brown
Before this change, we would silently drop packets when the packet was too big to be sent out through the NIC (and, for IPv4 packets, if DF was set). This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when the packet can't be fragmented. PiperOrigin-RevId: 373480078
2021-05-12Merge pull request #5975 from kevinGC:align32-mipsgVisor bot
PiperOrigin-RevId: 373466994
2021-05-12Fix not calling decRef on merged segmentsTing-Yu Wang
This code path is for outgoing packets, and we don't currently do memory accounting on this path. So it wasn't breaking anything. This change did not add a test for ref-counting issue fixed, but will switch to the leak-checking ref-counter later when all ref-counting issues are fixed. PiperOrigin-RevId: 373447913
2021-05-12Document design details for refsvfs2 template.Dean Deng
PiperOrigin-RevId: 373437576
2021-05-12Internal change.gVisor bot
PiperOrigin-RevId: 373417636
2021-05-12Use an exhaustive list of architecturesKevin Krakauer
2021-05-12enable building //pkg/tcpip on 32-bit MIPSKevin Krakauer
N.B. we don't explicitly support MIPS, but there's no reason it shouldn't work.
2021-05-11Merge pull request #5694 from kevinGC:align32gVisor bot
PiperOrigin-RevId: 373271579
2021-05-11[syserror] Refactor abi/linux.ErrnoZach Koopmans
PiperOrigin-RevId: 373265454
2021-05-11Internal change.gVisor bot
PiperOrigin-RevId: 373221316
2021-05-11Change AcquireAssignedAddress to use RLock.Bhasker Hariharan
This is a hot path for all incoming packets and we don't need an exclusive lock here as we are not modifying any of the fields protected by mu here. PiperOrigin-RevId: 373181254
2021-05-11Move multicounter testutil functions out of network/ipArthur Sfez
This is in preparation of having aggregated NIC stats at the stack level. These validation functions will be needed outside of the network layer packages to test aggregated NIC stats. PiperOrigin-RevId: 373180565
2021-05-11Process Hop-by-Hop header when forwarding IPv6 packetsNick Brown
Currently, we process IPv6 extension headers when receiving packets but not when forwarding them. This is fine for the most part, with with one exception: RFC 8200 requires that we process the Hop-by-Hop headers even while forwarding packets. This CL adds that support by invoking the Hop-by-hop logic performed when receiving packets during forwarding as well. PiperOrigin-RevId: 373145478
2021-05-07explicitly 0-index backing arrayKevin Krakauer
2021-05-07Merge pull request #5758 from zhlhahaha:2125gVisor bot
PiperOrigin-RevId: 372608247
2021-05-07Init all vCPU when initializing machine on ARM64howard zhang
This patch is to solve problem that vCPU timer mess up when adding vCPU dynamically on ARM64, for detailed information please refer to: https://github.com/google/gvisor/issues/5739 There is no influence on x86 and here are main changes for ARM64: 1. create maxVCPUs number of vCPU in machine initialization 2. we want to sync gvisor vCPU number with host CPU number, so use smaller number between runtime.NumCPU and KVM_CAP_MAX_VCPUS to be maxVCPUS 3. put unused vCPUs into architecture-specific map initialvCPUs 4. When machine need to bind a new vCPU with tid, rather than creating new one, it would pick a vCPU from map initalvCPUs 5. change the setSystemTime function. When vCPU number increasing, the time cost for function setTSC(use syscall to set cntvoff) is liner growth from around 300 ns to 100000 ns, and this leads to the function setSystemTimeLegacy can not get correct offset value. 6. initializing StdioFDs and goferFD before a platform to avoid StdioFDs confects with vCPU fds Signed-off-by: howard zhang <howard.zhang@arm.com>
2021-05-06Implement /proc/cmdlineSteve Silva
This change implements /proc/cmdline with a basic faux command line "BOOT_IMAGE=/vmlinuz-[version]-gvisor quiet" so apps that may expect it do not receive errors. Also tests for the existence of /proc/cmdline as part of the system call test suite PiperOrigin-RevId: 372462070
2021-05-06fix build constraintsKevin Krakauer
2021-05-06fix rebase errorKevin Krakauer
2021-05-06mark types as saveableKevin Krakauer
2021-05-06Added more commentsKevin Krakauer
2021-05-06Added commentsKevin Krakauer
2021-05-06Moved to atomicbitops and renamed filesKevin Krakauer
2021-05-06Fix alignment issue with 64-bit atomics on 32 bit machinesKevin Krakauer
2021-05-06Solicit routers as long as RAs are handledGhanan Gowripalan
...to conform with Linux's `accept_ra` sysctl option. ``` accept_ra - INTEGER Accept Router Advertisements; autoconfigure using them. It also determines whether or not to transmit Router Solicitations. If and only if the functional setting is to accept Router Advertisements, Router Solicitations will be transmitted. Possible values are: 0 Do not accept Router Advertisements. 1 Accept Router Advertisements if forwarding is disabled. 2 Overrule forwarding behaviour. Accept Router Advertisements even if forwarding is enabled. Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. ``` With this change, routers may be solicited even if the stack is has forwarding enabled, as long as the interface is configured to handle RAs when forwarding is enabled. PiperOrigin-RevId: 372406501
2021-05-05Allow handling RAs when forwarding is enabledGhanan Gowripalan
...to conform with Linux's `accept_ra` sysctl option. ``` accept_ra - INTEGER Accept Router Advertisements; autoconfigure using them. It also determines whether or not to transmit Router Solicitations. If and only if the functional setting is to accept Router Advertisements, Router Solicitations will be transmitted. Possible values are: 0 Do not accept Router Advertisements. 1 Accept Router Advertisements if forwarding is disabled. 2 Overrule forwarding behaviour. Accept Router Advertisements even if forwarding is enabled. Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. ``` PiperOrigin-RevId: 372214640
2021-05-05Send ICMP errors when the network is unreachableNick Brown
Before this change, we would silently drop packets when unable to determine a route to the destination host. This change brings us into line with RFC 792 (IPv4) and RFC 4443 (IPv6), both of which specify that gateways should return an ICMP error to the sender when unable to reach the destination. Startblock: has LGTM from asfez and then add reviewer ghanan PiperOrigin-RevId: 372214051
2021-05-05Don't cleanup NDP state when enabling forwardingGhanan Gowripalan
...to match linux behaviour: ``` $ sudo sysctl net.ipv6.conf.eno1.forwarding net.ipv6.conf.eno1.forwarding = 0 $ ip addr list dev eno1 2: eno1: <...> ... inet6 PREFIX:TEMP_IID/64 scope global temporary dynamic valid_lft 209363sec preferred_lft 64024sec inet6 PREFIX:GLOBAL_STABLE_IID/64 scope global dynamic mngtmpaddr ... valid_lft 209363sec preferred_lft 209363sec inet6 fe80::LINKLOCAL_STABLE_IID/64 scope link valid_lft forever preferred_lft forever $ sudo sysctl -w "net.ipv6.conf.all.forwarding=1" net.ipv6.conf.all.forwarding = 1 $ sudo sysctl net.ipv6.conf.eno1.forwarding net.ipv6.conf.eno1.forwarding = 1 $ ip addr list dev eno1 2: eno1: <...> ... inet6 PREFIX:TEMP_IID/64 scope global temporary dynamic valid_lft 209339sec preferred_lft 64000sec inet6 PREFIX:GLOBAL_STABLE_IID/64 scope global dynamic mngtmpaddr ... valid_lft 209339sec preferred_lft 209339sec inet6 fe80::LINKLOCAL_STABLE_IID/64 scope link valid_lft forever preferred_lft forever $ ip -6 route list ... PREFIX::/64 dev eno1 proto ra metric 100 expires 209241sec pref medium default via fe80::ROUTER_IID dev eno1 proto ra ... ``` PiperOrigin-RevId: 372146689
2021-05-05Fix a race in reading last seen ICMP error during handshakeMithun Iyer
On receiving an ICMP error during handshake, the error is propagated by reading `endpoint.lastError`. This can race with the socket layer invoking getsockopt() with SO_ERROR where the same value is read and cleared, causing the handshake to bail out with a non-error state. Fix the race by checking for lastError state and failing the handshake with ErrConnectionAborted if the lastError was read and cleared by say SO_ERROR. The race mentioned in the bug, is caught only with the newly added tcp_test unit test, where we have control over stopping/resuming protocol loop. Adding a packetimpact test as well for sanity testing of ICMP error handling during handshake. Fixes #5922 PiperOrigin-RevId: 372135662
2021-05-05[perf] Fix profiling in benchmarking jobs.Ayush Ranjan
Due to https://github.com/moby/moby/issues/42345, the docker daemon is passing the incorrect `--root` flag to runsc. So our profiler is not able to find the container stat files where it expects them to be. PiperOrigin-RevId: 372067954
2021-05-04Fix tcp_test listen backlog expectationMithun Iyer
Listen backlog value is 1 more than what is configured by the socket layer listen call. TestListenBacklogFull expects this behavior which is incorrect as it directly invokes endpoint Listen and with cl/369974744, backlog++ logic is moved to the callers of Listen(). This test passes sometimes, because the handshakes could overlap causing the last SYN to arrive at the listener before the previous handshake is enqueued to the accept queue. In such a case the accept queue is still not full and the SYN is replied to. The final ACK of this last handshake would get dropped eventually. PiperOrigin-RevId: 372041827
2021-05-04Use cmp.Diff for tcpip.Error comparisonMithun Iyer
PiperOrigin-RevId: 372021039
2021-05-04Remove uses of the binary package from the rest of the sentry.Rahat Mahmood
PiperOrigin-RevId: 372020696
2021-05-04Add TODOs to old reference counting utility.Dean Deng
PiperOrigin-RevId: 372012795
2021-05-04Make Mount.Type optional for bind mountsFabricio Voznika
According to the OCI spec Mount.Type is an optional field and it defaults to "bind" when any of "bind" or "rbind" is included in Mount.Options. Also fix the shim to remove bind/rbind from options when mount is converted from bind to tmpfs inside the Sentry. Fixes #2330 Fixes #3274 PiperOrigin-RevId: 371996891
2021-05-03Merge pull request #5903 from zchee:safecopy/fix-argsgVisor bot
PiperOrigin-RevId: 371829568
2021-05-03Implement standard clock safelyGhanan Gowripalan
Previously, tcpip.StdClock depended on linking with the unexposed method time.now to implement tcpip.Clock using the time package. This change updates the standard clock to not require manually linking to this unexported method and use publicly documented functions from the time package. PiperOrigin-RevId: 371805101
2021-05-03Convey GSO capabilities through GSOEndpointGhanan Gowripalan
...as all GSO capable endpoints must implement GSOEndpoint. PiperOrigin-RevId: 371804175
2021-05-03netstack: Add a test for mixed Push/ConsumeTing-Yu Wang
Not really designed to be used this way, but it works and it's been relied upon. Add a test. PiperOrigin-RevId: 371802756
2021-05-03Fix deadlock in /proc/[pid]/fd/[num]Fabricio Voznika
In order to resolve path names, fsSymlink.Readlink() may need to reenter kernfs. Change the code so that kernfs.Inode.Readlink() is called without locks and document the new contract. PiperOrigin-RevId: 371770222
2021-05-01[perf] Check caching on IncRef'd dentries before the others.Ayush Ranjan
When a child is added to a parent (directory) dentry, both child and parent are queued for checkCachingLocked(). Make sure that the parent is queued first because the parent gained a ref and so could be removed from the LRU cache hence making space for the new child. This could prevent an LRU cache eviction. In practice, this did seem to help. ~800 RPCs were reduced while building //absl/... (ABSL build benchmark). Evictions hurt in 2 ways - create renameMu contention and destroy a possibly useful dentry which will have to be re-walked and re-opened later. Follow up fix for #5859. PiperOrigin-RevId: 371509392