summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip
AgeCommit message (Collapse)Author
2021-04-17Avoid ignoring incoming packet by demuxer on endpoint lookup failureMithun Iyer
This fixes a race that occurs while the endpoint is being unregistered and the transport demuxer attempts to match the incoming packet to any endpoint. The race specifically occurs when the unregistration (and deletion of the endpoint) occurs, after a successful endpointsByNIC lookup and before the endpoints map is further looked up with ingress NICID of the packet. The fix is to notify the caller of lookup-with-NICID failure, so that the logic falls through to handling unknown destination packets. For TCP this can mean replying back with RST. The syscall test in this CL catches this race as the ACK completing the handshake could get silently dropped on a listener close, causing no RST sent to the peer and timing out the poll waiting for POLLHUP. Fixes #5850 PiperOrigin-RevId: 369023779
2021-04-16Enlarge port range and fix integer overflowKevin Krakauer
Also count failed TCP port allocations PiperOrigin-RevId: 368939619
2021-04-15Add field support to the sentry metrics.Nayana Bidari
Fields allow counter metrics to have multiple tabular values. At most one field is supported at the moment. PiperOrigin-RevId: 368767040
2021-04-15Reduce tcp_x_test runtime and memory usageKevin Krakauer
Reduce the ephemeral port range, which decreases the calls to makeEP. PiperOrigin-RevId: 368748379
2021-04-15Use nicer formatting for IP addresses in testsKevin Krakauer
This was semi-automated -- there are many addresses that were not replaced. Future commits should clean those up. Parse4 and Parse6 were given their own package because //pkg/test can introduce dependency cycles, as it depends transitively on //pkg/tcpip and some other netstack packages. PiperOrigin-RevId: 368726528
2021-04-14Automatically enforce limited netstack dependenciesKevin Krakauer
Netstack is supposed to be somewhat independent of the rest of gVisor, and others should be able to use it without pulling in excessive dependencies. Currently, there is no way to fight dependency creep besides careful code review. This change introduces a test rule `netstack_deps_check` that ensures the target only relies on gVisor targets and a short allowlist of external dependencies. Users who add a dependency will see an error and have to manually update the allowlist. The set of packages to test comes from //runsc, as it uses packages we would expect users to commonly rely on. It was generated via: $ find ./runsc -name BUILD | xargs grep tcpip | awk '{print $2}' | sort | uniq (Note: We considered giving //pkg/tcpip it's own go.mod, but this breaks go tooling.) PiperOrigin-RevId: 368456711
2021-04-13Fix listener close, client connect raceMithun Iyer
Fix a race where the ACK completing the handshake can be dropped by a closing listener without RST to the peer. The listener close would reset the accepted queue and that causes the connecting endpoint in SYNRCVD state to drop the ACK thinking the queue if filled up. PiperOrigin-RevId: 368165509
2021-04-12Drop locks before calling waiterQueue.NotifyTamir Duberstein
Holding this lock can cause the user's callback to deadlock if it attempts to inspect the accept queue. PiperOrigin-RevId: 368068334
2021-04-10Use the SecureRNG to generate listener noncesTamir Duberstein
Some other cleanup while I'm here: - Remove unused arguments - Handle some unhandled errors - Remove redundant casts - Remove redundant parens - Avoid shadowing `hash` package name PiperOrigin-RevId: 367816161
2021-04-10Don't store accepted endpoints in a channelTamir Duberstein
Use a linked list with cached length and capacity. The current channel is already composed with a mutex and condition variable, and is never used for its channel-like properties. Channels also require eager allocation equal to their capacity, which a linked list does not. PiperOrigin-RevId: 367766626
2021-04-09iptables: support postrouting hook and SNAT targetToshi Kikuchi
The current SNAT implementation has several limitations: - SNAT source port has to be specified. It is not optional. - SNAT source port range is not supported. - SNAT for UDP is a one-way translation. No response packets are handled (because conntrack doesn't support UDP currently). - SNAT and REDIRECT can't work on the same connection. Fixes #5489 PiperOrigin-RevId: 367750325
2021-04-09Move maxListenBacklog check to sentryMithun Iyer
Move maxListenBacklog check to the caller of endpoint Listen so that it is applicable to Unix domain sockets as well. This was changed in cl/366935921. Reported-by: syzbot+a35ae7cdfdde0c41cf7a@syzkaller.appspotmail.com PiperOrigin-RevId: 367728052
2021-04-09Rename IsV6LinkLocalAddress to IsV6LinkLocalUnicastAddressGhanan Gowripalan
To match the V4 variant. PiperOrigin-RevId: 367691981
2021-04-09Remove duplicate accept queue fullness checkTamir Duberstein
Both code paths perform this check; extract it and remove the comment that suggests it is unique to one of the paths. PiperOrigin-RevId: 367666160
2021-04-09Propagate SYN handling errorTamir Duberstein
Both callers of this function still drop this error on the floor, but progress is progress. Updates #4690. PiperOrigin-RevId: 367604788
2021-04-08Do not forward link-local packetsGhanan Gowripalan
As per RFC 3927 section 7 and RFC 4291 section 2.5.6. Test: forward_test.TestMulticastForwarding PiperOrigin-RevId: 367519336
2021-04-08Join all routers group when forwarding is enabledGhanan Gowripalan
See comments inline code for rationale. Test: ip_test.TestJoinLeaveAllRoutersGroup PiperOrigin-RevId: 367449434
2021-04-06Do not perform MLD for certain multicast scopesGhanan Gowripalan
...as per RFC 2710 section 5 page 10. Test: ipv6_test.TestMLDSkipProtocol PiperOrigin-RevId: 367031126
2021-04-05Fix listen backlog handling to be in parity with LinuxMithun Iyer
- Change the accept queue full condition for a listening endpoint to only honor completed (and delivered) connections. - Use syncookies if the number of incomplete connections is beyond listen backlog. This also cleans up the SynThreshold option code as that is no longer used with this change. - Added a new stack option to unconditionally generate syncookies. Similar to sysctl -w net.ipv4.tcp_syncookies=2 on Linux. - Enable keeping of incomplete connections beyond listen backlog. - Drop incoming SYNs only if the accept queue is filled up. - Drop incoming ACKs that complete handshakes when accept queue is full - Enable the stack to accept one more connection than programmed by listen backlog. - Handle backlog argument being zero, negative for listen, as Linux. - Add syscall and packetimpact tests to reflect the changes above. - Remove TCPConnectBacklog test which is polling for completed connections on the client side which is not reflective of whether the accept queue is filled up by the test. The modified syscall test in this CL addresses testing of connecting sockets. Fixes #3153 PiperOrigin-RevId: 366935921
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-24Fix data race in fdbased when accessing fanoutID.Bhasker Hariharan
PiperOrigin-RevId: 364859173
2021-03-24Unexpose immutable fields in stack.RouteNick Brown
This change sets the inner `routeInfo` struct to be a named private member and replaces direct access with access through getters. Note that direct access to the fields of `routeInfo` is still possible through the `RouteInfo` struct. Fixes #4902 PiperOrigin-RevId: 364822872
2021-03-23Use constant (TestInitialSequenceNumber) instead of integer (789) in tests.Nayana Bidari
PiperOrigin-RevId: 364596526
2021-03-23Explicitly allow martian loopback packetsGhanan Gowripalan
...instead of opting out of them. Loopback traffic should be stack-local but gVisor has some clients that depend on the ability to receive loopback traffic that originated from outside of the stack. Because of this, we guard this change behind IP protocol options. A previous change provided the facility to deny these martian loopback packets but this change requires client to opt-in to accepting martian loopback packets as accepting martian loopback packets are not meant to be accepted, as per RFC 1122 section 3.2.1.3.g: (g) { 127, <any> } Internal host loopback address. Addresses of this form MUST NOT appear outside a host. PiperOrigin-RevId: 364581174
2021-03-22Return tcpip.Error from (*Stack).GetMainNICAddressGhanan Gowripalan
PiperOrigin-RevId: 364381970
2021-03-17Do not use martian loopback packets in testsGhanan Gowripalan
Transport demuxer and UDP tests should not use a loopback address as the source address for packets injected into the stack as martian loopback packets will be dropped in a later change. PiperOrigin-RevId: 363479681
2021-03-17Drop loopback traffic from outside of the stackGhanan Gowripalan
Loopback traffic should be stack-local but gVisor has some clients that depend on the ability to receive loopback traffic that originated from outside of the stack. Because of this, we guard this change behind IP protocol options. Test: integration_test.TestExternalLoopbackTraffic PiperOrigin-RevId: 363461242
2021-03-16Fix tcp_fin_retransmission_netstack_testZeling Feng
Netstack does not check ACK number for FIN-ACK packets and goes into TIMEWAIT unconditionally. Fixing the state machine will give us back the retransmission of FIN. PiperOrigin-RevId: 363301883
2021-03-16Fix a race with synRcvdCount and acceptMithun Iyer
There is a race in handling new incoming connections on a listening endpoint that causes the endpoint to reply to more incoming SYNs than what is permitted by the listen backlog. The race occurs when there is a successful passive connection handshake and the synRcvdCount counter is decremented, followed by the endpoint delivered to the accept queue. In the window of time between synRcvdCount decrementing and the endpoint being enqueued for accept, new incoming SYNs can be handled without honoring the listen backlog value, as the backlog could be perceived not full. Fixes #5637 PiperOrigin-RevId: 363279372
2021-03-16Unexport methods on NDPOptionGhanan Gowripalan
They are not used outside of the header package. PiperOrigin-RevId: 363237708
2021-03-16Detect looped-back NDP DAD messagesGhanan Gowripalan
...as per RFC 7527. If a looped-back DAD message is received, do not fail DAD since our own DAD message does not indicate that a neighbor has the address assigned. Test: ndp_test.TestDADResolveLoopback PiperOrigin-RevId: 363224288
2021-03-16Do not call into Stack from LinkAddressRequestGhanan Gowripalan
Calling into the stack from LinkAddressRequest is not needed as we already have a reference to the network endpoint (IPv6) or network interface (IPv4/ARP). PiperOrigin-RevId: 363213973
2021-03-15Make netstack (//pkg/tcpip) buildable for 32 bitKevin Krakauer
Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-11improve readability of ports packageKevin Krakauer
Lots of small changes: - simplify package API via Reservation type - rename some single-letter variable names that were hard to follow - rename some types PiperOrigin-RevId: 362442366
2021-03-09Give TCP flags a dedicated typeZeling Feng
- Implement Stringer for it so that we can improve error messages. - Use TCPFlags through the code base. There used to be a mixed usage of byte, uint8 and int as TCP flags. PiperOrigin-RevId: 361940150
2021-03-08Implement /proc/sys/net/ipv4/ip_local_port_rangeKevin Krakauer
Speeds up the socket stress tests by a couple orders of magnitude. PiperOrigin-RevId: 361721050
2021-03-05Increment the counters when sending Echo requestsArthur Sfez
Updates #5597 PiperOrigin-RevId: 361252003
2021-03-05Fix network protocol/endpoint lock order violationGhanan Gowripalan
IPv4 would violate the lock ordering of protocol > endpoint when closing network endpoints by calling `ipv4.protocol.forgetEndpoint` while holding the network endpoint lock. PiperOrigin-RevId: 361232817
2021-03-05Include duplicate address holder info in DADResultGhanan Gowripalan
The integrator may be interested in who owns a duplicate address so pass this information (if available) along. Fixes #5605. PiperOrigin-RevId: 361213556
2021-03-05Make stack.DADResult an interfaceGhanan Gowripalan
While I'm here, update NDPDispatcher.OnDuplicateAddressDetectionStatus to take a DADResult and rename it to OnDuplicateAddressDetectionResult. Fixes #5606. PiperOrigin-RevId: 360965416
2021-03-04Nit fix: Should use maxTimeout in backoffTimerTing-Yu Wang
The only user is in (*handshake).complete and it specifies MaxRTO, so there is no behavior changes. PiperOrigin-RevId: 360954447
2021-03-03Deflake //pkg/tcpip/tests/integration:forward_testTing-Yu Wang
clientEP.Connect may fail because serverEP was not listening. PiperOrigin-RevId: 360780667
2021-03-03Make dedicated methods for data operations in PacketBufferTing-Yu Wang
One of the preparation to decouple underlying buffer implementation. There are still some methods that tie to VectorisedView, and they will be changed gradually in later CLs. This CL also introduce a new ICMPv6ChecksumParams to replace long list of parameters when calling ICMPv6Checksum, aiming to be more descriptive. PiperOrigin-RevId: 360778149
2021-03-03Assert UpdatedAtNanos in neighbor cache testsSam Balana
Changes the neighbor_cache_test.go tests to always assert UpdatedAtNanos. Completes the assertion of UpdatedAtNanos in every NUD test, a field that was historically not checked due to the lack of a deterministic, controllable clock. This is no longer true with the tcpip.Clock interface. While the tests have been adjusted to use Clock, asserting by the UpdatedAtNanos was neglected. Fixes #4663 PiperOrigin-RevId: 360730077
2021-03-03Add checklocks analyzer.Bhasker Hariharan
This validates that struct fields if annotated with "// checklocks:mu" where "mu" is a mutex field in the same struct then access to the field is only done with "mu" locked. All types that are guarded by a mutex must be annotated with // +checklocks:<mutex field name> For more details please refer to README.md. PiperOrigin-RevId: 360729328
2021-03-03Export stats that were forgottenArthur Sfez
While I'm here, simplify the comments and unify naming of certain stats across protocols. PiperOrigin-RevId: 360728849
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-03-02Plumb link address request errors up to requesterTamir Duberstein
Prevent the situation where callers to (*stack).GetLinkAddress provide incorrect arguments and are unable to observe this condition. Updates #5583. PiperOrigin-RevId: 360481557
2021-03-01tcp: endpoint.Write has to send all data that has been read from payloadAndrei Vagin
io.Reader.ReadFull returns the number of bytes copied and an error if fewer bytes were read. PiperOrigin-RevId: 360247614
2021-02-26Fix panic due to zero length writes in TCP.Bhasker Hariharan
There is a short race where in Write an endpoint can transition from writable to non-writable state due to say an incoming RST during the time we release the endpoint lock and reacquire after copying the payload. In such a case if the write happens to be a zero sized write we end up trying to call sendData() even though nothing was queued. This can panic when trying to enable/disable TCP timers if the endpoint had already transitioned to a CLOSED/ERROR state due to the incoming RST as we cleanup timers when the protocol goroutine terminates. Sadly the race window is small enough that my attempts at reproducing the panic in a syscall test has not been successful. PiperOrigin-RevId: 359887905