summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip
AgeCommit message (Collapse)Author
2021-09-14Fix bug in RecvMMsgDispatcher.Bhasker Hariharan
Fixed a bug introduced in the following commit: https://github.com/google/gvisor/commit/979d6e7d77b17e94defc29515180cc75d3560383 The commit introduced a bug which causes the recvmmsg dispatcher to never exit as BlockingPoll is now called with two fds and poll will not return an error anymore if one of the FD is closed. We need to explicitly check the events for each FD to determine if the sentry FD is closed. ReadV dispatcher does not have the same issue as Readv does not rely on sk_err field of the underlying socket to determine if the socket is in an error state. Recvmmsg OTOH seems to get confused and always returns EAGAIN if poll() is called which queries the sk_err field and clears it. PiperOrigin-RevId: 396676135
2021-09-14Defer mutex unlockingGhanan Gowripalan
PiperOrigin-RevId: 396670516
2021-09-13Accept packets destined to bound addressGhanan Gowripalan
...if bound to an address. We previously checked the source of a packet instead of the destination of a packet when bound to an address. PiperOrigin-RevId: 396497647
2021-09-13Set NICID before delivering packet to raw endpointGhanan Gowripalan
...as raw endpoints expect the packet's NICID to be set. PiperOrigin-RevId: 396446552
2021-09-13Separate IPv4 ToS & IPv6 TClass in dgram endpointGhanan Gowripalan
Setting the ToS for IPv4 packets (SOL_IP, IP_TOS) should not affect the Traffic Class of IPv6 packets (SOL_IPV6, IPV6_TCLASS). Also only return the ToS value XOR Traffic Class as a packet cannot be both an IPv4 and an IPv6 packet; It is invalid to return both the IPv4 ToS and IPv6 Traffic Class control messages when reading packets. Updates #6389. PiperOrigin-RevId: 396399096
2021-09-13Support anonymous structs in checklocks.Adin Scannell
Fixes #6558 PiperOrigin-RevId: 396393293
2021-09-09Remove linux-compat loopback hacks from packet endpointGhanan Gowripalan
Previously, gVisor did not represent loopback devices as an ethernet device as Linux does. To maintain Linux API compatibility for packet sockets, a workaround was used to add an ethernet header if a link header was not already present in the packet buffer delivered to a packet endpoint. However, this workaround is a bug for non-ethernet based interfaces; not all links use an ethernet header (e.g. pure L3/TUN interfaces). As of 3b4bb947517d0d9010120aaa1c3989fd6abf278e, gVisor represents loopback devices as an ethernet-based device so this workaround can now be removed. BUG: https://fxbug.dev/81592 Updates #6530, #6531. PiperOrigin-RevId: 395819151
2021-09-09Remove link/packetsocketGhanan Gowripalan
This change removes NetworkDispatcher.DeliverOutboundPacket. Since all packet writes go through the NIC (the only NetworkDispatcher), we can deliver outgoing packets to interested packet endpoints before writing the packet to the link endpoint as the stack expects that all packets that get delivered to a link endpoint are transmitted on the wire. That is, link endpoints no longer need to let the stack know when it writes a packet as the stack already knows about the packet it writes through a link endpoint. PiperOrigin-RevId: 395761629
2021-09-07Remove protocolMainLoop unused return valueArthur Sfez
PiperOrigin-RevId: 395325998
2021-09-01Support sending with packet socketsGhanan Gowripalan
...through the loopback interface, only. This change only supports sending on packet sockets through the loopback interface as the loopback interface is the only interface used in packet socket syscall tests - the other link endpoints are not excercised with the existing test infrastructure. Support for sending on packet sockets through the other interfaces will be added as needed. BUG: https://fxbug.dev/81592 PiperOrigin-RevId: 394368899
2021-09-01Out-of-order segment should not block in-sequence segments.Bhasker Hariharan
For a small receive buffer the first out-of-order segment will get accepted and fill up the receive buffer today. This change now includes the size of the out-of-order segment when checking whether to queue the out of order segment or not. PiperOrigin-RevId: 394351309
2021-09-01Extract network datagram endpoint common facilitiesGhanan Gowripalan
...from the UDP endpoint. Datagram-based transport endpoints (e.g. UDP, RAW IP) can share a lot of their write path due to the datagram-based nature of these endpoints. Extract the common facilities from UDP so they can be shared with other transport endpoints (in a later change). Test: UDP syscall tests. PiperOrigin-RevId: 394347774
2021-08-30Avoid pseudo endpoint for TSVal generationZeling Feng
PiperOrigin-RevId: 393808461
2021-08-27Add LinkEndpoint.WriteRawPacket with stubsGhanan Gowripalan
...returning unsupported errors. PiperOrigin-RevId: 393388991
2021-08-26Add Stack.Seed() backZeling Feng
... because it is still used by fuchsia. PiperOrigin-RevId: 393246904
2021-08-26Centralize TCP timestamp logicTamir Duberstein
Remove freestanding functions that convert time values to raw integers; centralize time->uint32 logic in methods on tcp.endpoint. Importantly, the knowledge that TSVal is in milliseconds now lives in adjacent functions rather than being spread around various files. Incidental cleanup: - Remove unused constant - Remove redundant conversion - Remove redundant parentheses - Add missing error check PiperOrigin-RevId: 393184768
2021-08-26Avoid unhandled error warningsTamir Duberstein
PiperOrigin-RevId: 393104589
2021-08-26Remove unused argumentTamir Duberstein
PiperOrigin-RevId: 393100095
2021-08-26Pass must-not-be-nil by valueTamir Duberstein
PiperOrigin-RevId: 393095246
2021-08-25Improve TestTimestampSynCookiesZeling Feng
.. by advancing the clock so that NowMonotonic does not return 0. PiperOrigin-RevId: 393005373
2021-08-25Avoid the appearance of allocationTamir Duberstein
PiperOrigin-RevId: 393004533
2021-08-24Measure RTT during handshake since Linux does the sameZeling Feng
Some tcp unit tests are affected by this change: - Some retransmission tests assumed RTO=1s when connection is established. This is no longer true because minRTO was set to 3s in tests so now RTO becomes 3s after the first updateRTO call. Set minRTO=1s for these tests. - Some RACK enabled tests are affected because now that RTT is initialized, and the estimated RTT is quite small, spurious TLP might be sent out and causing flakes, introduce an artificial delay for these tests so that the estimated RTT is larger. PiperOrigin-RevId: 392768725
2021-08-19Add loopback interface as an ethernet-based deviceGhanan Gowripalan
...to match Linux behaviour. We can see evidence of Linux representing loopback as an ethernet-based device below: ``` # EUI-48 based MAC addresses. $ ip link show lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # tcpdump showing ethernet frames when sniffing loopback and logging the # link-type as EN10MB (Ethernet). $ sudo tcpdump -i lo -e -c 2 -n tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0 03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0 2 packets captured 116 packets received by filter 0 packets dropped by kernel ``` Wireshark shows a similar result as the tcpdump example above. Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162 PiperOrigin-RevId: 391836719
2021-08-19Use a hash function to generate tcp timestamp offsetZeling Feng
Also fix an option parsing error in checker.TCPTimestampChecker while I am here. PiperOrigin-RevId: 391828329
2021-08-18Split TCP secrets from Stack to tcp.protocolZeling Feng
Use different secrets for different purposes (port picking, ISN generation, tsOffset generation) and moved the secrets from stack.Stack to tcp.protocol. PiperOrigin-RevId: 391641238
2021-08-13[syserror] Remove pkg syserror.Zach Koopmans
Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-13Free multicastMemberships on UDP endpoint closeGhanan Gowripalan
tcpip.Endpoint.Close is documented to free all resources associated with an endpoint so we don't need to create an empty map to clear the multicast memberships. PiperOrigin-RevId: 390609826
2021-08-12[syserror] Convert remaining syserror definitions to linuxerr.Zach Koopmans
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-08-12Add support for TCP send buffer auto tuning.Nayana Bidari
Send buffer size in TCP indicates the amount of bytes available for the sender to transmit. This change will allow TCP to update the send buffer size when - TCP enters established state. - ACK is received. The auto tuning is disabled when the send buffer size is set with the SO_SNDBUF option. PiperOrigin-RevId: 390312274
2021-08-11[op] Make PacketBuffer Clone() do a deeper copy.Ayush Ranjan
Earlier PacketBuffer.Clone() would do a shallow top level copy of the packet buffer - which involved sharing the *buffer.Buffer between packets. Reading or writing to the buffer in one packet would impact the other. This caused modifications in one packet to affect the other's pkt.Views() which is not desired. Change the clone to do a deeper copy of the underlying buffer list and buffer pointers. The payload buffers (which are immutable) are still shared. This change makes the Clone() operation more expensive as we now need to allocate the entire buffer list. Added unit test to test integrity of packet data after cloning. Reported-by: syzbot+7ffff9a82a227b8f2e31@syzkaller.appspotmail.com Reported-by: syzbot+7d241de0d9072b2b6075@syzkaller.appspotmail.com Reported-by: syzbot+212bc4d75802fa461521@syzkaller.appspotmail.com PiperOrigin-RevId: 390277713
2021-08-05Automated rollback of changelist 384508720Kevin Krakauer
PiperOrigin-RevId: 389035388
2021-07-30Support RTM_DELLINKZeling Feng
This change will allow us to remove the default link in a packetimpact test so we can reduce indeterministic behaviors as required in https://fxbug.dev/78430. This will also help with testing #1388. Updates #578, #1388. PiperOrigin-RevId: 387896847
2021-07-30checklinkname: rudimentary type-checking of linkname directivesMichael Pratt
This CL introduces a 'checklinkname' analyzer, which provides rudimentary type-checking that verifies that function signatures on the local and remote sides of //go:linkname directives match expected values. If the Go standard library changes the definitions of any of these function, checklinkname will flag the change as a finding, providing an error informing the gVisor team to adapt to the upstream changes. This allows us to eliminate the majority of gVisor's forward-looking negative build tags, as we can catch mismatches in testing [1]. The remaining forward-looking negative build tags are covering shared struct definitions, which I hope to add to checklinkname in a future CL. [1] Of course, semantics/requirements can change without the signature changing, so we still must be careful, but this covers the common case. PiperOrigin-RevId: 387873847
2021-07-28Explicitly encode the pcap packet headers to reduce CPU cost of pcap generation.gVisor bot
PiperOrigin-RevId: 387513118
2021-07-21Add metric to count number of segments acknowledged by DSACK.Nayana Bidari
- Creates new metric "/tcp/segments_acked_with_dsack" to count the number of segments acked with DSACK. - Added check to verify the metric is getting incremented when a DSACK is sent in the unit tests. PiperOrigin-RevId: 386135949
2021-07-20Enable RACK by default in netstack.Nayana Bidari
PiperOrigin-RevId: 385944428
2021-07-20Expose local address from raw socketsGhanan Gowripalan
PiperOrigin-RevId: 385940836
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-15netstack: support SO_RCVBUFFORCEKevin Krakauer
TCP is fully supported. As with SO_RCVBUF, other transport protocols perform no-ops per DefaultSocketOptionsHandler.OnSetReceiveBufferSize. PiperOrigin-RevId: 385023239
2021-07-14Set tcp endpoint state atomicallyTamir Duberstein
PiperOrigin-RevId: 384776517
2021-07-13netstack: atomically update buffer sizesKevin Krakauer
Previously, two calls to set the send or receive buffer size could have raced and left state wherein: - The actual size depended on one call - The value returned by getsockopt() depended on the other PiperOrigin-RevId: 384508720
2021-07-13Deflake TestRouterSolicitationGhanan Gowripalan
Before this change, transmission of the first router solicitation races with the adding of an IPv6 link-local address. This change creates the NIC in the disabled state and is only enabled after the address is added (if required) to avoid this race. PiperOrigin-RevId: 384493553
2021-07-12netstack: move SO_SNDBUF/RCVBUF clamping logic out of //pkg/tcpipKevin Krakauer
- Keeps Linux-specific behavior out of //pkg/tcpip - Makes it clearer that clamping is done only for setsockopt calls from users - Removes code duplication PiperOrigin-RevId: 384389809
2021-07-12[syserror] Update syserror to linuxerr for more errors.Zach Koopmans
Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-12Prevent interleaving in sniffer pcap outputTamir Duberstein
Remove "partial write" handling as io.Writer.Write is not permitted to return a nil error on partial writes, and this code was already panicking on non-nil errors. PiperOrigin-RevId: 384289970
2021-07-08Do not queue zero sized segments.Bhasker Hariharan
Commit 16b751b6c610ec2c5a913cb8a818e9239ee7da71 introduced a bug where writes of zero size would end up queueing a zero sized segment which will cause the sandbox to panic when trying to send a zero sized segment(e.g. after an RTO) as netstack asserts that the all non FIN segments have size > 0. This change adds the check for a zero sized payload back to avoid queueing such segments. The associated test panics without the fix and passes with it. PiperOrigin-RevId: 383677884
2021-07-07Move time.Now() call to snifferTamir Duberstein
PiperOrigin-RevId: 383481745
2021-07-07Use time package-level variableTamir Duberstein
PiperOrigin-RevId: 383426091
2021-07-02Discover more specific routes as per RFC 4191Ghanan Gowripalan
More-specific route discovery allows hosts to pick a more appropriate router for off-link destinations. Fixes #6172. PiperOrigin-RevId: 382779880
2021-07-01Mix checklocks and atomic analyzers.Adin Scannell
This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300