summaryrefslogtreecommitdiffhomepage
path: root/pkg/tcpip/link
AgeCommit message (Collapse)Author
2021-10-07add convenient wrapper for eventfdKevin Krakauer
The same create/write/read pattern is copied around several places. It's easier to understand in a package with names and comments, and we can reuse the smart blocking code in package rawfile. PiperOrigin-RevId: 401647108
2021-10-05Add server implementation for sharedmem endpoints.Bhasker Hariharan
PiperOrigin-RevId: 401088040
2021-09-14Fix bug in RecvMMsgDispatcher.Bhasker Hariharan
Fixed a bug introduced in the following commit: https://github.com/google/gvisor/commit/979d6e7d77b17e94defc29515180cc75d3560383 The commit introduced a bug which causes the recvmmsg dispatcher to never exit as BlockingPoll is now called with two fds and poll will not return an error anymore if one of the FD is closed. We need to explicitly check the events for each FD to determine if the sentry FD is closed. ReadV dispatcher does not have the same issue as Readv does not rely on sk_err field of the underlying socket to determine if the socket is in an error state. Recvmmsg OTOH seems to get confused and always returns EAGAIN if poll() is called which queries the sk_err field and clears it. PiperOrigin-RevId: 396676135
2021-09-09Remove linux-compat loopback hacks from packet endpointGhanan Gowripalan
Previously, gVisor did not represent loopback devices as an ethernet device as Linux does. To maintain Linux API compatibility for packet sockets, a workaround was used to add an ethernet header if a link header was not already present in the packet buffer delivered to a packet endpoint. However, this workaround is a bug for non-ethernet based interfaces; not all links use an ethernet header (e.g. pure L3/TUN interfaces). As of 3b4bb947517d0d9010120aaa1c3989fd6abf278e, gVisor represents loopback devices as an ethernet-based device so this workaround can now be removed. BUG: https://fxbug.dev/81592 Updates #6530, #6531. PiperOrigin-RevId: 395819151
2021-09-09Remove link/packetsocketGhanan Gowripalan
This change removes NetworkDispatcher.DeliverOutboundPacket. Since all packet writes go through the NIC (the only NetworkDispatcher), we can deliver outgoing packets to interested packet endpoints before writing the packet to the link endpoint as the stack expects that all packets that get delivered to a link endpoint are transmitted on the wire. That is, link endpoints no longer need to let the stack know when it writes a packet as the stack already knows about the packet it writes through a link endpoint. PiperOrigin-RevId: 395761629
2021-09-01Support sending with packet socketsGhanan Gowripalan
...through the loopback interface, only. This change only supports sending on packet sockets through the loopback interface as the loopback interface is the only interface used in packet socket syscall tests - the other link endpoints are not excercised with the existing test infrastructure. Support for sending on packet sockets through the other interfaces will be added as needed. BUG: https://fxbug.dev/81592 PiperOrigin-RevId: 394368899
2021-08-27Add LinkEndpoint.WriteRawPacket with stubsGhanan Gowripalan
...returning unsupported errors. PiperOrigin-RevId: 393388991
2021-08-19Add loopback interface as an ethernet-based deviceGhanan Gowripalan
...to match Linux behaviour. We can see evidence of Linux representing loopback as an ethernet-based device below: ``` # EUI-48 based MAC addresses. $ ip link show lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # tcpdump showing ethernet frames when sniffing loopback and logging the # link-type as EN10MB (Ethernet). $ sudo tcpdump -i lo -e -c 2 -n tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0 03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0 2 packets captured 116 packets received by filter 0 packets dropped by kernel ``` Wireshark shows a similar result as the tcpdump example above. Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162 PiperOrigin-RevId: 391836719
2021-08-13[syserror] Remove pkg syserror.Zach Koopmans
Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-12[syserror] Convert remaining syserror definitions to linuxerr.Zach Koopmans
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-07-30Support RTM_DELLINKZeling Feng
This change will allow us to remove the default link in a packetimpact test so we can reduce indeterministic behaviors as required in https://fxbug.dev/78430. This will also help with testing #1388. Updates #578, #1388. PiperOrigin-RevId: 387896847
2021-07-30checklinkname: rudimentary type-checking of linkname directivesMichael Pratt
This CL introduces a 'checklinkname' analyzer, which provides rudimentary type-checking that verifies that function signatures on the local and remote sides of //go:linkname directives match expected values. If the Go standard library changes the definitions of any of these function, checklinkname will flag the change as a finding, providing an error informing the gVisor team to adapt to the upstream changes. This allows us to eliminate the majority of gVisor's forward-looking negative build tags, as we can catch mismatches in testing [1]. The remaining forward-looking negative build tags are covering shared struct definitions, which I hope to add to checklinkname in a future CL. [1] Of course, semantics/requirements can change without the signature changing, so we still must be careful, but this covers the common case. PiperOrigin-RevId: 387873847
2021-07-28Explicitly encode the pcap packet headers to reduce CPU cost of pcap generation.gVisor bot
PiperOrigin-RevId: 387513118
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-12[syserror] Update syserror to linuxerr for more errors.Zach Koopmans
Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-12Prevent interleaving in sniffer pcap outputTamir Duberstein
Remove "partial write" handling as io.Writer.Write is not permitted to return a nil error on partial writes, and this code was already panicking on non-nil errors. PiperOrigin-RevId: 384289970
2021-07-07Move time.Now() call to snifferTamir Duberstein
PiperOrigin-RevId: 383481745
2021-07-07Use time package-level variableTamir Duberstein
PiperOrigin-RevId: 383426091
2021-06-30[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.Zach Koopmans
Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29Merge pull request #6085 from liornm:fix-tun-no_pigVisor bot
PiperOrigin-RevId: 382202462
2021-06-29[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVALZach Koopmans
Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-29Fix TUN IFF_NO_PI bugliornm
When TUN is created with IFF_NO_PI flag, there will be no Ethernet header and no packet info, therefore, both read and write will fail. This commit fix this bug.
2021-06-24Internal change.Jamie Liu
PiperOrigin-RevId: 381375705
2021-06-09Avoid fanout group collisions with best effortKevin Krakauer
Running multiple instances of netstack in the same network namespace can cause collisions when enabling packet fanout for fdbased endpoints. The only bulletproof fix is to run in different network namespaces, but by using `getpid()` instead of 0 as the fanout ID starting point we can avoid collisions in the common case, particularly when testing/experimenting. Addresses #6124
2021-05-22Remove detritusTamir Duberstein
- Unused constants - Unused functions - Unused arguments - Unkeyed literals - Unnecessary conversions PiperOrigin-RevId: 375253464
2021-05-03Convey GSO capabilities through GSOEndpointGhanan Gowripalan
...as all GSO capable endpoints must implement GSOEndpoint. PiperOrigin-RevId: 371804175
2021-04-27Remove uses of the binary package from networking code.Rahat Mahmood
Co-Author: ayushranjan PiperOrigin-RevId: 370785009
2021-04-21Only carry GSO options in the packet bufferGhanan Gowripalan
With this change, GSO options no longer needs to be passed around as a function argument in the write path. This change is done in preparation for a later change that defers segmentation, and may change GSO options for a packet as it flows down the stack. Updates #170. PiperOrigin-RevId: 369774872
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-24Fix data race in fdbased when accessing fanoutID.Bhasker Hariharan
PiperOrigin-RevId: 364859173
2021-03-15Make netstack (//pkg/tcpip) buildable for 32 bitKevin Krakauer
Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-09Give TCP flags a dedicated typeZeling Feng
- Implement Stringer for it so that we can improve error messages. - Use TCPFlags through the code base. There used to be a mixed usage of byte, uint8 and int as TCP flags. PiperOrigin-RevId: 361940150
2021-03-03Make dedicated methods for data operations in PacketBufferTing-Yu Wang
One of the preparation to decouple underlying buffer implementation. There are still some methods that tie to VectorisedView, and they will be changed gradually in later CLs. This CL also introduce a new ICMPv6ChecksumParams to replace long list of parameters when calling ICMPv6Checksum, aiming to be more descriptive. PiperOrigin-RevId: 360778149
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-24Move //pkg/gate.Gate to //pkg/sync.Jamie Liu
- Use atomic add rather than CAS in every Gate method, which is slightly faster in most cases. - Implement Close wakeup using gopark/goready to avoid channel allocation. New benchmarks: name old time/op new time/op delta GateEnterLeave-12 16.7ns ± 1% 10.3ns ± 1% -38.44% (p=0.000 n=9+8) GateClose-12 50.2ns ± 8% 42.4ns ± 6% -15.44% (p=0.000 n=10+10) GateEnterLeaveAsyncClose-12 972ns ± 2% 640ns ± 7% -34.15% (p=0.000 n=9+10) PiperOrigin-RevId: 359336344
2021-02-18Bump build constraints to Go 1.18Michael Pratt
These are bumped to allow early testing of Go 1.17. Use will be audited closer to the 1.17 release. PiperOrigin-RevId: 358278615
2021-02-06Synchronously send packets over pipe link endpointGhanan Gowripalan
Before this change, packets were delivered asynchronously to the remote end of a pipe. This was to avoid a deadlock during link resolution where the stack would attempt to double-lock a mutex (see removed comments in the parent commit for details). As of https://github.com/google/gvisor/commit/4943347137, we do not hold locks while sending link resolution probes so the deadlock will no longer occur. PiperOrigin-RevId: 356066224
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-25fdbased: Dedup code related to iovec readingTing-Yu Wang
PiperOrigin-RevId: 353755271
2021-01-21Queue packets in WritePackets when resolving link addressGhanan Gowripalan
Test: integration_test.TestWritePacketsLinkResolution Fixes #4458. PiperOrigin-RevId: 353108826
2021-01-21Populate EgressRoute, GSO, Netproto in NICGhanan Gowripalan
fdbased and qdisc layers expect these fields to already be populated before being reached. PiperOrigin-RevId: 353099492
2021-01-15Support GetLinkAddress with neighborCacheGhanan Gowripalan
Test: integration_test.TestGetLinkAddress PiperOrigin-RevId: 352119404
2021-01-15Only pass stack.Route's fields to LinkEndpointsGhanan Gowripalan
stack.Route is used to send network packets and resolve link addresses. A LinkEndpoint does not need to do either of these and only needs the route's fields at the time of the packet write request. Since LinkEndpoints only need the route's fields when writing packets, pass a stack.RouteInfo instead. PiperOrigin-RevId: 352108405
2021-01-15Populate EgressRoute, GSO, Netproto for batch writesGhanan Gowripalan
We loop over the list of packets anyways so setting these aren't expensive. Now that they are populated only by the link endpoint that uses them, TCP does not need to. PiperOrigin-RevId: 352090853
2021-01-13Switch uses of os.Getenv that check for empty string to os.LookupEnv.Dean Deng
Whether the variable was found is already returned by syscall.Getenv. os.Getenv drops this value while os.Lookupenv passes it along. PiperOrigin-RevId: 351674032
2021-01-12Fix simple mistakes identified by goreportcard.Adin Scannell
These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2021-01-06Do not filter frames in ethernet link endpointGhanan Gowripalan
Ethernet frames are usually filtered at the hardware-level so there is no need to filter the frames in software. For test purposes, a new link endpoint was introduced to filter frames based on their destination. PiperOrigin-RevId: 350422941
2020-12-22Correctly log sniffed ARP packetsTamir Duberstein
This condition was inverted in 360006d. PiperOrigin-RevId: 348679088
2020-12-22Invoke address resolution upon subsequent traffic to Failed neighborPeter Johnston
Removes the period of time in which subseqeuent traffic to a Failed neighbor immediately fails with ErrNoLinkAddress. A Failed neighbor is one in which address resolution fails; or in other words, the neighbor's IP address cannot be translated to a MAC address. This means removing the Failed state for linkAddrCache and allowing transitiong out of Failed into Incomplete for neighborCache. Previously, both caches would transition entries to Failed after address resolution fails. In this state, any subsequent traffic requested within an unreachable time would immediately fail with ErrNoLinkAddress. This does not follow RFC 4861 section 7.3.3: If address resolution fails, the entry SHOULD be deleted, so that subsequent traffic to that neighbor invokes the next-hop determination procedure again. Invoking next-hop determination at this point ensures that alternate default routers are tried. The API for getting a link address for a given address, whether through the link address cache or the neighbor table, is updated to optionally take a callback which will be called when address resolution completes. This allows `Route` to handle completing link resolution internally, so callers of (*Route).Resolve (e.g. endpoints) don’t have to keep track of when it completes and update the Route accordingly. This change also removes the wakers from LinkAddressCache, NeighborCache, and Route in favor of the callbacks, and callers that previously used a waker can now just pass a callback to (*Route).Resolve that will notify the waker on resolution completion. Fixes #4796 Startblock: has LGTM from sbalana and then add reviewer ghanan PiperOrigin-RevId: 348597478
2020-12-10Disable host reassembly for fragments.Bhasker Hariharan
fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket to ensure that fragments are delivered inorder to the right dispatcher. But this prevents fragments from being delivered to gvisor at all and makes testing of gvisor's fragment reassembly code impossible. The potential impact from this is minimal since IP Fragmentation is not really that prevelant and in cases where we do get fragments we may deliver the fragment out of order to the TCP layer as multiple network dispatchers may process the fragments and deliver a reassembled fragment after the next packet has been delivered to the TCP endpoint. While not desirable I believe the impact from this is minimal due to low prevalence of fragmentation. Also removed PktType and Hatype fields when binding the socket as these are not used when binding. Its just confusing to have them specified. See: https://man7.org/linux/man-pages/man7/packet.7.html "Fields used for binding are sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex." Fixes #5055 PiperOrigin-RevId: 346919439