summaryrefslogtreecommitdiffhomepage
path: root/pkg
AgeCommit message (Collapse)Author
2020-09-09Add option to replace linkAddrCache with neighborCacheSam Balana
This change adds an option to replace the current implementation of ARP through linkAddrCache, with an implementation of NUD through neighborCache. Switching to using NUD for both ARP and NDP is beneficial for the reasons described by RFC 4861 Section 3.1: "[Using NUD] significantly improves the robustness of packet delivery in the presence of failing routers, partially failing or partitioned links, or nodes that change their link-layer addresses. For instance, mobile nodes can move off-link without losing any connectivity due to stale ARP caches." "Unlike ARP, Neighbor Unreachability Detection detects half-link failures and avoids sending traffic to neighbors with which two-way connectivity is absent." Along with these changes exposes the API for querying and operating the neighbor cache. Operations include: - Create a static entry - List all entries - Delete all entries - Remove an entry by address This also exposes the API to change the NUD protocol constants on a per-NIC basis to allow Neighbor Discovery to operate over links with widely varying performance characteristics. See [RFC 4861 Section 10][1] for the list of constants. Finally, an API for subscribing to NUD state changes is exposed through NUDDispatcher. See [RFC 4861 Appendix C][3] for the list of edges. Tests: pkg/tcpip/network/arp:arp_test + TestDirectRequest pkg/tcpip/network/ipv6:ipv6_test + TestLinkResolution + TestNDPValidation + TestNeighorAdvertisementWithTargetLinkLayerOption + TestNeighorSolicitationResponse + TestNeighorSolicitationWithSourceLinkLayerOption + TestRouterAdvertValidation pkg/tcpip/stack:stack_test + TestCacheWaker + TestForwardingWithFakeResolver + TestForwardingWithFakeResolverManyPackets + TestForwardingWithFakeResolverManyResolutions + TestForwardingWithFakeResolverPartialTimeout + TestForwardingWithFakeResolverTwoPackets + TestIPv6SourceAddressSelectionScopeAndSameAddress [1]: https://tools.ietf.org/html/rfc4861#section-10 [2]: https://tools.ietf.org/html/rfc4861#appendix-C Fixes #1889 Fixes #1894 Fixes #1895 Fixes #1947 Fixes #1948 Fixes #1949 Fixes #1950 PiperOrigin-RevId: 328365034
2020-09-09Support SO_LINGER socket option.Nayana Bidari
When SO_LINGER option is enabled, the close will not return until all the queued messages are sent and acknowledged for the socket or linger timeout is reached. If the option is not set, close will return immediately. This option is mainly supported for connection oriented protocols such as TCP. PiperOrigin-RevId: 328350576
2020-09-09Fix TCP_LINGER2 behavior to match linux.Bhasker Hariharan
We still deviate a bit from linux in how long we will actually wait in FIN-WAIT-2. Linux seems to cap it with TIME_WAIT_LEN and it's not completely obvious as to why it's done that way. For now I think we can ignore that and fix it if it really is an issue. PiperOrigin-RevId: 328324922
2020-09-09Fix deadlock in gofer direct IO.Dean Deng
Fixes several java runtime tests: java/nio/channels/FileChannel/directio/ReadDirect.java java/nio/channels/FileChannel/directio/PreadDirect.java Updates #3576. PiperOrigin-RevId: 328281849
2020-09-09Automated rollback of changelist 327325153Ghanan Gowripalan
PiperOrigin-RevId: 328259353
2020-09-09Flush in fsimpl/gofer.regularFileFD.OnClose() if there are no dirty pages.Jamie Liu
This is closer to indistinguishable from VFS1 behavior. PiperOrigin-RevId: 328256068
2020-09-09Add check for same source in merkle tree libgVisor bot
If the data is in the same Reader as the merkle tree, we should verify from the first layer in the tree, instead of from the beginning. PiperOrigin-RevId: 328230988
2020-09-09Remove go profiling flag from dockerutil.Zach Koopmans
Go profiling was removed from runsc debug in a previous change. PiperOrigin-RevId: 328203826
2020-09-09Bump build constraints to 1.17Michael Pratt
This enables pre-release testing with 1.16. The intention is to replace these with a nogo check before the next release. PiperOrigin-RevId: 328193911
2020-09-09Consider loopback bound to all addresses in subnetGhanan Gowripalan
When a loopback interface is configurd with an address and associated subnet, the loopback should treat all addresses in that subnet as an address it owns. This is mimicking linux behaviour as seen below: ``` $ ip addr show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever $ ping 192.0.2.1 PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data. ^C --- 192.0.2.1 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1018ms $ ping 192.0.2.2 PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. ^C --- 192.0.2.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2039ms $ sudo ip addr add 192.0.2.1/24 dev lo $ ip addr show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 192.0.2.1/24 scope global lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever $ ping 192.0.2.1 PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data. 64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.131 ms 64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.046 ms 64 bytes from 192.0.2.1: icmp_seq=3 ttl=64 time=0.048 ms ^C --- 192.0.2.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2042ms rtt min/avg/max/mdev = 0.046/0.075/0.131/0.039 ms $ ping 192.0.2.2 PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.131 ms 64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.069 ms 64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.049 ms 64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.035 ms ^C --- 192.0.2.2 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3049ms rtt min/avg/max/mdev = 0.035/0.071/0.131/0.036 ms ``` Test: integration_test.TestLoopbackAcceptAllInSubnet PiperOrigin-RevId: 328188546
2020-09-09Update inotify documentation for gofer filesystem.Dean Deng
We now allow hard links to be created within gofer fs (see github.com/google/gvisor/commit/f20e63e31b56784c596897e86f03441f9d05f567). Update the inotify documentation accordingly. PiperOrigin-RevId: 328177485
2020-09-09Implement GetFilesystem for verity fsgVisor bot
verity GetFilesystem is implemented by mounting the underlying file system, save the mount, and store both the underlying root dentry and root Merkle file dentry in verity's root dentry. PiperOrigin-RevId: 327959334
2020-09-09[vfs] Allow mountpoint to be an existing non-directory.Ayush Ranjan
Unlike linux mount(2), OCI spec allows mounting on top of an existing non-directory file. PiperOrigin-RevId: 327914342
2020-09-09stateify: Fix pretty print not printing odd numbered fields.Ting-Yu Wang
PiperOrigin-RevId: 327902182
2020-09-09Provide fdReader/Writer for FileDescriptiongVisor bot
fdReader/Writer implements io.Reader/Writer so that they can be passed to Merkle tree library. PiperOrigin-RevId: 327901376
2020-09-09Internal change.gVisor bot
PiperOrigin-RevId: 327892274
2020-09-09Clarify seek behaviour for kernfs.GenericDirectoryFD.Rahat Mahmood
- Remove comment about GenericDirectoryFD not being compatible with dynamic directories. It is currently being used to implement dynamic directories. - Try to handle SEEK_END better than setting the offset to infinity. SEEK_END is poorly defined for dynamic directories anyways, so at least try make it work correctly for the static entries. Updates #1193. PiperOrigin-RevId: 327890128
2020-09-09Pass overlay credentials via context in copy up.Nicolas Lacasse
Some VFS operations (those which operate on FDs) get their credentials via the context instead of via an explicit creds param. For these cases, we must pass the overlay credentials on the context. PiperOrigin-RevId: 327881259
2020-09-09Make mounts ReadWrite first, then later change to ReadOnly.Nicolas Lacasse
This lets us create "synthetic" mountpoint directories in ReadOnly mounts during VFS setup. Also add context.WithMountNamespace, as some filesystems (like overlay) require a MountNamespace on ctx to handle vfs.Filesystem Operations. PiperOrigin-RevId: 327874971
2020-09-09Fix parent directory creation in CreateDeviceFile.Nicolas Lacasse
It was not properly creating recursive directories. Added tests for this case. Updates #1196 PiperOrigin-RevId: 327850811
2020-09-09[vfs] Create recursive dir creation util.Ayush Ranjan
Refactored the recursive dir creation util in runsc/boot/vfs.go to be more flexible. PiperOrigin-RevId: 327719100
2020-09-09stateify: Fix afterLoad not being called for root objectTing-Yu Wang
PiperOrigin-RevId: 327711264
2020-09-09Add reference count checking to the fsimpl/host package.Dean Deng
Includes a minor refactor for inode construction. Updates #1486. PiperOrigin-RevId: 327694933
2020-09-09Consistent precondition formattingMichael Pratt
Our "Preconditions:" blocks are very useful to determine the input invariants, but they are bit inconsistent throughout the codebase, which makes them harder to read (particularly cases with 5+ conditions in a single paragraph). I've reformatted all of the cases to fit in simple rules: 1. Cases with a single condition are placed on a single line. 2. Cases with multiple conditions are placed in a bulleted list. This format has been added to the style guide. I've also mentioned "Postconditions:", though those are much less frequently used, and all uses already match this style. PiperOrigin-RevId: 327687465
2020-09-09Skip listening TCP ports when trying to bind a free port.Bhasker Hariharan
PiperOrigin-RevId: 327686558
2020-09-09Only use the NextHeader value of the first IPv6 fragment extension header.Arthur Sfez
As per RFC 8200 Section 4.5: The Next Header field of the last header of the Per-Fragment headers is obtained from the Next Header field of the first fragment's Fragment header. Test: - pkg/tcpip/network/ipv6:ipv6_test - pkg/tcpip/network/ipv4:ipv4_test - pkg/tcpip/network/fragmentation:fragmentation_test Updates #2197 PiperOrigin-RevId: 327671635
2020-09-09Use a explicit random src for RandomID.Bhasker Hariharan
PiperOrigin-RevId: 327659759
2020-09-09Fix tabs in lock-ordering doc.Nicolas Lacasse
PiperOrigin-RevId: 327654207
2020-09-09Move boot.Config to its own packageFabricio Voznika
Updates #3494 PiperOrigin-RevId: 327548511
2020-09-09Remove path walk from localFile.MknodFabricio Voznika
Replace mknod call with mknodat equivalent to protect against symlink attacks. Also added Mknod tests. Remove goferfs reliance on gofer to check for file existence before creating a synthetic entry. Updates #2923 PiperOrigin-RevId: 327544516
2020-09-09ip6tables: move ipv4-specific logic into its own fileKevin Krakauer
A later change will introduce the equivalent IPv6 logic. #3549 PiperOrigin-RevId: 327499064
2020-09-09Remove use of channels from p9.connState legacy transport.Jamie Liu
- Remove sendDone, which currently does nothing whatsoever (errors sent to the channel are completely unused). Instead, have request handlers log errors they get from p9.send() inline. - Replace recvOkay and recvDone with recvMu/recvIdle/recvShutdown. In addition to being slightly clearer (IMO), this eliminates the p9.connState.service() goroutine, significantly reducing the overhead involved in passing connection receive access between goroutines (from buffered chan send/recv + unbuffered chan send/recv to just a mutex unlock/lock). PiperOrigin-RevId: 327476755
2020-09-09Change runtimeoptions proto handling.Fabricio Voznika
Stolen from cl/327337408 (ascannell is OOO) PiperOrigin-RevId: 327475423
2020-08-19Return appropriate errors when file locking is unsuccessful.Dean Deng
test_eintr now passes in the Python runtime tests. Updates #3515. PiperOrigin-RevId: 327441081
2020-08-19[vfs] Allow offsets for special files other than regular files.Ayush Ranjan
Some character and block devices can be seekable. So allow their FD to maintain file offset. PiperOrigin-RevId: 327370684
2020-08-19Get rid of kernfs.Inode.Destroy.Dean Deng
This interface method is unneeded. PiperOrigin-RevId: 327370325
2020-08-19Move ERESTART* error definitions to syserror package.Dean Deng
This is needed to avoid circular dependencies between the vfs and kernel packages. PiperOrigin-RevId: 327355524
2020-08-19Don't set atime if mount is readonlyFabricio Voznika
Updates #1035 PiperOrigin-RevId: 327351475
2020-08-19Add more information to panic when device ID don't matchFabricio Voznika
PiperOrigin-RevId: 327351357
2020-08-19RACK: Create a new list for segments.Nayana Bidari
RACK requires the segments to be in the order of their transmission or retransmission times. This cl creates a new list and moves the retransmitted segments to the end of the list. PiperOrigin-RevId: 327325153
2020-08-19Avoid holding locks when opening files in VFS2.Jamie Liu
Fixes #3243, #3521 PiperOrigin-RevId: 327308890
2020-08-19Wait for all p9 handlers to complete before server shutdown.Jamie Liu
... including those invoked via flipcall. PiperOrigin-RevId: 327283194
2020-08-19[vfs2] Implement /proc/sys/net/ipv4/tcp_rmem and /proc/sys/net/ipv4/tcp_wmem.Ayush Ranjan
Updates #1035 PiperOrigin-RevId: 327253907
2020-08-19Add a skeleton for verity file systemgVisor bot
PiperOrigin-RevId: 327123477
2020-08-19Stop masking the IO error in handleIOError.Nicolas Lacasse
PiperOrigin-RevId: 327123331
2020-08-19Add Verify in merkle tree librarygVisor bot
Verify checks input data against the merkle tree, and compares the root hash with expectation. PiperOrigin-RevId: 327116711
2020-08-19[vfs] Do O_DIRECTORY check after resolving symlinks.Ayush Ranjan
Fixes python runtime test test_glob. Updates #3515 We were checking is the to-be-opened dentry is a dir or not before resolving symlinks. We should check that after resolving symlinks. This was preventing us from opening a symlink which pointed to a directory with O_DIRECTORY. Also added this check in tmpfs and removed a duplicate check. PiperOrigin-RevId: 327085895
2020-08-19Remove address range functionsGhanan Gowripalan
Should have been removed in cl/326791119 https://github.com/google/gvisor/commit/9a7b5830aa063895f67ca0fdf653a46906374613 PiperOrigin-RevId: 327074156
2020-08-19Remove weak references from unix sockets.Dean Deng
The abstract socket namespace no longer holds any references on sockets. Instead, TryIncRef() is used when a socket is being retrieved in BoundEndpoint(). Abstract sockets are now responsible for removing themselves from the namespace they are in, when they are destroyed. Updates #1486. PiperOrigin-RevId: 327064173
2020-08-19Add a unit test for out of order IP reassemblyArthur Sfez
PiperOrigin-RevId: 327042869