gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-12-10	Disable host reassembly for fragments.	Bhasker Hariharan
	fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket to ensure that fragments are delivered inorder to the right dispatcher. But this prevents fragments from being delivered to gvisor at all and makes testing of gvisor's fragment reassembly code impossible. The potential impact from this is minimal since IP Fragmentation is not really that prevelant and in cases where we do get fragments we may deliver the fragment out of order to the TCP layer as multiple network dispatchers may process the fragments and deliver a reassembled fragment after the next packet has been delivered to the TCP endpoint. While not desirable I believe the impact from this is minimal due to low prevalence of fragmentation. Also removed PktType and Hatype fields when binding the socket as these are not used when binding. Its just confusing to have them specified. See: https://man7.org/linux/man-pages/man7/packet.7.html "Fields used for binding are sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex." Fixes #5055 PiperOrigin-RevId: 346919439
2020-12-03	Make `stack.Route` thread safe	Peter Johnston
	Currently we rely on the user to take the lock on the endpoint that owns the route, in order to modify it safely. We can instead move `Route.RemoteLinkAddress` under `Route`'s mutex, and allow non-locking and thread-safe access to other fields of `Route`. PiperOrigin-RevId: 345461586
2020-11-25	Make stack.Route safe to access concurrently	Ghanan Gowripalan
	Multiple goroutines may use the same stack.Route concurrently so the stack.Route should make sure that any functions called on it are thread-safe. Fixes #4073 PiperOrigin-RevId: 344320491
2020-11-18	Introduce stack.WritePacketToRemote, remove LinkEndpoint.WriteRawPacket	Bruno Dal Bo
	Redefine stack.WritePacket into stack.WritePacketToRemote which lets the NIC decide whether to append link headers. PiperOrigin-RevId: 343071742
2020-11-17	Add per-sniffer instance log prefix	Bruno Dal Bo
	A prefix associated with a sniffer instance can help debug situations where more than one NIC (i.e. more than one sniffer) exists. PiperOrigin-RevId: 342950027
2020-11-16	Remove ARP address workaround	Ghanan Gowripalan
	- Make AddressableEndpoint optional for NetworkEndpoint. Not all NetworkEndpoints need to support addressing (e.g. ARP), so AddressableEndpoint should only be implemented for protocols that support addressing such as IPv4 and IPv6. With this change, tcpip.ErrNotSupported will be returned by the stack when attempting to modify addresses on a network endpoint that does not support addressing. Now that packets are fully handled at the network layer, and (with this change) addresses are optional for network endpoints, we no longer need the workaround for ARP where a fake ARP address was added to each NIC that performs ARP so that packets would be delivered to the ARP layer. PiperOrigin-RevId: 342722547
2020-11-09	Initialize references with a value of 1.	Dean Deng
	This lets us avoid treating a value of 0 as one reference. All references using the refsvfs2 template must call InitRefs() before the reference is incremented/decremented, or else a panic will occur. Therefore, it should be pretty easy to identify missing InitRef calls during testing. Updates #1486. PiperOrigin-RevId: 341411151
2020-11-06	Trim link headers from buffer clone when sniffing	Ghanan Gowripalan
	PiperOrigin-RevId: 341135083
2020-10-23	Rewrite reference leak checker without finalizers.	Dean Deng
	Our current reference leak checker uses finalizers to verify whether an object has reached zero references before it is garbage collected. There are multiple problems with this mechanism, so a rewrite is in order. With finalizers, there is no way to guarantee that a finalizer will run before the program exits. When an unreachable object with a finalizer is garbage collected, its finalizer will be added to a queue and run asynchronously. The best we can do is run garbage collection upon sandbox exit to make sure that all finalizers are enqueued. Furthermore, if there is a chain of finalized objects, e.g. A points to B points to C, garbage collection needs to run multiple times before all of the finalizers are enqueued. The first GC run will register the finalizer for A but not free it. It takes another GC run to free A, at which point B's finalizer can be registered. As a result, we need to run GC as many times as the length of the longest such chain to have a somewhat reliable leak checker. Finally, a cyclical chain of structs pointing to one another will never be garbage collected if a finalizer is set. This is a well-known issue with Go finalizers (https://github.com/golang/go/issues/7358). Using leak checking on filesystem objects that produce cycles will not work and even result in memory leaks. The new leak checker stores reference counted objects in a global map when leak check is enabled and removes them once they are destroyed. At sandbox exit, any remaining objects in the map are considered as leaked. This provides a deterministic way of detecting leaks without relying on the complexities of finalizers and garbage collection. This approach has several benefits over the former, including: - Always detects leaks of objects that should be destroyed very close to sandbox exit. The old checker very rarely detected these leaks, because it relied on garbage collection to be run in a short window of time. - Panics if we forgot to enable leak check on a ref-counted object (we will try to remove it from the map when it is destroyed, but it will never have been added). - Can store extra logging information in the map values without adding to the size of the ref count struct itself. With the size of just an int64, the ref count object remains compact, meaning frequent operations like IncRef/DecRef are more cache-efficient. - Can aggregate leak results in a single report after the sandbox exits. Instead of having warnings littered in the log, which were non-deterministically triggered by garbage collection, we can print all warning messages at once. Note that this could also be a limitation--the sandbox must exit properly for leaks to be detected. Some basic benchmarking indicates that this change does not significantly affect performance when leak checking is enabled, which is understandable since registering/unregistering is only done once for each filesystem object. Updates #1486. PiperOrigin-RevId: 338685972
2020-10-20	Fix nogo test in //pkg/tcpip/...	Ting-Yu Wang
	PiperOrigin-RevId: 338168977
2020-10-16	Don't include link header when forwarding packets	Ghanan Gowripalan
	Before this change, if a link header was included in an incoming packet that is forwarded, the packet that gets sent out will take the original packet and add a link header to it while keeping the old link header. This would make the sent packet look like: OUTGOING LINK HDR \| INCOMING LINK HDR \| NETWORK HDR \| ... Obviously this is incorrect as we should drop the incoming link header and only include the outgoing link header. This change fixes this bug. Test: integration_test.TestForwarding PiperOrigin-RevId: 337571447
2020-10-09	Automated rollback of changelist 336304024	Ghanan Gowripalan
	PiperOrigin-RevId: 336339194
2020-10-09	Automated rollback of changelist 336185457	Bhasker Hariharan
	PiperOrigin-RevId: 336304024
2020-10-08	Do not resolve routes immediately	Ghanan Gowripalan
	When a response needs to be sent to an incoming packet, the stack should consult its neighbour table to determine the remote address's link address. When an entry does not exist in the stack's neighbor table, the stack should queue the packet while link resolution completes. See comments. PiperOrigin-RevId: 336185457
2020-09-18	Use common parsing utilities when sniffing	Ghanan Gowripalan
	Extract parsing utilities so they can be used by the sniffer. Fixes #3930 PiperOrigin-RevId: 332401880
2020-09-16	Gracefully translate unknown errno.	Ting-Yu Wang
	Neither POSIX.1 nor Linux defines an upperbound for errno. PiperOrigin-RevId: 332085017
2020-08-25	Use new reference count utility throughout gvisor.	Dean Deng
	This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted soon anyway. The following structs now use the new tool, with leak check enabled: devpts:rootInode fuse:inode kernfs:Dentry kernfs:dir kernfs:readonlyDir kernfs:StaticDirectory proc:fdDirInode proc:fdInfoDirInode proc:subtasksInode proc:taskInode proc:tasksInode vfs:FileDescription vfs:MountNamespace vfs:Filesystem sys:dir kernel:FSContext kernel:ProcessGroup kernel:Session shm:Shm mm:aioMappable mm:SpecialMappable transport:queue And the following use the template, but because they currently are not leak checked, a TODO is left instead of enabling leak check in this patch: kernel:FDTable tun:tunEndpoint Updates #1486. PiperOrigin-RevId: 328460377
2020-08-24	Bump build constraints to 1.17	Michael Pratt
	This enables pre-release testing with 1.16. The intention is to replace these with a nogo check before the next release. PiperOrigin-RevId: 328193911
2020-08-13	Migrate to PacketHeader API for PacketBuffer.	Ting-Yu Wang
	Formerly, when a packet is constructed or parsed, all headers are set by the client code. This almost always involved prepending to pk.Header buffer or trimming pk.Data portion. This is known to prone to bugs, due to the complexity and number of the invariants assumed across netstack to maintain. In the new PacketHeader API, client will call Push()/Consume() method to construct/parse an outgoing/incoming packet. All invariants, such as slicing and trimming, are maintained by the API itself. NewPacketBuffer() is introduced to create new PacketBuffer. Zero value is no longer valid. PacketBuffer now assumes the packet is a concatenation of following portions: * LinkHeader * NetworkHeader * TransportHeader * Data Any of them could be empty, or zero-length. PiperOrigin-RevId: 326507688
2020-08-11	Fix-up issue comment.	Adin Scannell
	PiperOrigin-RevId: 326129258
2020-08-03	Plumbing context.Context to DecRef() and Release().	Nayana Bidari
	context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-22	Support for receiving outbound packets in AF_PACKET.	Bhasker Hariharan
	Updates #173 PiperOrigin-RevId: 322665518
2020-07-15	fdbased: Vectorized write for packet; relax writev syscall filter.	Ting-Yu Wang
	Now it calls pkt.Data.ToView() when writing the packet. This may require copying when the packet is large, which puts the worse case in an even worse situation. This sent out in a separate preparation change as it requires syscall filter changes. This change will be followed by the change for the adoption of the new PacketHeader API. PiperOrigin-RevId: 321447003
2020-07-15	Fix minor bugs in a couple of interface IOCTLs.	Bhasker Hariharan
	gVisor incorrectly returns the wrong ARP type for SIOGIFHWADDR. This breaks tcpdump as it tries to interpret the packets incorrectly. Similarly, SIOCETHTOOL is used by tcpdump to query interface properties which fails with an EINVAL since we don't implement it. For now change it to return EOPNOTSUPP to indicate that we don't support the query rather than return EINVAL. NOTE: ARPHRD types for link endpoints are distinct from NIC capabilities and NIC flags. In Linux all 3 exist eg. ARPHRD types are stored in dev->type field while NIC capabilities are more like the device features which can be queried using SIOCETHTOOL but not modified and NIC Flags are fields that can be modified from user space. eg. NIC status (UP/DOWN/MULTICAST/BROADCAST) etc. Updates #2746 PiperOrigin-RevId: 321436525
2020-07-13	Fix recvMMsgDispatcher not slicing link header correctly.	Ting-Yu Wang
	PiperOrigin-RevId: 321035635
2020-07-06	Fix NonBlockingWrite3 not writing b3 if b2 is zero-length.	Ting-Yu Wang
	PiperOrigin-RevId: 319882171
2020-06-22	Extract common nested LinkEndpoint pattern	Bruno Dal Bo
	... and unify logic for detached netsted endpoints. sniffer.go caused crashes if a packet delivery is attempted when the dispatcher is nil. Extracted the endpoint nesting logic into a common composable type so it can be used by the Fuchsia Netstack (the pattern is widespread there). PiperOrigin-RevId: 317682842
2020-06-10	Remove duplicate and incorrect size check	Tamir Duberstein
	Minimum header sizes are already checked in each `case` arm below. Worse, the ICMP entries in transportProtocolMinSizes are incorrect, and produce false "raw packet" logs. PiperOrigin-RevId: 315730073
2020-06-10	Replace use of %v in sniffer	Tamir Duberstein
	PiperOrigin-RevId: 315711208
2020-06-03	Pass PacketBuffer as pointer.	Ting-Yu Wang
	Historically we've been passing PacketBuffer by shallow copying through out the stack. Right now, this is only correct as the caller would not use PacketBuffer after passing into the next layer in netstack. With new buffer management effort in gVisor/netstack, PacketBuffer will own a Buffer (to be added). Internally, both PacketBuffer and Buffer may have pointers and shallow copying shouldn't be used. Updates #2404. PiperOrigin-RevId: 314610879
2020-05-29	Update Go version build tags	Michael Pratt
	None of the dependencies have changed in 1.15. It may be possible to simplify some of the wrappers in rawfile following 1.13, but that can come in a later change. PiperOrigin-RevId: 313863264
2020-05-27	Remove linkEP from DeliverNetworkPacket	Sam Balana
	The specified LinkEndpoint is not being used in a significant way. No behavior change, existing tests pass. This change is a breaking change. PiperOrigin-RevId: 313496602
2020-05-27	Fix tiny typo.	Kevin Krakauer
	PiperOrigin-RevId: 313414690
2020-05-11	Automated rollback of changelist 310417191	Bhasker Hariharan
	PiperOrigin-RevId: 310963404
2020-05-07	Automated rollback of changelist 309339316	Bhasker Hariharan
	PiperOrigin-RevId: 310417191
2020-05-06	sniffer: fix accidental logging of good packets as bad	Kevin Krakauer
	We need to check vv.Size() instead of len(tcp), as tcp will always be 20 bytes long. PiperOrigin-RevId: 310218351
2020-05-01	Automated rollback of changelist 308674219	Kevin Krakauer
	PiperOrigin-RevId: 309491861
2020-04-30	Enable FIFO QDisc by default in runsc.	Bhasker Hariharan
	Updates #231 PiperOrigin-RevId: 309339316
2020-04-30	FIFO QDisc implementation	Bhasker Hariharan
	Updates #231 PiperOrigin-RevId: 309323808
2020-04-27	Automated rollback of changelist 308163542	gVisor bot
	PiperOrigin-RevId: 308674219
2020-04-23	Remove View.First() and View.RemoveFirst()	Kevin Krakauer
	These methods let users eaily break the VectorisedView abstraction, and allowed netstack to slip into pseudo-enforcement of the "all headers are in the first View" invariant. Removing them and replacing with PullUp(n) breaks this reliance and will make it easier to add iptables support and rework network buffer management. The new View.PullUp(n) method is low cost in the common case, when when all the headers fit in the first View. PiperOrigin-RevId: 308163542
2020-04-21	Automated rollback of changelist 307477185	gVisor bot
	PiperOrigin-RevId: 307598974
2020-04-17	Remove View.First() and View.RemoveFirst()	Kevin Krakauer
	These methods let users eaily break the VectorisedView abstraction, and allowed netstack to slip into pseudo-enforcement of the "all headers are in the first View" invariant. Removing them and replacing with PullUp(n) breaks this reliance and will make it easier to add iptables support and rework network buffer management. The new View.PullUp(n) method is low cost in the common case, when when all the headers fit in the first View.
2020-04-17	Allow caller-defined sinks for packet sniffing.	Tamir Duberstein
	PiperOrigin-RevId: 307053624
2020-04-16	Properly delegate Wait	Tamir Duberstein
	PiperOrigin-RevId: 306959393
2020-04-15	Deduplicate packet logging	Tamir Duberstein
	PiperOrigin-RevId: 306677789
2020-04-14	Reduce flakiness in tcp_test.	Bhasker Hariharan
	Tests now use a MinRTO of 3s instead of default 200ms. This reduced flakiness in a lot of the congestion control/recovery tests which were flaky due to retransmit timer firing too early in case the test executors were overloaded. This change also bumps some of the timeouts in tests which were too sensitive to timer variations and reduces the number of slow start iterations which can make the tests run for too long and also trigger retansmit timeouts etc if the executor is overloaded. PiperOrigin-RevId: 306562645
2020-04-08	Fix all printf formatting errors.	Adin Scannell
	Updates #2243
2020-04-03	Refactor software GSO code.	Bhasker Hariharan
	Software GSO implementation currently has a complicated code path with implicit assumptions that all packets to WritePackets carry same Data and it does this to avoid allocations on the path etc. But this makes it hard to reuse the WritePackets API. This change breaks all such assumptions by introducing a new Vectorised View API ReadToVV which can be used to cleanly split a VV into multiple independent VVs. Further this change also makes packet buffers linkable to form an intrusive list. This allows us to get rid of the array of packet buffers that are passed in the WritePackets API call and replace it with a list of packet buffers. While this code does introduce some more allocations in the benchmarks it doesn't cause any degradation. Updates #231 PiperOrigin-RevId: 304731742
2020-03-24	Add support for setting TCP segment hash.	Bhasker Hariharan
	This allows the link layer endpoints to consistenly hash a TCP segment to a single underlying queue in case a link layer endpoint does support multiple underlying queues. Updates #231 PiperOrigin-RevId: 302760664