gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-10-26	Simplify vfs.NewDisconnectedMount signature and callpoints.	Ayush Ranjan
	vfs.NewDisconnectedMount has no error paths. Its much prettier without the error return value. Also simplify MountDisconnected which would immediately drop the refs taken by NewDisconnectedMount. Instead make it directly call newMount. PiperOrigin-RevId: 405767966
2021-10-26	Obtain ref on root dentry in mqfs.GetFilesystem.	Ayush Ranjan
	As documented in FilesystemType.GetFilesystem, a reference should be taken on the returned dentry and filesystem by GetFilesystem implementation. mqfs did not do that. Additionally cleanup and clarify ref counting of dentry, filesystem and mount in mqfs. Reported-by: syzbot+a2c54bfb6e1525228e5f@syzkaller.appspotmail.com Reported-by: syzbot+ccd305cdab11cfebbfff@syzkaller.appspotmail.com PiperOrigin-RevId: 405700565
2021-10-20	Report correct error when restore fails	Fabricio Voznika
	When file corruption is detected, report vfs.ErrCorruption to distinguish corruption error from other restore errors. Updates #1035 PiperOrigin-RevId: 404588445
2021-10-11	Merge pull request #6428 from dillanzhou:fix_epoll_vfs2	gVisor bot
	PiperOrigin-RevId: 402323053
2021-09-21	[lisa] Implement lisafs protocol methods in VFS2 gofer client and fsgofer.	Ayush Ranjan
	Introduces RPC methods in lisafs. Makes that gofer client use lisafs RPCs instead of p9 when lisafs is enabled. Implements the handlers for those methods in fsgofer. Fixes #5465 PiperOrigin-RevId: 398080310
2021-09-16	Limit most file mmaps to the range of an int64.	Jamie Liu
	In the general case, files may have offsets between MaxInt64 and MaxUint64; in Linux pgoff is consistently represented by an unsigned long, and in gVisor the offset types in memmap.MappableRange are uint64. However, regular file mmap is constrained to int64 offsets (on 64-bit systems) by mm/mmap.c:file_mmap_size_max() => MAX_LFS_FILESIZE == LLONG_MAX. As a related fix, check for chunkStart overflow in fsutil.HostFileMapper; chunk offsets are uint64s, but as noted above some file types may use uint64 offsets beyond MaxInt64. Reported-by: syzbot+71342a1585aed97ed9f7@syzkaller.appspotmail.com PiperOrigin-RevId: 397136751
2021-08-20	Remove experimental warning in the VFS2 README.	Jamie Liu
	PiperOrigin-RevId: 392078690
2021-08-13	[syserror] Remove pkg syserror.	Zach Koopmans
	Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-12	[syserror] Convert remaining syserror definitions to linuxerr.	Zach Koopmans
	Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-08-09	vfs2/epoll: fix missing event trigger in epoll-in-epoll case	Jielong Zhou
	In some cases, epoll fd would be registered in another epoll fd. Process may call epoll_wait on the upper layer epoll fd, and the lower layer epoll fd should generate EPOLLIN event if itself get any event. But in VFS2, events generated for epoll fd could only be cleaned when (EpollInstance).ReadEvents is called. And this function is only called when epoll_wait on the epoll fd. Therefore, when epoll_wait on the upper layer epoll fd, the events generated in lower layer epoll fd would not be cleaned even if it's not valid anymore, and lower layer epoll fd would not report event to upper layer even if new event is triggered. In this commit, (EpollInstance).Readiness would also clean invalid events. So, when epoll_wait on the upper layer epoll fd, Readiness function called on lower layer epoll fd would clean invalid events. And lower layer could report event to upper layer if new event is triggered. A syscall test case is added to verify the commit. Fixes https://github.com/google/gvisor/issues/6427 Signed-off-by: Jielong Zhou <jielong.zjl@antgroup.com>
2021-07-13	Do not require O_PATH flag to enable verity	Chong Cai
	Remove the hack in gVisor vfs that allows verity to bypass the O_PATH check, since ioctl is not allowed on fds opened with O_PATH in linux. Verity still opens the lowerFD with O_PATH to open it as a symlink, but the API no longer expects O_PATH to open a fd to be verity enabled. Now only O_FOLLOW should be specified when opening and enabling verity features. PiperOrigin-RevId: 384567833
2021-07-12	[syserror] Update syserror to linuxerr for more errors.	Zach Koopmans
	Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-01	Mix checklocks and atomic analyzers.	Adin Scannell
	This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01	[syserror] Update several syserror errors to linuxerr equivalents.	Zach Koopmans
	Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-30	[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.	Zach Koopmans
	Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29	[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVAL	Zach Koopmans
	Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-22	[syserror] Add conversions to linuxerr with temporary Equals method.	Zach Koopmans
	Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-10	Minor VFS2 xattr changes.	Jamie Liu
	- Allow the gofer client to use most xattr namespaces. As documented by the updated comment, this is consistent with e.g. Linux's FUSE client, and allows gofers to provide extended attributes from FUSE filesystems. - Make tmpfs' listxattr omit xattrs in the "trusted" namespace for non-privileged users. PiperOrigin-RevId: 378778854
2021-06-01	vfs: Don't allow to mount anything on top of detached mounts	Andrei Vagin
	PiperOrigin-RevId: 376932659
2021-05-14	Resolve remaining O_PATH TODOs.	Dean Deng
	O_PATH is now implemented in vfs2. Fixes #2782. PiperOrigin-RevId: 373861410
2021-04-29	Remove ResolvingPath.Restart	Fabricio Voznika
	PiperOrigin-RevId: 371163405
2021-04-28	Automated rollback of changelist 369686285	Fabricio Voznika
	PiperOrigin-RevId: 371015541
2021-04-22	Also report mount options through /proc/<pid>/mounts.	Rahat Mahmood
	PiperOrigin-RevId: 369967629
2021-04-21	Automated rollback of changelist 369325957	Michael Pratt
	PiperOrigin-RevId: 369686285
2021-04-19	Add MultiGetAttr message to 9P	Fabricio Voznika
	While using remote-validation, the vast majority of time spent during FS operations is re-walking the path to check for modifications and then closing the file given that in most cases it has not been modified externally. This change introduces a new 9P message called MultiGetAttr which bulks query attributes of several files in one shot. The returned attributes are then used to update cached dentries before they are walked. File attributes are updated for files that still exist. Dentries that have been deleted are removed from the cache. And negative cache entries are removed if a new file/directory was created externally. Similarly, synthetic dentries are replaced if a file/directory is created externally. The bulk update needs to be carefull not to follow symlinks, cross mount points, because the gofer doesn't know how to resolve symlinks and where mounts points are located. It also doesn't walk to the parent ("..") to avoid deadlocks. Here are the results: Workload VFS1 VFS2 Change bazel action 115s 70s 28.8s Stat/100 11,043us 7,623us 974us Updates #1638 PiperOrigin-RevId: 369325957
2021-04-02	Implement cgroupfs.	Rahat Mahmood
	A skeleton implementation of cgroupfs. It supports trivial cpu and memory controllers with no support for hierarchies. PiperOrigin-RevId: 366561126
2021-03-29	[syserror] Split usermem package	Zach Koopmans
	Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-24	Add POLLRDNORM/POLLWRNORM support.	Bhasker Hariharan
	On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-11	Report filesystem-specific mount options.	Rahat Mahmood
	PiperOrigin-RevId: 362406813
2021-03-03	Add checklocks analyzer.	Bhasker Hariharan
	This validates that struct fields if annotated with "// checklocks:mu" where "mu" is a mutex field in the same struct then access to the field is only done with "mu" locked. All types that are guarded by a mutex must be annotated with // +checklocks:<mutex field name> For more details please refer to README.md. PiperOrigin-RevId: 360729328
2021-02-19	Don't hold baseEndpoint.mu while calling EventUpdate().	Nicolas Lacasse
	This removes a three-lock deadlock between fdnotifier.notifier.mu, epoll.EventPoll.listsMu, and baseEndpoint.mu. A lock order comment was added to epoll/epoll.go. Also fix unsafe access of baseEndpoint.connected/receiver. PiperOrigin-RevId: 358515191
2021-02-17	Add gohacks.Slice/StringHeader.	Jamie Liu
	See https://github.com/golang/go/issues/19367 for rationale. Note that the upstream decision arrived at in that thread, while useful for some of our use cases, doesn't account for all of our SliceHeader use cases (we often use SliceHeader to extract pointers from slices in a way that avoids bounds checking and/or handles nil slices correctly) and also doesn't exist yet. PiperOrigin-RevId: 358071574
2021-02-11	Internal change.	gVisor bot
	PiperOrigin-RevId: 357090170
2021-02-10	Support setgid directories in tmpfs and kernfs	Kevin Krakauer
	PiperOrigin-RevId: 356868412
2021-02-10	Don't allow to umount the namespace root mount	Andrei Vagin
	Linux does the same thing. Reported-by: syzbot+6c79385c930c929d1d9e@syzkaller.appspotmail.com PiperOrigin-RevId: 356854562
2021-02-09	Add cleanup TODO for integer-based proc files.	Dean Deng
	PiperOrigin-RevId: 356645022
2021-02-03	[vfs] Make sticky bit check consistent with Linux.	Ayush Ranjan
	Our implementation of vfs.CheckDeleteSticky was not consistent with Linux, specifically not consistent with fs/linux.h:check_sticky(). One of the biggest differences was that the vfs implementation did not allow the owner of the sticky directory to delete files inside it that belonged to other users. This change makes our implementation consistent with Linux. Also adds an integration test to check for this. This bug is also present in VFS1. Updates #3027 PiperOrigin-RevId: 355557425
2021-01-28	Add O_PATH support in vfs2	gVisor bot
	PiperOrigin-RevId: 354367665
2021-01-26	Move inotify events from syscall to vfs layer.	Dean Deng
	This also causes inotify events to be generated when reading files for exec. This change also requires us to adjust splice+inotify tests due to discrepancies between gVisor and Linux behavior. Note that these discrepancies existed before; we just did not exercise them previously. See comment for more details. Fixes #5348. PiperOrigin-RevId: 353907187
2021-01-26	Do not generate extraneous IN_CLOSE inotify events.	Dean Deng
	IN_CLOSE should only be generated when a file description loses its last reference; not when a file descriptor is closed. See fs/file_table.c:__fput. Updates #5348. PiperOrigin-RevId: 353810697
2021-01-22	Implement F_GETLK fcntl.	Dean Deng
	Fixes #5113. PiperOrigin-RevId: 353313374
2021-01-20	Move Lock/UnlockPOSIX into LockFD util.	Dean Deng
	PiperOrigin-RevId: 352904728
2021-01-12	Fix simple mistakes identified by goreportcard.	Adin Scannell
	These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2020-12-09	Add //pkg/sync:generic_atomicptrmap.	Jamie Liu
	AtomicPtrMap is a generic concurrent map from arbitrary keys to arbitrary pointer values. Benchmarks: name time/op StoreDelete/RWMutexMap-12 335ns ± 1% StoreDelete/SyncMap-12 705ns ± 3% StoreDelete/AtomicPtrMap-12 287ns ± 4% StoreDelete/AtomicPtrMapSharded-12 289ns ± 1% LoadOrStoreDelete/RWMutexMap-12 342ns ± 2% LoadOrStoreDelete/SyncMap-12 662ns ± 2% LoadOrStoreDelete/AtomicPtrMap-12 290ns ± 7% LoadOrStoreDelete/AtomicPtrMapSharded-12 293ns ± 2% LookupPositive/RWMutexMap-12 101ns ±26% LookupPositive/SyncMap-12 202ns ± 2% LookupPositive/AtomicPtrMap-12 71.1ns ± 2% LookupPositive/AtomicPtrMapSharded-12 73.2ns ± 1% LookupNegative/RWMutexMap-12 119ns ± 1% LookupNegative/SyncMap-12 154ns ± 1% LookupNegative/AtomicPtrMap-12 84.7ns ± 3% LookupNegative/AtomicPtrMapSharded-12 86.8ns ± 1% Concurrent/FixedKeys_1PercentWrites_RWMutexMap-12 1.32µs ± 2% Concurrent/FixedKeys_1PercentWrites_SyncMap-12 52.7ns ±10% Concurrent/FixedKeys_1PercentWrites_AtomicPtrMap-12 31.8ns ±20% Concurrent/FixedKeys_1PercentWrites_AtomicPtrMapSharded-12 24.0ns ±15% Concurrent/FixedKeys_10PercentWrites_RWMutexMap-12 860ns ± 3% Concurrent/FixedKeys_10PercentWrites_SyncMap-12 68.8ns ±20% Concurrent/FixedKeys_10PercentWrites_AtomicPtrMap-12 98.6ns ± 7% Concurrent/FixedKeys_10PercentWrites_AtomicPtrMapSharded-12 42.0ns ±25% Concurrent/FixedKeys_50PercentWrites_RWMutexMap-12 1.17µs ± 3% Concurrent/FixedKeys_50PercentWrites_SyncMap-12 136ns ±34% Concurrent/FixedKeys_50PercentWrites_AtomicPtrMap-12 286ns ± 3% Concurrent/FixedKeys_50PercentWrites_AtomicPtrMapSharded-12 115ns ±35% Concurrent/ChangingKeys_1PercentWrites_RWMutexMap-12 1.27µs ± 2% Concurrent/ChangingKeys_1PercentWrites_SyncMap-12 5.01µs ± 3% Concurrent/ChangingKeys_1PercentWrites_AtomicPtrMap-12 38.1ns ± 3% Concurrent/ChangingKeys_1PercentWrites_AtomicPtrMapSharded-12 22.6ns ± 2% Concurrent/ChangingKeys_10PercentWrites_RWMutexMap-12 1.08µs ± 2% Concurrent/ChangingKeys_10PercentWrites_SyncMap-12 5.97µs ± 1% Concurrent/ChangingKeys_10PercentWrites_AtomicPtrMap-12 390ns ± 2% Concurrent/ChangingKeys_10PercentWrites_AtomicPtrMapSharded-12 93.6ns ± 1% Concurrent/ChangingKeys_50PercentWrites_RWMutexMap-12 1.77µs ± 2% Concurrent/ChangingKeys_50PercentWrites_SyncMap-12 8.07µs ± 2% Concurrent/ChangingKeys_50PercentWrites_AtomicPtrMap-12 1.61µs ± 2% Concurrent/ChangingKeys_50PercentWrites_AtomicPtrMapSharded-12 386ns ± 1% Updates #231 PiperOrigin-RevId: 346614776
2020-12-04	Overlay runsc regular file mounts with regular files.	Jamie Liu
	Fixes #4991 PiperOrigin-RevId: 345800333
2020-12-03	Implement `fcntl` options `F_GETSIG` and `F_SETSIG`.	Etienne Perot
	These options allow overriding the signal that gets sent to the process when I/O operations are available on the file descriptor, rather than the default `SIGIO` signal. Doing so also populates `siginfo` to contain extra information about which file descriptor caused the event (`si_fd`) and what events happened on it (`si_band`). The logic around which FD is populated within `si_fd` matches Linux's, which means it has some weird edge cases where that value may not actually refer to a file descriptor that is still valid. This CL also ports extra S/R logic regarding async handler in VFS2. Without this, async I/O handlers aren't properly re-registered after S/R. PiperOrigin-RevId: 345436598
2020-12-02	Remove FileReadWriteSeeker from vfs.	Jamie Liu
	Previous experience has shown that these types of wrappers tends to create two kinds of problems: hidden allocations (e.g. each call to FileReadWriteSeeker.Read/Write allocates a usermem.BytesIO on the heap) and hidden lock ordering problems (e.g. VFS1 splice deadlocks). Since this is only needed by fsimpl/verity, move it there. PiperOrigin-RevId: 345377830
2020-11-18	Port filesystem metrics to VFS2.	Jamie Liu
	PiperOrigin-RevId: 343196927
2020-11-09	Initialize references with a value of 1.	Dean Deng
	This lets us avoid treating a value of 0 as one reference. All references using the refsvfs2 template must call InitRefs() before the reference is incremented/decremented, or else a panic will occur. Therefore, it should be pretty easy to identify missing InitRef calls during testing. Updates #1486. PiperOrigin-RevId: 341411151
2020-11-03	Fix more nogo tests	Ting-Yu Wang
	PiperOrigin-RevId: 340536306