gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-11-17	tmpfs: make sure that a dentry will not be destroyed before the open() call	Andrei Vagin
	If we don't hold a reference, the dentry can be destroyed by another thread. Reported-by: syzbot+f2132e50060c41f6d41f@syzkaller.appspotmail.com PiperOrigin-RevId: 342951940
2020-11-13	fs/tmpfs: change regularFile.size atomically	Andrei Vagin
	PiperOrigin-RevId: 342221309
2020-11-13	fs/tmpfs: use atomic operations to access inode.mode	Andrei Vagin
	PiperOrigin-RevId: 342214859
2020-11-11	Read fsimpl/tmpfs timestamps atomically.	Jamie Liu
	PiperOrigin-RevId: 341982672
2020-11-09	Initialize references with a value of 1.	Dean Deng
	This lets us avoid treating a value of 0 as one reference. All references using the refsvfs2 template must call InitRefs() before the reference is incremented/decremented, or else a panic will occur. Therefore, it should be pretty easy to identify missing InitRef calls during testing. Updates #1486. PiperOrigin-RevId: 341411151
2020-11-03	Make pipe min/max sizes match linux.	Nicolas Lacasse
	The default pipe size already matched linux, and is unchanged. Furthermore `atomicIOBytes` is made a proper constant (as it is in Linux). We were plumbing usermem.PageSize everywhere, so this is no functional change. PiperOrigin-RevId: 340497006
2020-10-23	Support VFS2 save/restore.	Jamie Liu
	Inode number consistency checks are now skipped in save/restore tests for reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename. They pass in VFS1 due to the bug described in new test case SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs. Fixes #1663 PiperOrigin-RevId: 338776148
2020-10-23	Rewrite reference leak checker without finalizers.	Dean Deng
	Our current reference leak checker uses finalizers to verify whether an object has reached zero references before it is garbage collected. There are multiple problems with this mechanism, so a rewrite is in order. With finalizers, there is no way to guarantee that a finalizer will run before the program exits. When an unreachable object with a finalizer is garbage collected, its finalizer will be added to a queue and run asynchronously. The best we can do is run garbage collection upon sandbox exit to make sure that all finalizers are enqueued. Furthermore, if there is a chain of finalized objects, e.g. A points to B points to C, garbage collection needs to run multiple times before all of the finalizers are enqueued. The first GC run will register the finalizer for A but not free it. It takes another GC run to free A, at which point B's finalizer can be registered. As a result, we need to run GC as many times as the length of the longest such chain to have a somewhat reliable leak checker. Finally, a cyclical chain of structs pointing to one another will never be garbage collected if a finalizer is set. This is a well-known issue with Go finalizers (https://github.com/golang/go/issues/7358). Using leak checking on filesystem objects that produce cycles will not work and even result in memory leaks. The new leak checker stores reference counted objects in a global map when leak check is enabled and removes them once they are destroyed. At sandbox exit, any remaining objects in the map are considered as leaked. This provides a deterministic way of detecting leaks without relying on the complexities of finalizers and garbage collection. This approach has several benefits over the former, including: - Always detects leaks of objects that should be destroyed very close to sandbox exit. The old checker very rarely detected these leaks, because it relied on garbage collection to be run in a short window of time. - Panics if we forgot to enable leak check on a ref-counted object (we will try to remove it from the map when it is destroyed, but it will never have been added). - Can store extra logging information in the map values without adding to the size of the ref count struct itself. With the size of just an int64, the ref count object remains compact, meaning frequent operations like IncRef/DecRef are more cache-efficient. - Can aggregate leak results in a single report after the sandbox exits. Instead of having warnings littered in the log, which were non-deterministically triggered by garbage collection, we can print all warning messages at once. Note that this could also be a limitation--the sandbox must exit properly for leaks to be detected. Some basic benchmarking indicates that this change does not significantly affect performance when leak checking is enabled, which is understandable since registering/unregistering is only done once for each filesystem object. Updates #1486. PiperOrigin-RevId: 338685972
2020-10-13	Don't read beyond EOF when inserting into sentry page cache.	Jamie Liu
	The sentry page cache stores file contents at page granularity; this is necessary for memory mappings. Thus file offset ranges passed to fsutil.FileRangeSet.Fill() must be page-aligned. If the read callback passed to Fill() returns (partial read, nil error) when reading up to EOF (which is the case for p9.ClientFile.ReadAt() since 9P's Rread cannot convey both a partial read and EOF), Fill() will re-invoke the read callback to try to read from EOF to the end of the containing page, which is harmless but needlessly expensive. Fix this by handling file size explicitly in fsutil.FileRangeSet.Fill(). PiperOrigin-RevId: 336934075
2020-10-13	[vfs2] Don't take reference in Task.MountNamespaceVFS2 and MountNamespace.Root.	Dean Deng
	This fixes reference leaks related to accidentally forgetting to DecRef() after calling one or the other. PiperOrigin-RevId: 336918922
2020-10-13	[vfs2] Destroy all tmpfs files when the filesystem is released.	Dean Deng
	In addition to fixing reference leaks, this change also releases memory used by regular tmpfs files once the containing filesystem is released. PiperOrigin-RevId: 336833111
2020-10-13	[vfs2] Add FilesystemType.Release to avoid reference leaks.	Dean Deng
	Singleton filesystem like devpts and devtmpfs have a single filesystem shared among all mounts, so they acquire a "self-reference" when initialized that must be released when the entire virtual filesystem is released at sandbox exit. PiperOrigin-RevId: 336828852
2020-09-28	Support inotify in overlayfs.	Dean Deng
	Fixes #1479, #317. PiperOrigin-RevId: 334258052
2020-09-24	Add basic stateify annotations.	Adin Scannell
	Updates #1663 PiperOrigin-RevId: 333539293
2020-09-18	Use a tmpfs file for shared anonymous and /dev/zero mmap on VFS2.	Jamie Liu
	This is more consistent with Linux (see comment on MM.NewSharedAnonMappable()). We don't do the same thing on VFS1 for reasons documented by the updated comment. PiperOrigin-RevId: 332514849
2020-09-17	fsimpl: improve the "implements" comments	Tiwei Bie
	As noticed by @ayushr2, the "implements" comments are not consistent, e.g. // IterDirents implements kernfs.inodeDynamicLookup. // Generate implements vfs.DynamicBytesSource.Generate. This patch improves this by making the comments like this consistently include the package name (when the interface and struct are not in the same package) and method name. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
2020-09-08	Honor readonly flag for root mount	Fabricio Voznika
	Updates #1487 PiperOrigin-RevId: 330580699
2020-09-08	[vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.	Ayush Ranjan
	PiperOrigin-RevId: 330554450
2020-09-02	[vfs] Implement xattr for overlayfs.	Ayush Ranjan
	PiperOrigin-RevId: 329825497
2020-08-27	unix: return ECONNREFUSE if a socket file exists but a socket isn't bound to it	Andrei Vagin
	PiperOrigin-RevId: 328843560
2020-08-26	tmpfs: Allow xattrs in the trusted namespace if creds has CAP_SYS_ADMIN.	Nicolas Lacasse
	This is needed to support the overlay opaque attribute. PiperOrigin-RevId: 328552985
2020-08-25	Return non-zero size for tmpfs statfs(2).	Jamie Liu
	This does not implement accepting or enforcing any size limit, which will be more complex and has performance implications; it just returns a fixed non-zero size. Updates #1936 PiperOrigin-RevId: 328428588
2020-08-21	Make mounts ReadWrite first, then later change to ReadOnly.	Nicolas Lacasse
	This lets us create "synthetic" mountpoint directories in ReadOnly mounts during VFS setup. Also add context.WithMountNamespace, as some filesystems (like overlay) require a MountNamespace on ctx to handle vfs.Filesystem Operations. PiperOrigin-RevId: 327874971
2020-08-20	Consistent precondition formatting	Michael Pratt
	Our "Preconditions:" blocks are very useful to determine the input invariants, but they are bit inconsistent throughout the codebase, which makes them harder to read (particularly cases with 5+ conditions in a single paragraph). I've reformatted all of the cases to fit in simple rules: 1. Cases with a single condition are placed on a single line. 2. Cases with multiple conditions are placed in a bulleted list. This format has been added to the style guide. I've also mentioned "Postconditions:", though those are much less frequently used, and all uses already match this style. PiperOrigin-RevId: 327687465
2020-08-18	Avoid holding locks when opening files in VFS2.	Jamie Liu
	Fixes #3243, #3521 PiperOrigin-RevId: 327308890
2020-08-17	[vfs] Do O_DIRECTORY check after resolving symlinks.	Ayush Ranjan
	Fixes python runtime test test_glob. Updates #3515 We were checking is the to-be-opened dentry is a dir or not before resolving symlinks. We should check that after resolving symlinks. This was preventing us from opening a symlink which pointed to a directory with O_DIRECTORY. Also added this check in tmpfs and removed a duplicate check. PiperOrigin-RevId: 327085895
2020-08-12	Add reference leak checking to vfs2 tmpfs.inode.	Dean Deng
	Updates #1486. PiperOrigin-RevId: 326354750
2020-08-12	[vfs2][gofer] Return appropriate errors when opening and creating files.	Ayush Ranjan
	Fixes php test ext/standard/tests/file/touch_variation5.phpt on vfs2. Updates #3516 Also spotted a bug with O_EXCL, where we did not return EEXIST when we tried to open the root of the filesystem with O_EXCL \| O_CREAT. Added some more tests for open() corner cases. PiperOrigin-RevId: 326346863
2020-08-05	Correctly decrement link counts in tmpfs rename operations.	Dean Deng
	When a directory is replaced by a rename operation, its link count should reach zero. We were missing the link from `dir/.` PiperOrigin-RevId: 325141730
2020-08-05	Add missing case in tmpfs.inode.direntType.	Dean Deng
	This was discovered by syzkaller. PiperOrigin-RevId: 325025193
2020-08-04	Automated rollback of changelist 324906582	Dean Deng
	PiperOrigin-RevId: 324931854
2020-08-04	Add reference counting utility to VFS2.	Dean Deng
	The utility has several differences from the VFS1 equivalent: - There are no weak references, which have a significant overhead - In order to print useful debug messages with the type of the reference- counted object, we use a generic Refs object with the owner type as a template parameter. In vfs1, this was accomplished by storing a type name and caller stack directly in the ref count (as in vfs1), which increases the struct size by 6x. (Note that the caller stack was needed because fs types like Dirent were shared by all fs implementations; in vfs2, each impl has its own data structures, so this is no longer necessary.) As an example, the utility is added to tmpfs.inode. Updates #1486. PiperOrigin-RevId: 324906582
2020-08-03	Plumbing context.Context to DecRef() and Release().	Nayana Bidari
	context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-23	Add permission checks to vfs2 truncate.	Dean Deng
	- Check write permission on truncate(2). Unlike ftruncate(2), truncate(2) fails if the user does not have write permissions on the file. - For gofers under InteropModeShared, check file type before making a truncate request. We should fail early and avoid making an rpc when possible. Furthermore, depending on the remote host's failure may give us unexpected behavior--if the host converts the truncate request to an ftruncate syscall on an open fd, we will get EINVAL instead of EISDIR. Updates #2923. PiperOrigin-RevId: 322913569
2020-07-22	[vfs2][tmpfs] Implement O_APPEND	Ayush Ranjan
	Updates #2923 PiperOrigin-RevId: 322671489
2020-07-07	Fix mknod and inotify syscall test	Ayush Ranjan
	This change fixes a few things: - creating sockets using mknod(2) is supported via vfs2 - fsgofer can create regular files via mknod(2) - mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well Updates #2923 PiperOrigin-RevId: 320074267
2020-07-01	Update preadv2/pwritev2 flag handling in vfs2.	Dean Deng
	We do not support RWF_SYNC/RWF_DSYNC and probably shouldn't silently accept them, since the user may incorrectly believe that we are synchronizing I/O. Remove the pwritev2 test verifying that we support these flags. gvisor.dev/issue/2601 is the tracking bug for deciding which RWF_.* flags we need and supporting them. Updates #2923, #2601. PiperOrigin-RevId: 319351286
2020-07-01	[vfs2][gofer] Update file size to 0 on O_TRUNC	Ayush Ranjan
	Some Open:TruncateXxx syscall tests were failing because the file size was not being updated when the file was opened with O_TRUNC. Fixes Truncate tests in test/syscalls:open_test_runsc_ptrace_vfs2. Updates #2923 PiperOrigin-RevId: 319340127
2020-07-01	Port fallocate to VFS2.	Zach Koopmans
	PiperOrigin-RevId: 319283715
2020-06-30	Allow O_DIRECT on vfs2 tmpfs files.	Dean Deng
	Updates #2923. PiperOrigin-RevId: 319153792
2020-06-27	Support sticky bit in vfs2.	Dean Deng
	Updates #2923. PiperOrigin-RevId: 318648128
2020-06-23	Complete inotify IN_EXCL_UNLINK implementation in VFS2.	Dean Deng
	Events were only skipped on parent directories after their children were unlinked; events on the unlinked file itself need to be skipped as well. As a result, all Watches.Notify() calls need to know whether the dentry where the call came from was unlinked. Updates #1479. PiperOrigin-RevId: 317979476
2020-06-23	Support inotify in vfs2 gofer fs.	Dean Deng
	Because there is no inode structure stored in the sandbox, inotify watches must be held on the dentry. This would be an issue in the presence of hard links, where multiple dentries would need to share the same set of watches, but in VFS2, we do not support the internal creation of hard links on gofer fs. As a result, we make the assumption that every dentry corresponds to a unique inode. Furthermore, dentries can be cached and then evicted, even if the underlying file has not be deleted. We must prevent this from occurring if there are any watches that would be lost. Note that if the dentry was deleted or invalidated (d.vfsd.IsDead()), we should still destroy it along with its watches. Additionally, when a dentry’s last watch is removed, we cache it if it also has zero references. This way, the dentry can eventually be evicted from memory if it is no longer needed. This is accomplished with a new dentry method, OnZeroWatches(), which is called by Inotify.RmWatch and Inotify.Release. Note that it must be called after all inotify locks are released to avoid violating lock order. Stress tests are added to make sure that inotify operations don't deadlock with gofer.OnZeroWatches. Updates #1479. PiperOrigin-RevId: 317958034
2020-06-19	Fix bugs in vfs2 to make symlink tests pass.	Dean Deng
	- Return ENOENT if target path is empty. - Make sure open(2) with O_CREAT\|O_EXCL returns EEXIST when necessary. - Correctly update atime in tmpfs using touchATime(). Updates #2923. PiperOrigin-RevId: 317382655
2020-06-19	Fix vfs2 handling of preadv2/pwritev2 flags.	Dean Deng
	Check for unsupported flags, and silently support RWF_HIPRI by doing nothing. From pkg/abi/linux/file.go: "gVisor does not implement the RWF_HIPRI feature, but the flag is accepted as a valid flag argument for preadv2/pwritev2." Updates #2923. PiperOrigin-RevId: 317330631
2020-06-18	Fix vfs2 tmpfs link permission checks.	Dean Deng
	Updates #2923. PiperOrigin-RevId: 317246916
2020-06-17	Implement Sync() to directories	Fabricio Voznika
	Updates #1035, #1199 PiperOrigin-RevId: 317028108
2020-06-17	Implement POSIX locks	Fabricio Voznika
	- Change FileDescriptionImpl Lock/UnlockPOSIX signature to take {start,length,whence}, so the correct offset can be calculated in the implementations. - Create PosixLocker interface to make it possible to share the same locking code from different implementations. Closes #1480 PiperOrigin-RevId: 316910286
2020-06-09	Implement flock(2) in VFS2	Fabricio Voznika
	LockFD is the generic implementation that can be embedded in FileDescriptionImpl implementations. Unique lock ID is maintained in vfs.FileDescription and is created on demand. Updates #1480 PiperOrigin-RevId: 315604825
2020-06-08	Implement VFS2 tmpfs mount options.	Jamie Liu
	As in VFS1, the mode, uid, and gid options are supported. Updates #1197 PiperOrigin-RevId: 315340510