gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2020-09-16	Merge pull request #3934 from avagin:feature/fuse	gVisor bot
	PiperOrigin-RevId: 332122081
2020-09-16	Update fuse.md design doc with design details	Jinmou Li

2020-09-16	Revert "fuse: add benchmarking support for FUSE"	Andrei Vagin
	test/fuse/benchmark/read_benchmark.cc:34: Failure Expected: (fuse_prefix) != (nullptr), actual: NULL vs (nullptr) external/com_google_benchmark/src/benchmark_runner.cc:120: RunInThread: Check `st.iterations() >= st.max_iterations' failed. Benchmark returned before State::KeepRunning() returned false! --- FAIL: Benchmarks_BM_Read/262144/real_time (0.29s) runner.go:502: test "Benchmarks.BM_Read/262144/real_time" failed with error exit status 134, want nil FAIL
2020-09-16	fuse: add benchmarking support for FUSE	Boyuan He & Ridwan Sharif
	This change adds the following: - Add support for containerizing syscall tests for FUSE - Mount tmpfs in the container so we can run benchmarks against it - Run the server in a background process - benchmarks for fuse syscall Co-authored-by: Ridwan Sharif <ridwanmsharif@google.com>
2020-09-14	Correct FDSize in /proc/[pid]/status.	Jamie Liu
	In Linux, FDSize is fs/proc/array.c:task_state() => struct fdtable::max_fds, which is set to the underlying array's length in fs/file.c:alloc_fdtable(). Follow-up changes: - Remove FDTable.GetRefs() and FDTable.GetRefsVFS2(), which are unused. - Reset FDTable.used to 0 during restore, since the subsequent calls to FDTable.setAll() increment it again, causing its value to be doubled. (After this CL, FDTable.used is only used to avoid reallocation in FDTable.GetFDs(), so this fix is not very visible.) PiperOrigin-RevId: 331588190
2020-09-11	Move the 'marshal' and 'primitive' packages to the 'pkg' directory.	Rahat Mahmood
	PiperOrigin-RevId: 331256608
2020-09-11	Fix host unix socket to not swallow EOF incorrectly.	Bhasker Hariharan
	Fixes an error where in case of a receive buffer larger than the host send buffer size for a host backed unix dgram socket we would end up swallowing EOF from recvmsg syscall causing the read() to block forever. PiperOrigin-RevId: 331192810
2020-09-08	[vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.	Ayush Ranjan
	PiperOrigin-RevId: 330554450
2020-09-02	[vfs] Implement xattr for overlayfs.	Ayush Ranjan
	PiperOrigin-RevId: 329825497
2020-09-01	Refactor tty codebase to use master-replica terminology.	Ayush Ranjan
	Updates #2972 PiperOrigin-RevId: 329584905
2020-09-01	[go-marshal] Enable auto-marshalling for fs/tty.	Ayush Ranjan
	PiperOrigin-RevId: 329564614
2020-08-27	[go-marshal] Enable auto-marshalling for tundev.	Ayush Ranjan
	PiperOrigin-RevId: 328863725
2020-08-25	Return non-zero size for tmpfs statfs(2).	Jamie Liu
	This does not implement accepting or enforcing any size limit, which will be more complex and has performance implications; it just returns a fixed non-zero size. Updates #1936 PiperOrigin-RevId: 328428588
2020-08-25	[go-marshal] Enable auto-marshalling for host tty.	Ayush Ranjan
	PiperOrigin-RevId: 328415633
2020-08-20	Consistent precondition formatting	Michael Pratt
	Our "Preconditions:" blocks are very useful to determine the input invariants, but they are bit inconsistent throughout the codebase, which makes them harder to read (particularly cases with 5+ conditions in a single paragraph). I've reformatted all of the cases to fit in simple rules: 1. Cases with a single condition are placed on a single line. 2. Cases with multiple conditions are placed in a bulleted list. This format has been added to the style guide. I've also mentioned "Postconditions:", though those are much less frequently used, and all uses already match this style. PiperOrigin-RevId: 327687465
2020-08-18	Move ERESTART* error definitions to syserror package.	Dean Deng
	This is needed to avoid circular dependencies between the vfs and kernel packages. PiperOrigin-RevId: 327355524
2020-08-05	Add loss recovery option for TCP.	Nayana Bidari
	/proc/sys/net/ipv4/tcp_recovery is used to enable RACK loss recovery in TCP. PiperOrigin-RevId: 325157807
2020-08-03	Plumbing context.Context to DecRef() and Release().	Nayana Bidari
	context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-27	Move platform.File in memmap	Andrei Vagin
	The subsequent systrap changes will need to import memmap from the platform package. PiperOrigin-RevId: 323409486
2020-07-24	Enable automated marshalling for netstack.	Ayush Ranjan
	PiperOrigin-RevId: 322954792
2020-07-15	fdbased: Vectorized write for packet; relax writev syscall filter.	Ting-Yu Wang
	Now it calls pkt.Data.ToView() when writing the packet. This may require copying when the packet is large, which puts the worse case in an even worse situation. This sent out in a separate preparation change as it requires syscall filter changes. This change will be followed by the change for the adoption of the new PacketHeader API. PiperOrigin-RevId: 321447003
2020-06-18	Remove various uses of 'whitelist'	Michael Pratt
	Updates #2972 PiperOrigin-RevId: 317113059
2020-06-17	Remove various uses of 'blacklist'	Michael Pratt
	Updates #2972 PiperOrigin-RevId: 316942245
2020-06-11	Merge pull request #2863 from lubinszARM:pr_sndbuf	gVisor bot
	PiperOrigin-RevId: 315991648
2020-06-10	Deleting the maxSendBufferSize from fs/host	Bin Lu
	When I do high-performance networking, the value of wmem_max is often set very high, specially for 10/25/50 Gigabit NIC. I think maybe this restriction is not suitable. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-06-09	Implement flock(2) in VFS2	Fabricio Voznika
	LockFD is the generic implementation that can be embedded in FileDescriptionImpl implementations. Unique lock ID is maintained in vfs.FileDescription and is created on demand. Updates #1480 PiperOrigin-RevId: 315604825
2020-06-09	Don't WriteOut to readonly mounts	Fabricio Voznika
	When the file closes, it attempts to write dirty cached attributes to the file. This should not be done when the mount is readonly. PiperOrigin-RevId: 315585058
2020-06-08	Combine executable lookup code	Fabricio Voznika
	Run vs. exec, VFS1 vs. VFS2 were executable lookup were slightly different from each other. Combine them all into the same logic. PiperOrigin-RevId: 315426443
2020-06-03	Fix data race on f.offset.	Nicolas Lacasse
	We must hold f.mu to write f.offset. PiperOrigin-RevId: 314582968
2020-06-02	Add some detail to milestone #1	Ridwan Sharif
	This change adds more information about what needs to be done to implement `/dev/fuse`
2020-05-29	Refactor the ResolveExecutablePath logic.	Nicolas Lacasse
	PiperOrigin-RevId: 313871804
2020-05-28	Merge pull request #2792 from avagin:g3doc/fuse/refs	gVisor bot
	PiperOrigin-RevId: 313600051
2020-05-27	g3doc/fuse: add more references	Andrei Vagin

2020-05-26	Merge pull request #2751 from mrahatm:fuse	gVisor bot
	PiperOrigin-RevId: 313300882
2020-05-26	Write initial design doc for FUSE.	Rahat Mahmood

2020-05-19	Implement mmap for host fs in vfs2.	Dean Deng
	In VFS1, both fs/host and fs/gofer used the same utils for host file mappings. Refactor parts of fsimpl/gofer to create similar utils to share with fsimpl/host (memory accounting code moved to fsutil, page rounding arithmetic moved to usermem). Updates #1476. PiperOrigin-RevId: 312345090
2020-05-13	Enable overlayfs_stale_read by default for runsc.	Jamie Liu
	Linux 4.18 and later make reads and writes coherent between pre-copy-up and post-copy-up FDs representing the same file on an overlay filesystem. However, memory mappings remain incoherent: - Documentation/filesystems/overlayfs.rst, "Non-standard behavior": "If a file residing on a lower layer is opened for read-only and then memory mapped with MAP_SHARED, then subsequent changes to the file are not reflected in the memory mapping." - fs/overlay/file.c:ovl_mmap() passes through to the underlying FD without any management of coherence in the overlay. - Experimentally on Linux 5.2: ``` $ cat mmap_cat_page.c #include <err.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> int main(int argc, char *argv) { if (argc < 2) { errx(1, "syntax: %s [FILE]", argv[0]); } const int fd = open(argv[1], O_RDONLY); if (fd < 0) { err(1, "open(%s)", argv[1]); } const size_t page_size = sysconf(_SC_PAGE_SIZE); void page = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0); if (page == MAP_FAILED) { err(1, "mmap"); } for (;;) { write(1, page, strnlen(page, page_size)); if (getc(stdin) == EOF) { break; } } return 0; } $ gcc -O2 -o mmap_cat_page mmap_cat_page.c $ mkdir lowerdir upperdir workdir overlaydir $ echo old > lowerdir/file $ sudo mount -t overlay -o "lowerdir=lowerdir,upperdir=upperdir,workdir=workdir" none overlaydir $ ./mmap_cat_page overlaydir/file old ^Z [1]+ Stopped ./mmap_cat_page overlaydir/file $ echo new > overlaydir/file $ cat overlaydir/file new $ fg ./mmap_cat_page overlaydir/file old ``` Therefore, while the VFS1 gofer client's behavior of reopening read FDs is only necessary pre-4.18, replacing existing memory mappings (in both sentry and application address spaces) with mappings of the new FD is required regardless of kernel version, and this latter behavior is common to both VFS1 and VFS2. Re-document accordingly, and change the runsc flag to enabled by default. New test: - Before this CL: https://source.cloud.google.com/results/invocations/5b222d2c-e918-4bae-afc4-407f5bac509b - After this CL: https://source.cloud.google.com/results/invocations/f28c747e-d89c-4d8c-a461-602b33e71aab PiperOrigin-RevId: 311361267
2020-05-12	Don't allow rename across different gofer or tmpfs mounts.	Nicolas Lacasse
	Fixes #2651. PiperOrigin-RevId: 311193661
2020-05-07	Update privateunixsocket TODOs.	Dean Deng
	Synthetic sockets do not have the race condition issue in VFS2, and we will get rid of privateunixsocket as well. Fixes #1200. PiperOrigin-RevId: 310386474
2020-05-06	Add maximum memory limit.	Nicolas Lacasse
	PiperOrigin-RevId: 310179277
2020-05-05	Translate p9.NoUID/GID to OverflowUID/GID.	Jamie Liu
	p9.NoUID/GID (== uint32(-1) == auth.NoID) is not a valid auth.KUID/KGID; in particular, using it for file ownership causes capabilities to be ineffective since file capabilities require that the file's KUID and KGID are mapped into the capability holder's user namespace [1], and auth.NoID is not mapped into any user namespace. Map p9.NoUID/GID to a different, valid KUID/KGID; in the unlikely case that an application actually using the overflow KUID/KGID attempts an operation that is consequently permitted by client permission checks, the remote operation will still fail with EPERM. Since this changes the VFS2 gofer client to no longer ignore the invalid IDs entirely, this CL both permits and requires that we change synthetic mount point creation to use root credentials. [1] See fs.Inode.CheckCapability or vfs.GenericCheckPermissions. PiperOrigin-RevId: 309856455
2020-04-28	Support pipes and sockets in VFS2 gofer fs.	Dean Deng
	Named pipes and sockets can be represented in two ways in gofer fs: 1. As a file on the remote filesystem. In this case, all file operations are passed through 9p. 2. As a synthetic file that is internal to the sandbox. In this case, the dentry stores an endpoint or VFSPipe for sockets and pipes respectively, which replaces interactions with the remote fs through the gofer. In gofer.filesystem.MknodAt, we attempt to call mknod(2) through 9p, and if it fails, fall back to the synthetic version. Updates #1200. PiperOrigin-RevId: 308828161
2020-04-27	Import host sockets.	Dean Deng
	The FileDescription implementation for hostfs sockets uses the standard Unix socket implementation (unix.SocketVFS2), but is also tied to a hostfs dentry. Updates #1672, #1476 PiperOrigin-RevId: 308716426
2020-04-24	Port SCM Rights to VFS2.	Dean Deng
	Fixes #1477. PiperOrigin-RevId: 308317511
2020-04-23	Port devpts to VFS2.	Nicolas Lacasse
	PiperOrigin-RevId: 308164359
2020-04-22	Move user home detection to its own library.	Nicolas Lacasse
	PiperOrigin-RevId: 307977689
2020-04-21	Misc VFS2 fixes	Fabricio Voznika
	- Fix defer operation ordering in kernfs.Filesystem.AccessAt() - Add AT_NULL entry in proc/pid/auvx - Fix line padding in /proc/pid/maps - Fix linux_dirent serialization for getdents(2) - Remove file creation flags from vfs.FileDescription.statusFlags() Updates #1193, #1035 PiperOrigin-RevId: 307704159
2020-04-21	Sentry metrics updates.	Dave Bailey
	Sentry metrics with nanoseconds units are labeled as such, and non-cumulative sentry metrics are supported. PiperOrigin-RevId: 307621080
2020-04-13	Remove obsolete TODOs for b/38173783	Jon Budd
	The comments in the ticket indicate that this behavior is fine and that the ticket should be closed, so we shouldn't need pointers to the ticket. PiperOrigin-RevId: 306266071
2020-04-10	Use O_CLOEXEC when dup'ing FDs	Fabricio Voznika
	The sentry doesn't allow execve, but it's a good defense in-depth measure. PiperOrigin-RevId: 305958737