gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-02-24	Kernfs should not try to rename a file to itself.	Nicolas Lacasse
	One precondition of VFS.PrepareRenameAt is that the `from` and `to` dentries are not the same. Kernfs was not checking this, which could lead to a deadlock. PiperOrigin-RevId: 359385974
2021-02-24	Use mapped device number + topmost inode number for all files in VFS2 overlay.	Jamie Liu
	Before this CL, VFS2's overlayfs uses a single private device number and an autoincrementing generated inode number for directories; this is consistent with Linux's overlayfs in the non-samefs non-xino case. However, this breaks some applications more consistently than on Linux due to more aggressive caching of Linux overlayfs dentries. Switch from using mapped device numbers + the topmost layer's inode number for just non-copied-up non-directory files, to doing so for all files. This still allows directory dev/ino numbers to change across copy-up, but otherwise keeps them consistent. Fixes #5545: ``` $ docker run --runtime=runsc-vfs2-overlay --rm ubuntu:focal bash -c "mkdir -p 1/2/3/4/5/6/7/8 && rm -rf 1 && echo done" done ``` PiperOrigin-RevId: 359350716
2021-02-24	Add YAMA security module restrictions on ptrace(2).	Dean Deng
	Restrict ptrace(2) according to the default configurations of the YAMA security module (mode 1), which is a common default among various Linux distributions. The new access checks only permit the tracer to proceed if one of the following conditions is met: a) The tracer is already attached to the tracee. b) The target is a descendant of the tracer. c) The target has explicitly given permission to the tracer through the PR_SET_PTRACER prctl. d) The tracer has CAP_SYS_PTRACE. See security/yama/yama_lsm.c for more details. Note that these checks are added to CanTrace, which is checked for PTRACE_ATTACH as well as some other operations, e.g., checking a process' memory layout through /proc/[pid]/mem. Since this patch adds restrictions to ptrace, it may break compatibility for applications run by non-root users that, for instance, rely on being able to trace processes that are not descended from the tracer (e.g., `gdb -p`). YAMA restrictions can be turned off by setting /proc/sys/kernel/yama/ptrace_scope to 0, or exceptions can be made on a per-process basis with the PR_SET_PTRACER prctl. Reported-by: syzbot+622822d8bca08c99e8c8@syzkaller.appspotmail.com PiperOrigin-RevId: 359237723
2021-02-11	Unconditionally check for directory-ness in overlay.filesystem.UnlinkAt().	Jamie Liu
	PiperOrigin-RevId: 357106080
2021-02-11	Internal change.	gVisor bot
	PiperOrigin-RevId: 357090170
2021-02-11	Assign controlling terminal when tty is opened and support NOCTTY	Kevin Krakauer
	PiperOrigin-RevId: 357015186
2021-02-10	Support setgid directories in tmpfs and kernfs	Kevin Krakauer
	PiperOrigin-RevId: 356868412
2021-02-09	Add support for setting SO_SNDBUF for unix domain sockets.	Bhasker Hariharan
	The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-09	pipe: writeLocked has to return ErrWouldBlock if the pipe is full	Andrei Vagin
	PiperOrigin-RevId: 356450303
2021-02-05	[vfs] Handle `.` and `..` as last path component names in kernfs Rename.	Ayush Ranjan
	According to vfs.FilesystemImpl.RenameAt documentation: - If the last path component in rp is "." or "..", and opts.Flags contains RENAME_NOREPLACE, RenameAt returns EEXIST. - If the last path component in rp is "." or "..", and opts.Flags does not contain RENAME_NOREPLACE, RenameAt returns EBUSY. Reported-by: syzbot+6189786e64fe13fe43f8@syzkaller.appspotmail.com PiperOrigin-RevId: 355959266
2021-02-04	Call kernfs.syntheticDir.InitRefs() on creation.	Nicolas Lacasse
	PiperOrigin-RevId: 355675900
2021-02-03	[vfs] Make sticky bit check consistent with Linux.	Ayush Ranjan
	Our implementation of vfs.CheckDeleteSticky was not consistent with Linux, specifically not consistent with fs/linux.h:check_sticky(). One of the biggest differences was that the vfs implementation did not allow the owner of the sticky directory to delete files inside it that belonged to other users. This change makes our implementation consistent with Linux. Also adds an integration test to check for this. This bug is also present in VFS1. Updates #3027 PiperOrigin-RevId: 355557425
2021-01-29	Fix deadlock in specialFileFD.pwrite	Fabricio Voznika
	When file is regular and metadata cache is authoritative, metadata lock is taken. The code deadlocks trying to acquire the metadata lock again to update time stampts. PiperOrigin-RevId: 354584594
2021-01-28	Change tcpip.Error to an interface	Tamir Duberstein
	This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28	[vfs] Fix rename implementation in OrderedChildren.	Ayush Ranjan
	Fixes #3027 as there is just 1 writable user using OrderedChildren's rename, unlink and rmdir (kernfs.syntheticDirectory) but it doesn't support the sticky bit yet. Fuse which is the other writable user implements its own Inode operations. PiperOrigin-RevId: 354386522
2021-01-26	Initialize timestamps for gofer synthetic children.	Dean Deng
	Contrary to the comment on the socket test, the failure was due to an issue with goferfs rather than kernfs. PiperOrigin-RevId: 353918021
2021-01-22	Implement F_GETLK fcntl.	Dean Deng
	Fixes #5113. PiperOrigin-RevId: 353313374
2021-01-20	Don't use task goroutine context in fsimpl tests.	Jamie Liu
	PiperOrigin-RevId: 352908368
2021-01-20	Move Lock/UnlockPOSIX into LockFD util.	Dean Deng
	PiperOrigin-RevId: 352904728
2021-01-20	Fix refcount increments in gofer.filesystem.Sync.	Jamie Liu
	Fixes #5263 PiperOrigin-RevId: 352903844
2021-01-14	Check for existence before permissions	Fabricio Voznika
	Return EEXIST when overwritting a file as long as the caller has exec permission on the parent directory, even if the caller doesn't have write permission. Also reordered the mount write check, which happens before permission is checked. Closes #5164 PiperOrigin-RevId: 351868123
2021-01-12	Fix simple mistakes identified by goreportcard.	Adin Scannell
	These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2021-01-05	fs/fuse: check that a task has a specified file descriptor	Andrei Vagin
	Reported-by: syzbot+814105309d2ae8651084@syzkaller.appspotmail.com PiperOrigin-RevId: 350159452
2020-12-31	Add missing error checks for FileDescription.Init.	Dean Deng
	Syzkaller discovered this bug in pipefs by doing something quite strange: creat(&(0x7f0000002a00)='./file1\x00', 0x0) mount(&(0x7f0000000440)=ANY=[], &(0x7f00000002c0)='./file1\x00', &(0x7f0000000300)='devtmpfs\x00', 0x20000d, 0x0) creat(&(0x7f0000000000)='./file1/file0\x00', 0x0) This can be reproduced with: touch mymount mkfifo /dev/mypipe mount -o ro -t devtmpfs devtmpfs mymount echo 123 > mymount/mypipe PiperOrigin-RevId: 349687714
2020-12-17	Set verityMu to be state nosave	Chong Cai
	PiperOrigin-RevId: 348092999
2020-12-17	Fix seek on /proc/pid/cmdline when task is zombie.	Nicolas Lacasse
	PiperOrigin-RevId: 348056159
2020-12-17	Set process group and session on host TTY	Fabricio Voznika
	Closes #5128 PiperOrigin-RevId: 348052446
2020-12-15	Change violation mode to an enum	Chong Cai
	PiperOrigin-RevId: 347706953
2020-12-15	Internal change.	gVisor bot
	PiperOrigin-RevId: 347671070
2020-12-11	Internal change.	gVisor bot
	PiperOrigin-RevId: 347091372
2020-12-11	Remove existing nogo exceptions.	Adin Scannell
	PiperOrigin-RevId: 347047550
2020-12-10	Change merkle root file name to avoid collision	Chong Cai
	PiperOrigin-RevId: 346923826
2020-12-07	Fix error handling on fusefs mount.	Rahat Mahmood
	Don't propagate arbitrary golang errors up from fusefs because errors that don't map to an errno result in a sentry panic. Reported-by: syzbot+697cb635346e456fddfc@syzkaller.appspotmail.com PiperOrigin-RevId: 346220306
2020-12-04	Overlay runsc regular file mounts with regular files.	Jamie Liu
	Fixes #4991 PiperOrigin-RevId: 345800333
2020-12-03	Implement `fcntl` options `F_GETSIG` and `F_SETSIG`.	Etienne Perot
	These options allow overriding the signal that gets sent to the process when I/O operations are available on the file descriptor, rather than the default `SIGIO` signal. Doing so also populates `siginfo` to contain extra information about which file descriptor caused the event (`si_fd`) and what events happened on it (`si_band`). The logic around which FD is populated within `si_fd` matches Linux's, which means it has some weird edge cases where that value may not actually refer to a file descriptor that is still valid. This CL also ports extra S/R logic regarding async handler in VFS2. Without this, async I/O handlers aren't properly re-registered after S/R. PiperOrigin-RevId: 345436598
2020-12-02	Remove FileReadWriteSeeker from vfs.	Jamie Liu
	Previous experience has shown that these types of wrappers tends to create two kinds of problems: hidden allocations (e.g. each call to FileReadWriteSeeker.Read/Write allocates a usermem.BytesIO on the heap) and hidden lock ordering problems (e.g. VFS1 splice deadlocks). Since this is only needed by fsimpl/verity, move it there. PiperOrigin-RevId: 345377830
2020-12-02	Clean up verity tests.	Dean Deng
	Refactor some utilities and rename some others for clarity. PiperOrigin-RevId: 345247836
2020-12-02	Add /proc/sys/kernel/sem.	Jing Chen
	PiperOrigin-RevId: 345178956
2020-11-24	Remove outdated TODO.	Dean Deng
	The bug has been fixed. PiperOrigin-RevId: 344088206
2020-11-23	Don't evict gofer.dentries with inotify watches before saving.	Jamie Liu
	PiperOrigin-RevId: 343959348
2020-11-20	Refactor verity test for readability	Chong Cai
	1. Add getD/getDentry methods to avoid long casting line in each test 2. Factor all calls to vfs.OpenAt/UnlinkAt/RenameAt on lower filesystem to their own method (for both lower file and lower Merkle file) so the tests are more readable 3. Add descriptive test names for delete/remove tests PiperOrigin-RevId: 343540202
2020-11-19	Remove racy stringification of socket fds from /proc/net/*.	Rahat Mahmood
	PiperOrigin-RevId: 343398191
2020-11-18	[vfs] kernfs: Do not panic if destroyed dentry is cached.	Ayush Ranjan
	If a kernfs user does not cache dentries, then cacheLocked will destroy the dentry. The current DecRef implementation will be racy in this case as the following can happen: - Goroutine 1 calls DecRef and decreases ref count from 1 to 0. - Goroutine 2 acquires d.fs.mu for reading and calls IncRef and increasing the ref count from 0 to 1. - Goroutine 2 releases d.fs.mu and calls DecRef again decreasing ref count from 1 to 0. - Goroutine 1 now acquires d.fs.mu and calls cacheLocked which destroys the dentry. - Goroutine 2 now acquires d.fs.mu and calls cacheLocked to find that the dentry is already destroyed! Earlier we would panic in this case, we could instead just return instead of adding complexity to handle this race. This is similar to what the gofer client does. We do not want to lock d.fs.mu in the case that the filesystem caches dentries (common case as procfs and sysfs do this) to prevent congestion due to lock contention. PiperOrigin-RevId: 343229496
2020-11-18	Port filesystem metrics to VFS2.	Jamie Liu
	PiperOrigin-RevId: 343196927
2020-11-17	fs/fuse: don't dereference fuse.DeviceFD.fs if it is nil	Andrei Vagin
	PiperOrigin-RevId: 342992936
2020-11-17	tmpfs: make sure that a dentry will not be destroyed before the open() call	Andrei Vagin
	If we don't hold a reference, the dentry can be destroyed by another thread. Reported-by: syzbot+f2132e50060c41f6d41f@syzkaller.appspotmail.com PiperOrigin-RevId: 342951940
2020-11-17	Add consistent precondition formatting for verity	Chong Cai
	Also add the lock order for verity fs, and add a lock to protect dentry hash. PiperOrigin-RevId: 342946537
2020-11-13	Have fuse.DeviceFD hold reference on fuse.filesystem.	Jamie Liu
	This is actually just b/168751672 again; cl/332394146 was incorrectly reverted by cl/341411151. Document the reference holder to reduce the likelihood that this happens again. Also document a few other bugs observed in the process. PiperOrigin-RevId: 342339144
2020-11-13	fs/tmpfs: change regularFile.size atomically	Andrei Vagin
	PiperOrigin-RevId: 342221309
2020-11-13	fs/tmpfs: use atomic operations to access inode.mode	Andrei Vagin
	PiperOrigin-RevId: 342214859