summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/fsimpl
AgeCommit message (Collapse)Author
2021-03-03Add checklocks analyzer.Bhasker Hariharan
This validates that struct fields if annotated with "// checklocks:mu" where "mu" is a mutex field in the same struct then access to the field is only done with "mu" locked. All types that are guarded by a mutex must be annotated with // +checklocks:<mutex field name> For more details please refer to README.md. PiperOrigin-RevId: 360729328
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-24Kernfs should not try to rename a file to itself.Nicolas Lacasse
One precondition of VFS.PrepareRenameAt is that the `from` and `to` dentries are not the same. Kernfs was not checking this, which could lead to a deadlock. PiperOrigin-RevId: 359385974
2021-02-24Use mapped device number + topmost inode number for all files in VFS2 overlay.Jamie Liu
Before this CL, VFS2's overlayfs uses a single private device number and an autoincrementing generated inode number for directories; this is consistent with Linux's overlayfs in the non-samefs non-xino case. However, this breaks some applications more consistently than on Linux due to more aggressive caching of Linux overlayfs dentries. Switch from using mapped device numbers + the topmost layer's inode number for just non-copied-up non-directory files, to doing so for all files. This still allows directory dev/ino numbers to change across copy-up, but otherwise keeps them consistent. Fixes #5545: ``` $ docker run --runtime=runsc-vfs2-overlay --rm ubuntu:focal bash -c "mkdir -p 1/2/3/4/5/6/7/8 && rm -rf 1 && echo done" done ``` PiperOrigin-RevId: 359350716
2021-02-24Add YAMA security module restrictions on ptrace(2).Dean Deng
Restrict ptrace(2) according to the default configurations of the YAMA security module (mode 1), which is a common default among various Linux distributions. The new access checks only permit the tracer to proceed if one of the following conditions is met: a) The tracer is already attached to the tracee. b) The target is a descendant of the tracer. c) The target has explicitly given permission to the tracer through the PR_SET_PTRACER prctl. d) The tracer has CAP_SYS_PTRACE. See security/yama/yama_lsm.c for more details. Note that these checks are added to CanTrace, which is checked for PTRACE_ATTACH as well as some other operations, e.g., checking a process' memory layout through /proc/[pid]/mem. Since this patch adds restrictions to ptrace, it may break compatibility for applications run by non-root users that, for instance, rely on being able to trace processes that are not descended from the tracer (e.g., `gdb -p`). YAMA restrictions can be turned off by setting /proc/sys/kernel/yama/ptrace_scope to 0, or exceptions can be made on a per-process basis with the PR_SET_PTRACER prctl. Reported-by: syzbot+622822d8bca08c99e8c8@syzkaller.appspotmail.com PiperOrigin-RevId: 359237723
2021-02-11Unconditionally check for directory-ness in overlay.filesystem.UnlinkAt().Jamie Liu
PiperOrigin-RevId: 357106080
2021-02-11Internal change.gVisor bot
PiperOrigin-RevId: 357090170
2021-02-11Assign controlling terminal when tty is opened and support NOCTTYKevin Krakauer
PiperOrigin-RevId: 357015186
2021-02-10Support setgid directories in tmpfs and kernfsKevin Krakauer
PiperOrigin-RevId: 356868412
2021-02-09Add support for setting SO_SNDBUF for unix domain sockets.Bhasker Hariharan
The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-09pipe: writeLocked has to return ErrWouldBlock if the pipe is fullAndrei Vagin
PiperOrigin-RevId: 356450303
2021-02-05[vfs] Handle `.` and `..` as last path component names in kernfs Rename.Ayush Ranjan
According to vfs.FilesystemImpl.RenameAt documentation: - If the last path component in rp is "." or "..", and opts.Flags contains RENAME_NOREPLACE, RenameAt returns EEXIST. - If the last path component in rp is "." or "..", and opts.Flags does not contain RENAME_NOREPLACE, RenameAt returns EBUSY. Reported-by: syzbot+6189786e64fe13fe43f8@syzkaller.appspotmail.com PiperOrigin-RevId: 355959266
2021-02-04Call kernfs.syntheticDir.InitRefs() on creation.Nicolas Lacasse
PiperOrigin-RevId: 355675900
2021-02-03[vfs] Make sticky bit check consistent with Linux.Ayush Ranjan
Our implementation of vfs.CheckDeleteSticky was not consistent with Linux, specifically not consistent with fs/linux.h:check_sticky(). One of the biggest differences was that the vfs implementation did not allow the owner of the sticky directory to delete files inside it that belonged to other users. This change makes our implementation consistent with Linux. Also adds an integration test to check for this. This bug is also present in VFS1. Updates #3027 PiperOrigin-RevId: 355557425
2021-01-29Fix deadlock in specialFileFD.pwriteFabricio Voznika
When file is regular and metadata cache is authoritative, metadata lock is taken. The code deadlocks trying to acquire the metadata lock again to update time stampts. PiperOrigin-RevId: 354584594
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-28[vfs] Fix rename implementation in OrderedChildren.Ayush Ranjan
Fixes #3027 as there is just 1 writable user using OrderedChildren's rename, unlink and rmdir (kernfs.syntheticDirectory) but it doesn't support the sticky bit yet. Fuse which is the other writable user implements its own Inode operations. PiperOrigin-RevId: 354386522
2021-01-26Initialize timestamps for gofer synthetic children.Dean Deng
Contrary to the comment on the socket test, the failure was due to an issue with goferfs rather than kernfs. PiperOrigin-RevId: 353918021
2021-01-22Implement F_GETLK fcntl.Dean Deng
Fixes #5113. PiperOrigin-RevId: 353313374
2021-01-20Don't use task goroutine context in fsimpl tests.Jamie Liu
PiperOrigin-RevId: 352908368
2021-01-20Move Lock/UnlockPOSIX into LockFD util.Dean Deng
PiperOrigin-RevId: 352904728
2021-01-20Fix refcount increments in gofer.filesystem.Sync.Jamie Liu
Fixes #5263 PiperOrigin-RevId: 352903844
2021-01-14Check for existence before permissionsFabricio Voznika
Return EEXIST when overwritting a file as long as the caller has exec permission on the parent directory, even if the caller doesn't have write permission. Also reordered the mount write check, which happens before permission is checked. Closes #5164 PiperOrigin-RevId: 351868123
2021-01-12Fix simple mistakes identified by goreportcard.Adin Scannell
These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2021-01-05fs/fuse: check that a task has a specified file descriptorAndrei Vagin
Reported-by: syzbot+814105309d2ae8651084@syzkaller.appspotmail.com PiperOrigin-RevId: 350159452
2020-12-31Add missing error checks for FileDescription.Init.Dean Deng
Syzkaller discovered this bug in pipefs by doing something quite strange: creat(&(0x7f0000002a00)='./file1\x00', 0x0) mount(&(0x7f0000000440)=ANY=[], &(0x7f00000002c0)='./file1\x00', &(0x7f0000000300)='devtmpfs\x00', 0x20000d, 0x0) creat(&(0x7f0000000000)='./file1/file0\x00', 0x0) This can be reproduced with: touch mymount mkfifo /dev/mypipe mount -o ro -t devtmpfs devtmpfs mymount echo 123 > mymount/mypipe PiperOrigin-RevId: 349687714
2020-12-17Set verityMu to be state nosaveChong Cai
PiperOrigin-RevId: 348092999
2020-12-17Fix seek on /proc/pid/cmdline when task is zombie.Nicolas Lacasse
PiperOrigin-RevId: 348056159
2020-12-17Set process group and session on host TTYFabricio Voznika
Closes #5128 PiperOrigin-RevId: 348052446
2020-12-15Change violation mode to an enumChong Cai
PiperOrigin-RevId: 347706953
2020-12-15Internal change.gVisor bot
PiperOrigin-RevId: 347671070
2020-12-11Internal change.gVisor bot
PiperOrigin-RevId: 347091372
2020-12-11Remove existing nogo exceptions.Adin Scannell
PiperOrigin-RevId: 347047550
2020-12-10Change merkle root file name to avoid collisionChong Cai
PiperOrigin-RevId: 346923826
2020-12-07Fix error handling on fusefs mount.Rahat Mahmood
Don't propagate arbitrary golang errors up from fusefs because errors that don't map to an errno result in a sentry panic. Reported-by: syzbot+697cb635346e456fddfc@syzkaller.appspotmail.com PiperOrigin-RevId: 346220306
2020-12-04Overlay runsc regular file mounts with regular files.Jamie Liu
Fixes #4991 PiperOrigin-RevId: 345800333
2020-12-03Implement `fcntl` options `F_GETSIG` and `F_SETSIG`.Etienne Perot
These options allow overriding the signal that gets sent to the process when I/O operations are available on the file descriptor, rather than the default `SIGIO` signal. Doing so also populates `siginfo` to contain extra information about which file descriptor caused the event (`si_fd`) and what events happened on it (`si_band`). The logic around which FD is populated within `si_fd` matches Linux's, which means it has some weird edge cases where that value may not actually refer to a file descriptor that is still valid. This CL also ports extra S/R logic regarding async handler in VFS2. Without this, async I/O handlers aren't properly re-registered after S/R. PiperOrigin-RevId: 345436598
2020-12-02Remove FileReadWriteSeeker from vfs.Jamie Liu
Previous experience has shown that these types of wrappers tends to create two kinds of problems: hidden allocations (e.g. each call to FileReadWriteSeeker.Read/Write allocates a usermem.BytesIO on the heap) and hidden lock ordering problems (e.g. VFS1 splice deadlocks). Since this is only needed by fsimpl/verity, move it there. PiperOrigin-RevId: 345377830
2020-12-02Clean up verity tests.Dean Deng
Refactor some utilities and rename some others for clarity. PiperOrigin-RevId: 345247836
2020-12-02Add /proc/sys/kernel/sem.Jing Chen
PiperOrigin-RevId: 345178956
2020-11-24Remove outdated TODO.Dean Deng
The bug has been fixed. PiperOrigin-RevId: 344088206
2020-11-23Don't evict gofer.dentries with inotify watches before saving.Jamie Liu
PiperOrigin-RevId: 343959348
2020-11-20Refactor verity test for readabilityChong Cai
1. Add getD/getDentry methods to avoid long casting line in each test 2. Factor all calls to vfs.OpenAt/UnlinkAt/RenameAt on lower filesystem to their own method (for both lower file and lower Merkle file) so the tests are more readable 3. Add descriptive test names for delete/remove tests PiperOrigin-RevId: 343540202
2020-11-19Remove racy stringification of socket fds from /proc/net/*.Rahat Mahmood
PiperOrigin-RevId: 343398191
2020-11-18[vfs] kernfs: Do not panic if destroyed dentry is cached.Ayush Ranjan
If a kernfs user does not cache dentries, then cacheLocked will destroy the dentry. The current DecRef implementation will be racy in this case as the following can happen: - Goroutine 1 calls DecRef and decreases ref count from 1 to 0. - Goroutine 2 acquires d.fs.mu for reading and calls IncRef and increasing the ref count from 0 to 1. - Goroutine 2 releases d.fs.mu and calls DecRef again decreasing ref count from 1 to 0. - Goroutine 1 now acquires d.fs.mu and calls cacheLocked which destroys the dentry. - Goroutine 2 now acquires d.fs.mu and calls cacheLocked to find that the dentry is already destroyed! Earlier we would panic in this case, we could instead just return instead of adding complexity to handle this race. This is similar to what the gofer client does. We do not want to lock d.fs.mu in the case that the filesystem caches dentries (common case as procfs and sysfs do this) to prevent congestion due to lock contention. PiperOrigin-RevId: 343229496
2020-11-18Port filesystem metrics to VFS2.Jamie Liu
PiperOrigin-RevId: 343196927
2020-11-17fs/fuse: don't dereference fuse.DeviceFD.fs if it is nilAndrei Vagin
PiperOrigin-RevId: 342992936
2020-11-17tmpfs: make sure that a dentry will not be destroyed before the open() callAndrei Vagin
If we don't hold a reference, the dentry can be destroyed by another thread. Reported-by: syzbot+f2132e50060c41f6d41f@syzkaller.appspotmail.com PiperOrigin-RevId: 342951940
2020-11-17Add consistent precondition formatting for verityChong Cai
Also add the lock order for verity fs, and add a lock to protect dentry hash. PiperOrigin-RevId: 342946537
2020-11-13Have fuse.DeviceFD hold reference on fuse.filesystem.Jamie Liu
This is actually just b/168751672 again; cl/332394146 was incorrectly reverted by cl/341411151. Document the reference holder to reduce the likelihood that this happens again. Also document a few other bugs observed in the process. PiperOrigin-RevId: 342339144