summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2019-05-30gvisor: socket() returns EPROTONOSUPPORT if protocol is not supportedAndrei Vagin
PiperOrigin-RevId: 250426407
2019-05-30Always wait on tracee childrenMichael Pratt
After bf959931ddb88c4e4366e96dd22e68fa0db9527c ("wait/ptrace: assume __WALL if the child is traced") (Linux 4.7), tracees are always eligible for waiting, regardless of type. PiperOrigin-RevId: 250399527
2019-05-30Remove obsolete bug.Adin Scannell
The original bug is no longer relevant, and the FIXME here contains lots of obsolete information. PiperOrigin-RevId: 249924036
2019-05-24Remove obsolete TODO.Adin Scannell
We don't need to model internal interfaces after the system call interfaces (which are objectively worse and simply use a flag to distinguish between two logically different operations). PiperOrigin-RevId: 249916814 Change-Id: I45d02e0ec0be66b782a685b1f305ea027694cab9
2019-05-23gvisor: interrupt the sendfile system call if a task has been interruptedAndrei Vagin
sendfile can be called for a big range and it can require significant amount of time to process it, so we need to handle task interrupts in this system call. PiperOrigin-RevId: 249781023 Change-Id: Ifc2ec505d74c06f5ee76f93b8d30d518ec2d4015
2019-05-23Added boilerplate code for ext4 fs.Ayush Ranjan
Initialized BUILD with license Mount is still unimplemented and is not meant to be part of this CL. Rest of the fs interface is implemented. Referenced the Linux kernel appropriately when needed PiperOrigin-RevId: 249741997 Change-Id: Id1e4c7c9e68b3f6946da39896fc6a0c3dcd7f98c
2019-05-23Initial support for bind mountsFabricio Voznika
Separate MountSource from Mount. This is needed to allow mounts to be shared by multiple containers within the same pod. PiperOrigin-RevId: 249617810 Change-Id: Id2944feb7e4194951f355cbe6d4944ae3c02e468
2019-05-22Log unhandled faults only at DEBUG level.Adin Scannell
PiperOrigin-RevId: 249561399 Change-Id: Ic73c68c8538bdca53068f38f82b7260939addac2
2019-05-22Add WCLONE / WALL support to waitidMichael Pratt
The previous commit adds WNOTHREAD support to waitid, so we may as well complete the upstream change. Linux added WCLONE, WALL, WNOTHREAD support to waitid(2) in 91c4e8ea8f05916df0c8a6f383508ac7c9e10dba ("wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL"). i.e., Linux 4.7. PiperOrigin-RevId: 249560587 Change-Id: Iff177b0848a3f7bae6cb5592e44500c5a942fbeb
2019-05-22Remove obsolete TODO.Adin Scannell
There no obvious reason to require that BlockSize and StatFS are MountSource operations. Today they are in INodeOperations, and they can be moved elsewhere in the future as part of a normal refactor process. PiperOrigin-RevId: 249549982 Change-Id: Ib832e02faeaf8253674475df4e385bcc53d780f3
2019-05-22Add support for wait(WNOTHREAD)Michael Pratt
PiperOrigin-RevId: 249537694 Change-Id: Iaa4bca73a2d8341e03064d59a2eb490afc3f80da
2019-05-22UDP and TCP raw socket support.Kevin Krakauer
PiperOrigin-RevId: 249511348 Change-Id: I34539092cc85032d9473ff4dd308fc29dc9bfd6b
2019-05-22Move wait constants to abi/linux packageMichael Pratt
Updates #214 PiperOrigin-RevId: 249483756 Change-Id: I0d3cf4112bed75a863d5eb08c2063fbc506cd875
2019-05-21Clean up pipe internals and add fcntl supportAdin Scannell
Pipe internals are made more efficient by avoiding garbage collection. A pool is now used that can be shared by all pipes, and buffers are chained via an intrusive list. The documentation for pipe structures and methods is also simplified and clarified. The pipe tests are now parameterized, so that they are run on all different variants (named pipes, small buffers, default buffers). The pipe buffer sizes are exposed by fcntl, which is now supported by this change. A size change test has been added to the suite. These new tests uncovered a bug regarding the semantics of open named pipes with O_NONBLOCK, which is also fixed by this CL. This fix also addresses the lack of the O_LARGEFILE flag for named pipes. PiperOrigin-RevId: 249375888 Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773
2019-05-21Fix inconsistencies in ELF anonymous mappingsMichael Pratt
* A segment with filesz == 0, memsz > 0 should be an anonymous only mapping. We were failing to load such an ELF. * Anonymous pages are always mapped RW, regardless of the segment protections. PiperOrigin-RevId: 249355239 Change-Id: I251e5c0ce8848cf8420c3aadf337b0d77b1ad991
2019-05-21Add basic plumbing for splice and stub implementation.Adin Scannell
This does not actually implement an efficient splice or sendfile. Rather, it adds a generic plumbing to the file internals so that this can be added. All file implementations use the stub fileutil.NoSplice implementation, which causes sendfile and splice to fall back to an internal copy. A basic splice system call interface is added, along with a test. PiperOrigin-RevId: 249335960 Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
2019-05-21Remove unused struct member.Neel Natu
Remove unused struct member. PiperOrigin-RevId: 249300446 Change-Id: Ifb16538f684bc3200342462c3da927eb564bf52d
2019-05-20Forward named pipe creation to the goferMichael Pratt
The backing 9p server must allow named pipe creation, which the runsc fsgofer currently does not. There are small changes to the overlay here. GetFile may block when opening a named pipe, which can cause a deadlock: 1. open(O_RDONLY) -> copyMu.Lock() -> GetFile() 2. open(O_WRONLY) -> copyMu.Lock() -> Deadlock A named pipe usable for writing must already be on the upper filesystem, but we are still taking copyMu for write when checking for upper. That can be changed to a read lock to fix the common case. However, a named pipe on the lower filesystem would still deadlock in open(O_WRONLY) when it tries to actually perform copy up (which would simply return EINVAL). Move the copy up type check before taking copyMu for write to avoid this. p9 must be modified, as it was incorrectly removing the file mode when sending messages on the wire. PiperOrigin-RevId: 249154033 Change-Id: Id6637130e567b03758130eb6c7cdbc976384b7d6
2019-05-20Fix incorrect tmpfs timestamp updatesMichael Pratt
* Creation of files, directories (and other fs objects) in a directory should always update ctime. * Same for removal. * atime should not be updated on lookup, only readdir. I've also renamed some misleading functions that update mtime and ctime. PiperOrigin-RevId: 249115063 Change-Id: I30fa275fa7db96d01aa759ed64628c18bb3a7dc7
2019-05-17Return EPERM for mknodMichael Pratt
This more directly matches what Linux does with unsupported nodes. PiperOrigin-RevId: 248780425 Change-Id: I17f3dd0b244f6dc4eb00e2e42344851b8367fbec
2019-05-17Fix gofer rename ctime and cleanup stat_times testMichael Pratt
There is a lot of redundancy that we can simplify in the stat_times test. This will make it easier to add new tests. However, the simplification reveals that cached uattrs on goferfs don't properly update ctime on rename. PiperOrigin-RevId: 248773425 Change-Id: I52662728e1e9920981555881f9a85f9ce04041cf
2019-05-15gofer: don't call hostfile.Close if hostFile is nilAndrei Vagin
PiperOrigin-RevId: 248437159 Change-Id: Ife71f6ca032fca59ec97a82961000ed0af257101
2019-05-14Start of support for /proc/pid/cgroup file.Nicolas Lacasse
PiperOrigin-RevId: 248263378 Change-Id: Ic057d2bb0b6212110f43ac4df3f0ac9bf931ab98
2019-05-14Remove false commentMichael Pratt
PiperOrigin-RevId: 248249285 Change-Id: I9b6d267baa666798b22def590ff20c9a118efd47
2019-05-10Add pgalloc.DelayedEvictionManual.Jamie Liu
PiperOrigin-RevId: 247667272 Change-Id: I16b04e11bb93f50b7e05e888992303f730e4a877
2019-05-09Implement fallocate(2)Fabricio Voznika
Closes #225 PiperOrigin-RevId: 247508791 Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
2019-05-08Set the FilesytemType in MountSource from the Filesystem.Nicolas Lacasse
And stop storing the Filesystem in the MountSource. This allows us to decouple the MountSource filesystem type from the name of the filesystem. PiperOrigin-RevId: 247292982 Change-Id: I49cbcce3c17883b7aa918ba76203dfd6d1b03cc8
2019-05-07Remove defers from gofer.contextFileFabricio Voznika
Most are single line methods in hot paths. PiperOrigin-RevId: 247050267 Change-Id: I428d78723fe00b57483185899dc8fa9e1f01e2ea
2019-05-06Ensure all uses of MM.brk occur under MM.mappingMu in MM.Brk().Jamie Liu
PiperOrigin-RevId: 246921386 Change-Id: I71d8908858f45a9a33a0483470d0240eaf0fd012
2019-05-03gofer: don't leak file descriptorsAndrei Vagin
Fixes #219 PiperOrigin-RevId: 246568639 Change-Id: Ic7afd15dde922638d77f6429c508d1cbe2e4288a
2019-04-30Update reference to old typeMichael Pratt
PiperOrigin-RevId: 246036806 Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7
2019-04-30Implement async MemoryFile eviction, and use it in CachingInodeOperations.Jamie Liu
This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb
2019-04-29Implement the MSG_CTRUNC msghdr flag for Unix sockets.Ian Gudger
Updates google/gvisor#206 PiperOrigin-RevId: 245880573 Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29Allow and document bug ids in gVisor codebase.Nicolas Lacasse
PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29createAt should return all errors from FindInode except ENOENT.Nicolas Lacasse
Previously, createAt was eating all errors from FindInode except for EACCES and proceeding with the creation. This is incorrect, as FindInode can return many other errors (like ENAMETOOLONG) that should stop creation. This CL changes createAt to return all errors encountered except for ENOENT, which we can ignore because we are about to create the thing. PiperOrigin-RevId: 245773222 Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
2019-04-26kvm: remove non-sane sanity checkAdin Scannell
Apparently some platforms don't have pSize < vSize. Fixes #208 PiperOrigin-RevId: 245480998 Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
2019-04-26Fix reference counting bug in /proc/PID/fdinfo/.Kevin Krakauer
PiperOrigin-RevId: 245452217 Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
2019-04-25Perform explicit CPUID and FP state compatibility checks on restoreMichael Pratt
PiperOrigin-RevId: 245341004 Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657
2019-04-25Don't enforce NAME_MAX in fs.Dirent.walk().Jamie Liu
Maximum filename length is filesystem-dependent, and obtained via statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not always. For example, VFAT supports filenames of up to 255... UCS-2 characters, which Linux conservatively takes to mean UTF-8-encoded bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. As a result, Linux's VFS does not enforce NAME_MAX: $ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/ fs/libfs.c 38: buf->f_namelen = NAME_MAX; 64: if (dentry->d_name.len > NAME_MAX) include/linux/relay.h 74: char base_filename[NAME_MAX]; /* saved base filename */ include/linux/fscrypt.h 149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length. include/linux/exportfs.h 176: * understanding that it is already pointing to a a %NAME_MAX+1 sized Remove this check from core VFS, and add it to ramfs (and by extension tmpfs), where it is actually applicable: mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does* enforce NAME_MAX. PiperOrigin-RevId: 245324748 Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-22Bugfix: fix fstatat symbol link to dirWei Zhang
For a symbol link to some directory, eg. `/tmp/symlink -> /tmp/dir` `fstatat("/tmp/symlink")` should return symbol link data, but `fstatat("/tmp/symlink/")` (symlink with trailing slash) should return directory data it points following linux behaviour. Currently fstatat() a symlink with trailing slash will get "not a directory" error which is wrong. Signed-off-by: Wei Zhang <zhangwei198900@gmail.com> Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2 PiperOrigin-RevId: 244783916
2019-04-22Fix doc typoMichael Pratt
PiperOrigin-RevId: 244773890 Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa
2019-04-22Clean up state error handlingMichael Pratt
PiperOrigin-RevId: 244773836 Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf
2019-04-19Add support for the MSG_TRUNC msghdr flag.Ian Gudger
The MSG_TRUNC flag is set in the msghdr when a message is truncated. Fixes google/gvisor#200 PiperOrigin-RevId: 244440486 Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-18Format struct pollfd in poll(2)/ppoll(2)Michael Pratt
I0410 15:40:38.854295 3776 x:0] [ 1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1) I0410 15:40:38.854348 3776 x:0] [ 1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s) PiperOrigin-RevId: 244269879 Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
2019-04-18Only emit unimplemented syscall events for unsupported values.Ian Gudger
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER when attempting to set unsupported values. PiperOrigin-RevId: 244229675 Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
2019-04-17Don't allow sigtimedwait to catch unblockable signalsMichael Pratt
The existing logic attempting to do this is incorrect. Unary ^ has higher precedence than &^, so mask always has UnblockableSignals cleared, allowing dequeueSignalLocked to dequeue unblockable signals (which allows userspace to ignore them). Switch the logic so that unblockable signals are always masked. PiperOrigin-RevId: 244058487 Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
2019-04-17Use FD limit and file size limit from hostFabricio Voznika
FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-04-17Convert poll/select to operate more directly on linux.PollFDMichael Pratt
Current, doPoll copies the user struct pollfd array into a []syscalls.PollFD, which contains internal kdefs.FD and waiter.EventMask types. While these are currently binary-compatible with the Linux versions, we generally discourage copying directly to internal types (someone may inadvertantly change kdefs.FD to uint64). Instead, copy directly to a []linux.PollFD, which will certainly be binary compatible. Most of syscalls/polling.go is included directly into syscalls/linux/sys_poll.go, as it can then operate directly on linux.PollFD. The additional syscalls.PollFD type is providing little value. I've also added explicit conversion functions for waiter.EventMask, which creates the possibility of a different binary format. PiperOrigin-RevId: 244042947 Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
2019-04-11Format FDs in strace logsMichael Pratt
Normal files display their path in the current mount namespace: I0410 10:57:54.964196 216336 x:0] [ 1] ls X read(0x3 /proc/filesystems, 0x55cee3bdb2c0 "nodev\t9p\nnodev\tdevpts \nnodev\tdevtmpfs\nnodev\tproc\nnodev\tramdiskfs\nnodev\tsysfs\nnodev\ttmpfs\n", 0x1000) = 0x58 (24.462?s) AT_FDCWD includes the CWD: I0411 12:58:48.278427 1526 x:0] [ 1] stat_test E newfstatat(AT_FDCWD /home/prattmic, 0x55ea719b564e /proc/self, 0x7ef5cefc2be8, 0x0) Sockets (and other non-vfs files) display an inode number (like /proc/PID/fd): I0410 10:54:38.909123 207684 x:0] [ 1] nc E bind(0x3 socket:[1], 0x55b5a1652040 {Family: AF_INET, Addr: , Port: 8080}, 0x10) I also fixed a few syscall args that should be Path. PiperOrigin-RevId: 243169025 Change-Id: Ic7dda6a82ae27062fe2a4a371557acfd6a21fa2a