summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2019-10-01Merge release-20190806.1-221-gdd69b49 (automated)gVisor bot
2019-10-01Disable cpuClockTicker when app is idleMichael Pratt
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to track their CPU usage. This improves latency in the syscall path by avoid expensive monotonic clock calls on every syscall entry/exit. However, this timer fires every 10ms. Thus, when all tasks are idle (i.e., blocked or stopped), this forces a sentry wakeup every 10ms, when we may otherwise be able to sleep until the next app-relevant event. These wakeups cause the sentry to utilize approximately 2% CPU when the application is otherwise idle. Updates to clock are not strictly necessary when the app is idle, as there are no readers of cpuClock. This commit reduces idle CPU by disabling the timer when tasks are completely idle, and computing its effects at the next wakeup. Rather than disabling the timer as soon as the app goes idle, we wait until the next tick, which provides a window for short sleeps to sleep and wakeup without doing the (relatively) expensive work of disabling and enabling the timer. PiperOrigin-RevId: 272265822
2019-10-01Merge release-20190806.1-217-g53cc72d (automated)gVisor bot
2019-10-01Honor X bit on extra anon pages in PT_LOAD segmentsMichael Pratt
Linux changed this behavior in 16e72e9b30986ee15f17fbb68189ca842c32af58 (v4.11). Previously, extra pages were always mapped RW. Now, those pages will be executable if the segment specified PF_X. They still must be writeable. PiperOrigin-RevId: 272256280
2019-10-01Merge release-20190806.1-216-g7a234f7 (automated)gVisor bot
2019-09-30splice: try another fallback option only if the previous one isn't supportedAndrei Vagin
Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com PiperOrigin-RevId: 272110815
2019-10-01Merge release-20190806.1-215-g29a1ba5 (automated)gVisor bot
2019-09-30splice: compare inode numbers only if both ends are pipesAndrei Vagin
It isn't allowed to splice data from and into the same pipe. But right now this check is broken, because we don't check that both ends are pipes. PiperOrigin-RevId: 272107022
2019-10-01Merge release-20190806.1-214-g20841b9 (automated)gVisor bot
2019-09-30Update FIXME bug with GitHub issue.Adin Scannell
PiperOrigin-RevId: 272101930
2019-09-30Merge release-20190806.1-210-g3ad17ff (automated)gVisor bot
2019-09-30Force timestamps to update when set via InodeOperations.SetTimestamps.Nicolas Lacasse
The gofer's CachingInodeOperations implementation contains an optimization for the common open-read-close pattern when we have a host FD. In this case, the host kernel will update the timestamp for us to a reasonably close time, so we don't need an extra RPC to the gofer. However, when the app explicitly sets the timestamps (via futimes or similar) then we actually DO need to update the timestamps, because the host kernel won't do it for us. To fix this, a new boolean `forceSetTimestamps` was added to CachineInodeOperations.SetMaskedAttributes. It is only set by gofer.InodeOperations.SetTimestamps. PiperOrigin-RevId: 272048146
2019-09-30Only copy out remaining time on nanosleep successMichael Pratt
It looks like the old code attempted to do this, but didn't realize that err != nil even in the happy case. PiperOrigin-RevId: 272005887
2019-09-28Merge release-20190806.1-207-geebc38b (automated)gVisor bot
2019-09-27Merge pull request #882 from DarcySail:darcy_faster_CopyStringIngVisor bot
PiperOrigin-RevId: 271675009
2019-09-27Merge release-20190806.1-204-g6a54aa1 (automated)gVisor bot
2019-09-27Merge release-20190806.1-203-g8539abc (automated)gVisor bot
2019-09-27Merge pull request #864 from tanjianfeng:fix-861gVisor bot
PiperOrigin-RevId: 271649711
2019-09-27Merge release-20190806.1-201-gabbee56 (automated)gVisor bot
2019-09-27Implement SO_BINDTODEVICE sockoptgVisor bot
PiperOrigin-RevId: 271644926
2019-09-26Merge release-20190806.1-197-g5434926 (automated)gVisor bot
2019-09-26Make raw socket tests pass in environments with or without CAP_NET_RAW.Kevin Krakauer
PiperOrigin-RevId: 271442321
2019-09-25Merge release-20190806.1-182-g99c86b8 (automated)gVisor bot
2019-09-25Merge pull request #863 from tanjianfeng:fix-862gVisor bot
PiperOrigin-RevId: 271168948
2019-09-25Merge release-20190806.1-180-g76ff194 (automated)gVisor bot
2019-09-24gvisor: change syscall.RawSyscall to syscall.RawSyscall6 where requiredgVisor bot
Before https://golang.org/cl/173160 syscall.RawSyscall would zero out the last three register arguments to the system call. That no longer happens. For system calls that take more than three arguments, use RawSyscall6 to ensure that we pass zero, not random data, for the additional arguments. PiperOrigin-RevId: 271062527
2019-09-24Merge release-20190806.1-178-g502f8f2 (automated)gVisor bot
2019-09-24Stub out readahead implementation.Adin Scannell
Closes #261 PiperOrigin-RevId: 270973347
2019-09-24tty: fix sending SIGTTOU on tty writehenry.tjf
How to reproduce: $ echo "timeout 10 ls" > foo.sh $ chmod +x foo.sh $ ./foo.sh (will hang here for 10 secs, and the output of ls does not show) When "ls" process writes to stdout, it receives SIGTTOU signal, and hangs there. Until "timeout" process timeouts, and kills "ls" process. The expected result is: "ls" writes its output into tty, and terminates immdedately, then "timeout" process receives SIGCHLD and terminates. The reason for this failure is that we missed the check for TOSTOP (if set, background processes will receive the SIGTTOU signal when they do write). We use drivers/tty/n_tty.c:n_tty_write() as a reference. Fixes: #862 Reported-by: chris.zn <chris.zn@antfin.com> Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Signed-off-by: chenglang.hy <chenglang.hy@antfin.com>
2019-09-23Merge release-20190806.1-168-g03ee55c (automated)gVisor bot
2019-09-23netstack: convert more socket options to {Set,Get}SockOptIntAndrei Vagin
PiperOrigin-RevId: 270763208
2019-09-23internal BUILD file cleanup.gVisor bot
PiperOrigin-RevId: 270680704
2019-09-20Change vfs.Dirent.Off to NextOff.Jamie Liu
"d_off is the distance from the start of the directory to the start of the next linux_dirent." - getdents(2). PiperOrigin-RevId: 270349685
2019-09-20fix set hostnameJianfeng Tan
Previously, when we set hostname: $ strace hostname abc ... sethostname("abc", 3) = -1 ENAMETOOLONG (File name too long) ... According to man 2 sethostname: "The len argument specifies the number of bytes in name. (Thus, name does not require a terminating null byte.)" We wrongly use the CopyStringIn() to check terminating zero byte in the implementation of sethostname syscall. To fix this, we use CopyInBytes() instead. Fixes: #861 Reported-by: chenglang.hy <chenglang.hy@antfin.com> Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-09-19Merge release-20190806.1-162-g75781ab (automated)gVisor bot
2019-09-19Remove defer from hot path and ensure Atomic is applied consistently.Adin Scannell
PiperOrigin-RevId: 270114317
2019-09-19Merge release-20190806.1-161-g1c0324d (automated)gVisor bot
2019-09-19Merge pull request #876 from xiaobo55x:hostcpugVisor bot
PiperOrigin-RevId: 270094324
2019-09-19Merge release-20190806.1-159-g0a8a75f (automated)gVisor bot
2019-09-19Job control: controlling TTYs and foreground process groups.Kevin Krakauer
Adresses a deadlock with the rolled back change: https://github.com/google/gvisor/commit/b6a5b950d28e0b474fdad160b88bc15314cf9259 Creating a session from an orphaned process group was causing a lock to be acquired twice by a single goroutine. This behavior is addressed, and a test (OrphanRegression) has been added to pty.cc. Implemented the following ioctls: - TIOCSCTTY - set controlling TTY - TIOCNOTTY - remove controlling tty, maybe signal some other processes - TIOCGPGRP - get foreground process group. Also enables tcgetpgrp(). - TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp(). Next steps are to actually turn terminal-generated control characters (e.g. C^c) into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when appropriate. PiperOrigin-RevId: 270088599
2019-09-19Accelerate byte lookup in string with `bytealg/indexbyte`Hang Su
`bytealg/indexbyte` will use AVX or SSE instruction set, if possible, which could accelerate `CopyStringIn` function by 28%. In worst case(CPU doesn't support SSE), `bytealg/indexbyte` will degenerate to traversal lookup. When dealing with short strings, `bytealg/indexbyte` has the same performance level as before. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Signed-off-by: Hang Su <darcy.sh@antfin.com>
2019-09-18Enable pkg/sentry/hostcpu support on arm64.Haibo Xu
Signed-off-by: Haibo Xu haibo.xu@arm.com Change-Id: I333872da9bdf56ddfa8ab2f034dfc1f36a7d3132
2019-09-18Merge release-20190806.1-156-gc98e7f0 (automated)gVisor bot
2019-09-18Signalfd supportAdin Scannell
Note that the exact semantics for these signalfds are slightly different from Linux. These signalfds are bound to the process at creation time. Reads, polls, etc. are all associated with signals directed at that task. In Linux, all signalfd operations are associated with current, regardless of where the signalfd originated. In practice, this should not be an issue given how signalfds are used. In order to fix this however, we will need to plumb the context through all the event APIs. This gets complicated really quickly, because the waiter APIs are all netstack-specific, and not generally exposed to the context. Probably not worthwhile fixing immediately. PiperOrigin-RevId: 269901749
2019-09-17Merge release-20190806.1-151-g3b7119a (automated)gVisor bot
2019-09-17platform/ptrace: log exit code for stub processesAndrei Vagin
PiperOrigin-RevId: 269631877
2019-09-14Merge release-20190806.1-145-g239a07a (automated)gVisor bot
2019-09-13gvisor: return ENOTDIR from the unlink syscallAndrei Vagin
ENOTDIR has to be returned when a component used as a directory in pathname is not, in fact, a directory. PiperOrigin-RevId: 269037893
2019-09-13Merge release-20190806.1-142-g7c6ab6a (automated)gVisor bot
2019-09-12Implement splice methods for pipes and sockets.Adin Scannell
This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207