summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2019-10-29Disallow execveat on interpreter scripts with fd opened with O_CLOEXEC.Dean Deng
When an interpreter script is opened with O_CLOEXEC and the resulting fd is passed into execveat, an ENOENT error should occur (the script would otherwise be inaccessible to the interpreter). This matches the actual behavior of Linux's execveat. PiperOrigin-RevId: 277306680
2019-10-28Update commentMichael Pratt
FDTable.GetFile doesn't exist. PiperOrigin-RevId: 277089842
2019-10-25Aggregate arguments for loading executables into a single struct.Dean Deng
This change simplifies the function signatures of functions related to loading executables, such as LoadTaskImage, Load, loadBinary. PiperOrigin-RevId: 276821187
2019-10-25Convert DelayOption to the newer/faster SockOpt int type.Ian Gudger
DelayOption is set on all new endpoints in gVisor. PiperOrigin-RevId: 276746791
2019-10-25platform/ptrace: use tgkill instead of killAndrei Vagin
The syscall filters don't allow kill, just tgkill. PiperOrigin-RevId: 276718421
2019-10-24Handle AT_SYMLINK_NOFOLLOW flag for execveat.Dean Deng
PiperOrigin-RevId: 276441249
2019-10-23Handle AT_EMPTY_PATH flag in execveat.Dean Deng
PiperOrigin-RevId: 276419967
2019-10-23Merge pull request #641 from tanjianfeng:mastergVisor bot
PiperOrigin-RevId: 276380008
2019-10-23Keep minimal available fd to accelerate fd allocationDarcySail
Use fd.next to store the iteration start position, which can be used to accelerate allocating new FDs. And adding the corresponding gtest benchmark to measure performance. @tanjianfeng COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/758 from DarcySail:master 96685ec7886dfe1a64988406831d3bc002b438cc PiperOrigin-RevId: 276351250
2019-10-22Update const names to be Go style.Ian Lewis
PiperOrigin-RevId: 276165962
2019-10-22platform/ptrace: exit without panic if a stub process has been killed by SIGKILLAndrei Vagin
SIGKILL can be sent only by an user or OOM-killer. In both cases, we don't need to panic. PiperOrigin-RevId: 276150120
2019-10-21Remove old TODO.Nicolas Lacasse
PiperOrigin-RevId: 275956240
2019-10-21Add basic implementation of execveat syscall and associated tests.Dean Deng
Allow file descriptors of directories as well as AT_FDCWD. PiperOrigin-RevId: 275929668
2019-10-21AF_PACKET support for netstack (aka epsocket).Kevin Krakauer
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366
2019-10-19Add support for pipes in VFS2.Kevin Krakauer
PiperOrigin-RevId: 275650307
2019-10-17Refactor pipe to support VFS2.Kevin Krakauer
* Pulls common functionality (IO and locking on open) into pipe_util.go. * Adds pipe/vfs.go, which implements a subset of vfs.FileDescriptionImpl. A subsequent change will add support for pipes in memfs. PiperOrigin-RevId: 275322385
2019-10-16Reorder BUILD license and load functions in gvisor.Kevin Krakauer
PiperOrigin-RevId: 275139066
2019-10-16Add sublevel to kernel versionMichael Pratt
Standard Linux kernel versions are VERSION.PATCHLEVEL.SUBLEVEL. e.g., 4.4.0, even when the sublevel is 0. Match this standard. PiperOrigin-RevId: 275125715
2019-10-16Fix problem with open FD when copy up is triggered in overlayfsFabricio Voznika
Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-16Support O_SYNC and O_DSYNC flags.Nicolas Lacasse
When any of these flags are set, all writes will trigger a subsequent fsync call. This behavior already existed for "write-through" mounts. O_DIRECT is treated as an alias for O_SYNC. Better support coming soon. PiperOrigin-RevId: 275114392
2019-10-16Fix syscall changes lost in rebaseMichael Pratt
These syscalls were changed in the amd64 file around the time the arm64 PR was sent out, so their changes got lost. Updates #63 PiperOrigin-RevId: 275114194
2019-10-16Merge pull request #736 from tanjianfeng:fix-unixgVisor bot
PiperOrigin-RevId: 275114157
2019-10-15Minor vfs.FileDescriptionImpl fixes.Jamie Liu
- Pass context.Context to OnClose(). - Pass memmap.MMapOpts to ConfigureMMap() by pointer so that implementations can actually mutate it as required. PiperOrigin-RevId: 274934967
2019-10-15epsocket: support /proc/net/snmpJianfeng Tan
Netstack has its own stats, we use this to fill /proc/net/snmp. Note that some metrics are not recorded in Netstack, which will be shown as 0 in the proc file. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
2019-10-15netstack: add counters for tcp CurrEstab and EstabResetsJianfeng Tan
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-10-15hostinet: support /proc/net/snmp and /proc/net/devJianfeng Tan
For hostinet, we inherit the data from host procfs. To to that, we cache the fds for these files for later reads. Fixes #506 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I2f81215477455b9c59acf67e33f5b9af28ee0165
2019-10-15support /proc/net/routeJianfeng Tan
This proc file reports routing information to applications inside the container. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15support /proc/net/snmpJianfeng Tan
This proc file contains statistics according to [1]. [1] https://tools.ietf.org/html/rfc2013 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-14Internal change.gVisor bot
PiperOrigin-RevId: 274700093
2019-10-10Allow for zero byte iovec with MSG_PEEK | MSG_TRUNC in recvmsg.Ian Lewis
This allows for peeking at the length of the next message on a netlink socket without pulling it off the socket's buffer/queue, allowing tools like 'ip' to work. This CL also fixes an issue where dump_done_errno was not included in the NLMSG_DONE messages payload. Issue #769 PiperOrigin-RevId: 274068637
2019-10-10Fix bugs in fragment handling.Bhasker Hariharan
Strengthen the header.IPv4.IsValid check to correctly check for IHL/TotalLength fields. Also add a check to make sure fragmentOffsets + size of the fragment do not cause a wrap around for the end of the fragment. PiperOrigin-RevId: 274049313
2019-10-10Fix signalfd polling.Adin Scannell
The signalfd descriptors otherwise always show as available. This can lead programs to spin, assuming they are looking to see what signals are pending. Updates #139 PiperOrigin-RevId: 274017890
2019-10-09Internal change.gVisor bot
PiperOrigin-RevId: 273861936
2019-10-09Merge pull request #811 from lubinszARM:pr_testutilgVisor bot
PiperOrigin-RevId: 273781641
2019-10-07Implement IP_TTL.Ian Gudger
Also change the default TTL to 64 to match Linux. PiperOrigin-RevId: 273430341
2019-10-07Remove unnecessary context parameter for new pipes.Kevin Krakauer
PiperOrigin-RevId: 273421634
2019-10-07Rename epsocket to netstack.Kevin Krakauer
PiperOrigin-RevId: 273365058
2019-10-07Merge pull request #753 from lubinszARM:pr_syscall_linuxgVisor bot
PiperOrigin-RevId: 273364848
2019-10-04Add sanity check that overlayCreate is called with an overlay parent inode.Nicolas Lacasse
PiperOrigin-RevId: 272987037
2019-10-04Change linux.FileMode from uint to uint16, and update VFS to use FileMode.Kevin Krakauer
In Linux (include/linux/types.h), mode_t is an unsigned short. PiperOrigin-RevId: 272956350
2019-10-03Don't report partialResult errors from sendfileAndrei Vagin
The input file descriptor is always a regular file, so sendfile can't lose any data if it will not be able to write them to the output file descriptor. Reported-by: syzbot+22d22330a35fa1c02155@syzkaller.appspotmail.com PiperOrigin-RevId: 272730357
2019-10-02Merge pull request #865 from tanjianfeng:fix-829gVisor bot
PiperOrigin-RevId: 272522508
2019-10-02fs/proc: report PID-s from a pid namespace of the proc mountAndrei Vagin
Right now, we can find more than one process with the 1 PID in /proc. $ for i in `seq 10`; do > unshare -fp sleep 1000 & > done $ ls /proc 1 1 1 1 12 18 24 29 6 loadavg net sys version 1 1 1 1 16 20 26 32 cpuinfo meminfo self thread-self 1 1 1 1 17 21 28 36 filesystems mounts stat uptime PiperOrigin-RevId: 272506593
2019-10-02Merge branch 'master' into pr_syscall_linuxAndrei Vagin
2019-10-01Include AT_SECURE in the aux vectorMichael Pratt
gVisor does not currently implement the functionality that would result in AT_SECURE = 1, but Linux includes AT_SECURE = 0 in the normal case, so we should do the same. PiperOrigin-RevId: 272311488
2019-10-01Disable cpuClockTicker when app is idleMichael Pratt
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to track their CPU usage. This improves latency in the syscall path by avoid expensive monotonic clock calls on every syscall entry/exit. However, this timer fires every 10ms. Thus, when all tasks are idle (i.e., blocked or stopped), this forces a sentry wakeup every 10ms, when we may otherwise be able to sleep until the next app-relevant event. These wakeups cause the sentry to utilize approximately 2% CPU when the application is otherwise idle. Updates to clock are not strictly necessary when the app is idle, as there are no readers of cpuClock. This commit reduces idle CPU by disabling the timer when tasks are completely idle, and computing its effects at the next wakeup. Rather than disabling the timer as soon as the app goes idle, we wait until the next tick, which provides a window for short sleeps to sleep and wakeup without doing the (relatively) expensive work of disabling and enabling the timer. PiperOrigin-RevId: 272265822
2019-10-01Honor X bit on extra anon pages in PT_LOAD segmentsMichael Pratt
Linux changed this behavior in 16e72e9b30986ee15f17fbb68189ca842c32af58 (v4.11). Previously, extra pages were always mapped RW. Now, those pages will be executable if the segment specified PF_X. They still must be writeable. PiperOrigin-RevId: 272256280
2019-09-30splice: try another fallback option only if the previous one isn't supportedAndrei Vagin
Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com PiperOrigin-RevId: 272110815
2019-09-30splice: compare inode numbers only if both ends are pipesAndrei Vagin
It isn't allowed to splice data from and into the same pipe. But right now this check is broken, because we don't check that both ends are pipes. PiperOrigin-RevId: 272107022
2019-09-30Update FIXME bug with GitHub issue.Adin Scannell
PiperOrigin-RevId: 272101930