gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2019-12-16	Fix deadlock in overlay bind	Yong He
	Copy up parent when binding UDS on overlayfs is supported in commit 02ab1f187cd24c67b754b004229421d189cee264. But the using of copyUp in overlayBind will cause sentry stuck, reason is dead lock in renameMu. 1 [Process A] Invoke a Unix socket bind operation renameMu is hold in fs.(Dirent).genericCreate by process A 2 [Process B] Invoke a read syscall on /proc/task/mounts waitng on Lock of renameMu in fs.(MountNamespace).FindMount 3 [Process A] Continue Unix socket bind operation wating on RLock of renameMu in fs.copyUp Root cause is recursive reading lock of reanmeMu in bind call trace, if there are writing lock between the two reading lock, then deadlock occured. Fixes #1397
2019-12-09	Redirect TODOs to gvisor.dev	Fabricio Voznika
	PiperOrigin-RevId: 284606233
2019-12-06	Implement TTY field in control.Processes().	Nicolas Lacasse
	Threadgroups already know their TTY (if they have one), which now contains the TTY Index, and is returned in the Processes() call. PiperOrigin-RevId: 284263850
2019-12-05	Create correct file for /proc/[pid]/task/[tid]/io	Zach Koopmans
	PiperOrigin-RevId: 284038840
2019-12-03	Fix printing /proc/[pid]/io for /proc/[pid]/task/[tid]/io.	Zach Koopmans
	PiperOrigin-RevId: 283630669
2019-11-26	Allow open(O_TRUNC) and (f)truncate for proc files.	Ian Lewis
	This allows writable proc and devices files to be opened with O_CREAT\|O_TRUNC. This is encountered most frequently when interacting with proc or devices files via the command line. e.g. $ echo 8192 1048576 4194304 > /proc/sys/net/ipv4/tcp_rmem Also adds a test to test the behavior of open(O_TRUNC), truncate, and ftruncate on named pipes. Fixes #1116 PiperOrigin-RevId: 282677425
2019-11-25	Merge pull request #1176 from xiaobo55x:runsc_boot	gVisor bot
	PiperOrigin-RevId: 282382564
2019-11-21	Import and structure cleanup.	Adin Scannell
	PiperOrigin-RevId: 281795269
2019-11-20	Pass OpenTruncate to gofer in Open call when opening file with O_TRUNC.	Nicolas Lacasse
	Note that the Sentry still calls Truncate() on the file before calling Open. A new p9 version check was added to ensure that the p9 server can handle the the OpenTruncate flag. If not, then the flag is stripped before sending. PiperOrigin-RevId: 281609112
2019-11-18	Merge pull request #1177 from xiaobo55x:fs_host	gVisor bot
	PiperOrigin-RevId: 281112758
2019-11-14	Check that a file is a regular file with open(O_TRUNC).	Kevin Krakauer
	It was possible to panic the sentry by opening a cache revalidating folder with O_TRUNC\|O_CREAT. Avoids breaking php tests. PiperOrigin-RevId: 280533213
2019-11-12	Use overlay MountSource when binding socket in overlay.	Nicolas Lacasse
	PiperOrigin-RevId: 280131840
2019-11-13	Enable sentry/fs/host support on arm64.	Haibo Xu
	newfstatat() syscall is not supported on arm64, so we resort to use the fstatat() syscall. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: Iea95550ea53bcf85c01f7b3b95da70ad0952177d
2019-11-13	Enable runsc/boot support on arm64.	Haibo Xu
	This patch also include a minor change to replace syscall.Dup2 with syscall.Dup3 which was missed in a previous commit(ref a25a976). Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-08	Automated rollback of changelist 278417533	Kevin Krakauer
	PiperOrigin-RevId: 279365629
2019-11-04	Check that a file is a regular file with open(O_TRUNC).	Kevin Krakauer
	It was possible to panic the sentry by opening a cache revalidating folder with O_TRUNC\|O_CREAT. PiperOrigin-RevId: 278417533
2019-10-23	Merge pull request #641 from tanjianfeng:master	gVisor bot
	PiperOrigin-RevId: 276380008
2019-10-16	Reorder BUILD license and load functions in gvisor.	Kevin Krakauer
	PiperOrigin-RevId: 275139066
2019-10-16	Fix problem with open FD when copy up is triggered in overlayfs	Fabricio Voznika
	Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-16	Support O_SYNC and O_DSYNC flags.	Nicolas Lacasse
	When any of these flags are set, all writes will trigger a subsequent fsync call. This behavior already existed for "write-through" mounts. O_DIRECT is treated as an alias for O_SYNC. Better support coming soon. PiperOrigin-RevId: 275114392
2019-10-16	Merge pull request #736 from tanjianfeng:fix-unix	gVisor bot
	PiperOrigin-RevId: 275114157
2019-10-15	support /proc/net/route	Jianfeng Tan
	This proc file reports routing information to applications inside the container. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
2019-10-15	support /proc/net/snmp	Jianfeng Tan
	This proc file contains statistics according to [1]. [1] https://tools.ietf.org/html/rfc2013 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
2019-10-07	Remove unnecessary context parameter for new pipes.	Kevin Krakauer
	PiperOrigin-RevId: 273421634
2019-10-04	Add sanity check that overlayCreate is called with an overlay parent inode.	Nicolas Lacasse
	PiperOrigin-RevId: 272987037
2019-10-02	Merge pull request #865 from tanjianfeng:fix-829	gVisor bot
	PiperOrigin-RevId: 272522508
2019-10-02	fs/proc: report PID-s from a pid namespace of the proc mount	Andrei Vagin
	Right now, we can find more than one process with the 1 PID in /proc. $ for i in `seq 10`; do > unshare -fp sleep 1000 & > done $ ls /proc 1 1 1 1 12 18 24 29 6 loadavg net sys version 1 1 1 1 16 20 26 32 cpuinfo meminfo self thread-self 1 1 1 1 17 21 28 36 filesystems mounts stat uptime PiperOrigin-RevId: 272506593
2019-10-01	Disable cpuClockTicker when app is idle	Michael Pratt
	Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to track their CPU usage. This improves latency in the syscall path by avoid expensive monotonic clock calls on every syscall entry/exit. However, this timer fires every 10ms. Thus, when all tasks are idle (i.e., blocked or stopped), this forces a sentry wakeup every 10ms, when we may otherwise be able to sleep until the next app-relevant event. These wakeups cause the sentry to utilize approximately 2% CPU when the application is otherwise idle. Updates to clock are not strictly necessary when the app is idle, as there are no readers of cpuClock. This commit reduces idle CPU by disabling the timer when tasks are completely idle, and computing its effects at the next wakeup. Rather than disabling the timer as soon as the app goes idle, we wait until the next tick, which provides a window for short sleeps to sleep and wakeup without doing the (relatively) expensive work of disabling and enabling the timer. PiperOrigin-RevId: 272265822
2019-09-30	splice: try another fallback option only if the previous one isn't supported	Andrei Vagin
	Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com PiperOrigin-RevId: 272110815
2019-09-30	Force timestamps to update when set via InodeOperations.SetTimestamps.	Nicolas Lacasse
	The gofer's CachingInodeOperations implementation contains an optimization for the common open-read-close pattern when we have a host FD. In this case, the host kernel will update the timestamp for us to a reasonably close time, so we don't need an extra RPC to the gofer. However, when the app explicitly sets the timestamps (via futimes or similar) then we actually DO need to update the timestamps, because the host kernel won't do it for us. To fix this, a new boolean `forceSetTimestamps` was added to CachineInodeOperations.SetMaskedAttributes. It is only set by gofer.InodeOperations.SetTimestamps. PiperOrigin-RevId: 272048146
2019-09-24	tty: fix sending SIGTTOU on tty write	henry.tjf
	How to reproduce: $ echo "timeout 10 ls" > foo.sh $ chmod +x foo.sh $ ./foo.sh (will hang here for 10 secs, and the output of ls does not show) When "ls" process writes to stdout, it receives SIGTTOU signal, and hangs there. Until "timeout" process timeouts, and kills "ls" process. The expected result is: "ls" writes its output into tty, and terminates immdedately, then "timeout" process receives SIGCHLD and terminates. The reason for this failure is that we missed the check for TOSTOP (if set, background processes will receive the SIGTTOU signal when they do write). We use drivers/tty/n_tty.c:n_tty_write() as a reference. Fixes: #862 Reported-by: chris.zn <chris.zn@antfin.com> Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Signed-off-by: chenglang.hy <chenglang.hy@antfin.com>
2019-09-20	Implement /proc/net/tcp6	Jianfeng Tan
	Fixes: #829 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Signed-off-by: Jielong Zhou <jielong.zjl@antfin.com>
2019-09-19	Job control: controlling TTYs and foreground process groups.	Kevin Krakauer
	Adresses a deadlock with the rolled back change: https://github.com/google/gvisor/commit/b6a5b950d28e0b474fdad160b88bc15314cf9259 Creating a session from an orphaned process group was causing a lock to be acquired twice by a single goroutine. This behavior is addressed, and a test (OrphanRegression) has been added to pty.cc. Implemented the following ioctls: - TIOCSCTTY - set controlling TTY - TIOCNOTTY - remove controlling tty, maybe signal some other processes - TIOCGPGRP - get foreground process group. Also enables tcgetpgrp(). - TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp(). Next steps are to actually turn terminal-generated control characters (e.g. C^c) into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when appropriate. PiperOrigin-RevId: 270088599
2019-09-13	gvisor: return ENOTDIR from the unlink syscall	Andrei Vagin
	ENOTDIR has to be returned when a component used as a directory in pathname is not, in fact, a directory. PiperOrigin-RevId: 269037893
2019-09-12	Implement splice methods for pipes and sockets.	Adin Scannell
	This also allows the tee(2) implementation to be enabled, since dup can now be properly supported via WriteTo. Note that this change necessitated some minor restructoring with the fs.FileOperations splice methods. If the *fs.File is passed through directly, then only public API methods are accessible, which will deadlock immediately since the locking is already done by fs.Splice. Instead, we pass through an abstract io.Reader or io.Writer, which elide locks and use the underlying fs.FileOperations directly. PiperOrigin-RevId: 268805207
2019-09-12	Remove go_test from go_stateify and go_marshal	Michael Pratt
	They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-08-30	Remove support for non-incremental mapped accounting.	Jamie Liu
	PiperOrigin-RevId: 266496644
2019-08-30	Automated rollback of changelist 261387276	Bhasker Hariharan
	PiperOrigin-RevId: 266491264
2019-08-29	Implement /proc/net/udp.	Rahat Mahmood
	PiperOrigin-RevId: 266229756
2019-08-29	Add limit_host_fd_translation Gofer mount option.	Jamie Liu
	PiperOrigin-RevId: 266177409
2019-08-27	Mount volumes as super user	Fabricio Voznika
	This used to be the case, but regressed after a recent change. Also made a few fixes around it and clean up the code a bit. Closes #720 PiperOrigin-RevId: 265717496
2019-08-22	unix: return ECONNRESET if peer closed with data not read	Jianfeng Tan
	For SOCK_STREAM type unix socket, we shall return ECONNRESET if peer is closed with data not read. We explictly set a flag when closing one end, to differentiate from just shutdown (where zero shall be returned). Fixes: #735 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-08-14	Replace uinptr with int64 when returning lengths	Tamir Duberstein
	This is in accordance with newer parts of the standard library. PiperOrigin-RevId: 263449916
2019-08-13	Fix file mode check in pipeOperations	Fabricio Voznika
	PiperOrigin-RevId: 263203441
2019-08-09	ext: Move to pkg/sentry/fsimpl.	Ayush Ranjan
	fsimpl is the keeper of all filesystem implementations in VFS2. PiperOrigin-RevId: 262617869
2019-08-08	ext: Benchmark tests.	Ayush Ranjan
	Added benchmark tests which emulate memfs benchmarks. Stat benchmarks BenchmarkVFS2Ext4fsStat/1-12 10000000 145 ns/op BenchmarkVFS2Ext4fsStat/2-12 10000000 170 ns/op BenchmarkVFS2Ext4fsStat/3-12 10000000 202 ns/op BenchmarkVFS2Ext4fsStat/8-12 3000000 374 ns/op BenchmarkVFS2Ext4fsStat/64-12 500000 2159 ns/op BenchmarkVFS2Ext4fsStat/100-12 300000 3459 ns/op BenchmarkVFS1TmpfsStat/1-12 5000000 348 ns/op BenchmarkVFS1TmpfsStat/2-12 3000000 487 ns/op BenchmarkVFS1TmpfsStat/3-12 2000000 655 ns/op BenchmarkVFS1TmpfsStat/8-12 1000000 1365 ns/op BenchmarkVFS1TmpfsStat/64-12 200000 9565 ns/op BenchmarkVFS1TmpfsStat/100-12 100000 15158 ns/op BenchmarkVFS2MemfsStat/1-12 10000000 133 ns/op BenchmarkVFS2MemfsStat/2-12 10000000 155 ns/op BenchmarkVFS2MemfsStat/3-12 10000000 182 ns/op BenchmarkVFS2MemfsStat/8-12 5000000 310 ns/op BenchmarkVFS2MemfsStat/64-12 1000000 1659 ns/op BenchmarkVFS2MemfsStat/100-12 500000 2787 ns/op Mount Stat benchmarks BenchmarkVFS2ExtfsMountStat/1-12 5000000 245 ns/op BenchmarkVFS2ExtfsMountStat/2-12 5000000 266 ns/op BenchmarkVFS2ExtfsMountStat/3-12 5000000 304 ns/op BenchmarkVFS2ExtfsMountStat/8-12 3000000 456 ns/op BenchmarkVFS2ExtfsMountStat/64-12 500000 2308 ns/op BenchmarkVFS2ExtfsMountStat/100-12 300000 3482 ns/op BenchmarkVFS1TmpfsMountStat/1-12 3000000 488 ns/op BenchmarkVFS1TmpfsMountStat/2-12 2000000 658 ns/op BenchmarkVFS1TmpfsMountStat/3-12 2000000 806 ns/op BenchmarkVFS1TmpfsMountStat/8-12 1000000 1514 ns/op BenchmarkVFS1TmpfsMountStat/64-12 100000 10037 ns/op BenchmarkVFS1TmpfsMountStat/100-12 100000 15280 ns/op BenchmarkVFS2MemfsMountStat/1-12 10000000 212 ns/op BenchmarkVFS2MemfsMountStat/2-12 5000000 232 ns/op BenchmarkVFS2MemfsMountStat/3-12 5000000 264 ns/op BenchmarkVFS2MemfsMountStat/8-12 3000000 390 ns/op BenchmarkVFS2MemfsMountStat/64-12 1000000 1813 ns/op BenchmarkVFS2MemfsMountStat/100-12 500000 2812 ns/op PiperOrigin-RevId: 262477158
2019-08-08	Return a well-defined socket address type from socket funtions.	Rahat Mahmood
	Previously we were representing socket addresses as an interface{}, which allowed any type which could be binary.Marshal()ed to be used as a socket address. This is fine when the address is passed to userspace via the linux ABI, but is problematic when used from within the sentry such as by networking procfs files. PiperOrigin-RevId: 262460640
2019-08-07	ext: Seek unit tests.	Ayush Ranjan
	PiperOrigin-RevId: 262264674
2019-08-07	ext: StatAt unit tests.	Ayush Ranjan
	PiperOrigin-RevId: 262249166
2019-08-07	ext: Read unit tests.	Ayush Ranjan
	PiperOrigin-RevId: 262242410