gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-07-13	Implement Registry.Remove.	Zyad A. Ali
	Remove implements the behaviour or msgctl(IPC_RMID). Updates #135
2021-07-13	Implement Registry.FindOrCreate.	Zyad A. Ali
	FindOrCreate implements the behaviour of msgget(2). Updates #135
2021-07-13	Create package msgqueue.	Zyad A. Ali
	Create package msgqueue, define primitives to be used for message queues, and add a msgqueue.Registry to IPCNamespace. Updates #135
2021-07-13	Create ipc.Registry.	Zyad A. Ali
	Create ipc.Registry to hold fields, and define functionality common to all SysV registries, and have registries use it.
2021-07-13	Create ipc package and ipc.Object.	Zyad A. Ali
	Create ipc.Object to define fields and functionality used in SysV mechanisms, and have them use it.
2021-07-12	netstack: move SO_SNDBUF/RCVBUF clamping logic out of //pkg/tcpip	Kevin Krakauer
	- Keeps Linux-specific behavior out of //pkg/tcpip - Makes it clearer that clamping is done only for setsockopt calls from users - Removes code duplication PiperOrigin-RevId: 384389809
2021-07-12	Fix deadlock in procfs	Fabricio Voznika
	Kernfs provides an internal mechanism to defer calls to `DecRef()` because on the last reference `Filesystem.mu` must be held and most places that need to call `DecRef()` are inside the lock. The same can be true for filesystems that extend kernfs. procfs needs to look up files and `DecRef()` them inside the `kernfs.Filesystem.mu`. If the files happen to be procfs files, it can deadlock trying to decrement if it's the last reference. This change extends the mechanism to external callers to defer DecRefs to `vfs.FileDescription` and `vfs.VirtualDentries`. PiperOrigin-RevId: 384361647
2021-07-12	Fix stdios ownership	Fabricio Voznika
	Set stdio ownership based on the container's user to ensure the user can open/read/write to/from stdios. 1. stdios in the host are changed to have the owner be the same uid/gid of the process running the sandbox. This ensures that the sandbox has full control over it. 2. stdios owner owner inside the sandbox is changed to match the container's user to give access inside the container and make it behave the same as runc. Fixes #6180 PiperOrigin-RevId: 384347009
2021-07-12	[syserror] Update syserror to linuxerr for more errors.	Zach Koopmans
	Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-12	Mark all functions that are called from a forked child with go:norace	Andrei Vagin
	PiperOrigin-RevId: 384305599
2021-07-12	Go 1.17 support for the KVM platform	Michael Pratt
	Go 1.17 adds a new register-based calling convention. While transparent for most applications, the KVM platform needs special work in a few cases. First of all, we need the actual address of some assembly functions, rather than the address of a wrapper. See http://gvisor.dev/pr/5832 for complete discussion of this. More relevant to this CL is that ABI0-to-ABIInternal wrappers (i.e., calls from assembly to Go) access the G via FS_BASE. The KVM quite fast-and-loose about the Go environment, often calling into (nosplit) Go functions with uninitialized FS_BASE. That will no longer work in Go 1.17, so this CL changes the platform to consistently restore FS_BASE before calling into Go code. This CL does not affect arm64 code. Go 1.17 does not support the register-based calling convention for arm64 (it will come in 1.18), but arm64 also does not use a non-standard register like FS_BASE for TLS, so it may not require any changes. PiperOrigin-RevId: 384234305
2021-07-09	Drop unnecessary checklocksignore.	Adin Scannell
	PiperOrigin-RevId: 383940663
2021-07-08	Fix some //pkg/seccomp bugs.	Jamie Liu
	- LockOSThread() around prctl(PR_SET_NO_NEW_PRIVS) => seccomp(). go:nosplit "mostly" prevents async preemption, but IIUC preemption is still permitted during function prologues: funcpctab "".seccomp [valfunc=pctopcdata] 0 -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) TEXT "".seccomp(SB), NOSPLIT\|ABIInternal, $72-32 0 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) TEXT "".seccomp(SB), NOSPLIT\|ABIInternal, $72-32 0 -1 00000 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) SUBQ $72, SP 4 00004 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) MOVQ BP, 64(SP) 9 00009 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) LEAQ 64(SP), BP e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $0, gclocals·ba30782f8935b28ed1adaec603e72627(SB) e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $1, gclocals·663f8c6bfa83aa777198789ce63d9ab4(SB) e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:110) FUNCDATA $2, "".seccomp.stkobj(SB) e 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111) PCDATA $0, $-2 e -2 00014 (gvisor/pkg/seccomp/seccomp_unsafe.go:111) MOVQ "".ptr+88(SP), AX (-1 is objabi.PCDATA_UnsafePointSafe and -2 is objabi.PCDATA_UnsafePointUnsafe, from Go's cmd/internal/objabi.) - Handle non-errno failures from seccomp() with SECCOMP_FILTER_FLAG_TSYNC. PiperOrigin-RevId: 383757580
2021-07-08	Replace kernel.ExitStatus with linux.WaitStatus.	Jamie Liu
	PiperOrigin-RevId: 383705129
2021-07-08	devpts: Notify of echo'd input queue bytes only after locks have been released.	Etienne Perot
	PiperOrigin-RevId: 383684320
2021-07-02	Merge pull request #6258 from liornm:fix-iptables-input-interface	gVisor bot
	PiperOrigin-RevId: 382788878
2021-07-01	Mix checklocks and atomic analyzers.	Adin Scannell
	This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01	Strace: handle null paths	Fabricio Voznika
	PiperOrigin-RevId: 382603592
2021-07-01	[syserror] Update several syserror errors to linuxerr equivalents.	Zach Koopmans
	Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-30	[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.	Zach Koopmans
	Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29	Sort children map before hash	Chong Cai
	The unordered map may generate different hash due to its order. The children map needs to be sorted each time before hashing to avoid false verification failure due to the map. Store the sorted children map in verity dentry to avoid sorting it each time verification happens. Also serialize the whole VerityDescriptor struct to hash now that the map is removed from it. PiperOrigin-RevId: 382201560
2021-06-29	Add SIOCGIFFLAGS ioctl support to hostinet.	Lucas Manning
	PiperOrigin-RevId: 382194711
2021-06-29	[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVAL	Zach Koopmans
	Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-29	Fix iptables List entries Input interface field	liornm
	In Linux the list entries command returns the name of the input interface assigned to the iptable rule. iptables -S > -A FORWARD -i docker0 -o docker0 -j ACCEPT Meanwhile, in gVsior this interface name is ignored. iptables -S > -A FORWARD -o docker0 -j ACCEPT
2021-06-28	Allow VFS2 gofer client to mmap from sentry page cache when forced.	Jamie Liu
	PiperOrigin-RevId: 381982257
2021-06-24	Internal change.	Jamie Liu
	PiperOrigin-RevId: 381375705
2021-06-24	Delete sentry metrics /watchdog/{stuck_startup_detected, stuck_tasks_detected}	Nayana Bidari
	- These metrics are replaced with WeirdnessMetric with fields watchdog_stuck_startup and watchdog_stuck_tasks. PiperOrigin-RevId: 381365617
2021-06-24	CreateProcessGroup has to check whether a target process stil exists or not	Andrei Vagin
	A caller of CreateProcessGroup looks up a thread group without locks, so the target process can exit before CreateProcessGroup will be called. Reported-by: syzbot+6abb7c34663dacbd55a8@syzkaller.appspotmail.com PiperOrigin-RevId: 381351069
2021-06-23	Use memutil.MapFile for the memory accounting page.	Jamie Liu
	PiperOrigin-RevId: 381145216
2021-06-23	Fix PR_SET_PTRACER applicability to non-leader threads.	Jamie Liu
	Compare if (!thread_group_leader(tracee)) tracee = rcu_dereference(tracee->group_leader); in security/yama/yama_lsm.c:ptracer_exception_found(). PiperOrigin-RevId: 381074242
2021-06-22	[syserror] Add conversions to linuxerr with temporary Equals method.	Zach Koopmans
	Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-22	Merge pull request #5051 from lubinszARM:pr_escapes_1	gVisor bot
	PiperOrigin-RevId: 380904249
2021-06-22	Trigger poll/epoll events on zero-length hostinet sendmsg	Ian Lewis
	Fixes #2726 PiperOrigin-RevId: 380753516
2021-06-21	clean up tcpdump TODOs	Kevin Krakauer
	tcpdump is largely supported. We've also chose not to implement writeable AF_PACKET sockets, and there's a bug specifically for promiscuous mode (#3333). Fixes #173. PiperOrigin-RevId: 380733686
2021-06-17	remove outdated ip6tables TODOs	Kevin Krakauer
	IPv6 SO_ORIGINAL_DST is supported, and the flag check as-written will detect when other flags are needed. Fixes #3549. PiperOrigin-RevId: 380059115
2021-06-17	Move tcpip.Clock impl to Timekeeper	Tamir Duberstein
	...and pass it explicitly. This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed. PiperOrigin-RevId: 380039167
2021-06-16	[syserror] Refactor linuxerr and error package.	Zach Koopmans
	Move Error struct to pkg/errors package for use in multiple places. Move linuxerr static definitions under pkg/errors/linuxerr. Add a lookup list for quick lookup of errors.Error by errno. This is useful when converting syserror errors and unix.Errno/syscall.Errrno values to errors.Error. Update benchmarks routines to include conversions. The below benchmarks show *errors.Error usage to be comparable to using unix.Errno. BenchmarkAssignUnix BenchmarkAssignUnix-32 787875022 1.284 ns/op BenchmarkAssignLinuxerr BenchmarkAssignLinuxerr-32 1000000000 1.209 ns/op BenchmarkAssignSyserror BenchmarkAssignSyserror-32 759269229 1.429 ns/op BenchmarkCompareUnix BenchmarkCompareUnix-32 1000000000 1.310 ns/op BenchmarkCompareLinuxerr BenchmarkCompareLinuxerr-32 1000000000 1.241 ns/op BenchmarkCompareSyserror BenchmarkCompareSyserror-32 147196165 8.248 ns/op BenchmarkSwitchUnix BenchmarkSwitchUnix-32 373233556 3.664 ns/op BenchmarkSwitchLinuxerr BenchmarkSwitchLinuxerr-32 476323929 3.294 ns/op BenchmarkSwitchSyserror BenchmarkSwitchSyserror-32 39293408 29.62 ns/op BenchmarkReturnUnix BenchmarkReturnUnix-32 1000000000 0.5042 ns/op BenchmarkReturnLinuxerr BenchmarkReturnLinuxerr-32 1000000000 0.8152 ns/op BenchmarkConvertUnixLinuxerr BenchmarkConvertUnixLinuxerr-32 739948875 1.547 ns/op BenchmarkConvertUnixLinuxerrZero BenchmarkConvertUnixLinuxerrZero-32 977733974 1.489 ns/op PiperOrigin-RevId: 379806801
2021-06-16	kvm: mark UpperHalf PTE-s as global	Andrei Vagin
	UpperHalf is shared with all address spaces. PiperOrigin-RevId: 379790539
2021-06-16	Merge pull request #5991 from zhlhahaha:2165	gVisor bot
	PiperOrigin-RevId: 379766106
2021-06-14	Fix typo	Michael Pratt
	PiperOrigin-RevId: 379337677
2021-06-14	Cleanup iptables bug TODOs	Kevin Krakauer
	There are many references to unimplemented iptables features that link to #170, but that bug is about Istio support specifically. Istio is supported, so the references should change. Some TODOs are addressed, some removed because they are not features requested by users, and some are left as implementation notes. Fixes #170. PiperOrigin-RevId: 379328488
2021-06-13	Remove usermem dependency from marshal	Ian Lewis
	Both marshal and usermem are depended on by many packages and a dependency on marshal can often create circular dependencies. marshal should consider adding internal dependencies carefully moving forward. Fixes #6160 PiperOrigin-RevId: 379199882
2021-06-11	Fix //test/syscalls:exec_test_native	Zach Koopmans
	Later kernels add empty arguments to argv, throwing off return values for the exec_basic_workload.cc binary. This is result of a bug introduced by ccbb18b67323b "exec/binfmt_script: Don't modify bprm->buf and then return - ENOEXEC". Before this change, an empty interpreter string was reported if the first non-space/non-tab character after "#!" was '\0' (end of file, previously- overwritten trailing space or tab, or previously-overwritten first newline). After this change, an empty interpreter string is reported if all characters after "#!" are spaces or tabs, or the first non-space non-tab character is at i_end, which is the position of the first newline after "#!". However, if there is no newline after "#!" (as in ExecTest.InterpreterScriptNoPath), then i_end = buf_end (= bprm->buf + sizeof(bprm->buf) - 1, the last possible byte in the buffer) and neither condition holds. Change white space for script inputs to take into account the above bug. Co-authored-by: Andrei Vagin <avagin@gmail.com> PiperOrigin-RevId: 378997171
2021-06-10	Minor VFS2 xattr changes.	Jamie Liu
	- Allow the gofer client to use most xattr namespaces. As documented by the updated comment, this is consistent with e.g. Linux's FUSE client, and allows gofers to provide extended attributes from FUSE filesystems. - Make tmpfs' listxattr omit xattrs in the "trusted" namespace for non-privileged users. PiperOrigin-RevId: 378778854
2021-06-10	Fix lock ordering issue when enumerating cgroup tasks.	Rahat Mahmood
	The control files enumerating tasks and threads residing in cgroupfs incorrectly locks cgroupfs.filesystem.tasksMu before kernel.TaskSet.mu. The contents of these control files are inherently racy anyways, so use a snapshot of the tasks in the cgroup and drop tasksMu before resolving pids/tids (which acquires TaskSet.mu). PiperOrigin-RevId: 378767060
2021-06-10	Set RLimits during `runsc exec`	Fabricio Voznika
	PiperOrigin-RevId: 378726430
2021-06-10	Add /proc/sys/vm/max_map_count	Fabricio Voznika
	Set it to int32 max because gVisor doesn't have a limit. Fixes #2337 PiperOrigin-RevId: 378722230
2021-06-10	Parse mmap protection and flags in strace	Fabricio Voznika
	PiperOrigin-RevId: 378712518
2021-06-10	Report task exit in /proc/[pid]/{stat,status} before task goroutine exit.	Jamie Liu
	Between when runExitNotify.execute() returns nil (indicating that the task goroutine should exit) and when Task.run() advances Task.gosched.State to TaskGoroutineNonexistent (indicating that the task goroutine is exiting), there is a race window in which the Task is waitable (since TaskSet.mu is unlocked and Task.exitParentNotified is true) but will be reported by /proc/[pid]/status as running. Close the window by checking Task.exitState before task goroutine exit. PiperOrigin-RevId: 378711484
2021-06-10	[op] Move SignalInfo to abi/linux package.	Ayush Ranjan
	Fixes #214 PiperOrigin-RevId: 378680466