gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2019-06-21	Merge 5ba16d51 (automated)	gVisor bot

2019-06-21	Merge c0317b28 (automated)	gVisor bot

2019-06-21	Merge f94653b3 (automated)	gVisor bot

2019-06-21	kernel: call t.mu.Unlock() explicitly in WithMuLocked	Andrei Vagin
	defer here doesn't improve readability, but we know it slower that the explicit call. PiperOrigin-RevId: 254441473
2019-06-21	Merge 335fd987 (automated)	gVisor bot

2019-06-21	Merge 054b5632 (automated)	gVisor bot

2019-06-21	Update comment	Fabricio Voznika
	PiperOrigin-RevId: 254428866
2019-06-21	Merge dc36c34a (automated)	gVisor bot

2019-06-20	Merge 3c7448ab (automated)	gVisor bot

2019-06-20	Merge 292f70cb (automated)	gVisor bot

2019-06-20	Merge 0b213507 (automated)	gVisor bot

2019-06-20	Merge 7e495156 (automated)	gVisor bot

2019-06-20	Merge c2d87d5d (automated)	gVisor bot

2019-06-19	Merge 773423a9 (automated)	gVisor bot

2019-06-19	Merge 9d2efaac (automated)	gVisor bot

2019-06-19	Merge f7428af9 (automated)	gVisor bot

2019-06-19	Add MountNamespace to task.	Nicolas Lacasse
	This allows tasks to have distinct mount namespace, instead of all sharing the kernel's root mount namespace. Currently, the only way for a task to get a different mount namespace than the kernel's root is by explicitly setting a different MountNamespace in CreateProcessArgs, and nothing does this (yet). In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a new container inside runsc. Note that "MountNamespace" is a poor term for this thing. It's more like a distinct VFS tree. When we get around to adding real mount namespaces, this will need a better naem. PiperOrigin-RevId: 254009310
2019-06-19	Merge ca245a42 (automated)	gVisor bot

2019-06-18	Merge 546b2948 (automated)	gVisor bot

2019-06-18	Merge 0e07c94d (automated)	gVisor bot

2019-06-18	Merge bdb19b82 (automated)	gVisor bot

2019-06-18	Merge ec15fb11 (automated)	gVisor bot

2019-06-18	Merge 8ab0848c (automated)	gVisor bot

2019-06-18	gvisor/fs: don't update file.offset for sockets, pipes, etc	Andrei Vagin
	sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644
2019-06-18	Merge 66cc0e9f (automated)	gVisor bot

2019-06-17	Merge 99d28637 (automated)	gVisor bot

2019-06-14	Merge a8608c50 (automated)	gVisor bot

2019-06-14	Skip tid allocation which is using	Yong He
	When leader of process group (session) exit, the process group ID (session ID) is holding by other processes in the process group, so the process group ID (session ID) can not be reused. If reusing the process group ID (seession ID) as new process group ID for new process, this will cause session create failed, and later runsc crash when access process group. The fix skip the tid if it is using by a process group (session) when allocating a new tid. We could easily reproduce the runsc crash follow these steps: 1. build test program, and run inside container int main(int argc, char argv[]) { pid_t cpid, spid; cpid = fork(); if (cpid == -1) { perror("fork"); exit(EXIT_FAILURE); } if (cpid == 0) { pid_t sid = setsid(); printf("Start New Session %ld\n",sid); printf("Child PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); spid = fork(); if (spid == 0) { setpgid(getpid(), getpid()); printf("Set GrandSon as New Process Group\n"); printf("GrandSon PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); while(1) { usleep(1); } } sleep(3); exit(0); } else { exit(0); } return 0; } 2. build hello program int main(int argc, char argv[]) { printf("Current PID is %ld\n", (long) getpid()); return 0; } 3. run script on host which run hello inside container, you can speed up the test with set TasksLimit as lower value. for (( i=0; i<65535; i++ )) do docker exec <container id> /test/hello done 4. when hello process reusing the process group of loop process, runsc will crash. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x79f0c8] goroutine 612475 [running]: gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(ProcessGroup).decRefWithParent(0x0, 0x0) pkg/sentry/kernel/sessions.go:160 +0x78 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).exitNotifyLocked(0xc000663500, 0x0) pkg/sentry/kernel/task_exit.go:672 +0x2b7 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(runExitNotify).execute(0x0, 0xc000663500, 0x0, 0x0) pkg/sentry/kernel/task_exit.go:542 +0xc4 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(Task).run(0xc000663500, 0xc) pkg/sentry/kernel/task_run.go:91 +0x194 created by gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Start pkg/sentry/kernel/task_start.go:286 +0xfe
2019-06-14	Merge 3d71c627 (automated)	gVisor bot

2019-06-14	Merge 3e9b8ecb (automated)	gVisor bot

2019-06-13	Plumb context through more layers of filesytem.	Ian Gudger
	All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709
2019-06-13	Fix deadlock in fasync.	Ian Gudger
	The deadlock can occur when both ends of a connected Unix socket which has FIOASYNC enabled on at least one end are closed at the same time. One end notifies that it is closing, calling (waiter.Queue).Notify which takes waiter.Queue.mu (as a read lock) and then calls (FileAsync).Callback, which takes FileAsync.mu. The other end tries to unregister for notifications by calling (FileAsync).Unregister, which takes FileAsync.mu and calls (waiter.Queue).EventUnregister which takes waiter.Queue.mu. This is fixed by moving the calls to waiter.Waitable.EventRegister and waiter.Waitable.EventUnregister outside of the protection of any mutex used in (FileAsync).Callback. The new test is related, but does not cover this particular situation. Also fix a data race on FileAsync.e.Callback. (FileAsync).Callback checked FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter calling (*FileAsync).Callback could not and did not. This is fixed by making FileAsync.e.Callback immutable before passing it to the waiter for the first time. Fixes #346 PiperOrigin-RevId: 253138340
2019-06-13	Merge add40fd6 (automated)	gVisor bot

2019-06-13	Update canonical repository.	Adin Scannell
	This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-13	Merge 4fdd560b (automated)	gVisor bot

2019-06-13	Merge 9f77b36f (automated)	gVisor bot

2019-06-12	Merge 356d1be1 (automated)	gVisor bot

2019-06-12	Merge df110ad4 (automated)	gVisor bot

2019-06-11	Merge 69c8657a (automated)	gVisor bot

2019-06-11	Merge 478a0873 (automated)	gVisor bot

2019-06-11	Merge fc746efa (automated)	gVisor bot

2019-06-11	Merge 847c4b97 (automated)	gVisor bot

2019-06-11	Merge a775ae82 (automated)	gVisor bot

2019-06-11	Merge 307a9854 (automated)	gVisor bot

2019-06-11	Merge 74e397e3 (automated)	gVisor bot

2019-06-10	Add introspection for Linux/AMD64 syscalls	Ian Lewis
	Adds simple introspection for syscall compatibility information to Linux/AMD64. Syscalls registered in the syscall table now have associated metadata like name, support level, notes, and URLs to relevant issues. Syscall information can be exported as a table, JSON, or CSV using the new 'runsc help syscalls' command. Users can use this info to debug and get info on the compatibility of the version of runsc they are running or to generate documentation. PiperOrigin-RevId: 252558304
2019-06-10	Merge 589f36ac (automated)	gVisor bot

2019-06-10	Merge a00157cc (automated)	gVisor bot

2019-06-10	Store more information in the kernel socket table.	Rahat Mahmood
	Store enough information in the kernel socket table to distinguish between different types of sockets. Previously we were only storing the socket family, but this isn't enough to classify sockets. For example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets are SOCK_DGRAM sockets with a particular protocol. Instead of creating more sub-tables, flatten the socket table and provide a filtering mechanism based on the socket entry. Also generate and store a socket entry index ("sl" in linux) which allows us to output entries in a stable order from procfs. PiperOrigin-RevId: 252495895
2019-06-06	"Implement" mbind(2).	Jamie Liu
	We still only advertise a single NUMA node, and ignore mempolicy accordingly, but mbind() at least now succeeds and has effects reflected by get_mempolicy(). Also fix handling of nodemasks: round sizes to unsigned long (as documented and done by Linux), and zero trailing bits when copying them out. PiperOrigin-RevId: 251950859