summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/kernel
AgeCommit message (Collapse)Author
2019-06-21Merge 5ba16d51 (automated)gVisor bot
2019-06-21Merge c0317b28 (automated)gVisor bot
2019-06-21Merge f94653b3 (automated)gVisor bot
2019-06-21kernel: call t.mu.Unlock() explicitly in WithMuLockedAndrei Vagin
defer here doesn't improve readability, but we know it slower that the explicit call. PiperOrigin-RevId: 254441473
2019-06-21Merge 335fd987 (automated)gVisor bot
2019-06-21Merge 054b5632 (automated)gVisor bot
2019-06-21Update commentFabricio Voznika
PiperOrigin-RevId: 254428866
2019-06-21Merge dc36c34a (automated)gVisor bot
2019-06-20Merge 3c7448ab (automated)gVisor bot
2019-06-20Merge 292f70cb (automated)gVisor bot
2019-06-20Merge 0b213507 (automated)gVisor bot
2019-06-20Merge 7e495156 (automated)gVisor bot
2019-06-20Merge c2d87d5d (automated)gVisor bot
2019-06-19Merge 773423a9 (automated)gVisor bot
2019-06-19Merge 9d2efaac (automated)gVisor bot
2019-06-19Merge f7428af9 (automated)gVisor bot
2019-06-19Add MountNamespace to task.Nicolas Lacasse
This allows tasks to have distinct mount namespace, instead of all sharing the kernel's root mount namespace. Currently, the only way for a task to get a different mount namespace than the kernel's root is by explicitly setting a different MountNamespace in CreateProcessArgs, and nothing does this (yet). In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a new container inside runsc. Note that "MountNamespace" is a poor term for this thing. It's more like a distinct VFS tree. When we get around to adding real mount namespaces, this will need a better naem. PiperOrigin-RevId: 254009310
2019-06-19Merge ca245a42 (automated)gVisor bot
2019-06-18Merge 546b2948 (automated)gVisor bot
2019-06-18Merge 0e07c94d (automated)gVisor bot
2019-06-18Merge bdb19b82 (automated)gVisor bot
2019-06-18Merge ec15fb11 (automated)gVisor bot
2019-06-18Merge 8ab0848c (automated)gVisor bot
2019-06-18gvisor/fs: don't update file.offset for sockets, pipes, etcAndrei Vagin
sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644
2019-06-18Merge 66cc0e9f (automated)gVisor bot
2019-06-17Merge 99d28637 (automated)gVisor bot
2019-06-14Merge a8608c50 (automated)gVisor bot
2019-06-14Skip tid allocation which is usingYong He
When leader of process group (session) exit, the process group ID (session ID) is holding by other processes in the process group, so the process group ID (session ID) can not be reused. If reusing the process group ID (seession ID) as new process group ID for new process, this will cause session create failed, and later runsc crash when access process group. The fix skip the tid if it is using by a process group (session) when allocating a new tid. We could easily reproduce the runsc crash follow these steps: 1. build test program, and run inside container int main(int argc, char *argv[]) { pid_t cpid, spid; cpid = fork(); if (cpid == -1) { perror("fork"); exit(EXIT_FAILURE); } if (cpid == 0) { pid_t sid = setsid(); printf("Start New Session %ld\n",sid); printf("Child PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); spid = fork(); if (spid == 0) { setpgid(getpid(), getpid()); printf("Set GrandSon as New Process Group\n"); printf("GrandSon PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); while(1) { usleep(1); } } sleep(3); exit(0); } else { exit(0); } return 0; } 2. build hello program int main(int argc, char *argv[]) { printf("Current PID is %ld\n", (long) getpid()); return 0; } 3. run script on host which run hello inside container, you can speed up the test with set TasksLimit as lower value. for (( i=0; i<65535; i++ )) do docker exec <container id> /test/hello done 4. when hello process reusing the process group of loop process, runsc will crash. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x79f0c8] goroutine 612475 [running]: gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*ProcessGroup).decRefWithParent(0x0, 0x0) pkg/sentry/kernel/sessions.go:160 +0x78 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).exitNotifyLocked(0xc000663500, 0x0) pkg/sentry/kernel/task_exit.go:672 +0x2b7 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runExitNotify).execute(0x0, 0xc000663500, 0x0, 0x0) pkg/sentry/kernel/task_exit.go:542 +0xc4 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run(0xc000663500, 0xc) pkg/sentry/kernel/task_run.go:91 +0x194 created by gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Start pkg/sentry/kernel/task_start.go:286 +0xfe
2019-06-14Merge 3d71c627 (automated)gVisor bot
2019-06-14Merge 3e9b8ecb (automated)gVisor bot
2019-06-13Plumb context through more layers of filesytem.Ian Gudger
All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709
2019-06-13Fix deadlock in fasync.Ian Gudger
The deadlock can occur when both ends of a connected Unix socket which has FIOASYNC enabled on at least one end are closed at the same time. One end notifies that it is closing, calling (*waiter.Queue).Notify which takes waiter.Queue.mu (as a read lock) and then calls (*FileAsync).Callback, which takes FileAsync.mu. The other end tries to unregister for notifications by calling (*FileAsync).Unregister, which takes FileAsync.mu and calls (*waiter.Queue).EventUnregister which takes waiter.Queue.mu. This is fixed by moving the calls to waiter.Waitable.EventRegister and waiter.Waitable.EventUnregister outside of the protection of any mutex used in (*FileAsync).Callback. The new test is related, but does not cover this particular situation. Also fix a data race on FileAsync.e.Callback. (*FileAsync).Callback checked FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter calling (*FileAsync).Callback could not and did not. This is fixed by making FileAsync.e.Callback immutable before passing it to the waiter for the first time. Fixes #346 PiperOrigin-RevId: 253138340
2019-06-13Merge add40fd6 (automated)gVisor bot
2019-06-13Update canonical repository.Adin Scannell
This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-13Merge 4fdd560b (automated)gVisor bot
2019-06-13Merge 9f77b36f (automated)gVisor bot
2019-06-12Merge 356d1be1 (automated)gVisor bot
2019-06-12Merge df110ad4 (automated)gVisor bot
2019-06-11Merge 69c8657a (automated)gVisor bot
2019-06-11Merge 478a0873 (automated)gVisor bot
2019-06-11Merge fc746efa (automated)gVisor bot
2019-06-11Merge 847c4b97 (automated)gVisor bot
2019-06-11Merge a775ae82 (automated)gVisor bot
2019-06-11Merge 307a9854 (automated)gVisor bot
2019-06-11Merge 74e397e3 (automated)gVisor bot
2019-06-10Add introspection for Linux/AMD64 syscallsIan Lewis
Adds simple introspection for syscall compatibility information to Linux/AMD64. Syscalls registered in the syscall table now have associated metadata like name, support level, notes, and URLs to relevant issues. Syscall information can be exported as a table, JSON, or CSV using the new 'runsc help syscalls' command. Users can use this info to debug and get info on the compatibility of the version of runsc they are running or to generate documentation. PiperOrigin-RevId: 252558304
2019-06-10Merge 589f36ac (automated)gVisor bot
2019-06-10Merge a00157cc (automated)gVisor bot
2019-06-10Store more information in the kernel socket table.Rahat Mahmood
Store enough information in the kernel socket table to distinguish between different types of sockets. Previously we were only storing the socket family, but this isn't enough to classify sockets. For example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets are SOCK_DGRAM sockets with a particular protocol. Instead of creating more sub-tables, flatten the socket table and provide a filtering mechanism based on the socket entry. Also generate and store a socket entry index ("sl" in linux) which allows us to output entries in a stable order from procfs. PiperOrigin-RevId: 252495895
2019-06-06"Implement" mbind(2).Jamie Liu
We still only advertise a single NUMA node, and ignore mempolicy accordingly, but mbind() at least now succeeds and has effects reflected by get_mempolicy(). Also fix handling of nodemasks: round sizes to unsigned long (as documented and done by Linux), and zero trailing bits when copying them out. PiperOrigin-RevId: 251950859