summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2019-06-24Add CLOCK_BOOTTIME as a CLOCK_MONOTONIC aliasAdrien Leravat
Makes CLOCK_BOOTTIME available with * clock_gettime * timerfd_create * clock_gettime vDSO CLOCK_BOOTTIME is implemented as an alias to CLOCK_MONOTONIC. CLOCK_MONOTONIC already keeps track of time across save and restore. This is the closest possible behavior to Linux CLOCK_BOOTIME, as there is no concept of suspend/resume. Updates google/gvisor#218
2019-06-24fs: synchronize concurrent writes into files with O_APPENDAndrei Vagin
For files with O_APPEND, a file write operation gets a file size and uses it as offset to call an inode write operation. This means that all other operations which can change a file size should be blocked while the write operation doesn't complete. PiperOrigin-RevId: 254873771
2019-06-24Add O_EXITKILL to ptrace options.Adin Scannell
This prevents a race before PDEATH_SIG can take effect during a sentry crash. Discovered and solution by avagin@. PiperOrigin-RevId: 254871534
2019-06-24Implement /proc/net/tcp.Rahat Mahmood
PiperOrigin-RevId: 254854346
2019-06-24platform/ptrace: specify PTRACE_O_TRACEEXIT for stub-processesAndrei Vagin
The tracee is stopped early during process exit, when registers are still available, allowing the tracer to see where the exit occurred, whereas the normal exit notifi? cation is done after the process is finished exiting. Without this option, dumpAndPanic fails to get registers. PiperOrigin-RevId: 254852917
2019-06-24Use correct statx syscall number for amd64.Nicolas Lacasse
The previous number was for the arm architecture. Also change the statx tests to force them to run on gVisor, which would have caught this issue. PiperOrigin-RevId: 254846831
2019-06-24Allow to change logging options using 'runsc debug'Fabricio Voznika
New options are: runsc debug --strace=off|all|function1,function2 runsc debug --log-level=warning|info|debug runsc debug --log-packets=true|false Updates #407 PiperOrigin-RevId: 254843128
2019-06-24Add regression test for #128 (fixed in ab6774ce)brb-g
Tests run at HEAD (35719d52): ``` $ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)') <snip> //test/syscalls:getdents_test_native PASSED in 0.3s //test/syscalls:getdents_test_runsc_ptrace PASSED in 4.9s //test/syscalls:getdents_test_runsc_ptrace_overlay PASSED in 4.7s //test/syscalls:getdents_test_runsc_ptrace_shared PASSED in 5.2s //test/syscalls:getdents_test_runsc_kvm FAILED in 4.0s ``` Tests run at ab6774ce~1 (6f933a93): ``` $ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)') //test/syscalls:getdents_test_native PASSED in 0.2s //test/syscalls:getdents_test_runsc_kvm FAILED in 4.2s /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_kvm/test.log //test/syscalls:getdents_test_runsc_ptrace FAILED in 5.3s /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace/test.log //test/syscalls:getdents_test_runsc_ptrace_overlay FAILED in 4.9s /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_overlay/test.log //test/syscalls:getdents_test_runsc_ptrace_shared FAILED in 5.2s /usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_shared/test.log ``` (I think all runsc_kvm tests are broken on my machine -- I'll rerun them if you can point me at the documentation to set it up)
2019-06-24Return ENOENT when reading /proc/{pid}/task of an exited processchris.zn
There will be a deadloop when we use getdents to read /proc/{pid}/task of an exited process Like this: Process A is running Process B: open /proc/{pid of A}/task Process A exits Process B: getdents /proc/{pid of A}/task Then, process B will fall into deadloop, and return "." and ".." in loops and never ends. This patch returns ENOENT when use getdents to read /proc/{pid}/task if the process is just exited. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-06-22Implement statx.Nicolas Lacasse
We don't have the plumbing for btime yet, so that field is left off. The returned mask indicates that btime is absent. Fixes #343 PiperOrigin-RevId: 254575752
2019-06-21Fix the logic for sending zero window updates.Bhasker Hariharan
Today we have the logic split in two places between endpoint Read() and the worker goroutine which actually sends a zero window. This change makes it so that when a zero window ACK is sent we set a flag in the endpoint which can be read by the endpoint to decide if it should notify the worker to send a nonZeroWindow update. The worker now does not do the check again but instead sends an ACK and flips the flag right away. Similarly today when SO_RECVBUF is set the SetSockOpt call has logic to decide if a zero window update is required. Rather than do that we move the logic to the worker goroutine and it can check the zeroWindow flag and send an update if required. PiperOrigin-RevId: 254505447
2019-06-21gvisor/fs: getdents returns 0 if offset is equal to FileMaxOffsetAndrei Vagin
FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called. PiperOrigin-RevId: 254498777
2019-06-21Remove O(n) lookup on unlink/renameMichael Pratt
Currently, the path tracking in the gofer involves an O(n) lookup of child fidRefs. This causes a significant overhead on unlinks in directories with lots of child fidRefs (<4k). In this transition, pathNode moves from sync.Map to normal synchronized maps. There is a small chance of contention in walk, but the lock is held for a very short time (and sync.Map also had a chance of requiring locking). OTOH, sync.Map makes it very difficult to add a fidRef reverse map. PiperOrigin-RevId: 254489952
2019-06-21Deflake TestSimpleReceive failures due to timeoutsBrad Burlage
This test will occasionally fail waiting to read a packet. From repeated runs, I've seen it up to 1.5s for waitForPackets to complete. PiperOrigin-RevId: 254484627
2019-06-21ext4 block group descriptor implementation in disk layout package.Ayush Ranjan
PiperOrigin-RevId: 254482180
2019-06-21Add //pkg/flipcall.Jamie Liu
Flipcall is a (conceptually) simple local-only RPC mechanism. Compared to unet, Flipcall does not support passing FDs (support for which will be provided out of band by another package), requires users to establish connections manually, and requires user management of concurrency since each connected Endpoint pair supports only a single RPC at a time; however, it improves performance by using shared memory for data (reducing memory copies) and using futexes for control signaling (which is much cheaper than sendto/recvfrom/sendmsg/recvmsg). PiperOrigin-RevId: 254471986
2019-06-21Add list of stuck tasks to panic messageFabricio Voznika
PiperOrigin-RevId: 254450309
2019-06-21Update pathNode documentation to reflect realityMichael Pratt
Neither fidRefs or children are (directly) synchronized by mu. Remove the preconditions that say so. That said, the surrounding does enforce some synchronization guarantees (e.g., fidRef.renameChildTo does not atomically replace the child in the maps). I've tried to note the need for callers to do this synchronization. I've also renamed the maps to what are (IMO) clearer names. As is, it is not obvious that pathNode.fidRefs is a map of *child* fidRefs rather than self fidRefs. PiperOrigin-RevId: 254446965
2019-06-21kernel: call t.mu.Unlock() explicitly in WithMuLockedAndrei Vagin
defer here doesn't improve readability, but we know it slower that the explicit call. PiperOrigin-RevId: 254441473
2019-06-21Delete dangling comment line.Nicolas Lacasse
This was from an old comment, which was superseded by the existing comment which is correct. PiperOrigin-RevId: 254434535
2019-06-21Update commentFabricio Voznika
PiperOrigin-RevId: 254428866
2019-06-20Close FD on TcpSocketTest loop failure.Ian Gudger
This helps prevent the blocking call from getting stuck and causing a test timeout. PiperOrigin-RevId: 254325926
2019-06-20Deflake TestSIGALRMToMainThread.Neel Natu
Bump up the threshold on number of SIGALRMs received by worker threads from 50 to 200. Even with the new threshold we still expect that the majority of SIGALRMs are received by the thread group leader. PiperOrigin-RevId: 254289787
2019-06-20Preallocate auth.NewAnonymousCredentials() in contexttest.TestContext.Jamie Liu
Otherwise every call to, say, fs.ContextCanAccessFile() in a benchmark using contexttest allocates new auth.Credentials, a new auth.UserNamespace, ... PiperOrigin-RevId: 254261051
2019-06-20Add package docs to seqfile and ramfsMichael Pratt
These are the only packages missing docs: https://godoc.org/gvisor.dev/gvisor PiperOrigin-RevId: 254261022
2019-06-20Unmark amutex_test as flaky.Rahat Mahmood
PiperOrigin-RevId: 254254058
2019-06-20Implement madvise(MADV_DONTFORK)Neel Natu
PiperOrigin-RevId: 254253777
2019-06-20Drop extra characterMichael Pratt
PiperOrigin-RevId: 254237530
2019-06-19Deflake SendFileTest_Shutdown.Ian Gudger
The sendfile syscall's backing doSplice contained a race with regard to blocking. If the first attempt failed with syserror.ErrWouldBlock and then the blocking file became ready before registering a waiter, we would just return the ErrWouldBlock (even if we were supposed to block). PiperOrigin-RevId: 254114432
2019-06-19Mark tcp_socket test flaky (for real)Michael Pratt
The tag on the binary has no effect. It must be on the test. PiperOrigin-RevId: 254103480
2019-06-19Deflake mount_test.Nicolas Lacasse
Inode ids are only stable across Save/Restore if we have an open FD on the inode. All tests that compare inode ids must therefor hold an FD open. PiperOrigin-RevId: 254086603
2019-06-19Abort loop on failureMichael Pratt
As-is, on failure these will infinite loop, resulting in test timeout instead of failure. PiperOrigin-RevId: 254074989
2019-06-19Add renamed children pathNodes to target parentMichael Pratt
Otherwise future renames may miss Renamed calls. PiperOrigin-RevId: 254060946
2019-06-19fileOp{On,At} should pass the remaning symlink traversal count.Nicolas Lacasse
And methods that do more traversals should use the remaining count rather than resetting. PiperOrigin-RevId: 254041720
2019-06-19Add MountNamespace to task.Nicolas Lacasse
This allows tasks to have distinct mount namespace, instead of all sharing the kernel's root mount namespace. Currently, the only way for a task to get a different mount namespace than the kernel's root is by explicitly setting a different MountNamespace in CreateProcessArgs, and nothing does this (yet). In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a new container inside runsc. Note that "MountNamespace" is a poor term for this thing. It's more like a distinct VFS tree. When we get around to adding real mount namespaces, this will need a better naem. PiperOrigin-RevId: 254009310
2019-06-19Mark tcp_socket test flaky.Neel Natu
PiperOrigin-RevId: 253997465
2019-06-18Attempt to fix TestPipeWritesAccumulateFabricio Voznika
Test fails because it's reading 4KB instead of the expected 64KB. Changed the test to read pipe buffer size instead of hardcode and added some logging in case the reason for failure was not pipe buffer size. PiperOrigin-RevId: 253916040
2019-06-18Use return values from syscalls in eventfd tests.Rahat Mahmood
PiperOrigin-RevId: 253890611
2019-06-18Kill sandbox process when 'runsc do' exitsFabricio Voznika
PiperOrigin-RevId: 253882115
2019-06-18Add Container/Sandbox args struct for creationFabricio Voznika
There were 3 string arguments that could be easily misplaced and it makes it easier to add new arguments, especially for Container that has dozens of callers. PiperOrigin-RevId: 253872074
2019-06-18Replace usage of deprecated strtoul/strtoullBrad Burlage
PiperOrigin-RevId: 253864770
2019-06-18Fix PipeTest_Streaming timeoutFabricio Voznika
Test was calling Size() inside read and write loops. Size() makes 2 syscalls to return the pipe size, making the test do a lot more work than it should. PiperOrigin-RevId: 253824690
2019-06-18gvisor/fs: don't update file.offset for sockets, pipes, etcAndrei Vagin
sockets, pipes and other non-seekable file descriptors don't use file.offset, so we don't need to update it. With this change, we will be able to call file operations without locking the file.mu mutex. This is already used for pipes in the splice system call. PiperOrigin-RevId: 253746644
2019-06-18gvisor/kokoro: don't modify tests names in the BUILD fileAndrei Vagin
PiperOrigin-RevId: 253746380
2019-06-17gvisor/bazel: use python2 to build runsc-debianAndrei Vagin
$ bazel build runsc:runsc-debian File ".../bazel_tools/tools/build_defs/pkg/make_deb.py", line 311, in GetFlagValue: flagvalue = flagvalue.decode('utf-8') AttributeError: 'str' object has no attribute 'decode' make_deb.py is incompatible with Python3. https://github.com/bazelbuild/bazel/issues/8443 PiperOrigin-RevId: 253691923
2019-06-17Internal change.gVisor bot
PiperOrigin-RevId: 253559564
2019-06-14Enable Receive Buffer Auto-Tuning for runsc.Bhasker Hariharan
Updates #230 PiperOrigin-RevId: 253225078
2019-06-14Skip tid allocation which is usingYong He
When leader of process group (session) exit, the process group ID (session ID) is holding by other processes in the process group, so the process group ID (session ID) can not be reused. If reusing the process group ID (seession ID) as new process group ID for new process, this will cause session create failed, and later runsc crash when access process group. The fix skip the tid if it is using by a process group (session) when allocating a new tid. We could easily reproduce the runsc crash follow these steps: 1. build test program, and run inside container int main(int argc, char *argv[]) { pid_t cpid, spid; cpid = fork(); if (cpid == -1) { perror("fork"); exit(EXIT_FAILURE); } if (cpid == 0) { pid_t sid = setsid(); printf("Start New Session %ld\n",sid); printf("Child PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); spid = fork(); if (spid == 0) { setpgid(getpid(), getpid()); printf("Set GrandSon as New Process Group\n"); printf("GrandSon PID %ld / PPID %ld / PGID %ld / SID %ld\n", getpid(),getppid(),getpgid(getpid()),getsid(getpid())); while(1) { usleep(1); } } sleep(3); exit(0); } else { exit(0); } return 0; } 2. build hello program int main(int argc, char *argv[]) { printf("Current PID is %ld\n", (long) getpid()); return 0; } 3. run script on host which run hello inside container, you can speed up the test with set TasksLimit as lower value. for (( i=0; i<65535; i++ )) do docker exec <container id> /test/hello done 4. when hello process reusing the process group of loop process, runsc will crash. panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x79f0c8] goroutine 612475 [running]: gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*ProcessGroup).decRefWithParent(0x0, 0x0) pkg/sentry/kernel/sessions.go:160 +0x78 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).exitNotifyLocked(0xc000663500, 0x0) pkg/sentry/kernel/task_exit.go:672 +0x2b7 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runExitNotify).execute(0x0, 0xc000663500, 0x0, 0x0) pkg/sentry/kernel/task_exit.go:542 +0xc4 gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run(0xc000663500, 0xc) pkg/sentry/kernel/task_run.go:91 +0x194 created by gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Start pkg/sentry/kernel/task_start.go:286 +0xfe
2019-06-13Add support for TCP receive buffer auto tuning.Bhasker Hariharan
The implementation is similar to linux where we track the number of bytes consumed by the application to grow the receive buffer of a given TCP endpoint. This ensures that the advertised window grows at a reasonable rate to accomodate for the sender's rate and prevents large amounts of data being held in stack buffers if the application is not actively reading or not reading fast enough. The original paper that was used to implement the linux receive buffer auto- tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf NOTE: Linux does not implement DRS as defined in that paper, it's just a good reference to understand the solution space. Updates #230 PiperOrigin-RevId: 253168283
2019-06-13Plumb context through more layers of filesytem.Ian Gudger
All functions which allocate objects containing AtomicRefCounts will soon need a context. PiperOrigin-RevId: 253147709