Age | Commit message (Collapse) | Author |
|
Get/Set pipe size and ioctl support were missing from
overlayfs. It required moving the pipe.Sizer interface
to fs so that overlay could get access.
Fixes #318
PiperOrigin-RevId: 255511125
|
|
Addresses obvious typos, in the documentation only.
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
|
|
Currently, the overlay dirCache is only used for a single logical use of
getdents. i.e., it is discard when the FD is closed or seeked back to
the beginning.
But the initial work of getting the directory contents can be quite
expensive (particularly sorting large directories), so we should keep it
as long as possible.
This is very similar to the readdirCache in fs/gofer.
Since the upper filesystem does not have to allow caching readdir
entries, the new CacheReaddir MountSourceOperations method controls this
behavior.
This caching should be trivially movable to all Inodes if desired,
though that adds an additional copy step for non-overlay Inodes.
(Overlay Inodes already do the extra copy).
PiperOrigin-RevId: 255477592
|
|
PiperOrigin-RevId: 255465635
|
|
PiperOrigin-RevId: 255462850
|
|
The code was wrongly assuming that only read access was
required from the lower overlay when checking for permissions.
This allowed non-writable files to be writable in the overlay.
Fixes #316
PiperOrigin-RevId: 255263686
|
|
If we have a symlink whose target does not exist, creating the symlink (either
via 'creat' or 'open' with O_CREAT flag) should create the target of the
symlink. Previously, gVisor would error with EEXIST in this case
PiperOrigin-RevId: 255232944
|
|
Go was going to change the behavior of SysProcAttr.Ctty such that it must be an
FD in the *parent* FD table:
https://go-review.googlesource.com/c/go/+/178919/
However, after some debate, it was decided that this change was too
backwards-incompatible, and so it was reverted.
https://github.com/golang/go/issues/29458
The behavior going forward is unchanged: the Ctty FD must be an FD in the
*child* FD table.
PiperOrigin-RevId: 255228476
|
|
Updates #179
PiperOrigin-RevId: 255081565
|
|
To accompany flipcall connections in cases where passing FDs is required
(as for gofers).
PiperOrigin-RevId: 255062277
|
|
An upcoming change in Go 1.13 [1] changes the semantics of the SysProcAttr.Ctty
field. Prior to the change, the FD must be an FD in the child process's FD
table (aka "post-shuffle"). After the change, the FD must be an FD in the
current process's FD table (aka "pre-shuffle").
To be compatible with both versions this CL introduces a new boolean
"CttyFdIsPostShuffle" which indicates whether a pre- or post-shuffle FD should
be provided. We use build tags to chose the correct one.
1: https://go-review.googlesource.com/c/go/+/178919/
PiperOrigin-RevId: 255015303
|
|
Credentials are immutable and even before these changes we could read them
without locks, but we needed to take a task lock to get a credential object
from a task object.
It is possible to avoid this lock, if we will guarantee that a credential
object will not be changed after setting it on a task.
PiperOrigin-RevId: 254989492
|
|
When we reopen file by path, we can't be sure that
we will open exactly the same file. The file can be
deleted and another one with the same name can be
created.
PiperOrigin-RevId: 254898594
|
|
Makes CLOCK_BOOTTIME available with
* clock_gettime
* timerfd_create
* clock_gettime vDSO
CLOCK_BOOTTIME is implemented as an alias to CLOCK_MONOTONIC.
CLOCK_MONOTONIC already keeps track of time across save
and restore. This is the closest possible behavior to Linux
CLOCK_BOOTIME, as there is no concept of suspend/resume.
Updates google/gvisor#218
|
|
For files with O_APPEND, a file write operation gets a file size and uses it as
offset to call an inode write operation. This means that all other operations
which can change a file size should be blocked while the write operation doesn't
complete.
PiperOrigin-RevId: 254873771
|
|
This prevents a race before PDEATH_SIG can take effect during
a sentry crash.
Discovered and solution by avagin@.
PiperOrigin-RevId: 254871534
|
|
PiperOrigin-RevId: 254854346
|
|
The tracee is stopped early during process exit, when registers are still
available, allowing the tracer to see where the exit occurred, whereas the
normal exit notifi? cation is done after the process is finished exiting.
Without this option, dumpAndPanic fails to get registers.
PiperOrigin-RevId: 254852917
|
|
The previous number was for the arm architecture.
Also change the statx tests to force them to run on gVisor, which would have
caught this issue.
PiperOrigin-RevId: 254846831
|
|
New options are:
runsc debug --strace=off|all|function1,function2
runsc debug --log-level=warning|info|debug
runsc debug --log-packets=true|false
Updates #407
PiperOrigin-RevId: 254843128
|
|
Tests run at HEAD (35719d52):
```
$ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)')
<snip>
//test/syscalls:getdents_test_native PASSED in 0.3s
//test/syscalls:getdents_test_runsc_ptrace PASSED in 4.9s
//test/syscalls:getdents_test_runsc_ptrace_overlay PASSED in 4.7s
//test/syscalls:getdents_test_runsc_ptrace_shared PASSED in 5.2s
//test/syscalls:getdents_test_runsc_kvm FAILED in 4.0s
```
Tests run at ab6774ce~1 (6f933a93):
```
$ bazel test $(bazel query 'filter(".*getdents.*", //test/syscalls:all)')
//test/syscalls:getdents_test_native PASSED in 0.2s
//test/syscalls:getdents_test_runsc_kvm FAILED in 4.2s
/usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_kvm/test.log
//test/syscalls:getdents_test_runsc_ptrace FAILED in 5.3s
/usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace/test.log
//test/syscalls:getdents_test_runsc_ptrace_overlay FAILED in 4.9s
/usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_overlay/test.log
//test/syscalls:getdents_test_runsc_ptrace_shared FAILED in 5.2s
/usr/local/google/home/brb/.cache/bazel/_bazel_brb/967240a6aae7d353a221d73f4375e038/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/syscalls/getdents_test_runsc_ptrace_shared/test.log
```
(I think all runsc_kvm tests are broken on my machine -- I'll rerun them
if you can point me at the documentation to set it up)
|
|
There will be a deadloop when we use getdents to read /proc/{pid}/task
of an exited process
Like this:
Process A is running
Process B: open /proc/{pid of A}/task
Process A exits
Process B: getdents /proc/{pid of A}/task
Then, process B will fall into deadloop, and return "." and ".."
in loops and never ends.
This patch returns ENOENT when use getdents to read /proc/{pid}/task
if the process is just exited.
Signed-off-by: chris.zn <chris.zn@antfin.com>
|
|
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.
Fixes #343
PiperOrigin-RevId: 254575752
|
|
Today we have the logic split in two places between endpoint Read() and the
worker goroutine which actually sends a zero window. This change makes it so
that when a zero window ACK is sent we set a flag in the endpoint which can be
read by the endpoint to decide if it should notify the worker to send a
nonZeroWindow update.
The worker now does not do the check again but instead sends an ACK and flips
the flag right away.
Similarly today when SO_RECVBUF is set the SetSockOpt call has logic
to decide if a zero window update is required. Rather than do that we move
the logic to the worker goroutine and it can check the zeroWindow flag
and send an update if required.
PiperOrigin-RevId: 254505447
|
|
FileMaxOffset is a special case when lseek(d, 0, SEEK_END) has been called.
PiperOrigin-RevId: 254498777
|
|
Currently, the path tracking in the gofer involves an O(n) lookup of
child fidRefs. This causes a significant overhead on unlinks in
directories with lots of child fidRefs (<4k).
In this transition, pathNode moves from sync.Map to normal synchronized
maps. There is a small chance of contention in walk, but the lock is
held for a very short time (and sync.Map also had a chance of requiring
locking).
OTOH, sync.Map makes it very difficult to add a fidRef reverse map.
PiperOrigin-RevId: 254489952
|
|
This test will occasionally fail waiting to read a packet. From repeated runs,
I've seen it up to 1.5s for waitForPackets to complete.
PiperOrigin-RevId: 254484627
|
|
PiperOrigin-RevId: 254482180
|
|
Flipcall is a (conceptually) simple local-only RPC mechanism. Compared
to unet, Flipcall does not support passing FDs (support for which will
be provided out of band by another package), requires users to establish
connections manually, and requires user management of concurrency since
each connected Endpoint pair supports only a single RPC at a time;
however, it improves performance by using shared memory for data
(reducing memory copies) and using futexes for control signaling (which
is much cheaper than sendto/recvfrom/sendmsg/recvmsg).
PiperOrigin-RevId: 254471986
|
|
PiperOrigin-RevId: 254450309
|
|
Neither fidRefs or children are (directly) synchronized by mu. Remove
the preconditions that say so.
That said, the surrounding does enforce some synchronization guarantees
(e.g., fidRef.renameChildTo does not atomically replace the child in the
maps). I've tried to note the need for callers to do this
synchronization.
I've also renamed the maps to what are (IMO) clearer names. As is, it is
not obvious that pathNode.fidRefs is a map of *child* fidRefs rather
than self fidRefs.
PiperOrigin-RevId: 254446965
|
|
defer here doesn't improve readability, but we know it slower that
the explicit call.
PiperOrigin-RevId: 254441473
|
|
This was from an old comment, which was superseded by the
existing comment which is correct.
PiperOrigin-RevId: 254434535
|
|
PiperOrigin-RevId: 254428866
|
|
This helps prevent the blocking call from getting stuck and causing a test
timeout.
PiperOrigin-RevId: 254325926
|
|
Bump up the threshold on number of SIGALRMs received by worker
threads from 50 to 200. Even with the new threshold we still
expect that the majority of SIGALRMs are received by the
thread group leader.
PiperOrigin-RevId: 254289787
|
|
Otherwise every call to, say, fs.ContextCanAccessFile() in a benchmark
using contexttest allocates new auth.Credentials, a new
auth.UserNamespace, ...
PiperOrigin-RevId: 254261051
|
|
These are the only packages missing docs:
https://godoc.org/gvisor.dev/gvisor
PiperOrigin-RevId: 254261022
|
|
PiperOrigin-RevId: 254254058
|
|
PiperOrigin-RevId: 254253777
|
|
PiperOrigin-RevId: 254237530
|
|
The sendfile syscall's backing doSplice contained a race with regard to
blocking. If the first attempt failed with syserror.ErrWouldBlock and then
the blocking file became ready before registering a waiter, we would just
return the ErrWouldBlock (even if we were supposed to block).
PiperOrigin-RevId: 254114432
|
|
The tag on the binary has no effect. It must be on the test.
PiperOrigin-RevId: 254103480
|
|
Inode ids are only stable across Save/Restore if we have an open FD on the
inode. All tests that compare inode ids must therefor hold an FD open.
PiperOrigin-RevId: 254086603
|
|
As-is, on failure these will infinite loop, resulting in test timeout
instead of failure.
PiperOrigin-RevId: 254074989
|
|
Otherwise future renames may miss Renamed calls.
PiperOrigin-RevId: 254060946
|
|
And methods that do more traversals should use the remaining count rather than
resetting.
PiperOrigin-RevId: 254041720
|
|
This allows tasks to have distinct mount namespace, instead of all sharing the
kernel's root mount namespace.
Currently, the only way for a task to get a different mount namespace than the
kernel's root is by explicitly setting a different MountNamespace in
CreateProcessArgs, and nothing does this (yet).
In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a
new container inside runsc.
Note that "MountNamespace" is a poor term for this thing. It's more like a
distinct VFS tree. When we get around to adding real mount namespaces, this
will need a better naem.
PiperOrigin-RevId: 254009310
|
|
PiperOrigin-RevId: 253997465
|
|
Test fails because it's reading 4KB instead of the
expected 64KB. Changed the test to read pipe buffer
size instead of hardcode and added some logging in
case the reason for failure was not pipe buffer size.
PiperOrigin-RevId: 253916040
|