Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 254254058
|
|
PiperOrigin-RevId: 254253777
|
|
|
|
|
|
The sendfile syscall's backing doSplice contained a race with regard to
blocking. If the first attempt failed with syserror.ErrWouldBlock and then
the blocking file became ready before registering a waiter, we would just
return the ErrWouldBlock (even if we were supposed to block).
PiperOrigin-RevId: 254114432
|
|
|
|
|
|
|
|
|
|
Otherwise future renames may miss Renamed calls.
PiperOrigin-RevId: 254060946
|
|
|
|
And methods that do more traversals should use the remaining count rather than
resetting.
PiperOrigin-RevId: 254041720
|
|
|
|
This allows tasks to have distinct mount namespace, instead of all sharing the
kernel's root mount namespace.
Currently, the only way for a task to get a different mount namespace than the
kernel's root is by explicitly setting a different MountNamespace in
CreateProcessArgs, and nothing does this (yet).
In a follow-up CL, we will set CreateProcessArgs.MountNamespace when creating a
new container inside runsc.
Note that "MountNamespace" is a poor term for this thing. It's more like a
distinct VFS tree. When we get around to adding real mount namespaces, this
will need a better naem.
PiperOrigin-RevId: 254009310
|
|
|
|
Test fails because it's reading 4KB instead of the
expected 64KB. Changed the test to read pipe buffer
size instead of hardcode and added some logging in
case the reason for failure was not pipe buffer size.
PiperOrigin-RevId: 253916040
|
|
|
|
|
|
|
|
|
|
|
|
sockets, pipes and other non-seekable file descriptors don't
use file.offset, so we don't need to update it.
With this change, we will be able to call file operations
without locking the file.mu mutex. This is already used for
pipes in the splice system call.
PiperOrigin-RevId: 253746644
|
|
|
|
|
|
|
|
When leader of process group (session) exit, the process
group ID (session ID) is holding by other processes in
the process group, so the process group ID (session ID)
can not be reused.
If reusing the process group ID (seession ID) as new process
group ID for new process, this will cause session create
failed, and later runsc crash when access process group.
The fix skip the tid if it is using by a process group
(session) when allocating a new tid.
We could easily reproduce the runsc crash follow
these steps:
1. build test program, and run inside container
int main(int argc, char *argv[])
{
pid_t cpid, spid;
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) {
pid_t sid = setsid();
printf("Start New Session %ld\n",sid);
printf("Child PID %ld / PPID %ld / PGID %ld / SID %ld\n",
getpid(),getppid(),getpgid(getpid()),getsid(getpid()));
spid = fork();
if (spid == 0) {
setpgid(getpid(), getpid());
printf("Set GrandSon as New Process Group\n");
printf("GrandSon PID %ld / PPID %ld / PGID %ld / SID %ld\n",
getpid(),getppid(),getpgid(getpid()),getsid(getpid()));
while(1) {
usleep(1);
}
}
sleep(3);
exit(0);
} else {
exit(0);
}
return 0;
}
2. build hello program
int main(int argc, char *argv[])
{
printf("Current PID is %ld\n", (long) getpid());
return 0;
}
3. run script on host which run hello inside container, you can
speed up the test with set TasksLimit as lower value.
for (( i=0; i<65535; i++ ))
do
docker exec <container id> /test/hello
done
4. when hello process reusing the process group of loop process,
runsc will crash.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x79f0c8]
goroutine 612475 [running]:
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*ProcessGroup).decRefWithParent(0x0, 0x0)
pkg/sentry/kernel/sessions.go:160 +0x78
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).exitNotifyLocked(0xc000663500, 0x0)
pkg/sentry/kernel/task_exit.go:672 +0x2b7
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runExitNotify).execute(0x0, 0xc000663500, 0x0, 0x0)
pkg/sentry/kernel/task_exit.go:542 +0xc4
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run(0xc000663500, 0xc)
pkg/sentry/kernel/task_run.go:91 +0x194
created by gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Start
pkg/sentry/kernel/task_start.go:286 +0xfe
|
|
|
|
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.
This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.
The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf
NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.
Updates #230
PiperOrigin-RevId: 253168283
|
|
|
|
All functions which allocate objects containing AtomicRefCounts will soon need
a context.
PiperOrigin-RevId: 253147709
|
|
|
|
The deadlock can occur when both ends of a connected Unix socket which has
FIOASYNC enabled on at least one end are closed at the same time. One end
notifies that it is closing, calling (*waiter.Queue).Notify which takes
waiter.Queue.mu (as a read lock) and then calls (*FileAsync).Callback, which
takes FileAsync.mu. The other end tries to unregister for notifications by
calling (*FileAsync).Unregister, which takes FileAsync.mu and calls
(*waiter.Queue).EventUnregister which takes waiter.Queue.mu.
This is fixed by moving the calls to waiter.Waitable.EventRegister and
waiter.Waitable.EventUnregister outside of the protection of any mutex used
in (*FileAsync).Callback.
The new test is related, but does not cover this particular situation.
Also fix a data race on FileAsync.e.Callback. (*FileAsync).Callback checked
FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter
calling (*FileAsync).Callback could not and did not. This is fixed by making
FileAsync.e.Callback immutable before passing it to the waiter for the first
time.
Fixes #346
PiperOrigin-RevId: 253138340
|
|
SO_TYPE was already implemented for everything but netlink sockets.
PiperOrigin-RevId: 253138157
|
|
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
PiperOrigin-RevId: 253122166
|
|
|
|
|
|
|
|
PiperOrigin-RevId: 252918338
|
|
Change-Id: I7457a11de4725e1bf3811420c505d225b1cb6943
|
|
|
|
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.
PiperOrigin-RevId: 252889093
|
|
|
|
|
|
PiperOrigin-RevId: 252855280
|
|
|
|
|
|
For sendfile(2), we propagate a TCP error through the system call layer.
This should be eaten if there is a partial result. This change also adds
a test to ensure that there is no panic in this case, for both TCP sockets
and unix domain sockets.
PiperOrigin-RevId: 252746192
|
|
|