Age | Commit message (Collapse) | Author |
|
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.
1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.
Fixes #6180
PiperOrigin-RevId: 384347009
|
|
PiperOrigin-RevId: 384344990
|
|
PiperOrigin-RevId: 378726430
|
|
When loading cgroups for another process, `/proc/self` was used in
a few places, causing the end state to be a mix of the process
and self. This is now fixes to always use the proper `/proc/[pid]`
path.
Added net_prio and net_cls to the list of optional controllers. This
is to allow runsc to execute then these cgroups are disabled as long
as there are no net_prio and net_cls limits that need to be applied.
Deflake TestMultiContainerEvent.
Closes #5875
Closes #5887
PiperOrigin-RevId: 372242687
|
|
According to the OCI spec Mount.Type is an optional field and it
defaults to "bind" when any of "bind" or "rbind" is included in
Mount.Options.
Also fix the shim to remove bind/rbind from options when mount is
converted from bind to tmpfs inside the Sentry.
Fixes #2330
Fixes #3274
PiperOrigin-RevId: 371996891
|
|
PiperOrigin-RevId: 369505182
|
|
VFS1 skips over mounts that overrides files in /dev because the list of
files is hardcoded. This is not needed for VFS2 and a recent change
lifted this restriction. However, parts of the code were still skipping
/dev mounts even in VFS2, causing the loader to panic when it ran short
of FDs to connect to the gofer.
PiperOrigin-RevId: 365858436
|
|
PiperOrigin-RevId: 362406813
|
|
PiperOrigin-RevId: 361962416
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential
Updates #214
PiperOrigin-RevId: 361381490
|
|
`runsc ps` currently return pid for a task's immediate pid namespace,
which is confusing when there're multiple pid namespaces. We should
return only pids in the root namespace.
Before:
```
1000 1 0 0 ? 02:24 250ms chrome
1000 1 0 0 ? 02:24 40ms dumb-init
1000 1 0 0 ? 02:24 240ms chrome
1000 2 1 0 ? 02:24 2.78s node
```
After:
```
UID PID PPID C TTY STIME TIME CMD
1000 1 0 0 ? 12:35 0s dumb-init
1000 2 1 7 ? 12:35 240ms node
1000 13 2 21 ? 12:35 2.33s chrome
1000 27 13 3 ? 12:35 260ms chrome
```
Signed-off-by: Daniel Dao <dqminh@cloudflare.com>
|
|
Updates #3481
Closes #5430
PiperOrigin-RevId: 358923208
|
|
Previously, loader.signalProcess was inconsitently using both root and
container's PID namespace to find the process. It used root namespace
for the exec'd process and container's PID namespace for other processes.
This fixes the code to use the root PID namespace across the board, which
is the same PID reported in `runsc ps` (or soon will after
https://github.com/google/gvisor/pull/5519).
PiperOrigin-RevId: 358836297
|
|
Panic seen at some code path like control.ExecAsync where
ctx does not have a Task.
Reported-by: syzbot+55ce727161cf94a7b7d6@syzkaller.appspotmail.com
PiperOrigin-RevId: 355960596
|
|
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.
This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.
Addresses #172.
PiperOrigin-RevId: 355229833
|
|
Updates #1663
PiperOrigin-RevId: 355077816
|
|
Updates #5226
PiperOrigin-RevId: 353262133
|
|
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.
PiperOrigin-RevId: 351425971
|
|
Closes #5226
PiperOrigin-RevId: 351259576
|
|
This allows to find all containers inside a sandbox more efficiently.
This operation is required every time a container starts and stops,
and previously required loading *all* container state files to check
whether the container belonged to the sandbox.
Apert from being inneficient, it has caused problems when state files
are stale or corrupt, causing inavalability to create any container.
Also adjust commands `list` and `debug` to skip over files that fail
to load.
Resolves #5052
PiperOrigin-RevId: 348050637
|
|
PiperOrigin-RevId: 345399936
|
|
Container is not thread-safe, locking must be done in the caller.
The test was calling Container.Wait() from multiple threads with
no synchronization.
Also removed Container.WaitPID from test because the process might
have already existed when wait is called.
PiperOrigin-RevId: 343176280
|
|
Fixes #2714
PiperOrigin-RevId: 342950412
|
|
Due to a type doDestroyNotStartedTest was being tested
2x instead of doDestroyStartingTest.
PiperOrigin-RevId: 340969797
|
|
This was causing gvisor-containerd-shim to crash because the command
suceeded, but there was no stat present.
PiperOrigin-RevId: 340964921
|
|
When OOM score adjustment needs to be set, all the containers need to be
loaded to find all containers that belong to the sandbox. However, each
load signals the container to ensure it is still alive. OOM score
adjustment is set during creation and deletion of every container, generating
a flood of signals to all containers. The fix removes the signal check
when it's not needed.
There is also a race fetching OOM score adjustment value from the parent when
the sandbox exits at the same time (the time it took to signal containers above
made this window quite large). The fix is to store the original value
in the sandbox state file and use it when the value needs to be restored.
Also add more logging and made the existing ones more consistent to help with
debugging.
PiperOrigin-RevId: 340940799
|
|
PiperOrigin-RevId: 338372736
|
|
|
|
There were a few problems with cgroups:
- cleanup loop what breaking too early
- parse of /proc/[pid]/cgroups was skipping "name=systemd"
because "name=" was not being removed from name.
- When no limits are specified, fillFromAncestor was not being
called, causing a failure to set cpuset.mems
Updates #4536
PiperOrigin-RevId: 337947356
|
|
|
|
When all container tasks finish, they release the mount which in turn
will close the 9P session to the gofer. The gofer exits when the connection
closes, triggering the gofer monitor. The gofer monitor will _think_ that
the gofer died prematurely and destroy the container. Then when the caller
attempts to wait for the container, e.g. to get the exit code, wait fails
saying the container doesn't exist.
Gofer monitor now just SIGKILLs the container, and let the normal teardown
process to happen, which will evetually destroy the container at the right
time. Also, fixed an issue with exec racing with container's init process
exiting.
Closes #1487
PiperOrigin-RevId: 335537350
|
|
Updates #1487
PiperOrigin-RevId: 335516732
|
|
|
|
based on arch, apply different syscall number for
sched_rr_get_interval
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
|
|
Gofer panics are suppressed by p9 server and an error
is returned to the caller, making it effectively the
same as returning EROFS.
PiperOrigin-RevId: 332282959
|
|
All tests under runsc are passing with overlay enabled.
Updates #1487, #1199
PiperOrigin-RevId: 332181267
|
|
ptrace was always selected as a dimension before, but not
anymore. Some tests were specifying "overlay" expecting that
to be in addition to the default.
PiperOrigin-RevId: 332004111
|
|
Useful when you want to run multiple containers with the same config.
And runc does that too.
|
|
Updates #1487
PiperOrigin-RevId: 330580699
|
|
VFS1 and VFS2 host FDs have different dupping behavior,
making error prone to code for both. Change the contract
so that FDs are released as they are used, so the caller
can simple defer a block that closes all remaining files.
This also addresses handling of partial failures.
With this fix, more VFS2 tests can be enabled.
Updates #1487
PiperOrigin-RevId: 330112266
|
|
Updates #2972
PiperOrigin-RevId: 329584905
|
|
Updates #3494
PiperOrigin-RevId: 327548511
|
|
The bazel server was being started as the wrong user, leading to issues
where the container would suddenly exit during a build.
We can also simplify the waiting logic by starting the container in two
separate steps: those that must complete first, then the asynchronous bit.
PiperOrigin-RevId: 323391161
|
|
PiperOrigin-RevId: 321449877
|
|
Much like the boot process, apply pdeathsig to the gofer for cases where
the sandbox lifecycle is attached to the parent (runsc run/do).
This isn't strictly necessary, as the gofer normally exits once the
sentry disappears, but this makes that extra reliable.
|
|
- Combine process creation code that is shared between
root and subcontainer processes
- Move root container information into a struct for
clarity
Updates #2714
PiperOrigin-RevId: 321204798
|
|
PiperOrigin-RevId: 321053634
|
|
Container restart test is disabled for VFS2 for now.
Updates #1487
PiperOrigin-RevId: 320296401
|
|
Fixes #701
PiperOrigin-RevId: 316025635
|
|
Run vs. exec, VFS1 vs. VFS2 were executable lookup were
slightly different from each other. Combine them all
into the same logic.
PiperOrigin-RevId: 315426443
|