Age | Commit message (Collapse) | Author |
|
Also adds support for clearing the setuid bit when appropriate (writing,
truncating, changing size, changing UID, or changing GID).
VFS2 only.
PiperOrigin-RevId: 364661835
|
|
These host calls are needed for Verity fs to generate/verify hashes.
PiperOrigin-RevId: 364598180
|
|
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.:
```
"mounts": [
...
{
"destination": "/dev",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
...
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=67108864"
]
},
...
```
(This is mostly consistent with how Linux is usually configured, except that
/dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer
implements OCI-runtime-spec-undocumented behavior to create
/dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently
switches /dev to devtmpfs. In VFS1, this is necessary to get device files like
/dev/null at all, since VFS1 doesn't support real device special files, only
what is hardcoded in devfs. VFS2 does support device special files, but using
devtmpfs is the easiest way to get pre-created files in /dev.)
runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1,
this appears to be to avoid introducing a submount overlay for /dev, and is
mostly fine since the typical mode for the /dev/shm mount is ~consistent with
the mode of the /dev/shm directory provided by devfs (modulo the sticky bit).
In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs'
/dev/shm mode is correct for the mount point but not the mount. So turn off
this behavior for VFS2.
After this change:
```
$ docker run --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root 40 Mar 18 00:16 .
drwxr-xr-x 5 root root 360 Mar 18 00:16 ..
$ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwx 1 root root 0 Mar 18 00:16 .
dr-xr-xr-x 1 root root 0 Mar 18 00:16 ..
$ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root 40 Mar 18 00:16 .
drwxr-xr-x 5 root root 320 Mar 18 00:16 ..
```
Fixes #5687
PiperOrigin-RevId: 363699385
|
|
PiperOrigin-RevId: 362360425
|
|
PiperOrigin-RevId: 361689477
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential
Updates #214
PiperOrigin-RevId: 361381490
|
|
Add reverse operation to mitigate that just enables
all CPUs.
PiperOrigin-RevId: 360511215
|
|
Previously, loader.signalProcess was inconsitently using both root and
container's PID namespace to find the process. It used root namespace
for the exec'd process and container's PID namespace for other processes.
This fixes the code to use the root PID namespace across the board, which
is the same PID reported in `runsc ps` (or soon will after
https://github.com/google/gvisor/pull/5519).
PiperOrigin-RevId: 358836297
|
|
PiperOrigin-RevId: 356772367
|
|
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.
This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.
Addresses #172.
PiperOrigin-RevId: 355229833
|
|
Whether the variable was found is already returned by syscall.Getenv.
os.Getenv drops this value while os.Lookupenv passes it along.
PiperOrigin-RevId: 351674032
|
|
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.
PiperOrigin-RevId: 351425971
|
|
Closes #5226
PiperOrigin-RevId: 351259576
|
|
This includes minor fix-ups:
* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.
PiperOrigin-RevId: 350205718
|
|
Closes #5052
PiperOrigin-RevId: 349579814
|
|
This allows for a model of profiling when you can start collection, and
it will terminate when the sandbox terminates. Without this synchronous
call, it is effectively impossible to collect length blocking and mutex
profiles.
PiperOrigin-RevId: 349483418
|
|
This allows to find all containers inside a sandbox more efficiently.
This operation is required every time a container starts and stops,
and previously required loading *all* container state files to check
whether the container belonged to the sandbox.
Apert from being inneficient, it has caused problems when state files
are stale or corrupt, causing inavalability to create any container.
Also adjust commands `list` and `debug` to skip over files that fail
to load.
Resolves #5052
PiperOrigin-RevId: 348050637
|
|
This command takes instruction pointers from stdin and converts them into their
corresponding file names and line/column numbers in the runsc source code. The
inputs are not interpreted as actual addresses, but as synthetic values that are
exposed through /sys/kernel/debug/kcov. One can extract coverage information
from kcov and translate those values into locations in the source code by
running symbolize on the same runsc binary.
This will allow us to generate syzkaller coverage reports.
PiperOrigin-RevId: 347089624
|
|
c.Usage() only returns a string; f.Usage() will print the usage message.
PiperOrigin-RevId: 345500123
|
|
Fixes #2714
PiperOrigin-RevId: 342950412
|
|
This was causing gvisor-containerd-shim to crash because the command
suceeded, but there was no stat present.
PiperOrigin-RevId: 340964921
|
|
When OOM score adjustment needs to be set, all the containers need to be
loaded to find all containers that belong to the sandbox. However, each
load signals the container to ensure it is still alive. OOM score
adjustment is set during creation and deletion of every container, generating
a flood of signals to all containers. The fix removes the signal check
when it's not needed.
There is also a race fetching OOM score adjustment value from the parent when
the sandbox exits at the same time (the time it took to signal containers above
made this window quite large). The fix is to store the original value
in the sandbox state file and use it when the value needs to be restored.
Also add more logging and made the existing ones more consistent to help with
debugging.
PiperOrigin-RevId: 340940799
|
|
PiperOrigin-RevId: 340536306
|
|
PiperOrigin-RevId: 339385609
|
|
|
|
Subcontainers are only configured when the container starts, however because
start doesn't load the spec, flag annotations that may override flags were
not getting applied to the configuration.
Updates #3494
PiperOrigin-RevId: 338610953
|
|
In case setting up network fails, log a warning and fallback to internal
network.
Closes #4498
PiperOrigin-RevId: 337442632
|
|
Gofer panics are suppressed by p9 server and an error
is returned to the caller, making it effectively the
same as returning EROFS.
PiperOrigin-RevId: 332282959
|
|
Updates #2972
PiperOrigin-RevId: 329584905
|
|
This allows runsc flags to be set per sandbox instance. For
example, K8s pod annotations can be used to enable
--debug for a single pod, making troubleshoot much easier.
Similarly, features like --vfs2 can be enabled for
experimentation without affecting other pods in the node.
Closes #3494
PiperOrigin-RevId: 329542815
|
|
Updates #3494
PiperOrigin-RevId: 327548511
|
|
Also removes `--profile-goroutine` because it's equivalent
to `debug --stacks`.
PiperOrigin-RevId: 325061502
|
|
- Combine process creation code that is shared between
root and subcontainer processes
- Move root container information into a struct for
clarity
Updates #2714
PiperOrigin-RevId: 321204798
|
|
PiperOrigin-RevId: 321053634
|
|
Adds a netns flag to runsc spec that allows users to specify a network
namespace path when creating a sample config.json file. Also, adds the ability
to specify the command arguments used when running the container.
This will make it easier for new users to create sample OCI bundles without
having to edit the config.json by hand.
PiperOrigin-RevId: 320486267
|
|
Previously, it was not possible to encode/decode an object graph which
contained a pointer to a field within another type. This was because the
encoder was previously unable to disambiguate a pointer to an object and a
pointer within the object.
This CL remedies this by constructing an address map tracking the full memory
range object occupy. The encoded Refvalue message has been extended to allow
references to children objects within another object. Because the encoding
process may learn about object structure over time, we cannot encode any
objects under the entire graph has been generated.
This CL also updates the state package to use standard interfaces intead of
reflection-based dispatch in order to improve performance overall. This
includes a custom wire protocol to significantly reduce the number of
allocations and take advantage of structure packing.
As part of these changes, there are a small number of minor changes in other
places of the code base:
* The lists used during encoding are changed to use intrusive lists with the
objectEncodeState directly, which required that the ilist Len() method is
updated to work properly with the ElementMapper mechanism.
* A bug is fixed in the list code wherein Remove() called on an element that is
already removed can corrupt the list (removing the element if there's only a
single element). Now the behavior is correct.
* Standard error wrapping is introduced.
* Compressio was updated to implement the new wire.Reader and wire.Writer
inteface methods directly. The lack of a ReadByte and WriteByte caused issues
not due to interface dispatch, but because underlying slices for a Read or
Write call through an interface would always escape to the heap!
* Statify has been updated to support the new APIs.
See README.md for a description of how the new mechanism works.
PiperOrigin-RevId: 318010298
|
|
PiperOrigin-RevId: 315583963
|
|
SetTraceback("all") does not include all goroutines in panics (you didn't think
it was that simple, did you?). It includes all _user_ goroutines; those started
by the runtime (such as GC workers) are excluded.
Switch to "system" to additionally include runtime goroutines, which are useful
to track down bugs in the runtime itself.
PiperOrigin-RevId: 314204473
|
|
No writes are expected to the underlying filesystem when
using --overlay.
PiperOrigin-RevId: 314171457
|
|
|
|
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense
in one place.
* Drop "user-space kernel" and use "application kernel". The term "user-space
kernel" is confusing when some platform implementation do not run in
user-space (instead running in guest ring zero).
* Clear up the relationship between the Platform page in the user guide and the
Platform page in the architecture guide, and ensure they are cross-linked.
* Restore the call-to-action quick start link in the main page, and drop the
GitHub link (which also appears in the top-right).
* Improve image formatting by centering all doc and blog images, and move the
image captions to the alt text.
PiperOrigin-RevId: 311845158
|
|
We can register any number of tables with any number of architectures, and
need not limit the definitions to the architecture in question. This allows
runsc to generate documentation for all architectures simultaneously.
Similarly, this simplifies the VFSv2 patching process.
PiperOrigin-RevId: 310224827
|
|
Previously we unconditionally failed to cleanup the networking files
(hostname, resolve.conf, hosts), and failed to cleanup the netns, etc on
partial setup failure.
We can drop the iptables commands from cleanup, as the routes
automatically go away when the device is deleted. Those commands were
failing previously.
Forward signals to the container, allowing it to exit normally when a
signal is received, and then for runsc to run the cleanup. This doesn't
cover cleanup when runsc is signalled before the container start, it
covers the most common case.
Fixes #2539
Fixes #2540
|
|
This change adds a layer of abstraction around the internal Docker APIs,
and eliminates all direct dependencies on Dockerfiles in the infrastructure.
A subsequent change will automated the generation of local images (with
efficient caching). Note that this change drops the use of bazel container
rules, as that experiment does not seem to be viable.
PiperOrigin-RevId: 308095430
|
|
PiperOrigin-RevId: 307941984
|
|
This is to make easier to find corresponding logs in
case test fails.
PiperOrigin-RevId: 307104283
|
|
PiperOrigin-RevId: 305592245
|
|
PiperOrigin-RevId: 301949722
|
|
When the sandbox runs in attached more, e.g. runsc do, runsc run, the
sandbox lifetime is controlled by the parent process. This wasn't working
in all cases because PR_GET_PDEATHSIG doesn't propagate through execve
when the process changes uid/gid. So it was getting dropped when the
sandbox execve's to change to user nobody.
PiperOrigin-RevId: 300601247
|
|
|