Age | Commit message (Collapse) | Author |
|
lisafs is only supported in VFS2. Added a runsc flag which enables lisafs.
When the flag is enabled, the gofer process and the client communicate using
lisafs protocol instead of 9P.
Added a filesystem option in fsimpl/gofer which indicates if lisafs is being
used. That will be used to gate lisafs on the gofer client.
Note that this change does not make the gofer client use lisafs just yet.
Updates #5465
PiperOrigin-RevId: 397917844
|
|
PiperOrigin-RevId: 397114051
|
|
Add global flags -profile-{block,cpu,heap,mutex} and -trace which
enable collection of the specified profile for the entire duration of a
container execution. This provides a way to definitively start profiling
before that application starts, rather than attempting to race with an
out-of-band `runsc debug`.
Note that only the main boot process is profiled.
This exposed a bug in Task.traceExecEvent: a crash when tracing and
-race are enabled. traceExecEvent is called off of the task goroutine,
but uses the Task as a context, which is a violation of the Task
contract. Switching to the AsyncContext fixes the issue.
Fixes #220
|
|
PiperOrigin-RevId: 396938550
|
|
PiperOrigin-RevId: 396754242
|
|
Rootless mode seems to work fine for simple containers with runsc run,
so allow its use.
Since runsc run is more widely used, require a workable --network option
is passed rather than automatically switching like runsc do does.
Fixes #3036
|
|
Updates #220
|
|
Add Event controls and implement "stream" commands.
PiperOrigin-RevId: 390691702
|
|
Add Usage controls and implement "usage/usagefd" commands.
PiperOrigin-RevId: 390507423
|
|
PiperOrigin-RevId: 390467957
|
|
Refactor mitigate to use /sys/devices/system/cpu/smt/control instead
of individual CPU control files.
PiperOrigin-RevId: 389215975
|
|
Add Fs controls and implement "cat" command.
PiperOrigin-RevId: 388812540
|
|
PiperOrigin-RevId: 386988406
|
|
PiperOrigin-RevId: 385894869
|
|
PiperOrigin-RevId: 384586164
|
|
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.
1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.
Fixes #6180
PiperOrigin-RevId: 384347009
|
|
PiperOrigin-RevId: 384344990
|
|
This patch fixes the local timezone support in logs by creating
etc/localtime in the rootfs of sandbox process and gofer process
based on the current /etc/localtime on host.
Before this patch, the timestamps in sandbox and gofer logs will
fallback to UTC timezone after execving "/proc/self/exe" which
may not be very convenient for users to analyse the logs:
I0708 15:37:43.825100 1 chroot.go:69] Setting up sandbox chroot in "/tmp"
I0708 15:37:43.825189 1 chroot.go:31] Mounting "proc" at "/tmp/proc"
......
I0708 15:37:43.850926 1 cmd.go:73] Execve "/proc/self/exe" again, bye!
I0708 07:37:43.856719 1 main.go:218] ***************************
I0708 07:37:43.856751 1 main.go:219] Args: [runsc-sandbox --root=/run/...]
I0708 07:37:43.856785 1 main.go:220] Version release-20210628.0-27-g02fec8dba5a6
I0708 07:37:43.856795 1 main.go:221] GOOS: linux
I0708 07:37:43.856803 1 main.go:222] GOARCH: amd64
......
Fixes #1984
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
|
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
|
PiperOrigin-RevId: 383750666
|
|
PiperOrigin-RevId: 383705129
|
|
PiperOrigin-RevId: 382845950
|
|
This change makes the checklocks analyzer considerable more powerful, adding:
* The ability to traverse complex structures, e.g. to have multiple nested
fields as part of the annotation.
* The ability to resolve simple anonymous functions and closures, and perform
lock analysis across these invocations. This does not apply to closures that
are passed elsewhere, since it is not possible to know the context in which
they might be invoked.
* The ability to annotate return values in addition to receivers and other
parameters, with the same complex structures noted above.
* Ignoring locking semantics for "fresh" objects, i.e. objects that are
allocated in the local frame (typically a new-style function).
* Sanity checking of locking state across block transitions and returns, to
ensure that no unexpected locks are held.
Note that initially, most of these findings are excluded by a comprehensive
nogo.yaml. The findings that are included are fundamental lock violations.
The changes here should be relatively low risk, minor refactorings to either
include necessary annotations to simplify the code structure (in general
removing closures in favor of methods) so that the analyzer can be easily
track the lock state.
This change additional includes two changes to nogo itself:
* Sanity checking of all types to ensure that the binary and ast-derived
types have a consistent objectpath, to prevent the bug above from occurring
silently (and causing much confusion). This also requires a trick in
order to ensure that serialized facts are consumable downstream. This can
be removed with https://go-review.googlesource.com/c/tools/+/331789 merged.
* A minor refactoring to isolation the objdump settings in its own package.
This was originally used to implement the sanity check above, but this
information is now being passed another way. The minor refactor is preserved
however, since it cleans up the code slightly and is minimal risk.
PiperOrigin-RevId: 382613300
|
|
It defaults to true and setting it to false can cause filesytem corruption.
PiperOrigin-RevId: 378518663
|
|
PiperOrigin-RevId: 372993341
|
|
Add a coverage-report flag that will cause the sandbox to generate a coverage
report (with suffix .cov) in the debug log directory upon exiting. For the
report to be generated, runsc must have been built with the following Bazel
flags: `--collect_code_coverage --instrumentation_filter=...`.
With coverage reports, we should be able to aggregate results across all tests
to surface code coverage statistics for the project as a whole.
The report is simply a text file with each line representing a covered block
as `file:start_line.start_col,end_line.end_col`. Note that this is similar to
the format of coverage reports generated with `go test -coverprofile`,
although we omit the count and number of statements, which are not useful for
us.
Some simple ways of getting coverage reports:
bazel test <some_test> --collect_code_coverage \
--instrumentation_filter=//pkg/...
bazel build //runsc --collect_code_coverage \
--instrumentation_filter=//pkg/...
runsc -coverage-report=dir/ <other_flags> do ...
PiperOrigin-RevId: 368952911
|
|
PiperOrigin-RevId: 368919504
|
|
This is needed to enable Xattrs features required by verity.
PiperOrigin-RevId: 366843640
|
|
Implement a new runsc command to set up a sandbox with verityfs and
run the measure tool. This is loosely forked from the do command, and
currently requires the caller to provide the measure tool binary.
PiperOrigin-RevId: 366553769
|
|
As MDS side channel attack does not affect ARM64, we disable
mitigate on ARM64 in case misusage.
For more detail, please refer to:
https://access.redhat.com/security/vulnerabilities/mds
Signed-off-by: Howard Zhang <howard.zhang@arm.com>
|
|
VFS1 skips over mounts that overrides files in /dev because the list of
files is hardcoded. This is not needed for VFS2 and a recent change
lifted this restriction. However, parts of the code were still skipping
/dev mounts even in VFS2, causing the loader to panic when it ran short
of FDs to connect to the gofer.
PiperOrigin-RevId: 365858436
|
|
Also adds support for clearing the setuid bit when appropriate (writing,
truncating, changing size, changing UID, or changing GID).
VFS2 only.
PiperOrigin-RevId: 364661835
|
|
These host calls are needed for Verity fs to generate/verify hashes.
PiperOrigin-RevId: 364598180
|
|
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.:
```
"mounts": [
...
{
"destination": "/dev",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
...
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=67108864"
]
},
...
```
(This is mostly consistent with how Linux is usually configured, except that
/dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer
implements OCI-runtime-spec-undocumented behavior to create
/dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently
switches /dev to devtmpfs. In VFS1, this is necessary to get device files like
/dev/null at all, since VFS1 doesn't support real device special files, only
what is hardcoded in devfs. VFS2 does support device special files, but using
devtmpfs is the easiest way to get pre-created files in /dev.)
runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1,
this appears to be to avoid introducing a submount overlay for /dev, and is
mostly fine since the typical mode for the /dev/shm mount is ~consistent with
the mode of the /dev/shm directory provided by devfs (modulo the sticky bit).
In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs'
/dev/shm mode is correct for the mount point but not the mount. So turn off
this behavior for VFS2.
After this change:
```
$ docker run --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root 40 Mar 18 00:16 .
drwxr-xr-x 5 root root 360 Mar 18 00:16 ..
$ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwx 1 root root 0 Mar 18 00:16 .
dr-xr-xr-x 1 root root 0 Mar 18 00:16 ..
$ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root 40 Mar 18 00:16 .
drwxr-xr-x 5 root root 320 Mar 18 00:16 ..
```
Fixes #5687
PiperOrigin-RevId: 363699385
|
|
PiperOrigin-RevId: 362360425
|
|
PiperOrigin-RevId: 361689477
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential
Updates #214
PiperOrigin-RevId: 361381490
|
|
Add reverse operation to mitigate that just enables
all CPUs.
PiperOrigin-RevId: 360511215
|
|
Previously, loader.signalProcess was inconsitently using both root and
container's PID namespace to find the process. It used root namespace
for the exec'd process and container's PID namespace for other processes.
This fixes the code to use the root PID namespace across the board, which
is the same PID reported in `runsc ps` (or soon will after
https://github.com/google/gvisor/pull/5519).
PiperOrigin-RevId: 358836297
|
|
PiperOrigin-RevId: 356772367
|
|
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.
This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.
Addresses #172.
PiperOrigin-RevId: 355229833
|
|
Whether the variable was found is already returned by syscall.Getenv.
os.Getenv drops this value while os.Lookupenv passes it along.
PiperOrigin-RevId: 351674032
|
|
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.
PiperOrigin-RevId: 351425971
|
|
Closes #5226
PiperOrigin-RevId: 351259576
|
|
This includes minor fix-ups:
* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.
PiperOrigin-RevId: 350205718
|
|
Closes #5052
PiperOrigin-RevId: 349579814
|
|
This allows for a model of profiling when you can start collection, and
it will terminate when the sandbox terminates. Without this synchronous
call, it is effectively impossible to collect length blocking and mutex
profiles.
PiperOrigin-RevId: 349483418
|
|
This allows to find all containers inside a sandbox more efficiently.
This operation is required every time a container starts and stops,
and previously required loading *all* container state files to check
whether the container belonged to the sandbox.
Apert from being inneficient, it has caused problems when state files
are stale or corrupt, causing inavalability to create any container.
Also adjust commands `list` and `debug` to skip over files that fail
to load.
Resolves #5052
PiperOrigin-RevId: 348050637
|
|
This command takes instruction pointers from stdin and converts them into their
corresponding file names and line/column numbers in the runsc source code. The
inputs are not interpreted as actual addresses, but as synthetic values that are
exposed through /sys/kernel/debug/kcov. One can extract coverage information
from kcov and translate those values into locations in the source code by
running symbolize on the same runsc binary.
This will allow us to generate syzkaller coverage reports.
PiperOrigin-RevId: 347089624
|
|
c.Usage() only returns a string; f.Usage() will print the usage message.
PiperOrigin-RevId: 345500123
|