Age | Commit message (Collapse) | Author |
|
It was confusing to find functions relating to root and non-root
containers. Replace "non-root" and "subcontainer" and make naming
consistent in Sandbox and controller.
PiperOrigin-RevId: 384512518
|
|
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.
1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.
Fixes #6180
PiperOrigin-RevId: 384347009
|
|
PiperOrigin-RevId: 384344990
|
|
PiperOrigin-RevId: 383705129
|
|
Update/remove most syserror errors to linuxerr equivalents. For list
of removed errors, see //pkg/syserror/syserror.go.
PiperOrigin-RevId: 382574582
|
|
PiperOrigin-RevId: 382194711
|
|
PiperOrigin-RevId: 381964660
|
|
PiperOrigin-RevId: 381561785
|
|
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors.
This will facilitate removal of syserror definitions in a followup, and
finding needed conversions from unix.Errno to linuxerr.
PiperOrigin-RevId: 380909667
|
|
The typical sequence of calls to start a container looks like this
ct, err := container.New(conf, containerArgs)
defer ct.Destroy()
ct.Start(conf)
ws, err := ct.Wait()
For the root container, ct.Destroy() kills the sandbox process. This
doesn't look like a right wait to stop it. For example, all ongoing rpc
calls are aborted in this case. If everything is going alright, we can
just wait and it will exit itself.
Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
...and pass it explicitly.
This reverts commit b63e61828d0652ad1769db342c17a3529d2d24ed.
PiperOrigin-RevId: 380039167
|
|
PiperOrigin-RevId: 378726430
|
|
Fixes #214
PiperOrigin-RevId: 378680466
|
|
PiperOrigin-RevId: 378677167
|
|
It defaults to true and setting it to false can cause filesytem corruption.
PiperOrigin-RevId: 378518663
|
|
HostFileMapper.RegenerateMappings calls mmap with
MAP_SHARED|MAP_FIXED and these were not allowed.
Closes #6116
PiperOrigin-RevId: 377428463
|
|
Avoids a race condition at kernel initialization.
Updates #6057.
PiperOrigin-RevId: 377357723
|
|
...except in tests.
Note this replaces some uses of a cryptographic RNG with a plain RNG.
PiperOrigin-RevId: 376070666
|
|
PiperOrigin-RevId: 375843579
|
|
Remove useless conversions. Avoid unhandled errors.
PiperOrigin-RevId: 375834275
|
|
O_PATH is now implemented in vfs2.
Fixes #2782.
PiperOrigin-RevId: 373861410
|
|
PiperOrigin-RevId: 372608247
|
|
This patch is to solve problem that vCPU timer mess up when
adding vCPU dynamically on ARM64, for detailed information
please refer to:
https://github.com/google/gvisor/issues/5739
There is no influence on x86 and here are main changes for
ARM64:
1. create maxVCPUs number of vCPU in machine initialization
2. we want to sync gvisor vCPU number with host CPU number,
so use smaller number between runtime.NumCPU and
KVM_CAP_MAX_VCPUS to be maxVCPUS
3. put unused vCPUs into architecture-specific map initialvCPUs
4. When machine need to bind a new vCPU with tid, rather
than creating new one, it would pick a vCPU from map initalvCPUs
5. change the setSystemTime function. When vCPU number increasing,
the time cost for function setTSC(use syscall to set cntvoff) is
liner growth from around 300 ns to 100000 ns, and this leads to
the function setSystemTimeLegacy can not get correct offset
value.
6. initializing StdioFDs and goferFD before a platform to avoid
StdioFDs confects with vCPU fds
Signed-off-by: howard zhang <howard.zhang@arm.com>
|
|
According to the OCI spec Mount.Type is an optional field and it
defaults to "bind" when any of "bind" or "rbind" is included in
Mount.Options.
Also fix the shim to remove bind/rbind from options when mount is
converted from bind to tmpfs inside the Sentry.
Fixes #2330
Fixes #3274
PiperOrigin-RevId: 371996891
|
|
Weirdness metric contains fields to track the number of clock fallback,
partial result and vsyscalls. This metric will avoid the overhead of
having three different metrics (fallbackMetric, partialResultMetric,
vsyscallCount).
PiperOrigin-RevId: 369970218
|
|
In the previous spot, there was a roughly 50% chance that leak checking would
actually run. Move it to the waitContainer() call on the root container, where
it is guaranteed to run before the sandbox process is terminated. Add it to
runsc/cli/main.go as well for good measure, in case the sandbox exit path does
not involve waitContainer().
PiperOrigin-RevId: 369329796
|
|
Add a coverage-report flag that will cause the sandbox to generate a coverage
report (with suffix .cov) in the debug log directory upon exiting. For the
report to be generated, runsc must have been built with the following Bazel
flags: `--collect_code_coverage --instrumentation_filter=...`.
With coverage reports, we should be able to aggregate results across all tests
to surface code coverage statistics for the project as a whole.
The report is simply a text file with each line representing a covered block
as `file:start_line.start_col,end_line.end_col`. Note that this is similar to
the format of coverage reports generated with `go test -coverprofile`,
although we omit the count and number of statements, which are not useful for
us.
Some simple ways of getting coverage reports:
bazel test <some_test> --collect_code_coverage \
--instrumentation_filter=//pkg/...
bazel build //runsc --collect_code_coverage \
--instrumentation_filter=//pkg/...
runsc -coverage-report=dir/ <other_flags> do ...
PiperOrigin-RevId: 368952911
|
|
Allow user mounting a verity fs on an existing mount by specifying mount
flags root_hash and lower_path.
PiperOrigin-RevId: 366843846
|
|
A skeleton implementation of cgroupfs. It supports trivial cpu and
memory controllers with no support for hierarchies.
PiperOrigin-RevId: 366561126
|
|
Implement a new runsc command to set up a sandbox with verityfs and
run the measure tool. This is loosely forked from the do command, and
currently requires the caller to provide the measure tool binary.
PiperOrigin-RevId: 366553769
|
|
VFS1 skips over mounts that overrides files in /dev because the list of
files is hardcoded. This is not needed for VFS2 and a recent change
lifted this restriction. However, parts of the code were still skipping
/dev mounts even in VFS2, causing the loader to panic when it ran short
of FDs to connect to the gofer.
PiperOrigin-RevId: 365858436
|
|
--file-access-mounts flag is similar to --file-access, but controls
non-root mounts that were previously mounted in shared mode only.
This gives more flexibility to control how mounts are shared within
a container.
PiperOrigin-RevId: 364669882
|
|
containerd usually configures both /dev and /dev/shm as tmpfs mounts, e.g.:
```
"mounts": [
...
{
"destination": "/dev",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
...
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/moby/10eedbd6a0e7937ddfcab90f2c25bd9a9968b734c4ae361318142165d445e67e/shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=67108864"
]
},
...
```
(This is mostly consistent with how Linux is usually configured, except that
/dev is conventionally devtmpfs, not regular tmpfs. runc/libcontainer
implements OCI-runtime-spec-undocumented behavior to create
/dev/{ptmx,fd,stdin,stdout,stderr} in non-bind /dev mounts. runsc silently
switches /dev to devtmpfs. In VFS1, this is necessary to get device files like
/dev/null at all, since VFS1 doesn't support real device special files, only
what is hardcoded in devfs. VFS2 does support device special files, but using
devtmpfs is the easiest way to get pre-created files in /dev.)
runsc ignores many /dev submounts in the spec, including /dev/shm. In VFS1,
this appears to be to avoid introducing a submount overlay for /dev, and is
mostly fine since the typical mode for the /dev/shm mount is ~consistent with
the mode of the /dev/shm directory provided by devfs (modulo the sticky bit).
In VFS2, this is vestigial (VFS2 does not use submount overlays), and devtmpfs'
/dev/shm mode is correct for the mount point but not the mount. So turn off
this behavior for VFS2.
After this change:
```
$ docker run --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root 40 Mar 18 00:16 .
drwxr-xr-x 5 root root 360 Mar 18 00:16 ..
$ docker run --runtime=runsc --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwx 1 root root 0 Mar 18 00:16 .
dr-xr-xr-x 1 root root 0 Mar 18 00:16 ..
$ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal ls -lah /dev/shm
total 0
drwxrwxrwt 2 root root 40 Mar 18 00:16 .
drwxr-xr-x 5 root root 320 Mar 18 00:16 ..
```
Fixes #5687
PiperOrigin-RevId: 363699385
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential
Updates #214
PiperOrigin-RevId: 361381490
|
|
Syzkaller hosts contains many audit messages that runsc tries
to call the clock_nanosleep syscall.
PiperOrigin-RevId: 359331413
|
|
Previously, loader.signalProcess was inconsitently using both root and
container's PID namespace to find the process. It used root namespace
for the exec'd process and container's PID namespace for other processes.
This fixes the code to use the root PID namespace across the board, which
is the same PID reported in `runsc ps` (or soon will after
https://github.com/google/gvisor/pull/5519).
PiperOrigin-RevId: 358836297
|
|
Operations are now shut down automatically by the main Stop
command, and it is not necessary to call Stop during Destroy.
Fixes #5454
PiperOrigin-RevId: 357295930
|
|
Some versions of the Go runtime call getcpu(), so add it for compatibility. The
hostcpu package already uses getcpu() on arm64.
PiperOrigin-RevId: 355717757
|
|
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.
This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.
Addresses #172.
PiperOrigin-RevId: 355229833
|
|
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.
PiperOrigin-RevId: 351425971
|
|
Closes #5226
PiperOrigin-RevId: 351259576
|
|
This includes minor fix-ups:
* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.
PiperOrigin-RevId: 350205718
|
|
PiperOrigin-RevId: 350159657
|
|
This allows for a model of profiling when you can start collection, and
it will terminate when the sandbox terminates. Without this synchronous
call, it is effectively impossible to collect length blocking and mutex
profiles.
PiperOrigin-RevId: 349483418
|
|
PiperOrigin-RevId: 348055514
|
|
Closes #5048
PiperOrigin-RevId: 348050472
|
|
There are surprisingly few syscall tests that run with hostinet. For example
running the following command only returns two results:
`bazel query test/syscalls:all | grep hostnet`
I think as a result, as our control messages evolved, hostinet was left
behind. Update it to support all control messages netstack supports.
This change also updates sentry's control message parsing logic to make it up to
date with all the control messages we support.
PiperOrigin-RevId: 347508892
|
|
PiperOrigin-RevId: 347047550
|
|
PiperOrigin-RevId: 346101076
|
|
Fixes #4991
PiperOrigin-RevId: 345800333
|