Age | Commit message (Collapse) | Author |
|
In --network=sandbox mode, we create SOCK_RAW sockets on the given
network device. The code tries to force-set 4MiB rcv and snd buffers
on it, but in certain situations it might fail. There is no reason
for refusing the sandbox startup in such case - we should bump the
buffers to max availabe size and just move on.
|
|
OCI configuration includes support for specifying seccomp filters. In runc,
these filter configurations are converted into seccomp BPF programs and loaded
into the kernel via libseccomp. runsc needs to be a static binary so, for
runsc, we cannot rely on a C library and need to implement the functionality
in Go.
The generator added here implements basic support for taking OCI seccomp
configuration and converting it into a seccomp BPF program with the same
behavior as a program generated by libseccomp.
- New conditional operations were added to pkg/seccomp to support operations
available in OCI.
- AllowAny and AllowValue were renamed to MatchAny and EqualTo to better reflect
that syscalls matching the conditionals result in the provided action not
simply SCMP_RET_ALLOW.
- BuildProgram in pkg/seccomp no longer panics if provided an empty list of
rules. It now builds a program with the architecture sanity check only.
- ProgramBuilder now allows adding labels that are unused. However, backwards
jumps are still not permitted.
Fixes #510
PiperOrigin-RevId: 331938697
|
|
Updates #1487
PiperOrigin-RevId: 330580699
|
|
The existing implementation for TransportProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.
This change introduces marker interfaces for transport protocol options
that may be set or queried which transport protocol option types
implement to ensure that invalid types are caught at compile time.
Different interfaces are used to allow the compiler to enforce read-only
or set-only socket options.
RELNOTES: n/a
PiperOrigin-RevId: 330559811
|
|
VFS1 and VFS2 host FDs have different dupping behavior,
making error prone to code for both. Change the contract
so that FDs are released as they are used, so the caller
can simple defer a block that closes all remaining files.
This also addresses handling of partial failures.
With this fix, more VFS2 tests can be enabled.
Updates #1487
PiperOrigin-RevId: 330112266
|
|
PiperOrigin-RevId: 329710371
|
|
This is to cover the common pattern: open->read/write->close,
where SetAttr needs to be called to update atime/mtime before
the file is closed.
Benchmark results:
BM_OpenReadClose/10240 CPU
setattr+clunk: 63783 ns
VFS2: 68109 ns
VFS1: 72507 ns
Updates #1198
PiperOrigin-RevId: 329628461
|
|
Updates #2972
PiperOrigin-RevId: 329584905
|
|
This allows runsc flags to be set per sandbox instance. For
example, K8s pod annotations can be used to enable
--debug for a single pod, making troubleshoot much easier.
Similarly, features like --vfs2 can be enabled for
experimentation without affecting other pods in the node.
Closes #3494
PiperOrigin-RevId: 329542815
|
|
Currently the stdio FDs are not dupped and will be closed
unexpectedly in VFS2 when starting a child container. This
patch fixes this issue.
Fixes: #3821
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
|
The existing implementation for NetworkProtocol.{Set}Option take
arguments of an empty interface type which all types (implicitly)
implement; any type may be passed to the functions.
This change introduces marker interfaces for network protocol options
that may be set or queried which network protocol option types implement
to ensure that invalid types are caught at compile time. Different
interfaces are used to allow the compiler to enforce read-only or
set-only socket options.
PiperOrigin-RevId: 328980359
|
|
Use reflection and tags to provide automatic conversion from
Config to flags. This makes adding new flags less error-prone,
skips flags using default values (easier to read), and makes
tests correctly use default flag values for test Configs.
Updates #3494
PiperOrigin-RevId: 328662070
|
|
In Linux, a kernel configuration is set that compiles the kernel with a
custom function that is called at the beginning of every basic block, which
updates the memory-mapped coverage information. The Go coverage tool does not
allow us to inject arbitrary instructions into basic blocks, but it does
provide data that we can convert to a kcov-like format and transfer them to
userspace through a memory mapping.
Note that this is not a strict implementation of kcov, which is especially
tricky to do because we do not have the same coverage tools available in Go
that that are available for the actual Linux kernel. In Linux, a kernel
configuration is set that compiles the kernel with a custom function that is
called at the beginning of every basic block to write program counters to the
kcov memory mapping. In Go, however, coverage tools only give us a count of
basic blocks as they are executed. Every time we return to userspace, we
collect the coverage information and write out PCs for each block that was
executed, providing userspace with the illusion that the kcov data is always
up to date. For convenience, we also generate a unique synthetic PC for each
block instead of using actual PCs. Finally, we do not provide thread-specific
coverage data (each kcov instance only contains PCs executed by the thread
owning it); instead, we will supply data for any file specified by --
instrumentation_filter.
Also, fix issue in nogo that was causing pkg/coverage:coverage_nogo
compilation to fail.
PiperOrigin-RevId: 328426526
|
|
The debian rules are also moved to the top-level, since they
apply to binaries outside the //runsc directory.
Fixes #3665
PiperOrigin-RevId: 328379709
|
|
Unlike linux mount(2), OCI spec allows mounting on top of an existing
non-directory file.
PiperOrigin-RevId: 327914342
|
|
This lets us create "synthetic" mountpoint directories in ReadOnly mounts
during VFS setup.
Also add context.WithMountNamespace, as some filesystems (like overlay) require
a MountNamespace on ctx to handle vfs.Filesystem Operations.
PiperOrigin-RevId: 327874971
|
|
Refactored the recursive dir creation util in runsc/boot/vfs.go to be more
flexible.
PiperOrigin-RevId: 327719100
|
|
Updates #3494
PiperOrigin-RevId: 327548511
|
|
Replace mknod call with mknodat equivalent to protect
against symlink attacks. Also added Mknod tests.
Remove goferfs reliance on gofer to check for file
existence before creating a synthetic entry.
Updates #2923
PiperOrigin-RevId: 327544516
|
|
PiperOrigin-RevId: 327300635
|
|
Running garbage collection enqueues all finalizers, which are used by the
refs/refs_vfs2 packages to detect reference leaks. Note that even with GC,
there is no guarantee that all finalizers will be run before the program exits.
This is a best effort attempt to activate leak checks as much as possible.
Updates #3545.
PiperOrigin-RevId: 325834438
|
|
Earlier we were using NLink to decide if /tmp is empty or not. However, NLink
at best tells us about the number of subdirectories (via the ".." entries).
NLink = n + 2 for n subdirectories. But it does not tell us if the directory is
empty. There still might be non-directory files. We could also not rely on
NLink because host overlayfs always returned 1.
VFS1 uses Readdir to decide if the directory is empty. Used a similar approach.
We now use IterDirents to decide if the "/tmp" directory is empty.
Fixes #3369
PiperOrigin-RevId: 325554234
|
|
PiperOrigin-RevId: 325266487
|
|
Also removes `--profile-goroutine` because it's equivalent
to `debug --stacks`.
PiperOrigin-RevId: 325061502
|
|
The loader dup's stdio FD into stable FD's starting at a fixed
number. During tests, it's possible that the target FD is already
in use. Added check to error early so it's easier to debug failures.
Also bumped up the starting FD number to prevent collisions.
PiperOrigin-RevId: 324917299
|
|
context is passed to DecRef() and Release() which is
needed for SO_LINGER implementation.
PiperOrigin-RevId: 324672584
|
|
9P2000.L is silent as to how readdir RPCs interact with directory mutation. The
most performant option is for Treaddir with offset=0 to restart iteration,
avoiding needing to walk+open+clunk a new directory fid between invocations of
getdents64(2), and the VFS2 gofer client assumes this is the case. Make this
actually true for the runsc fsgofer.
Fixes #3344, #3345, #3355
PiperOrigin-RevId: 324090384
|
|
PiperOrigin-RevId: 324080111
|
|
PiperOrigin-RevId: 323638518
|
|
The bazel server was being started as the wrong user, leading to issues
where the container would suddenly exit during a build.
We can also simplify the waiting logic by starting the container in two
separate steps: those that must complete first, then the asynchronous bit.
PiperOrigin-RevId: 323391161
|
|
... when it is possible.
The guitar gVisorKernel*Workflow-s runs test with the local execution_method.
In this case, blaze runs test cases locally without sandboxes. This means
that all tests run in the same network namespace. We have a few tests which
use hard-coded network ports and they can fail if one of these port will be
used by someone else or by another test cases.
PiperOrigin-RevId: 323137254
|
|
Implement WalkGetAttr() to reuse the stat that is already
needed for Walk(). In addition, cache file QID, so it
doesn't need to stat the file to compute it.
open(2) time improved by 10%:
Baseline: 6780 ns
Change: 6083 ns
Also fixed file type which was not being set in all places.
PiperOrigin-RevId: 323102560
|
|
Allow FUSE filesystems to be mounted using libfuse.
The appropriate flags and mount options are parsed and
understood by fusefs.
|
|
Open tries to reuse the control file to save syscalls and
file descriptors when opening a file. However, when the
control file was opened using O_PATH (e.g. no file permission
to open readonly), Open() would not check for it.
PiperOrigin-RevId: 322821729
|
|
Updates #173
PiperOrigin-RevId: 322665518
|
|
PiperOrigin-RevId: 321449877
|
|
Now it calls pkt.Data.ToView() when writing the packet. This may require
copying when the packet is large, which puts the worse case in an even worse
situation.
This sent out in a separate preparation change as it requires syscall filter
changes. This change will be followed by the change for the adoption of the new
PacketHeader API.
PiperOrigin-RevId: 321447003
|
|
Much like the boot process, apply pdeathsig to the gofer for cases where
the sandbox lifecycle is attached to the parent (runsc run/do).
This isn't strictly necessary, as the gofer normally exits once the
sentry disappears, but this makes that extra reliable.
|
|
PiperOrigin-RevId: 321411758
|
|
- Combine process creation code that is shared between
root and subcontainer processes
- Move root container information into a struct for
clarity
Updates #2714
PiperOrigin-RevId: 321204798
|
|
PiperOrigin-RevId: 321053634
|
|
The go.mod dependency tree for the shim was somehow contradictory. After
resolving these issues (e.g. explicitly imported k8s 1.14, pulling a
specific dbus version), and adding all dependencies, the shim can now be
build as part of the regular bazel tree.
As part of this process, minor cleanup was done in all the source files:
headers were standardized (and include "The gVisor Authors" in addition
to the "The containerd Authors" if originally derived from containerd
sources), and comments were cleaned up to meet coding standards.
This change makes the containerd installation dynamic, so that multiple
versions can be tested, and drops the static installer for the VM image
itself.
This change also updates test/root/crictl_test.go and related utilities,
so that the containerd tests can be run on any version (and in cases
where it applies, they can be run on both v1 and v2 as parameterized
tests).
|
|
Adds a netns flag to runsc spec that allows users to specify a network
namespace path when creating a sample config.json file. Also, adds the ability
to specify the command arguments used when running the container.
This will make it easier for new users to create sample OCI bundles without
having to edit the config.json by hand.
PiperOrigin-RevId: 320486267
|
|
This change gates all FUSE commands (by gating /dev/fuse) behind a runsc
flag. In order to use FUSE commands, use the --fuse flag with the --vfs2
flag. Check if FUSE is enabled by running dmesg in the sandbox.
|
|
Container restart test is disabled for VFS2 for now.
Updates #1487
PiperOrigin-RevId: 320296401
|
|
PiperOrigin-RevId: 320281516
|
|
Removed VDSO dependency on VFS1.
Resolves #2921
PiperOrigin-RevId: 320122176
|
|
This change fixes a few things:
- creating sockets using mknod(2) is supported via vfs2
- fsgofer can create regular files via mknod(2)
- mode = 0 for mknod(2) will be interpreted as regular file in vfs2 as well
Updates #2923
PiperOrigin-RevId: 320074267
|
|
|
|
|