Age | Commit message (Collapse) | Author |
|
|
|
PiperOrigin-RevId: 407392578
|
|
|
|
PiperOrigin-RevId: 407177936
|
|
copy and setup PERMANENT (static) ARP entries
from CNI namespace to the sandbox
Fixes #3301
|
|
This is part of cgroupv2 patch set. Here we add a Cgroup interface that
both v1 and v2 need to conform to, and port cgroupv1 to use that first.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
|
|
|
|
gVisor was previously reporting the lower of cgroup limit or 2GB as total
memory. This may cause applications to make bad decisions based on amount
of memory available to them when more than 2GB is required.
This change makes the lower of cgroup limit or the host total memory to be
reported inside the sandbox. This also is more inline with docker which always
reports host total memory. Note that reporting cgroup limit is strictly better
than host total memory when there is a limit set.
Fixes #5608
PiperOrigin-RevId: 403241608
|
|
|
|
Add global flags -profile-{block,cpu,heap,mutex} and -trace which
enable collection of the specified profile for the entire duration of a
container execution. This provides a way to definitively start profiling
before that application starts, rather than attempting to race with an
out-of-band `runsc debug`.
Note that only the main boot process is profiled.
This exposed a bug in Task.traceExecEvent: a crash when tracing and
-race are enabled. traceExecEvent is called off of the task goroutine,
but uses the Task as a context, which is a violation of the Task
contract. Switching to the AsyncContext fixes the issue.
Fixes #220
|
|
|
|
Add Event controls and implement "stream" commands.
PiperOrigin-RevId: 390691702
|
|
|
|
Add Usage controls and implement "usage/usagefd" commands.
PiperOrigin-RevId: 390507423
|
|
|
|
Add Fs controls and implement "cat" command.
PiperOrigin-RevId: 388812540
|
|
|
|
Also change runsc pause/resume cmd to access Lifecycle instead of
containerManager.
PiperOrigin-RevId: 388534928
|
|
|
|
* First, we don't need to poll child processes.
* Second, the 5 seconds timeout is too small if a host is overloaded.
* Third, this can hide bugs in the code when we wait a process that
isn't going to exit.
PiperOrigin-RevId: 386337586
|
|
|
|
It was confusing to find functions relating to root and non-root
containers. Replace "non-root" and "subcontainer" and make naming
consistent in Sandbox and controller.
PiperOrigin-RevId: 384512518
|
|
|
|
Set stdio ownership based on the container's user to ensure the
user can open/read/write to/from stdios.
1. stdios in the host are changed to have the owner be the same
uid/gid of the process running the sandbox. This ensures that the
sandbox has full control over it.
2. stdios owner owner inside the sandbox is changed to match the
container's user to give access inside the container and make it
behave the same as runc.
Fixes #6180
PiperOrigin-RevId: 384347009
|
|
|
|
PiperOrigin-RevId: 384344990
|
|
|
|
The typical sequence of calls to start a container looks like this
ct, err := container.New(conf, containerArgs)
defer ct.Destroy()
ct.Start(conf)
ws, err := ct.Wait()
For the root container, ct.Destroy() kills the sandbox process. This
doesn't look like a right wait to stop it. For example, all ongoing rpc
calls are aborted in this case. If everything is going alright, we can
just wait and it will exit itself.
Reported-by: syzbot+084fca334720887441e7@syzkaller.appspotmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
|
|
|
|
PiperOrigin-RevId: 374981100
|
|
|
|
When loading cgroups for another process, `/proc/self` was used in
a few places, causing the end state to be a mix of the process
and self. This is now fixes to always use the proper `/proc/[pid]`
path.
Added net_prio and net_cls to the list of optional controllers. This
is to allow runsc to execute then these cgroups are disabled as long
as there are no net_prio and net_cls limits that need to be applied.
Deflake TestMultiContainerEvent.
Closes #5875
Closes #5887
PiperOrigin-RevId: 372242687
|
|
|
|
Add a coverage-report flag that will cause the sandbox to generate a coverage
report (with suffix .cov) in the debug log directory upon exiting. For the
report to be generated, runsc must have been built with the following Bazel
flags: `--collect_code_coverage --instrumentation_filter=...`.
With coverage reports, we should be able to aggregate results across all tests
to surface code coverage statistics for the project as a whole.
The report is simply a text file with each line representing a covered block
as `file:start_line.start_col,end_line.end_col`. Note that this is similar to
the format of coverage reports generated with `go test -coverprofile`,
although we omit the count and number of statements, which are not useful for
us.
Some simple ways of getting coverage reports:
bazel test <some_test> --collect_code_coverage \
--instrumentation_filter=//pkg/...
bazel build //runsc --collect_code_coverage \
--instrumentation_filter=//pkg/...
runsc -coverage-report=dir/ <other_flags> do ...
PiperOrigin-RevId: 368952911
|
|
|
|
PiperOrigin-RevId: 367446222
|
|
|
|
The syscall package has been deprecated in favor of golang.org/x/sys.
Note that syscall is still used in some places because the following don't seem
to have an equivalent in unix package:
- syscall.SysProcIDMap
- syscall.Credential
Updates #214
PiperOrigin-RevId: 361381490
|
|
|
|
Because we lack gVisor-internal cgroups, we take the CPU usage of the entire pod
and divide it proportionally according to sentry-internal usage stats.
This fixes `kubectl top pods`, which gets a pod's CPU usage by summing the usage
of its containers.
Addresses #172.
PiperOrigin-RevId: 355229833
|
|
|
|
These are primarily simplification and lint mistakes. However, minor
fixes are also included and tests added where appropriate.
PiperOrigin-RevId: 351425971
|
|
|
|
This includes minor fix-ups:
* Handle SIGTERM in runsc debug, to exit gracefully.
* Fix cmd.debug.go opening all profiles as RDONLY.
* Fix the test name in fio_test.go, and encode the block size in the test.
PiperOrigin-RevId: 350205718
|
|
|
|
This allows for a model of profiling when you can start collection, and
it will terminate when the sandbox terminates. Without this synchronous
call, it is effectively impossible to collect length blocking and mutex
profiles.
PiperOrigin-RevId: 349483418
|
|
|
|
fdbased endpoint was enabling fragment reassembly on the host AF_PACKET socket
to ensure that fragments are delivered inorder to the right dispatcher. But this
prevents fragments from being delivered to gvisor at all and makes testing of
gvisor's fragment reassembly code impossible.
The potential impact from this is minimal since IP Fragmentation is not really
that prevelant and in cases where we do get fragments we may deliver the
fragment out of order to the TCP layer as multiple network dispatchers may
process the fragments and deliver a reassembled fragment after the next packet
has been delivered to the TCP endpoint. While not desirable I believe the impact
from this is minimal due to low prevalence of fragmentation.
Also removed PktType and Hatype fields when binding the socket as these are not
used when binding. Its just confusing to have them specified.
See: https://man7.org/linux/man-pages/man7/packet.7.html
"Fields used for binding are
sll_family (should be AF_PACKET), sll_protocol, and sll_ifindex."
Fixes #5055
PiperOrigin-RevId: 346919439
|
|
|
|
Closes #4022
PiperOrigin-RevId: 343378647
|