Age | Commit message (Collapse) | Author |
|
Docker expects containers to be created before they are restored.
However, gVisor restoring requires specificactions regarding the kernel
and the file system. These actions were originally in booting the sandbox.
Now setting up the file system is deferred until a call to a call to
runsc start. In the restore case, the kernel is destroyed and a new kernel
is created in the same process, as we need the same process for Docker.
These changes required careful execution of concurrent processes which
required the use of a channel.
Full docker integration still needs the ability to restore into the same
container.
PiperOrigin-RevId: 205161441
Change-Id: Ie1d2304ead7e06855319d5dc310678f701bd099f
|
|
PiperOrigin-RevId: 204522004
Change-Id: I4819dc025f0a1df03ceaaba7951b1902d44562b3
|
|
Closing the control server will block until all open requests have completed.
If a control server method panics, we end up stuck because the defer'd Destroy
function will never return.
PiperOrigin-RevId: 204354676
Change-Id: I6bb1d84b31242d7c3f20d5334b1c966bd6a61dbf
|
|
PiperOrigin-RevId: 204196916
Change-Id: If632750fc6368acb835e22cfcee0ae55c8a04d16
|
|
80bdf8a4068de3ac4a73b6b61a0cdcfe3e3571af accidentally moved vdso into an
inner scope, never assigning the vdso variable passed to the Kernel and
thus skipping VDSO mappings.
Fix this and remove the ability for loadVDSO to skip VDSO mappings,
since tests that do so are gone.
PiperOrigin-RevId: 203169135
Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba
|
|
PiperOrigin-RevId: 203161098
Change-Id: Ia1904420cb3ee830899d24a4fe418bba6533be64
|
|
Add option to redirect packet back to netstack if it's destined to itself.
This fixes the problem where connecting to the local NIC address would
not work, e.g.:
echo bar | nc -l -p 8080 &
echo foo | nc 192.168.0.2 8080
PiperOrigin-RevId: 203157739
Change-Id: I31c9f7c501e3f55007f25e1852c27893a16ac6c4
|
|
The /proc and /sys mounts are "mandatory" in the sense that they should be
mounted in the sandbox even when they are not included in the spec. Runsc
treats /tmp similarly, because it is faster to use the internal tmpfs
implementation instead of proxying to the host.
However, the spec may contain submounts of these mandatory mounts (particularly
for /tmp). In those cases, we must mount our mandatory mounts before the
submount, otherwise the submount will be masked.
Since the mandatory mounts are all top-level directories, we can mount them
right after the root.
PiperOrigin-RevId: 203145635
Change-Id: Id69bae771d32c1a5b67e08c8131b73d9b42b2fbf
|
|
glibc's malloc also uses SYS_TIME. Permit it.
#0 0x0000000000de6267 in time ()
#1 0x0000000000db19d8 in get_nprocs ()
#2 0x0000000000d8a31a in arena_get2.part ()
#3 0x0000000000d8ab4a in malloc ()
#4 0x0000000000d3c6b5 in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 140737488355328ull, 0ul, __sanitizer::SizeClassMap<3ul, 4ul, 8ul, 17ul, 64ul, 14ul>, 20ul, __sanitizer::TwoLevelByteMap<32768ull, 4096ull, __sanitizer::NoOpMapUnmapCallback>, __sanitizer::NoOpMapUnmapCallback> >*, unsigned long) ()
#5 0x0000000000d4cd70 in __tsan_go_start ()
#6 0x00000000004617a3 in racecall ()
#7 0x00000000010f4ea0 in runtime.findfunctab ()
#8 0x000000000043f193 in runtime.racegostart ()
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
[mpratt@google.com: updated comments and commit message]
Signed-off-by: Michael Pratt <mpratt@google.com>
Change-Id: Ibe2d0dc3035bf5052d5fb802cfaa37c5e0e7a09a
PiperOrigin-RevId: 203042627
|
|
Closes #2
PiperOrigin-RevId: 202997196
Change-Id: I0c9f6f5a8a1abe1ae427bca5f590bdf9f82a6675
|
|
Updated how restoring occurs through boot.go with a separate Restore function.
This prevents a new process and new mounts from being created.
Added tests to ensure the container is restored.
Registered checkpoint and restore commands so they can be used.
Docker support for these commands is still limited.
Working on #80.
PiperOrigin-RevId: 202710950
Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
|
|
Users can now call "runsc wait <container id>" to wait on a particular process
inside the container. -pid can also be used to wait on a specific PID.
Manually tested the wait subcommand for a single waiter and multiple waiters
(simultaneously 2 processes waiting on the container and 2 processes waiting on
a PID within the container).
PiperOrigin-RevId: 202548978
Change-Id: Idd507c2cdea613c3a14879b51cfb0f7ea3fb3d4c
|
|
Closes #66
PiperOrigin-RevId: 202496258
Change-Id: Ib9287c5bf1279ffba1db21ebd9e6b59305cddf34
|
|
PiperOrigin-RevId: 202494747
Change-Id: I4d4a18e71468690b785060e580a5f83c616bd90f
|
|
PiperOrigin-RevId: 202185642
Change-Id: I2eefcc0b2ffadc6ef21d177a8a4ab0cda91f3399
|
|
PiperOrigin-RevId: 201995800
Change-Id: I770190d135e14ec7da4b3155009fe10121b2a502
|
|
PiperOrigin-RevId: 201978212
Change-Id: Ie3df1fd41d5293fff66b546a0c68c3bf98126067
|
|
PiperOrigin-RevId: 201742160
Change-Id: Ia9fa1442287c5f9e1196fb117c41536a80f6bb31
|
|
SIGUSR2 was being masked out to be used as a way to dump sentry
stacks. This could cause compatibility problems in cases anyone
uses SIGUSR2 to communicate with the container init process.
PiperOrigin-RevId: 201575374
Change-Id: I312246e828f38ad059139bb45b8addc2ed055d74
|
|
Before a container can be restored, the mounts must be configured.
The root and submounts and their key information is compiled into a
RestoreEnvironment.
Future code will be added to set this created environment before
restoring a container.
Tests to ensure the correct environment were added.
PiperOrigin-RevId: 201544637
Change-Id: Ia894a8b0f80f31104d1c732e113b1d65a4697087
|
|
PiperOrigin-RevId: 201539050
Change-Id: I36cbf270fa5ad25de507ecb919e4005eda6aa16d
|
|
It prints sandbox stacks to the log to help debug stuckness. I expect
that many more options will be added in the future.
PiperOrigin-RevId: 201405931
Change-Id: I87e560800cd5a5a7b210dc25a5661363c8c3a16e
|
|
Containers are created as processes in the sandbox. Of the many things that
don't work yet, the biggest issue is that the fsgofer is launched with its root
as the sandbox's root directory. Thus, when a container is started and wants to
read anything (including the init binary of the container), the gofer tries to
serve from sandbox's root (which basically just has pause), not the container's.
PiperOrigin-RevId: 201294560
Change-Id: I6423aa8830538959c56ae908ce067e4199d627b1
|
|
When running multi-container, child containers are added after the filters have
been installed. Thus, lstat must be in the set of allowed syscalls.
PiperOrigin-RevId: 201269550
Change-Id: I03f2e6675a53d462ed12a0f651c10049b76d4c52
|
|
Resume checks the status of the container and unpauses the kernel
if its status is paused. Otherwise nothing happens.
Tests were added to ensure that the process is in the correct state
after various commands.
PiperOrigin-RevId: 201251234
Change-Id: Ifd11b336c33b654fea6238738f864fcf2bf81e19
|
|
A file descriptor was added as a flag to boot so a state file can restore a
container that was checkpointed.
PiperOrigin-RevId: 201068699
Change-Id: I18e96069488ffa3add468861397f3877725544aa
|
|
PiperOrigin-RevId: 201018483
Change-Id: I52fe3d01c83c8a2f0e9275d9d88c37e46fa224a2
|
|
Like runc, the pause command will pause the processes of the given container.
It will set that container's status to "paused."
A resume command will be be added to unpause and continue running the process.
PiperOrigin-RevId: 200789624
Change-Id: I72a5d7813d90ecfc4d01cc252d6018855016b1ea
|
|
PiperOrigin-RevId: 200768923
Change-Id: I4b8da10bcac296e8171fe6754abec5aabfec5e65
|
|
The right number to use is the number of processors assigned to the cgroup. But until
we make the sandbox join the respective cgroup, just use the number of processors on
the host.
Closes #65, closes #66
PiperOrigin-RevId: 200725483
Change-Id: I34a566b1a872e26c66f56fa6e3100f42aaf802b1
|
|
golang.org/cl/108538 replaces pselect6 with nanosleep in runtime.usleep. Update
the filters accordingly.
PiperOrigin-RevId: 200574612
Change-Id: Ifb2296fcb3781518fc047aabbbffedb9ae488cd7
|
|
Boot loader tries to stat mount to determine whether it's a file or not. This
may file if the sandbox process doesn't have access to the file. Instead, add
overlay on top of file, which is better anyway since we don't want to propagate
changes to the host.
PiperOrigin-RevId: 200411261
Change-Id: I14222410e8bc00ed037b779a1883d503843ffebb
|
|
PiperOrigin-RevId: 200274828
Change-Id: I956703217df08d8650a881479b7ade8f9f119912
|
|
This is the first iteration of checkpoint that actually saves to a file.
Tests for checkpoint are included.
Ran into an issue when private unix sockets are enabled. An error message
was added for this case and the mutex state was set.
PiperOrigin-RevId: 200269470
Change-Id: I28d29a9f92c44bf73dc4a4b12ae0509ee4070e93
|
|
runsc now mounts the devpts filesystem, so you get a real terminal using
ssh+sshd.
PiperOrigin-RevId: 200244830
Change-Id: If577c805ad0138fda13103210fa47178d8ac6605
|
|
Unit tests call runsc directly now, so all command line arguments
are valid. On the other hand, enabling debug in the test binary
doesn't affect runsc. It needs to be set in the config.
PiperOrigin-RevId: 200237706
Change-Id: I0b5922db17f887f58192dbc2f8dd2fd058b76ec7
|
|
PiperOrigin-RevId: 199808391
Change-Id: Ib37a4fb6193dc85c1f93bc16769d6aa41854b9d4
|
|
Functionality for checkpoint is not complete, more to come.
PiperOrigin-RevId: 199500803
Change-Id: Iafb0fcde68c584270000fea898e6657a592466f7
|
|
PiperOrigin-RevId: 199175296
Change-Id: I694ad1cfa65572c92f77f22421fdcac818f44630
|
|
PiperOrigin-RevId: 198919043
Change-Id: I7f1f0a3b3430cd0936a4ee4fc6859aab71820bdf
|
|
Container user might not have enough priviledge to walk directories and
mount filesystems. Instead, create superuser to perform these steps of
the configuration.
PiperOrigin-RevId: 197953667
Change-Id: I643650ab654e665408e2af1b8e2f2aa12d58d4fb
|
|
PiperOrigin-RevId: 197058289
Change-Id: I3946c25028b7e032be4894d61acb48ac0c24d574
|
|
This is another step towards multi-container support.
Previously, we delivered signals directly to the sandbox process (which then
forwarded the signal to PID 1 inside the sandbox). Similarly, we waited on a
container by waiting on the sandbox process itself. This approach will not work
when there are multiple containers inside the sandbox, and we need to
signal/wait on individual containers.
This CL adds two new messages, ContainerSignal and ContainerWait. These
messages include the id of the container to signal/wait. The controller inside
the sandbox receives these messages and signals/waits on the appropriate
process inside the sandbox.
The container id is plumbed into the sandbox, but it currently is not used. We
still end up signaling/waiting on PID 1 in all cases. Once we actually have
multiple containers inside the sandbox, we will need to keep some sort of map
of container id -> pid (or possibly pid namespace), and signal/kill the
appropriate process for the container.
PiperOrigin-RevId: 197028366
Change-Id: I07b4d5dc91ecd2affc1447e6b4bdd6b0b7360895
|
|
This is a necessary prerequisite for supporting multiple containers in a single
sandbox.
All the commands (in cmd package) now call operations on Containers (container
package). When a Container first starts, it will create a Sandbox with the same
ID.
The Sandbox class is now simpler, as it only knows how to create boot/gofer
processes, and how to forward commands into the running boot process.
There are TODOs sprinkled around for additional support for multiple
containers. Most notably, we need to detect when a container is intended to run
in an existing sandbox (by reading the metadata), and then have some way to
signal to the sandbox to start a new container. Other urpc calls into the
sandbox need to pass the container ID, so the sandbox can run the operation on
the given container. These are only half-plummed through right now.
PiperOrigin-RevId: 196688269
Change-Id: I1ecf4abbb9dd8987a53ae509df19341aaf42b5b0
|
|
When file is backed by host FD, atime and mtime for the host file and the
cached attributes in the Sentry must be close together. In this case,
the call to update atime and mtime can be skipped. This is important when
host filesystem is using overlay because updating atime and mtime explicitly
forces a copy up for every file that is touched.
PiperOrigin-RevId: 196176413
Change-Id: I3933ea91637a071ba2ea9db9d8ac7cdba5dc0482
|
|
Two changes in this CL:
First, make the "boot" process sleep when it encounters an error to give the
controller time to send the error back to the "start" process. Otherwise the
"boot" process exits immediately and the control connection errors with EOF.
Secondly, open the log file with O_APPEND, not O_TRUNC. Docker uses the same
log file for all runtime commands, and setting O_TRUNC causes them to get
destroyed. Furthermore, containerd parses these log files in the event of an
error, and it does not like the file being truncated out from underneath it.
Now, when trying to run a binary that does not exist in the image, the error
message is more reasonable:
$ docker run alpine /not/found
docker: Error response from daemon: OCI runtime start failed: /usr/local/google/docker/runtimes/runscd did not terminate sucessfully: error starting sandbox: error starting application [/not/found]: failed to create init process: no such file or directory
Fixes #32
PiperOrigin-RevId: 196027084
Change-Id: Iabc24c0bdd8fc327237acc051a1655515f445e68
|
|
There was a typo and one new capability missing from the list
PiperOrigin-RevId: 195427713
Change-Id: I6d9e1c6e77b48fe85ef10d9f54c70c8a7271f6e7
|
|
PiperOrigin-RevId: 195049322
Change-Id: I09f6dd58cf10a2e50e53d17d2823d540102913c5
|
|
PiperOrigin-RevId: 195047018
Change-Id: I6d99528a00a2125f414e1e51e067205289ec9d3d
|
|
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
|