Age | Commit message (Collapse) | Author |
|
host.endpoint contained duplicated logic from the sockerpair implementation and
host.ConnectedEndpoint. Remove host.endpoint in favor of a
host.ConnectedEndpoint wrapped in a socketpair end.
PiperOrigin-RevId: 217240096
Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c
|
|
- Change Dirent.Busy => Dirent.isMountPoint. The function body is unchanged,
and it is no longer exported.
- fs.MayDelete now checks that the victim is not the process root. This aligns
with Linux's namei.c:may_delete().
- Fix "is-ancestor" checks to actually compare all ancestors, not just the
parents.
- Fix handling of paths that end in dots, which are handled differently in
Rename vs. Unlink.
PiperOrigin-RevId: 217239274
Change-Id: I7a0eb768e70a1b2915017ce54f7f95cbf8edf1fb
|
|
We treat handle the boot process stdio separately from the application stdio
(which gets passed via flags), but we were still sending both to same place. As
a result, some logs that are written directly to os.Stderr by the boot process
were ending up in the application logs.
This CL starts sendind boot process stdio to the null device (since we don't
have any better options). The boot process is already configured to send all
logs (and panics) to the log file, so we won't miss anything important.
PiperOrigin-RevId: 217173020
Change-Id: I5ab980da037f34620e7861a3736ba09c18d73794
|
|
PiperOrigin-RevId: 217155458
Change-Id: Id3265b1ec784787039e2131c80254ac4937330c7
|
|
This enables ifconfig to display MTU.
PiperOrigin-RevId: 216917021
Change-Id: Id513b23d9d76899bcb71b0b6a25036f41629a923
|
|
The spec command is analygous to the 'runc spec' command and allows for
the convenient creation of a config.json file for users that don't have
runc handy.
Change-Id: Ifdfec37e023048ea461c32da1a9042a45b37d856
PiperOrigin-RevId: 216907826
|
|
This is done to further isolate the gofer from the host.
PiperOrigin-RevId: 216790991
Change-Id: Ia265b77e4e50f815d08f743a05669f9d75ad7a6f
|
|
PiperOrigin-RevId: 216780438
Change-Id: Ide637fe36f8d2a61fea9e5b16d1b3401f2540416
|
|
It's possible for Start() and Wait() calls to race, if the sandboxed
application is short-lived. If the application finishes before (or during) the
Wait RPC, then Wait will fail. In practice this looks like "connection
refused" or "EOF" errors when waiting for an RPC response.
This race is especially bad in tests, where we often run "true" inside a
sandbox.
This CL does a best-effort fix, by returning the sandbox exit status as the
container exit status. In most cases, these are the same.
This fixes the remaining flakes in runsc/container:container_test.
PiperOrigin-RevId: 216777793
Change-Id: I9dfc6e6ec885b106a736055bc7a75b2008dfff7a
|
|
PiperOrigin-RevId: 216770391
Change-Id: Idcdc28b2fe9e1b0b63b8119d445f05a8bcbce81e
|
|
This is a breaking change if you're using --debug-log-dir.
The fix is to replace it with --debug-log and add a '/' at
the end:
--debug-log-dir=/tmp/runsc ==> --debug-log=/tmp/runsc/
PiperOrigin-RevId: 216761212
Change-Id: I244270a0a522298c48115719fa08dad55e34ade1
|
|
This should reduce use-after-free errors and accidental close via create or
remove. This change includes one functional fix as well: when closing via
remove, the closed field was not set and the finalizer was not freed, so the
file would have been clunked at some random point in the future.
PiperOrigin-RevId: 216750000
Change-Id: Ice3292c6feb953ae97abac308afbafd2d9410402
|
|
Verify that cgroup is being properly set.
PiperOrigin-RevId: 216736137
Change-Id: I0e27fd604eca67e7dd2e3548dc372ca9cc416309
|
|
This change introduces a new flags to create/run called
--user-log. Logs to this files are visible to users and
are meant to help debugging problems with their images
and containers.
For now only unsupported syscalls are sent to this log,
and only minimum support was added. We can build more
infrastructure around it as needed.
PiperOrigin-RevId: 216735977
Change-Id: I54427ca194604991c407d49943ab3680470de2d0
|
|
PiperOrigin-RevId: 216733414
Change-Id: I33cd3eb818f0c39717d6656fcdfff6050b37ebb0
|
|
This is a defense-in-depth measure. If the sentry is compromised, this prevents
system call injection to the stubs. There is some complexity with respect to
ptrace and seccomp interactions, so this protection is not really available
for kernel versions < 4.8; this is detected dynamically.
Note that this also solves the vsyscall emulation issue by adding in
appropriate trapping for those system calls. It does mean that a compromised
sentry could theoretically inject these into the stub (ignoring the trap and
resume, thereby allowing execution), but they are harmless.
PiperOrigin-RevId: 216647581
Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
|
|
PiperOrigin-RevId: 216616873
Change-Id: I4d974ab968058eadd01542081e18a987ef08f50a
|
|
When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY
in the new process, not the current process. The ioctl call is made after
duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD.
This fixes the "bad address" flakes in runsc/container:container_test, although
some other flakes remain.
PiperOrigin-RevId: 216594394
Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862
|
|
Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578
PiperOrigin-RevId: 216591484
|
|
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.
When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.
To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.
PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
|
|
PiperOrigin-RevId: 216554791
Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35
|
|
Sandbox creation uses the limits and reservations configured in the
OCI spec and set cgroup options accordinly. Then it puts both the
sandbox and gofer processes inside the cgroup.
It also allows the cgroup to be pre-configured by the caller. If the
cgroup already exists, sandbox and gofer processes will join the
cgroup but it will not modify the cgroup with spec limits.
PiperOrigin-RevId: 216538209
Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
|
|
PiperOrigin-RevId: 216472439
Change-Id: Ic4cb86c8e0a9cb022d3ceed9dc5615266c307cf9
|
|
PiperOrigin-RevId: 216431260
Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820
|
|
Also properly add padding after Procs in the linux.Sysinfo
structure. This will be implicitly padded to 64bits so we
need to do the same.
PiperOrigin-RevId: 216372907
Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
|
|
We were previously only sending to the originator of the process group.
Integration test was changed to test this behavior. It fails without the
corresponding code change.
PiperOrigin-RevId: 216297263
Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
|
|
PiperOrigin-RevId: 216281263
Change-Id: Ie0c189e7f5934b77c6302336723bc1181fd2866c
|
|
We accidentally set the wrong maximum. I've also added PATH_MAX and
NAME_MAX to the linux abi package.
PiperOrigin-RevId: 216221311
Change-Id: I44805fcf21508831809692184a0eba4cee469633
|
|
- Shared futex objects on shared mappings are represented by Mappable +
offset, analogous to Linux's use of inode + offset. Add type
futex.Key, and change the futex.Manager bucket API to use futex.Keys
instead of addresses.
- Extend the futex.Checker interface to be able to return Keys for
memory mappings. It returns Keys rather than just mappings because
whether the address or the target of the mapping is used in the Key
depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this
matters because using mapping target for a futex on a MAP_PRIVATE
mapping causes it to stop working across COW-breaking.
- futex.Manager.WaitComplete depends on atomic updates to
futex.Waiter.addr to determine when it has locked the right bucket,
which is much less straightforward for struct futex.Waiter.key. Switch
to an atomically-accessed futex.Waiter.bucket pointer.
- futex.Manager.Wake now needs to take a futex.Checker to resolve
addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit
path to perform a shared futex wakeup (Linux:
kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid,
FUTEX_WAKE, ...)). This is a problem because futexChecker is in the
syscalls/linux package. Move it to kernel.
PiperOrigin-RevId: 216207039
Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
|
|
Docker and Containerd both eat the boot processes stderr, making it difficult
to track down panics (which are always written to stderr).
This CL makes the boot process dup its debug log FD to stderr, so that panics
will be captured in the debug log, which is better than nothing.
This is the 3rd try at this CL. Previous attempts were foiled because Docker
expects the 'create' command to pass its stdio directly to the container, so
duping stderr in 'create' caused the applications stderr to go to the log file,
which breaks many applications (including our mysql test).
I added a new image_test that makes sure stdout and stderr are handled
correctly.
PiperOrigin-RevId: 215767328
Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
|
|
Sandbox was setting chroot, but was not chaging the working
dir. Added test to ensure this doesn't happen in the future.
PiperOrigin-RevId: 215676270
Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
|
|
PiperOrigin-RevId: 215674589
Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
|
|
PiperOrigin-RevId: 215664253
Change-Id: Ice2500e669194630c9d03903c35622afb92dcba5
|
|
PiperOrigin-RevId: 215658757
Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
|
|
PiperOrigin-RevId: 215655197
Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
|
|
PiperOrigin-RevId: 215633475
Change-Id: I7bc471e3b9a2c725fb5e15b3bbcba2ee1ea574b1
|
|
PiperOrigin-RevId: 215620949
Change-Id: I519da4b44386d950443e5784fb8c48ff9a36c5d3
|
|
This can happen if an error is encountered during Create() which causes the
container to be destroyed and set to state Stopped.
Without this transition, errors during Create get hidden by the later panic.
PiperOrigin-RevId: 215599193
Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
|
|
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3.
I switched back to Math.Pow format to make the arithmetic easier to inspect.
PiperOrigin-RevId: 215588140
Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
|
|
Docker and containerd do not expose runsc's stderr, so tracking down sentry
panics can be painful.
If we have a debug log file, we should send panics (and all stderr data) to the
log file.
PiperOrigin-RevId: 215585559
Change-Id: I3844259ed0cd26e26422bcdb40dded302740b8b6
|
|
We were previously using the sandbox process's stdio as the root container's
stdio. This makes it difficult/impossible to distinguish output application
output from sandbox output, such as panics, which are always written to stderr.
Also close the console socket when we are done with it.
PiperOrigin-RevId: 215585180
Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
|
|
PiperOrigin-RevId: 215574070
Change-Id: Ib36e804adebaf756adb9cbc2752be9789691530b
|
|
PiperOrigin-RevId: 215489101
Change-Id: Iaf96aa8edb1101b70548030c62995841215237d9
|
|
Docker.Run only returns a single argument.
PiperOrigin-RevId: 215427309
Change-Id: I1eebbc628853ca57f79d25e18d4f04dfa5a2a003
|
|
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.
However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.
This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.
One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.
To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.
Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.
Example:
root@:/usr/local/apache2# sleep 100
^C
root@:/usr/local/apache2# sleep 100
^Z
[1]+ Stopped sleep 100
root@:/usr/local/apache2# fg
sleep 100
^C
root@:/usr/local/apache2#
PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
|
|
PiperOrigin-RevId: 215278262
Change-Id: Icd10384c99802be6097be938196044386441e282
|
|
PiperOrigin-RevId: 215274663
Change-Id: I051721f459084db3aa608432831170cd47ae7df0
|
|
There was a race where we checked task.Parent() != nil, and then later called
task.Parent() again, assuming that it is not nil. If the task is exiting, the
parent may have been set to nil in between the two calls, causing a panic.
This CL changes the code to only call task.Parent() once.
PiperOrigin-RevId: 215274456
Change-Id: Ib5a537312c917773265ec72016014f7bc59a5f59
|
|
And remove multicontainer option.
PiperOrigin-RevId: 215236981
Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
|
|
This makes runsc more friendly to run without docker or K8s.
PiperOrigin-RevId: 215165586
Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
|