summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2018-10-10Add seccomp filter configuration to ptrace stubs.Adin Scannell
This is a defense-in-depth measure. If the sentry is compromised, this prevents system call injection to the stubs. There is some complexity with respect to ptrace and seccomp interactions, so this protection is not really available for kernel versions < 4.8; this is detected dynamically. Note that this also solves the vsyscall emulation issue by adding in appropriate trapping for those system calls. It does mean that a compromised sentry could theoretically inject these into the stub (ignoring the trap and resume, thereby allowing execution), but they are harmless. PiperOrigin-RevId: 216647581 Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
2018-10-10Removes irrelevant TODO.Kevin Krakauer
PiperOrigin-RevId: 216616873 Change-Id: I4d974ab968058eadd01542081e18a987ef08f50a
2018-10-10runsc: Pass controlling TTY by FD in the *new* process, not current process.Nicolas Lacasse
When setting Cmd.SysProcAttr.Ctty, the FD must be the FD of the controlling TTY in the new process, not the current process. The ioctl call is made after duping all FDs in Cmd.ExtraFiles, which may stomp on the old TTY FD. This fixes the "bad address" flakes in runsc/container:container_test, although some other flakes remain. PiperOrigin-RevId: 216594394 Change-Id: Idfd1677abb866aa82ad7e8be776f0c9087256862
2018-10-10Support for older Linux kernels without getrandomJonathan Giannuzzi
Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578 PiperOrigin-RevId: 216591484
2018-10-10Enforce message size limits and avoid host calls with too many iovecsMichael Pratt
Currently, in the face of FileMem fragmentation and a large sendmsg or recvmsg call, host sockets may pass > 1024 iovecs to the host, which will immediately cause the host to return EMSGSIZE. When we detect this case, use a single intermediate buffer to pass to the kernel, copying to/from the src/dst buffer. To avoid creating unbounded intermediate buffers, enforce message size checks and truncation w.r.t. the send buffer size. The same functionality is added to netstack unix sockets for feature parity. PiperOrigin-RevId: 216590198 Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
2018-10-10When creating a new process group, add it to the session.Nicolas Lacasse
PiperOrigin-RevId: 216554791 Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35
2018-10-10Add sandbox to cgroupFabricio Voznika
Sandbox creation uses the limits and reservations configured in the OCI spec and set cgroup options accordinly. Then it puts both the sandbox and gofer processes inside the cgroup. It also allows the cgroup to be pre-configured by the caller. If the cgroup already exists, sandbox and gofer processes will join the cgroup but it will not modify the cgroup with spec limits. PiperOrigin-RevId: 216538209 Change-Id: If2c65ffedf55820baab743a0edcfb091b89c1019
2018-10-09Add tests to verify gofer is chroot'edFabricio Voznika
PiperOrigin-RevId: 216472439 Change-Id: Ic4cb86c8e0a9cb022d3ceed9dc5615266c307cf9
2018-10-09Add new netstack metrics to the sentryIan Gudger
PiperOrigin-RevId: 216431260 Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820
2018-10-09Add memunit to sysinfo(2).Brian Geffon
Also properly add padding after Procs in the linux.Sysinfo structure. This will be implicitly padded to 64bits so we need to do the same. PiperOrigin-RevId: 216372907 Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
2018-10-08Job control signals must be sent to all processes in the FG process group.Nicolas Lacasse
We were previously only sending to the originator of the process group. Integration test was changed to test this behavior. It fails without the corresponding code change. PiperOrigin-RevId: 216297263 Change-Id: I7e41cfd6bdd067f4b9dc215e28f555fb5088916f
2018-10-08Uncapitalize errorMichael Pratt
PiperOrigin-RevId: 216281263 Change-Id: Ie0c189e7f5934b77c6302336723bc1181fd2866c
2018-10-08Statfs Namelen should be NAME_MAX not PATH_MAXMichael Pratt
We accidentally set the wrong maximum. I've also added PATH_MAX and NAME_MAX to the linux abi package. PiperOrigin-RevId: 216221311 Change-Id: I44805fcf21508831809692184a0eba4cee469633
2018-10-08Implement shared futexes.Jamie Liu
- Shared futex objects on shared mappings are represented by Mappable + offset, analogous to Linux's use of inode + offset. Add type futex.Key, and change the futex.Manager bucket API to use futex.Keys instead of addresses. - Extend the futex.Checker interface to be able to return Keys for memory mappings. It returns Keys rather than just mappings because whether the address or the target of the mapping is used in the Key depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this matters because using mapping target for a futex on a MAP_PRIVATE mapping causes it to stop working across COW-breaking. - futex.Manager.WaitComplete depends on atomic updates to futex.Waiter.addr to determine when it has locked the right bucket, which is much less straightforward for struct futex.Waiter.key. Switch to an atomically-accessed futex.Waiter.bucket pointer. - futex.Manager.Wake now needs to take a futex.Checker to resolve addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit path to perform a shared futex wakeup (Linux: kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid, FUTEX_WAKE, ...)). This is a problem because futexChecker is in the syscalls/linux package. Move it to kernel. PiperOrigin-RevId: 216207039 Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
2018-10-04Capture boot panics in debug log.Nicolas Lacasse
Docker and Containerd both eat the boot processes stderr, making it difficult to track down panics (which are always written to stderr). This CL makes the boot process dup its debug log FD to stderr, so that panics will be captured in the debug log, which is better than nothing. This is the 3rd try at this CL. Previous attempts were foiled because Docker expects the 'create' command to pass its stdio directly to the container, so duping stderr in 'create' caused the applications stderr to go to the log file, which breaks many applications (including our mysql test). I added a new image_test that makes sure stdout and stderr are handled correctly. PiperOrigin-RevId: 215767328 Change-Id: Icebac5a5dcf39b623b79d7a0e2f968e059130059
2018-10-03Fix sandbox chrootFabricio Voznika
Sandbox was setting chroot, but was not chaging the working dir. Added test to ensure this doesn't happen in the future. PiperOrigin-RevId: 215676270 Change-Id: I14352d3de64a4dcb90e50948119dc8328c9c15e1
2018-10-03Fix panic if FIOASYNC callback is registered and triggered without targetIan Gudger
PiperOrigin-RevId: 215674589 Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
2018-10-03Bump rules_go to v0.15.4 and go toolchain to v1.11.1.Nicolas Lacasse
PiperOrigin-RevId: 215664253 Change-Id: Ice2500e669194630c9d03903c35622afb92dcba5
2018-10-03Implement TIOCSCTTY ioctl as a noop.Nicolas Lacasse
PiperOrigin-RevId: 215658757 Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
2018-10-03Add S/R support for FIOASYNCIan Gudger
PiperOrigin-RevId: 215655197 Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
2018-10-03Automated rollback of changelist 215585559Nicolas Lacasse
PiperOrigin-RevId: 215633475 Change-Id: I7bc471e3b9a2c725fb5e15b3bbcba2ee1ea574b1
2018-10-03Add //pkg/sync:generic_atomicptr.Jamie Liu
PiperOrigin-RevId: 215620949 Change-Id: I519da4b44386d950443e5784fb8c48ff9a36c5d3
2018-10-03runsc: Allow state transition from Creating to Stopped.Nicolas Lacasse
This can happen if an error is encountered during Create() which causes the container to be destroyed and set to state Stopped. Without this transition, errors during Create get hidden by the later panic. PiperOrigin-RevId: 215599193 Change-Id: Icd3f42e12c685cbf042f46b3929bccdf30ad55b0
2018-10-03Fix arithmetic error in multi_container_test.Nicolas Lacasse
We add an additional (2^3)-1=7 processes, but the code was only waiting for 3. I switched back to Math.Pow format to make the arithmetic easier to inspect. PiperOrigin-RevId: 215588140 Change-Id: Iccad4d6f977c1bfc5c4b08d3493afe553fe25733
2018-10-03runsc: Dup debug log file to stderr, so sentry panics don't get lost.Nicolas Lacasse
Docker and containerd do not expose runsc's stderr, so tracking down sentry panics can be painful. If we have a debug log file, we should send panics (and all stderr data) to the log file. PiperOrigin-RevId: 215585559 Change-Id: I3844259ed0cd26e26422bcdb40dded302740b8b6
2018-10-03runsc: Pass root container's stdio via FD.Nicolas Lacasse
We were previously using the sandbox process's stdio as the root container's stdio. This makes it difficult/impossible to distinguish output application output from sandbox output, such as panics, which are always written to stderr. Also close the console socket when we are done with it. PiperOrigin-RevId: 215585180 Change-Id: I980b8c69bd61a8b8e0a496fd7bc90a06446764e0
2018-10-03Add TIOCINQ to allowed seccomp when hostinet is usedFabricio Voznika
PiperOrigin-RevId: 215574070 Change-Id: Ib36e804adebaf756adb9cbc2752be9789691530b
2018-10-02Bump some timeouts in the image tests.Nicolas Lacasse
PiperOrigin-RevId: 215489101 Change-Id: Iaf96aa8edb1101b70548030c62995841215237d9
2018-10-02Fix compilation bug.Nicolas Lacasse
Docker.Run only returns a single argument. PiperOrigin-RevId: 215427309 Change-Id: I1eebbc628853ca57f79d25e18d4f04dfa5a2a003
2018-10-01runsc: Support job control signals in "exec -it".Nicolas Lacasse
Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01Add itimer types to linux package, straceMichael Pratt
PiperOrigin-RevId: 215278262 Change-Id: Icd10384c99802be6097be938196044386441e282
2018-10-01Fix ruby image tests.Nicolas Lacasse
PiperOrigin-RevId: 215274663 Change-Id: I051721f459084db3aa608432831170cd47ae7df0
2018-10-01Fix possible panic in control.Processes.Nicolas Lacasse
There was a race where we checked task.Parent() != nil, and then later called task.Parent() again, assuming that it is not nil. If the task is exiting, the parent may have been set to nil in between the two calls, causing a panic. This CL changes the code to only call task.Parent() once. PiperOrigin-RevId: 215274456 Change-Id: Ib5a537312c917773265ec72016014f7bc59a5f59
2018-10-01Make multi-container the default mode for runscFabricio Voznika
And remove multicontainer option. PiperOrigin-RevId: 215236981 Change-Id: I9fd1d963d987e421e63d5817f91a25c819ced6cb
2018-09-30Don't fail if Root is readonly and is not a mount pointFabricio Voznika
This makes runsc more friendly to run without docker or K8s. PiperOrigin-RevId: 215165586 Change-Id: Id45a9fc24a3c09b1645f60dbaf70e64711a7a4cd
2018-09-30Removed duplicate/stale TODOsFabricio Voznika
PiperOrigin-RevId: 215162121 Change-Id: I35f06ac3235cf31c9e8a158dcf6261a7ded6c4c4
2018-09-28Add test for 'signall --all' with stopped containerFabricio Voznika
PiperOrigin-RevId: 215025517 Change-Id: I04b9d8022b3d9dfe279e466ddb91310b9860b9af
2018-09-28Made a few changes to make testutil.Docker easier to useFabricio Voznika
PiperOrigin-RevId: 215023376 Change-Id: I139569bd15c013e5dd0f60d0c98a64eaa0ba9e8e
2018-09-28runsc: allow `kill --all` when container is in stopped state.Lantao Liu
PiperOrigin-RevId: 215009105 Change-Id: I1ab12eddf7694c4db98f6dafca9dae352a33f7c4
2018-09-28Add ruby image testsFabricio Voznika
PiperOrigin-RevId: 215009066 Change-Id: I54ab920fa649cf4d0817f7cb8ea76f9126523330
2018-09-28Make runsc kill and delete more conformant to the "spec"Fabricio Voznika
PiperOrigin-RevId: 214976251 Change-Id: I631348c3886f41f63d0e77e7c4f21b3ede2ab521
2018-09-28Change tcpip.Route.Mask to tcpip.AddressMask.Googler
PiperOrigin-RevId: 214975659 Change-Id: I7bd31a2c54f03ff52203109da312e4206701c44c
2018-09-28Clarify CLA requirements and Gerrit errorMichael Pratt
Call out the error that Gerrit returns if there is no CLA on file. PiperOrigin-RevId: 214964718 Change-Id: I3d92e3eb73f178e8c4c52b5defbe8d21db536215
2018-09-28Require AF_UNIX sockets from the goferMichael Pratt
host.endpoint already has the check, but it is missing from host.ConnectedEndpoint. PiperOrigin-RevId: 214962762 Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6
2018-09-28Block for link address resolutionSepehr Raissian
Previously, if address resolution for UDP or Ping sockets required sending packets using Write in Transport layer, Resolve would return ErrWouldBlock and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution would run in background. Further calls to Write using same address would also return ErrNoLinkAddress until resolution has been completed successfully. Since Write is not allowed to block and System Calls need to be interruptible in System Call layer, the caller to Write is responsible for blocking upon return of ErrWouldBlock. Now, when startAddressResolution is called a notification channel for the completion of the address resolution is returned. The channel will traverse up to the calling function of Write as well as ErrNoLinkAddress. Once address resolution is complete (success or not) the channel is closed. The caller would call Write again to send packets and check if address resolution was compeleted successfully or not. Fixes google/gvisor#5 Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93 PiperOrigin-RevId: 214962200
2018-09-28Switch to root in userns when CAP_SYS_CHROOT is also missingFabricio Voznika
Some tests check current capabilities and re-run the tests as root inside userns if required capabibilities are missing. It was checking for CAP_SYS_ADMIN only, CAP_SYS_CHROOT is also required now. PiperOrigin-RevId: 214949226 Change-Id: Ic81363969fa76c04da408fae8ea7520653266312
2018-09-27Merge Loader.containerRootTGs and execProcess into a single mapFabricio Voznika
It's easier to manage a single map with processes that we're interested to track. This will make the next change to clean up the map on destroy easier. PiperOrigin-RevId: 214894210 Change-Id: I099247323a0487cd0767120df47ba786fac0926d
2018-09-27Move common test code to functionFabricio Voznika
PiperOrigin-RevId: 214890335 Change-Id: I42743f0ce46a5a42834133bce2f32d187194fc87
2018-09-27Forward ioctl(TCSETSF) calls on host ttys to the host kernel.Nicolas Lacasse
We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but discards pending input. The filters were relaxed to allow host ioctls with TCSETSF argument. This fixes programs like "passwd" that prevent user input from being displayed on the terminal. Before: root@b8a0240fc836:/# passwd Enter new UNIX password: 123 Retype new UNIX password: 123 passwd: password updated successfully After: root@ae6f5dabe402:/# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully PiperOrigin-RevId: 214869788 Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27Implement 'runsc kill --all'Fabricio Voznika
In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6