gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2018-10-15	Merge host.endpoint into host.ConnectedEndpoint	Ian Gudger
	host.endpoint contained duplicated logic from the sockerpair implementation and host.ConnectedEndpoint. Remove host.endpoint in favor of a host.ConnectedEndpoint wrapped in a socketpair end. PiperOrigin-RevId: 217240096 Change-Id: I4a3d51e3fe82bdf30e2d0152458b8499ab4c987c
2018-10-15	Clean up Rename and Unlink checks for EBUSY.	Nicolas Lacasse
	- Change Dirent.Busy => Dirent.isMountPoint. The function body is unchanged, and it is no longer exported. - fs.MayDelete now checks that the victim is not the process root. This aligns with Linux's namei.c:may_delete(). - Fix "is-ancestor" checks to actually compare all ancestors, not just the parents. - Fix handling of paths that end in dots, which are handled differently in Rename vs. Unlink. PiperOrigin-RevId: 217239274 Change-Id: I7a0eb768e70a1b2915017ce54f7f95cbf8edf1fb
2018-10-15	sentry: save fs.Dirent deleted info.	Zhaozhong Ni
	PiperOrigin-RevId: 217155458 Change-Id: Id3265b1ec784787039e2131c80254ac4937330c7
2018-10-12	runsc: Support retrieving MTU via netdevice ioctl.	Kevin Krakauer
	This enables ifconfig to display MTU. PiperOrigin-RevId: 216917021 Change-Id: Id513b23d9d76899bcb71b0b6a25036f41629a923
2018-10-11	Add String() method to AddressMask	Fabricio Voznika
	PiperOrigin-RevId: 216770391 Change-Id: Idcdc28b2fe9e1b0b63b8119d445f05a8bcbce81e
2018-10-11	Add client sanity checking for P9.	Adin Scannell
	This should reduce use-after-free errors and accidental close via create or remove. This change includes one functional fix as well: when closing via remove, the closed field was not set and the finalizer was not freed, so the file would have been clunked at some random point in the future. PiperOrigin-RevId: 216750000 Change-Id: Ice3292c6feb953ae97abac308afbafd2d9410402
2018-10-11	sentry: allow saving of unlinked files with open fds on virtual fs.	Zhaozhong Ni
	PiperOrigin-RevId: 216733414 Change-Id: I33cd3eb818f0c39717d6656fcdfff6050b37ebb0
2018-10-10	Add seccomp filter configuration to ptrace stubs.	Adin Scannell
	This is a defense-in-depth measure. If the sentry is compromised, this prevents system call injection to the stubs. There is some complexity with respect to ptrace and seccomp interactions, so this protection is not really available for kernel versions < 4.8; this is detected dynamically. Note that this also solves the vsyscall emulation issue by adding in appropriate trapping for those system calls. It does mean that a compromised sentry could theoretically inject these into the stub (ignoring the trap and resume, thereby allowing execution), but they are harmless. PiperOrigin-RevId: 216647581 Change-Id: Id06c232cbac1f9489b1803ec97f83097fcba8eb8
2018-10-10	Support for older Linux kernels without getrandom	Jonathan Giannuzzi
	Change-Id: I1fb9f5b47a264a7617912f6f56f995f3c4c5e578 PiperOrigin-RevId: 216591484
2018-10-10	Enforce message size limits and avoid host calls with too many iovecs	Michael Pratt
	Currently, in the face of FileMem fragmentation and a large sendmsg or recvmsg call, host sockets may pass > 1024 iovecs to the host, which will immediately cause the host to return EMSGSIZE. When we detect this case, use a single intermediate buffer to pass to the kernel, copying to/from the src/dst buffer. To avoid creating unbounded intermediate buffers, enforce message size checks and truncation w.r.t. the send buffer size. The same functionality is added to netstack unix sockets for feature parity. PiperOrigin-RevId: 216590198 Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
2018-10-10	When creating a new process group, add it to the session.	Nicolas Lacasse
	PiperOrigin-RevId: 216554791 Change-Id: Ia6b7a2e6eaad80a81b2a8f2e3241e93ebc2bda35
2018-10-09	Add new netstack metrics to the sentry	Ian Gudger
	PiperOrigin-RevId: 216431260 Change-Id: Ia6e5c8d506940148d10ff2884cf4440f470e5820
2018-10-09	Add memunit to sysinfo(2).	Brian Geffon
	Also properly add padding after Procs in the linux.Sysinfo structure. This will be implicitly padded to 64bits so we need to do the same. PiperOrigin-RevId: 216372907 Change-Id: I6eb6a27800da61d8f7b7b6e87bf0391a48fdb475
2018-10-08	Statfs Namelen should be NAME_MAX not PATH_MAX	Michael Pratt
	We accidentally set the wrong maximum. I've also added PATH_MAX and NAME_MAX to the linux abi package. PiperOrigin-RevId: 216221311 Change-Id: I44805fcf21508831809692184a0eba4cee469633
2018-10-08	Implement shared futexes.	Jamie Liu
	- Shared futex objects on shared mappings are represented by Mappable + offset, analogous to Linux's use of inode + offset. Add type futex.Key, and change the futex.Manager bucket API to use futex.Keys instead of addresses. - Extend the futex.Checker interface to be able to return Keys for memory mappings. It returns Keys rather than just mappings because whether the address or the target of the mapping is used in the Key depends on whether the mapping is MAP_SHARED or MAP_PRIVATE; this matters because using mapping target for a futex on a MAP_PRIVATE mapping causes it to stop working across COW-breaking. - futex.Manager.WaitComplete depends on atomic updates to futex.Waiter.addr to determine when it has locked the right bucket, which is much less straightforward for struct futex.Waiter.key. Switch to an atomically-accessed futex.Waiter.bucket pointer. - futex.Manager.Wake now needs to take a futex.Checker to resolve addresses for shared futexes. CLONE_CHILD_CLEARTID requires the exit path to perform a shared futex wakeup (Linux: kernel/fork.c:mm_release() => sys_futex(tsk->clear_child_tid, FUTEX_WAKE, ...)). This is a problem because futexChecker is in the syscalls/linux package. Move it to kernel. PiperOrigin-RevId: 216207039 Change-Id: I708d68e2d1f47e526d9afd95e7fed410c84afccf
2018-10-03	Fix panic if FIOASYNC callback is registered and triggered without target	Ian Gudger
	PiperOrigin-RevId: 215674589 Change-Id: I4f8871b64c570dc6da448d2fe351cec8a406efeb
2018-10-03	Implement TIOCSCTTY ioctl as a noop.	Nicolas Lacasse
	PiperOrigin-RevId: 215658757 Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
2018-10-03	Add S/R support for FIOASYNC	Ian Gudger
	PiperOrigin-RevId: 215655197 Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
2018-10-03	Add //pkg/sync:generic_atomicptr.	Jamie Liu
	PiperOrigin-RevId: 215620949 Change-Id: I519da4b44386d950443e5784fb8c48ff9a36c5d3
2018-10-02	Bump some timeouts in the image tests.	Nicolas Lacasse
	PiperOrigin-RevId: 215489101 Change-Id: Iaf96aa8edb1101b70548030c62995841215237d9
2018-10-01	runsc: Support job control signals in "exec -it".	Nicolas Lacasse
	Terminal support in runsc relies on host tty file descriptors that are imported into the sandbox. Application tty ioctls are sent directly to the host fd. However, those host tty ioctls are associated in the host kernel with a host process (in this case runsc), and the host kernel intercepts job control characters like ^C and send signals to the host process. Thus, typing ^C into a "runsc exec" shell will send a SIGINT to the runsc process. This change makes "runsc exec" handle all signals, and forward them into the sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is associated with a particular container process in the sandbox, the signal must be associated with the same container process. One big difficulty is that the signal should not necessarily be sent to the sandbox process started by "exec", but instead must be sent to the foreground process group for the tty. For example, we may exec "bash", and from bash call "sleep 100". A ^C at this point should SIGINT sleep, not bash. To handle this, tty files inside the sandbox must keep track of their foreground process group, which is set/get via ioctls. When an incoming ContainerSignal urpc comes in, we look up the foreground process group via the tty file. Unfortunately, this means we have to expose and cache the tty file in the Loader. Note that "runsc exec" now handles signals properly, but "runs run" does not. That will come in a later CL, as this one is complex enough already. Example: root@:/usr/local/apache2# sleep 100 ^C root@:/usr/local/apache2# sleep 100 ^Z [1]+ Stopped sleep 100 root@:/usr/local/apache2# fg sleep 100 ^C root@:/usr/local/apache2# PiperOrigin-RevId: 215334554 Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
2018-10-01	Add itimer types to linux package, strace	Michael Pratt
	PiperOrigin-RevId: 215278262 Change-Id: Icd10384c99802be6097be938196044386441e282
2018-10-01	Fix possible panic in control.Processes.	Nicolas Lacasse
	There was a race where we checked task.Parent() != nil, and then later called task.Parent() again, assuming that it is not nil. If the task is exiting, the parent may have been set to nil in between the two calls, causing a panic. This CL changes the code to only call task.Parent() once. PiperOrigin-RevId: 215274456 Change-Id: Ib5a537312c917773265ec72016014f7bc59a5f59
2018-09-28	Change tcpip.Route.Mask to tcpip.AddressMask.	Googler
	PiperOrigin-RevId: 214975659 Change-Id: I7bd31a2c54f03ff52203109da312e4206701c44c
2018-09-28	Require AF_UNIX sockets from the gofer	Michael Pratt
	host.endpoint already has the check, but it is missing from host.ConnectedEndpoint. PiperOrigin-RevId: 214962762 Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6
2018-09-28	Block for link address resolution	Sepehr Raissian
	Previously, if address resolution for UDP or Ping sockets required sending packets using Write in Transport layer, Resolve would return ErrWouldBlock and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution would run in background. Further calls to Write using same address would also return ErrNoLinkAddress until resolution has been completed successfully. Since Write is not allowed to block and System Calls need to be interruptible in System Call layer, the caller to Write is responsible for blocking upon return of ErrWouldBlock. Now, when startAddressResolution is called a notification channel for the completion of the address resolution is returned. The channel will traverse up to the calling function of Write as well as ErrNoLinkAddress. Once address resolution is complete (success or not) the channel is closed. The caller would call Write again to send packets and check if address resolution was compeleted successfully or not. Fixes google/gvisor#5 Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93 PiperOrigin-RevId: 214962200
2018-09-27	Forward ioctl(TCSETSF) calls on host ttys to the host kernel.	Nicolas Lacasse
	We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but discards pending input. The filters were relaxed to allow host ioctls with TCSETSF argument. This fixes programs like "passwd" that prevent user input from being displayed on the terminal. Before: root@b8a0240fc836:/# passwd Enter new UNIX password: 123 Retype new UNIX password: 123 passwd: password updated successfully After: root@ae6f5dabe402:/# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully PiperOrigin-RevId: 214869788 Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27	Implement 'runsc kill --all'	Fabricio Voznika
	In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27	netstack: make go:linkname work for all architectures	Anton Gyllenberg
	The //go:linkname directive requires the presence of assembly files in the package. Even an empty file will do. There was an empty assembly file commit_arm64.s, but that is limited to GOARCH=arm64. Renaming to empty.s will remove the unnecessary build constraint and allow building netstack for other architectures than amd64 and arm64. Without this, building directly with go (not bazel) for e.g., GOARCH=arm gives: sleep/sleep_unsafe.go:88:6: missing function body sleep/sleep_unsafe.go:91:6: missing function body Change-Id: I29d1d13e1ff31506a174d4595b8cd57fa58bf52b PiperOrigin-RevId: 214820299
2018-09-27	sentry: export cpuTime function.	Zhaozhong Ni
	PiperOrigin-RevId: 214798278 Change-Id: Id59d1ceb35037cda0689d3a1c4844e96c6957615
2018-09-26	Return correct parent PID	Fabricio Voznika
	Old code was returning ID of the thread that created the child process. It should be returning the ID of the parent process instead. PiperOrigin-RevId: 214720910 Change-Id: I95715c535bcf468ecf1ae771cccd04a4cd345b36
2018-09-26	Use the ICMP target address in responses	Tamir Duberstein
	There is a subtle bug that is the result of two changes made when upstreaming ICMPv6 support from Fuchsia: 1) ipv6.endpoint.WritePacket writes the local address it was initialized with, rather than the provided route's local address 2) ipv6.endpoint.handleICMP doesn't set its route's local address to the ICMP target address before writing the response The result is that the ICMP response erroneously uses the target ipv6 address (rather than icmp) as its source address in the response. When trying to debug this by fixing (2), we ran into problems with bad ipv6 checksums because (1) didn't respect the local address of the route being passed to it. This fixes both problems. PiperOrigin-RevId: 214650822 Change-Id: Ib6148bf432e6428d760ef9da35faef8e4b610d69
2018-09-26	Export ipv6 address helpers	Tamir Duberstein
	This is useful for Fuchsia. PiperOrigin-RevId: 214619681 Change-Id: If5a60dd82365c2eae51a12bbc819e5aae8c76ee9
2018-09-21	Remove unnecessary defer	Ian Gudger
	PiperOrigin-RevId: 214073949 Change-Id: I8fab916cd77362c13dac2c9dcf2ecc1710d87a5e
2018-09-21	Run gofmt -s on everything	Ian Gudger
	PiperOrigin-RevId: 214040901 Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21	Extend tcpip.Address.String to ipv6 addresses	Tamir Duberstein
	PiperOrigin-RevId: 214039349 Change-Id: Ia7d09c5f85eddd1e5634f3c21b0bd60b10be6bd2
2018-09-21	Deflake TestSimpleReceive	Tamir Duberstein
	...by increasing the allotted timeout and using direct comparison rather than reflect.DeepEqual (which should be faster). PiperOrigin-RevId: 214027024 Change-Id: I0a2690e65c7e14b4cc118c7312dbbf5267dc78bc
2018-09-21	Export read-only tcpip.Subnet.Mask	Tamir Duberstein
	PiperOrigin-RevId: 214023383 Change-Id: I5a7572f949840fb68a3ffb7342e6a3524bd00864
2018-09-19	Fix data race on tcp.endpoint.hardError in tcp.(*endpoint).Read	Ian Gudger
	tcp.endpoint.hardError is protected by tcp.endpoint.mu. PiperOrigin-RevId: 213730698 Change-Id: I4e4f322ac272b145b500b1a652fbee0c7b985be2
2018-09-19	Pass local link address to DeliverNetworkPacket	Bert Muthalaly
	This allows a NetworkDispatcher to implement transparent bridging, assuming all implementations of LinkEndpoint.WritePacket call eth.Encode with header.EthernetFields.SrcAddr set to the passed Route.LocalLinkAddress, if it is provided. PiperOrigin-RevId: 213686651 Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
2018-09-19	Fix RTT estimation when timestamp option is enabled.	Bhasker Hariharan
	From RFC7323#Section-4 The [RFC6298] RTT estimator has weighting factors, alpha and beta, based on an implicit assumption that at most one RTTM will be sampled per RTT. When multiple RTTMs per RTT are available to update the RTT estimator, an implementation SHOULD try to adhere to the spirit of the history specified in [RFC6298]. An implementation suggestion is detailed in Appendix G. From RFC7323#appendix-G Appendix G. RTO Calculation Modification Taking multiple RTT samples per window would shorten the history calculated by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a similar history as originally intended by [RFC6298]. It is roughly known how many samples a congestion window worth of data will yield, not accounting for ACK compression, and ACK losses. Such events will result in more history of the path being reflected in the final value for RTO, and are uncritical. This modification will ensure that a similar amount of time is taken into account for the RTO estimation, regardless of how many samples are taken per window: ExpectedSamples = ceiling(FlightSize / (SMSS * 2)) alpha' = alpha / ExpectedSamples beta' = beta / ExpectedSamples Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs". Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and beta' instead: RTTVAR <- (1 - beta') * RTTVAR + beta' * \|SRTT - R'\| SRTT <- (1 - alpha') * SRTT + alpha' * R' (for each sample R') PiperOrigin-RevId: 213644795 Change-Id: I52278b703540408938a8edb8c38be97b37f4a10e
2018-09-18	Short-circuit Readdir calls on overlay files when the dirent is frozen.	Nicolas Lacasse
	If we have an overlay file whose corresponding Dirent is frozen, then we should not bother calling Readdir on the upper or lower files, since DirentReaddir will calculate children based on the frozen Dirent tree. A test was added that fails without this change. PiperOrigin-RevId: 213531215 Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d
2018-09-18	Increase state test timeout	Michael Pratt
	PiperOrigin-RevId: 213519378 Change-Id: Iffdb987da3a7209a297ea2df171d2ae5fa9b2b34
2018-09-18	Allow for MSG_CTRUNC in input flags for recv.	Brian Geffon
	PiperOrigin-RevId: 213481363 Change-Id: I8150ea20cebeb207afe031ed146244de9209e745
2018-09-18	Provide better message when memfd_create fails with ENOSYS	Fabricio Voznika
	Updates #100 PiperOrigin-RevId: 213414821 Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37
2018-09-17	Remove memory usage static init	Fabricio Voznika
	panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17	Prevent TCP connect from picking bound ports	Tamir Duberstein
	PiperOrigin-RevId: 213387851 Change-Id: Icc6850761bc11afd0525f34863acd77584155140
2018-09-17	runsc: Enable waiting on exited processes.	Kevin Krakauer
	This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17	Allow kernel.(*Task).Block to accept an extract only channel	Ian Gudger
	PiperOrigin-RevId: 213328293 Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a
2018-09-17	Add empty .s file to allow `//go:linkname`	Tamir Duberstein
	This was previously broken in 212917409, resulting in "missing function body" compilation errors. PiperOrigin-RevId: 213323695 Change-Id: I32a95b76a1c73fd731f223062ec022318b979bd4