gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2018-09-28	Block for link address resolution	Sepehr Raissian
	Previously, if address resolution for UDP or Ping sockets required sending packets using Write in Transport layer, Resolve would return ErrWouldBlock and Write would return ErrNoLinkAddress. Meanwhile startAddressResolution would run in background. Further calls to Write using same address would also return ErrNoLinkAddress until resolution has been completed successfully. Since Write is not allowed to block and System Calls need to be interruptible in System Call layer, the caller to Write is responsible for blocking upon return of ErrWouldBlock. Now, when startAddressResolution is called a notification channel for the completion of the address resolution is returned. The channel will traverse up to the calling function of Write as well as ErrNoLinkAddress. Once address resolution is complete (success or not) the channel is closed. The caller would call Write again to send packets and check if address resolution was compeleted successfully or not. Fixes google/gvisor#5 Change-Id: Idafaf31982bee1915ca084da39ae7bd468cebd93 PiperOrigin-RevId: 214962200
2018-09-27	Forward ioctl(TCSETSF) calls on host ttys to the host kernel.	Nicolas Lacasse
	We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but discards pending input. The filters were relaxed to allow host ioctls with TCSETSF argument. This fixes programs like "passwd" that prevent user input from being displayed on the terminal. Before: root@b8a0240fc836:/# passwd Enter new UNIX password: 123 Retype new UNIX password: 123 passwd: password updated successfully After: root@ae6f5dabe402:/# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully PiperOrigin-RevId: 214869788 Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
2018-09-27	Implement 'runsc kill --all'	Fabricio Voznika
	In order to implement kill --all correctly, the Sentry needs to track all tasks that belong to a given container. This change introduces ContainerID to the task, that gets inherited by all children. 'kill --all' then iterates over all tasks comparing the ContainerID field to find all processes that need to be signalled. PiperOrigin-RevId: 214841768 Change-Id: I693b2374be8692d88cc441ef13a0ae34abf73ac6
2018-09-27	netstack: make go:linkname work for all architectures	Anton Gyllenberg
	The //go:linkname directive requires the presence of assembly files in the package. Even an empty file will do. There was an empty assembly file commit_arm64.s, but that is limited to GOARCH=arm64. Renaming to empty.s will remove the unnecessary build constraint and allow building netstack for other architectures than amd64 and arm64. Without this, building directly with go (not bazel) for e.g., GOARCH=arm gives: sleep/sleep_unsafe.go:88:6: missing function body sleep/sleep_unsafe.go:91:6: missing function body Change-Id: I29d1d13e1ff31506a174d4595b8cd57fa58bf52b PiperOrigin-RevId: 214820299
2018-09-27	sentry: export cpuTime function.	Zhaozhong Ni
	PiperOrigin-RevId: 214798278 Change-Id: Id59d1ceb35037cda0689d3a1c4844e96c6957615
2018-09-26	Return correct parent PID	Fabricio Voznika
	Old code was returning ID of the thread that created the child process. It should be returning the ID of the parent process instead. PiperOrigin-RevId: 214720910 Change-Id: I95715c535bcf468ecf1ae771cccd04a4cd345b36
2018-09-26	Use the ICMP target address in responses	Tamir Duberstein
	There is a subtle bug that is the result of two changes made when upstreaming ICMPv6 support from Fuchsia: 1) ipv6.endpoint.WritePacket writes the local address it was initialized with, rather than the provided route's local address 2) ipv6.endpoint.handleICMP doesn't set its route's local address to the ICMP target address before writing the response The result is that the ICMP response erroneously uses the target ipv6 address (rather than icmp) as its source address in the response. When trying to debug this by fixing (2), we ran into problems with bad ipv6 checksums because (1) didn't respect the local address of the route being passed to it. This fixes both problems. PiperOrigin-RevId: 214650822 Change-Id: Ib6148bf432e6428d760ef9da35faef8e4b610d69
2018-09-26	Export ipv6 address helpers	Tamir Duberstein
	This is useful for Fuchsia. PiperOrigin-RevId: 214619681 Change-Id: If5a60dd82365c2eae51a12bbc819e5aae8c76ee9
2018-09-21	Remove unnecessary defer	Ian Gudger
	PiperOrigin-RevId: 214073949 Change-Id: I8fab916cd77362c13dac2c9dcf2ecc1710d87a5e
2018-09-21	Run gofmt -s on everything	Ian Gudger
	PiperOrigin-RevId: 214040901 Change-Id: I74d79497a053da3624921ad2b7c5193ca4a87942
2018-09-21	Extend tcpip.Address.String to ipv6 addresses	Tamir Duberstein
	PiperOrigin-RevId: 214039349 Change-Id: Ia7d09c5f85eddd1e5634f3c21b0bd60b10be6bd2
2018-09-21	Deflake TestSimpleReceive	Tamir Duberstein
	...by increasing the allotted timeout and using direct comparison rather than reflect.DeepEqual (which should be faster). PiperOrigin-RevId: 214027024 Change-Id: I0a2690e65c7e14b4cc118c7312dbbf5267dc78bc
2018-09-21	Export read-only tcpip.Subnet.Mask	Tamir Duberstein
	PiperOrigin-RevId: 214023383 Change-Id: I5a7572f949840fb68a3ffb7342e6a3524bd00864
2018-09-19	Fix data race on tcp.endpoint.hardError in tcp.(*endpoint).Read	Ian Gudger
	tcp.endpoint.hardError is protected by tcp.endpoint.mu. PiperOrigin-RevId: 213730698 Change-Id: I4e4f322ac272b145b500b1a652fbee0c7b985be2
2018-09-19	Pass local link address to DeliverNetworkPacket	Bert Muthalaly
	This allows a NetworkDispatcher to implement transparent bridging, assuming all implementations of LinkEndpoint.WritePacket call eth.Encode with header.EthernetFields.SrcAddr set to the passed Route.LocalLinkAddress, if it is provided. PiperOrigin-RevId: 213686651 Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
2018-09-19	Fix RTT estimation when timestamp option is enabled.	Bhasker Hariharan
	From RFC7323#Section-4 The [RFC6298] RTT estimator has weighting factors, alpha and beta, based on an implicit assumption that at most one RTTM will be sampled per RTT. When multiple RTTMs per RTT are available to update the RTT estimator, an implementation SHOULD try to adhere to the spirit of the history specified in [RFC6298]. An implementation suggestion is detailed in Appendix G. From RFC7323#appendix-G Appendix G. RTO Calculation Modification Taking multiple RTT samples per window would shorten the history calculated by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a similar history as originally intended by [RFC6298]. It is roughly known how many samples a congestion window worth of data will yield, not accounting for ACK compression, and ACK losses. Such events will result in more history of the path being reflected in the final value for RTO, and are uncritical. This modification will ensure that a similar amount of time is taken into account for the RTO estimation, regardless of how many samples are taken per window: ExpectedSamples = ceiling(FlightSize / (SMSS * 2)) alpha' = alpha / ExpectedSamples beta' = beta / ExpectedSamples Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs". Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and beta' instead: RTTVAR <- (1 - beta') * RTTVAR + beta' * \|SRTT - R'\| SRTT <- (1 - alpha') * SRTT + alpha' * R' (for each sample R') PiperOrigin-RevId: 213644795 Change-Id: I52278b703540408938a8edb8c38be97b37f4a10e
2018-09-18	Short-circuit Readdir calls on overlay files when the dirent is frozen.	Nicolas Lacasse
	If we have an overlay file whose corresponding Dirent is frozen, then we should not bother calling Readdir on the upper or lower files, since DirentReaddir will calculate children based on the frozen Dirent tree. A test was added that fails without this change. PiperOrigin-RevId: 213531215 Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d
2018-09-18	Increase state test timeout	Michael Pratt
	PiperOrigin-RevId: 213519378 Change-Id: Iffdb987da3a7209a297ea2df171d2ae5fa9b2b34
2018-09-18	Allow for MSG_CTRUNC in input flags for recv.	Brian Geffon
	PiperOrigin-RevId: 213481363 Change-Id: I8150ea20cebeb207afe031ed146244de9209e745
2018-09-18	Provide better message when memfd_create fails with ENOSYS	Fabricio Voznika
	Updates #100 PiperOrigin-RevId: 213414821 Change-Id: I90c2e6c18c54a6afcd7ad6f409f670aa31577d37
2018-09-17	Remove memory usage static init	Fabricio Voznika
	panic() during init() can be hard to debug. Updates #100 PiperOrigin-RevId: 213391932 Change-Id: Ic103f1981c5b48f1e12da3b42e696e84ffac02a9
2018-09-17	Prevent TCP connect from picking bound ports	Tamir Duberstein
	PiperOrigin-RevId: 213387851 Change-Id: Icc6850761bc11afd0525f34863acd77584155140
2018-09-17	runsc: Enable waiting on exited processes.	Kevin Krakauer
	This makes `runsc wait` behave more like waitpid()/wait4() in that: - Once a process has run to completion, you can wait on it and get its exit code. - Processes not waited on will consume memory (like a zombie process) PiperOrigin-RevId: 213358916 Change-Id: I5b5eca41ce71eea68e447380df8c38361a4d1558
2018-09-17	Allow kernel.(*Task).Block to accept an extract only channel	Ian Gudger
	PiperOrigin-RevId: 213328293 Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a
2018-09-17	Add empty .s file to allow `//go:linkname`	Tamir Duberstein
	This was previously broken in 212917409, resulting in "missing function body" compilation errors. PiperOrigin-RevId: 213323695 Change-Id: I32a95b76a1c73fd731f223062ec022318b979bd4
2018-09-17	Implement packet forwarding to enable NAT	Tamir Duberstein
	PiperOrigin-RevId: 213323501 Change-Id: I0996ddbdcf097588745efe35481085d42dbaf446
2018-09-17	Allow NULL data in mount(2)	Michael Pratt
	PiperOrigin-RevId: 213315267 Change-Id: I7562bcd81fb22e90aa9c7dd9eeb94803fcb8c5af
2018-09-14	Avoid reuse of pending SignalInfo objects	newmanwang
	runApp.execute -> Task.SendSignal -> sendSignalLocked -> sendSignalTimerLocked -> pendingSignals.enqueue assumes that it owns the arch.SignalInfo returned from platform.Context.Switch. On the other hand, ptrace.context.Switch assumes that it owns the returned SignalInfo and can safely reuse it on the next call to Switch. The KVM platform always returns a unique SignalInfo. This becomes a problem when the returned signal is not immediately delivered, allowing a future signal in Switch to change the previous pending SignalInfo. This is noticeable in #38 when external SIGINTs are delivered from the PTY slave FD. Note that the ptrace stubs are in the same process group as the sentry, so they are eligible to receive the PTY signals. This should probably change, but is not the only possible cause of this bug. Updates #38 Original change by newmanwang <wcs1011@gmail.com>, updated by Michael Pratt <mpratt@google.com>. Change-Id: I5383840272309df70a29f67b25e8221f933622cd PiperOrigin-RevId: 213071072
2018-09-14	Remove buffer.Prependable.UsedBytes	Tamir Duberstein
	It is the same as buffer.Prependable.View. PiperOrigin-RevId: 213064166 Change-Id: Ib33b8a2c4da864209d9a0be0a1c113be10b520d3
2018-09-14	Reuse readlink parameter, add sockaddr max.	Michael Pratt
	PiperOrigin-RevId: 213058623 Change-Id: I522598c655d633b9330990951ff1c54d1023ec29
2018-09-14	Pass buffer.Prependable by value	Tamir Duberstein
	PiperOrigin-RevId: 213053370 Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562
2018-09-14	Make gVisor hard link check match Linux's.	Nicolas Lacasse
	Linux permits hard-linking if the target is owned by the user OR the target has Read+Write permission. PiperOrigin-RevId: 213024613 Change-Id: If642066317b568b99084edd33ee4e8822ec9cbb3
2018-09-14	Fix interaction between rt_sigtimedwait and ignored signals.	Jamie Liu
	PiperOrigin-RevId: 213011782 Change-Id: I716c6ea3c586b0c6c5a892b6390d2d11478bc5af
2018-09-13	platform/kvm: Get max vcpu number dynamically by ioctl	Chenggang
	The old kernel version, such as 4.4, only support 255 vcpus. While gvisor is ran on these kernels, it could panic because the vcpu id and vcpu number beyond max_vcpus. Use ioctl(vmfd, _KVM_CHECK_EXTENSION, _KVM_CAP_MAX_VCPUS) to get max vcpus number dynamically. Change-Id: I50dd859a11b1c2cea854a8e27d4bf11a411aa45c PiperOrigin-RevId: 212929704
2018-09-13	Plumb monotonic time to netstack	Ian Gudger
	Netstack needs to be portable, so this seems to be preferable to using raw system calls. PiperOrigin-RevId: 212917409 Change-Id: I7b2073e7db4b4bf75300717ca23aea4c15be944c
2018-09-13	Extend memory usage events to report mapped memory usage.	Rahat Mahmood
	PiperOrigin-RevId: 212887555 Change-Id: I3545383ce903cbe9f00d9b5288d9ef9a049b9f4f
2018-09-13	Format struct itimerspec	Michael Pratt
	PiperOrigin-RevId: 212874745 Change-Id: I0c3e8e6a9e8976631cee03bf0b8891b336ddb8c8
2018-09-13	initArgs must hold a reference on the Root if it is not nil.	Nicolas Lacasse
	The contract in ExecArgs says that a reference on ExecArgs.Root must be held for the lifetime of the struct, but the caller is free to drop the ref after that. As a result, proc.Exec must take an additional ref on Root when it constructs the CreateProcessArgs, since that holds a pointer to Root as well. That ref is dropped in CreateProcess. PiperOrigin-RevId: 212828348 Change-Id: I7f44a612f337ff51a02b873b8a845d3119408707
2018-09-12	Always pass buffer.VectorisedView by value	Tamir Duberstein
	PiperOrigin-RevId: 212757571 Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
2018-09-12	Add multicast support	Tamir Duberstein
	PiperOrigin-RevId: 212750821 Change-Id: I822fd63e48c684b45fd91f9ce057867b7eceb792
2018-09-12	compressio: stop worker-pool reference / dependency loop.	Zhaozhong Ni
	PiperOrigin-RevId: 212732300 Change-Id: I9a0b9b7c28e7b7439d34656dd4f2f6114d173e22
2018-09-12	runsc: Add exec flag that specifies where to save the sandbox-internal pid.	Kevin Krakauer
	This is different from the existing -pid-file flag, which saves a host pid. PiperOrigin-RevId: 212713968 Change-Id: I2c486de8dd5cfd9b923fb0970165ef7c5fc597f0
2018-09-12	Prevent UDP sockets from binding to bound ports	Tamir Duberstein
	PiperOrigin-RevId: 212653818 Change-Id: Ib4e1d754d9cdddeaa428a066cb675e6ec44d91ad
2018-09-11	platform: Pass device fd into platform constructor.	Nicolas Lacasse
	We were previously openining the platform device (i.e. /dev/kvm) inside the platfrom constructor (i.e. kvm.New). This requires that we have RW access to the platform device when constructing the platform. However, now that the runsc sandbox process runs as user "nobody", it is not able to open the platform device. This CL changes the kvm constructor to take the platform device FD, rather than opening the device file itself. The device file is opened outside of the sandbox and passed to the sandbox process. PiperOrigin-RevId: 212505804 Change-Id: I427e1d9de5eb84c84f19d513356e1bb148a52910
2018-09-10	Map committed chunks concurrently in FileMem.LoadFrom.	Jamie Liu
	PiperOrigin-RevId: 212345401 Change-Id: Iac626ee87ba312df88ab1019ade6ecd62c04c75c
2018-09-10	Allow '/dev/zero' to be mapped with unaligned length	Fabricio Voznika
	PiperOrigin-RevId: 212321271 Change-Id: I79d71c2e6f4b8fcd3b9b923fe96c2256755f4c48
2018-09-10	Simplify some code in VectorisedView#ToView.	Bert Muthalaly
	PiperOrigin-RevId: 212317717 Change-Id: Ic77449c53bf2f8be92c9f0a7a726c45bd35ec435
2018-09-07	Update cleanup TODO	Michael Pratt
	PiperOrigin-RevId: 212068327 Change-Id: I3f360cdf7d6caa1c96fae68ae3a1caaf440f0cbe
2018-09-07	runsc: Support multi-container exec.	Nicolas Lacasse
	We must use a context.Context with a Root Dirent that corresponds to the container's chroot. Previously we were using the root context, which does not have a chroot. Getting the correct context required refactoring some of the path-lookup code. We can't lookup the path without a context.Context, which requires kernel.CreateProcArgs, which we only get inside control.Execute. So we have to do the path lookup much later than we previously were. PiperOrigin-RevId: 212064734 Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
2018-09-07	Add 'Starting gVisor...' message to syslog	Fabricio Voznika
	This allows applications to verify they are running with gVisor. It also helps debugging when running with a mix of container runtimes. Closes #54 PiperOrigin-RevId: 212059457 Change-Id: I51d9595ee742b58c1f83f3902ab2e2ecbd5cedec