gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-01-14	Simplify the pipe implementation.	Jamie Liu
	- Remove the pipe package's dependence on the buffer package, which becomes unused as a result. The buffer package is currently intended to serve two use cases, pipes and temporary buffers, and does neither optimally as a result; this change facilitates retooling the buffer package to better serve the latter. - Pass callbacks taking safemem.BlockSeq to the internal pipe I/O methods, which makes most callbacks trivial. - Fix VFS1's splice() and tee() to immediately return if a pipe returns a partial write. PiperOrigin-RevId: 351911375
2021-01-12	Fix simple mistakes identified by goreportcard.	Adin Scannell
	These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2021-01-07	Implement the semtimedop syscall	Andrei Vagin
	Signed-off-by: Andrei Vagin <avagin@gmail.com>
2021-01-05	Fix panic when parsing SO_TIMESTAMP cmsg	Kevin Krakauer
	PiperOrigin-RevId: 350223482
2020-12-31	Add missing error checks for FileDescription.Init.	Dean Deng
	Syzkaller discovered this bug in pipefs by doing something quite strange: creat(&(0x7f0000002a00)='./file1\x00', 0x0) mount(&(0x7f0000000440)=ANY=[], &(0x7f00000002c0)='./file1\x00', &(0x7f0000000300)='devtmpfs\x00', 0x20000d, 0x0) creat(&(0x7f0000000000)='./file1/file0\x00', 0x0) This can be reproduced with: touch mymount mkfifo /dev/mypipe mount -o ro -t devtmpfs devtmpfs mymount echo 123 > mymount/mypipe PiperOrigin-RevId: 349687714
2020-12-23	vfs1: don't allow to open socket files	Andrei Vagin
	open() has to return ENXIO in this case. O_PATH isn't supported by vfs1. PiperOrigin-RevId: 348820478
2020-12-17	[netstack] Implement MSG_ERRQUEUE flag for recvmsg(2).	Ayush Ranjan
	Introduces the per-socket error queue and the necessary cmsg mechanisms. PiperOrigin-RevId: 348028508
2020-12-15	Implement command SEM_INFO and SEM_STAT for semctl.	Jing Chen
	PiperOrigin-RevId: 347711998
2020-12-15	[syzkaller] Avoid AIOContext from resurrecting after being marked dead.	Ayush Ranjan
	syzkaller reported the closing of a nil channel. This is only possible when the AIOContext was destroyed twice. Some scenarios that could lead to this: - It died and then some called aioCtx.Prepare() on it and then killed it again which could cause the double destroy. The context could have been destroyed in between the call to LookupAIOContext() and Prepare(). - aioManager was destroyed but it did not update the contexts map. So Lookup could still return a dead AIOContext and then someone could call Prepare on it and kill it again. So added a check in aioCtx.Prepare() for the context being dead. This will prevent a dead context from resurrecting. Also refactored code to destroy the aioContext consistently. Earlier we were not munmapping the aioContexts that were destroyed upon aioManager destruction. Reported-by: syzbot+ef6a588d0ce6059991d2@syzkaller.appspotmail.com PiperOrigin-RevId: 347704347
2020-12-11	Remove existing nogo exceptions.	Adin Scannell
	PiperOrigin-RevId: 347047550
2020-12-11	[netstack] Decouple tcpip.ControlMessages from the IP control messges.	Ayush Ranjan
	tcpip.ControlMessages can not contain Linux specific structures which makes it painful to convert back and forth from Linux to tcpip back to Linux when passing around control messages in hostinet and raw sockets. Now we convert to the Linux version of the control message as soon as we are out of tcpip. PiperOrigin-RevId: 347027065
2020-12-11	Make semctl IPC_INFO cmd return the index of highest used entry.	Jing Chen
	PiperOrigin-RevId: 346973338
2020-12-09	Add support for IP_RECVORIGDSTADDR IP option.	Bhasker Hariharan
	Fixes #5004 PiperOrigin-RevId: 346643745
2020-12-03	Implement command IPC_INFO for semctl.	Jing Chen
	PiperOrigin-RevId: 345589628
2020-12-03	Implement `fcntl` options `F_GETSIG` and `F_SETSIG`.	Etienne Perot
	These options allow overriding the signal that gets sent to the process when I/O operations are available on the file descriptor, rather than the default `SIGIO` signal. Doing so also populates `siginfo` to contain extra information about which file descriptor caused the event (`si_fd`) and what events happened on it (`si_band`). The logic around which FD is populated within `si_fd` matches Linux's, which means it has some weird edge cases where that value may not actually refer to a file descriptor that is still valid. This CL also ports extra S/R logic regarding async handler in VFS2. Without this, async I/O handlers aren't properly re-registered after S/R. PiperOrigin-RevId: 345436598
2020-11-16	Reset watchdog timer between sendfile() iterations.	Jamie Liu
	As part of this, change Task.interrupted() to not drain Task.interruptChan, and do so explicitly using new function Task.unsetInterrupted() instead. PiperOrigin-RevId: 342768365
2020-11-16	Allow RLIMIT_RSS to be set	Fabricio Voznika
	Closes #4746 PiperOrigin-RevId: 342747165
2020-11-12	Rename kernel.TaskContext to kernel.TaskImage.	Jamie Liu
	This reduces confusion with context.Context (which is also relevant to kernel.Tasks) and is consistent with existing function kernel.LoadTaskImage(). PiperOrigin-RevId: 342167298
2020-11-06	Implement command GETNCNT for semctl.	Jing Chen
	PiperOrigin-RevId: 341154192
2020-11-06	Fix infinite loop when splicing to pipes/eventfds.	Nicolas Lacasse
	Writes to pipes of size < PIPE_BUF are guaranteed to be atomic, so writes larger than that will return EAGAIN if the pipe has capacity < PIPE_BUF. Writes to eventfds will return EAGAIN if the write would cause the eventfd value to go over the max. In both such cases, calling Ready() on the FD will return true (because it is possible to write), but specific kinds of writes will in fact return EAGAIN. This CL fixes an infinite loop in splice and sendfile (VFS1 and VFS2) by forcing skipping the readiness check for the outfile in send, splice, and tee. PiperOrigin-RevId: 341102260
2020-11-03	Make pipe min/max sizes match linux.	Nicolas Lacasse
	The default pipe size already matched linux, and is unchanged. Furthermore `atomicIOBytes` is made a proper constant (as it is in Linux). We were plumbing usermem.PageSize everywhere, so this is no functional change. PiperOrigin-RevId: 340497006
2020-11-02	Implement command GETZCNT for semctl.	Jing Chen
	PiperOrigin-RevId: 340389884
2020-10-26	Implement command IPC_STAT for semctl.	Jing Chen
	PiperOrigin-RevId: 339166854
2020-10-19	Fix runsc tests on VFS2 overlay.	Jamie Liu
	- Check the sticky bit in overlay.filesystem.UnlinkAt(). Fixes StickyTest.StickyBitPermDenied. - When configuring a VFS2 overlay in runsc, copy the lower layer's root owner/group/mode to the upper layer's root (as in the VFS1 equivalent, boot.addOverlay()). This makes the overlay root owned by UID/GID 65534 with mode 0755 rather than owned by UID/GID 0 with mode 01777. Fixes CreateTest.CreateFailsOnUnpermittedDir, which assumes that the test cannot create files in /. - MknodTest.UnimplementedTypesReturnError assumes that the creation of device special files is not supported. However, while the VFS2 gofer client still doesn't support device special files, VFS2 tmpfs does, and in the overlay test dimension mknod() targets a tmpfs upper layer. The test initially has all capabilities, including CAP_MKNOD, so its creation of these files succeeds. Constrain these tests to VFS1. - Rename overlay.nonDirectoryFD to overlay.regularFileFD and only use it for regular files, using the original FD for pipes and device special files. This is more consistent with Linux (which gets the original inode_operations, and therefore file_operations, for these file types from ovl_fill_inode() => init_special_inode()) and fixes remaining mknod and pipe tests. - Read/write 1KB at a time in PipeTest.Streaming, rather than 4 bytes. This isn't strictly necessary, but it makes the test less obnoxiously slow on ptrace. Fixes #4407 PiperOrigin-RevId: 337971042
2020-10-19	splice: return EINVAL is len is negative	Andrei Vagin
	Reported-by: syzbot+0268cc591c0f517a1de0@syzkaller.appspotmail.com PiperOrigin-RevId: 337901664
2020-10-19	pgalloc: Do not hold MemoryFile.mu while calling mincore.	Ayush Ranjan
	This change makes the following changes: - Unlocks MemoryFile.mu while calling mincore (checkCommitted) because mincore can take a really long time. Accordingly looks up the segment in the tree tree again and handles changes to the segment. - MemoryFile.UpdateUsage() can now only be called at frequency at most 100Hz. 100 Hz = linux.CLOCKS_PER_SEC. Co-authored-by: Jamie Liu <jamieliu@google.com> PiperOrigin-RevId: 337865250
2020-10-14	Fix SCM Rights reference leaks.	Dean Deng
	Control messages should be released on Read (which ignores the control message) or zero-byte Send. Otherwise, open fds sent through the control messages will be leaked. PiperOrigin-RevId: 337110774
2020-10-13	[vfs2] Don't take reference in Task.MountNamespaceVFS2 and MountNamespace.Root.	Dean Deng
	This fixes reference leaks related to accidentally forgetting to DecRef() after calling one or the other. PiperOrigin-RevId: 336918922
2020-10-09	Reduce the cost of sysinfo(2).	Jamie Liu
	- sysinfo(2) does not actually require a fine-grained breakdown of memory usage. Accordingly, instead of calling pgalloc.MemoryFile.UpdateUsage() to update the sentry's fine-grained memory accounting snapshot, just use pgalloc.MemoryFile.TotalUsage() (which is a single fstat(), and therefore far cheaper). - Use the number of threads in the root PID namespace (i.e. globally) rather than in the task's PID namespace for consistency with Linux (which just reads global variable nr_threads), and add a new method to kernel.PIDNamespace to allow this to be read directly from an underlying map rather than requiring the allocation and population of an intermediate slice. PiperOrigin-RevId: 336353100
2020-10-09	syscalls: Don't leak a file on the error path	Andrei Vagin
	Reported-by: syzbot+bb82fb556d5d0a43f632@syzkaller.appspotmail.com PiperOrigin-RevId: 336324720
2020-10-08	Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.	Jamie Liu
	cf. 2a36ab717e8f "rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ" PiperOrigin-RevId: 336186795
2020-10-06	Implement membarrier(2) commands other than *_SYNC_CORE.	Jamie Liu
	Updates #267 PiperOrigin-RevId: 335713923
2020-10-05	Merge pull request #4368 from zhlhahaha:1979	gVisor bot
	PiperOrigin-RevId: 335492800
2020-10-02	Merge pull request #4035 from lubinszARM:pr_misc_01	gVisor bot
	PiperOrigin-RevId: 335051794
2020-09-29	add related arm64 syscall for vfs2	Howard Zhang
	arm64 vfs2: Add support for io_submit/fallocate/ sendfile/newfstatat/readahead/fadvise64 Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2020-09-25	arm64: some minor changes	Bin Lu
	This patch adds minor changes for Arm64 platform: 1, add SetRobustList/GetRobustList support for arm64 syscall module. 2, add newfstatat support for arm64 vfs2 syscall module. 3, add tls value in ProtoBuf. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-22	Handle EOF properly in splice/sendfile.	Dean Deng
	Use HandleIOErrorVFS2 instead of custom error handling. PiperOrigin-RevId: 333227581
2020-09-18	Use a tmpfs file for shared anonymous and /dev/zero mmap on VFS2.	Jamie Liu
	This is more consistent with Linux (see comment on MM.NewSharedAnonMappable()). We don't do the same thing on VFS1 for reasons documented by the updated comment. PiperOrigin-RevId: 332514849
2020-09-18	Fix definition of SchedParam.	Rahat Mahmood
	Linux defines this struct as: struct sched_param { int priority; } ... in include/linux/sched.h. PiperOrigin-RevId: 332473133
2020-09-15	Enable automated marshalling for the syscall package.	Rahat Mahmood
	PiperOrigin-RevId: 331940975
2020-09-15	Read vfs2 epoll events atomically.	Jamie Liu
	Discovered by ayushranjan@: VFS2 was employing the following algorithm for fetching ready events from an epoll instance: - Create a statically sized EpollEvent slice on the stack of size 16. - Pass that to EpollInstance.ReadEvents() to populate. - EpollInstance.ReadEvents() requeues level-triggered events that it returns back into the ready queue. - Write the results to usermem. - If the number of results were = 16 then recall EpollInstance.ReadEvents() in the hopes of getting more. But this will cause duplication of the "requeued" ready level-triggered events. So if the ready queue has >= 16 ready events, the EpollWait for loop will spin until it fills the usermem with `maxEvents` events. Fixes #3521 PiperOrigin-RevId: 331840527
2020-09-14	Add note about gofer link(2) limitation	Fabricio Voznika
	PiperOrigin-RevId: 331648296
2020-09-11	Move the 'marshal' and 'primitive' packages to the 'pkg' directory.	Rahat Mahmood
	PiperOrigin-RevId: 331256608
2020-09-08	[vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.	Ayush Ranjan
	PiperOrigin-RevId: 330554450
2020-09-03	Adjust input file offset when sendfile only completes a partial write.	Dean Deng
	Fixes #3779. PiperOrigin-RevId: 330057268
2020-09-01	Fix panic when calling dup2().	Nayana Bidari
	PiperOrigin-RevId: 329572337
2020-08-28	Fix EOF handling for splice.	Dean Deng
	Also, add corresponding EOF tests for splice/sendfile. Discovered by syzkaller. PiperOrigin-RevId: 328975990
2020-08-27	Fix vfs2 pipe behavior when splicing to a non-pipe.	Dean Deng
	Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero would not actually empty the pipe. There was no guarantee that the data would actually be consumed on a splice operation unless the output file's implementation of Write/PWrite actually called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed regardless of whether CopyIn is called or not. Furthermore, the number of bytes in the IOSequence for reads is now capped at the amount of data actually available. Before, splicing to /dev/zero would always return the requested splice size without taking the actual available data into account. This change also refactors the case where an input file is spliced into an output pipe so that it follows a similar pattern, which is arguably cleaner anyway. Updates #3576. PiperOrigin-RevId: 328843954
2020-08-24	Update inotify documentation for gofer filesystem.	Dean Deng
	We now allow hard links to be created within gofer fs (see github.com/google/gvisor/commit/f20e63e31b56784c596897e86f03441f9d05f567). Update the inotify documentation accordingly. PiperOrigin-RevId: 328177485
2020-08-21	Make mounts ReadWrite first, then later change to ReadOnly.	Nicolas Lacasse
	This lets us create "synthetic" mountpoint directories in ReadOnly mounts during VFS setup. Also add context.WithMountNamespace, as some filesystems (like overlay) require a MountNamespace on ctx to handle vfs.Filesystem Operations. PiperOrigin-RevId: 327874971