summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/syscalls/linux
AgeCommit message (Collapse)Author
2020-11-16Reset watchdog timer between sendfile() iterations.Jamie Liu
As part of this, change Task.interrupted() to not drain Task.interruptChan, and do so explicitly using new function Task.unsetInterrupted() instead. PiperOrigin-RevId: 342768365
2020-11-16Allow RLIMIT_RSS to be setFabricio Voznika
Closes #4746 PiperOrigin-RevId: 342747165
2020-11-12Rename kernel.TaskContext to kernel.TaskImage.Jamie Liu
This reduces confusion with context.Context (which is also relevant to kernel.Tasks) and is consistent with existing function kernel.LoadTaskImage(). PiperOrigin-RevId: 342167298
2020-11-06Implement command GETNCNT for semctl.Jing Chen
PiperOrigin-RevId: 341154192
2020-11-06Fix infinite loop when splicing to pipes/eventfds.Nicolas Lacasse
Writes to pipes of size < PIPE_BUF are guaranteed to be atomic, so writes larger than that will return EAGAIN if the pipe has capacity < PIPE_BUF. Writes to eventfds will return EAGAIN if the write would cause the eventfd value to go over the max. In both such cases, calling Ready() on the FD will return true (because it is possible to write), but specific kinds of writes will in fact return EAGAIN. This CL fixes an infinite loop in splice and sendfile (VFS1 and VFS2) by forcing skipping the readiness check for the outfile in send, splice, and tee. PiperOrigin-RevId: 341102260
2020-11-03Make pipe min/max sizes match linux.Nicolas Lacasse
The default pipe size already matched linux, and is unchanged. Furthermore `atomicIOBytes` is made a proper constant (as it is in Linux). We were plumbing usermem.PageSize everywhere, so this is no functional change. PiperOrigin-RevId: 340497006
2020-11-02Implement command GETZCNT for semctl.Jing Chen
PiperOrigin-RevId: 340389884
2020-10-26Implement command IPC_STAT for semctl.Jing Chen
PiperOrigin-RevId: 339166854
2020-10-19Fix runsc tests on VFS2 overlay.Jamie Liu
- Check the sticky bit in overlay.filesystem.UnlinkAt(). Fixes StickyTest.StickyBitPermDenied. - When configuring a VFS2 overlay in runsc, copy the lower layer's root owner/group/mode to the upper layer's root (as in the VFS1 equivalent, boot.addOverlay()). This makes the overlay root owned by UID/GID 65534 with mode 0755 rather than owned by UID/GID 0 with mode 01777. Fixes CreateTest.CreateFailsOnUnpermittedDir, which assumes that the test cannot create files in /. - MknodTest.UnimplementedTypesReturnError assumes that the creation of device special files is not supported. However, while the VFS2 gofer client still doesn't support device special files, VFS2 tmpfs does, and in the overlay test dimension mknod() targets a tmpfs upper layer. The test initially has all capabilities, including CAP_MKNOD, so its creation of these files succeeds. Constrain these tests to VFS1. - Rename overlay.nonDirectoryFD to overlay.regularFileFD and only use it for regular files, using the original FD for pipes and device special files. This is more consistent with Linux (which gets the original inode_operations, and therefore file_operations, for these file types from ovl_fill_inode() => init_special_inode()) and fixes remaining mknod and pipe tests. - Read/write 1KB at a time in PipeTest.Streaming, rather than 4 bytes. This isn't strictly necessary, but it makes the test less obnoxiously slow on ptrace. Fixes #4407 PiperOrigin-RevId: 337971042
2020-10-19splice: return EINVAL is len is negativeAndrei Vagin
Reported-by: syzbot+0268cc591c0f517a1de0@syzkaller.appspotmail.com PiperOrigin-RevId: 337901664
2020-10-19pgalloc: Do not hold MemoryFile.mu while calling mincore.Ayush Ranjan
This change makes the following changes: - Unlocks MemoryFile.mu while calling mincore (checkCommitted) because mincore can take a really long time. Accordingly looks up the segment in the tree tree again and handles changes to the segment. - MemoryFile.UpdateUsage() can now only be called at frequency at most 100Hz. 100 Hz = linux.CLOCKS_PER_SEC. Co-authored-by: Jamie Liu <jamieliu@google.com> PiperOrigin-RevId: 337865250
2020-10-14Fix SCM Rights reference leaks.Dean Deng
Control messages should be released on Read (which ignores the control message) or zero-byte Send. Otherwise, open fds sent through the control messages will be leaked. PiperOrigin-RevId: 337110774
2020-10-13[vfs2] Don't take reference in Task.MountNamespaceVFS2 and MountNamespace.Root.Dean Deng
This fixes reference leaks related to accidentally forgetting to DecRef() after calling one or the other. PiperOrigin-RevId: 336918922
2020-10-09Reduce the cost of sysinfo(2).Jamie Liu
- sysinfo(2) does not actually require a fine-grained breakdown of memory usage. Accordingly, instead of calling pgalloc.MemoryFile.UpdateUsage() to update the sentry's fine-grained memory accounting snapshot, just use pgalloc.MemoryFile.TotalUsage() (which is a single fstat(), and therefore far cheaper). - Use the number of threads in the root PID namespace (i.e. globally) rather than in the task's PID namespace for consistency with Linux (which just reads global variable nr_threads), and add a new method to kernel.PIDNamespace to allow this to be read directly from an underlying map rather than requiring the allocation and population of an intermediate slice. PiperOrigin-RevId: 336353100
2020-10-09syscalls: Don't leak a file on the error pathAndrei Vagin
Reported-by: syzbot+bb82fb556d5d0a43f632@syzkaller.appspotmail.com PiperOrigin-RevId: 336324720
2020-10-08Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.Jamie Liu
cf. 2a36ab717e8f "rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ" PiperOrigin-RevId: 336186795
2020-10-06Implement membarrier(2) commands other than *_SYNC_CORE.Jamie Liu
Updates #267 PiperOrigin-RevId: 335713923
2020-10-05Merge pull request #4368 from zhlhahaha:1979gVisor bot
PiperOrigin-RevId: 335492800
2020-10-02Merge pull request #4035 from lubinszARM:pr_misc_01gVisor bot
PiperOrigin-RevId: 335051794
2020-09-29add related arm64 syscall for vfs2Howard Zhang
arm64 vfs2: Add support for io_submit/fallocate/ sendfile/newfstatat/readahead/fadvise64 Signed-off-by: Howard Zhang <howard.zhang@arm.com>
2020-09-25arm64: some minor changesBin Lu
This patch adds minor changes for Arm64 platform: 1, add SetRobustList/GetRobustList support for arm64 syscall module. 2, add newfstatat support for arm64 vfs2 syscall module. 3, add tls value in ProtoBuf. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-09-22Handle EOF properly in splice/sendfile.Dean Deng
Use HandleIOErrorVFS2 instead of custom error handling. PiperOrigin-RevId: 333227581
2020-09-18Use a tmpfs file for shared anonymous and /dev/zero mmap on VFS2.Jamie Liu
This is more consistent with Linux (see comment on MM.NewSharedAnonMappable()). We don't do the same thing on VFS1 for reasons documented by the updated comment. PiperOrigin-RevId: 332514849
2020-09-18Fix definition of SchedParam.Rahat Mahmood
Linux defines this struct as: struct sched_param { int priority; } ... in include/linux/sched.h. PiperOrigin-RevId: 332473133
2020-09-15Enable automated marshalling for the syscall package.Rahat Mahmood
PiperOrigin-RevId: 331940975
2020-09-15Read vfs2 epoll events atomically.Jamie Liu
Discovered by ayushranjan@: VFS2 was employing the following algorithm for fetching ready events from an epoll instance: - Create a statically sized EpollEvent slice on the stack of size 16. - Pass that to EpollInstance.ReadEvents() to populate. - EpollInstance.ReadEvents() requeues level-triggered events that it returns back into the ready queue. - Write the results to usermem. - If the number of results were = 16 then recall EpollInstance.ReadEvents() in the hopes of getting more. But this will cause duplication of the "requeued" ready level-triggered events. So if the ready queue has >= 16 ready events, the EpollWait for loop will spin until it fills the usermem with `maxEvents` events. Fixes #3521 PiperOrigin-RevId: 331840527
2020-09-14Add note about gofer link(2) limitationFabricio Voznika
PiperOrigin-RevId: 331648296
2020-09-11Move the 'marshal' and 'primitive' packages to the 'pkg' directory.Rahat Mahmood
PiperOrigin-RevId: 331256608
2020-09-08[vfs] Capitalize x in the {Get/Set/Remove/List}xattr functions.Ayush Ranjan
PiperOrigin-RevId: 330554450
2020-09-03Adjust input file offset when sendfile only completes a partial write.Dean Deng
Fixes #3779. PiperOrigin-RevId: 330057268
2020-09-01Fix panic when calling dup2().Nayana Bidari
PiperOrigin-RevId: 329572337
2020-08-28Fix EOF handling for splice.Dean Deng
Also, add corresponding EOF tests for splice/sendfile. Discovered by syzkaller. PiperOrigin-RevId: 328975990
2020-08-27Fix vfs2 pipe behavior when splicing to a non-pipe.Dean Deng
Fixes *.sh Java runtime tests, where splice()-ing from a pipe to /dev/zero would not actually empty the pipe. There was no guarantee that the data would actually be consumed on a splice operation unless the output file's implementation of Write/PWrite actually called VFSPipeFD.CopyIn. Now, whatever bytes are "written" are consumed regardless of whether CopyIn is called or not. Furthermore, the number of bytes in the IOSequence for reads is now capped at the amount of data actually available. Before, splicing to /dev/zero would always return the requested splice size without taking the actual available data into account. This change also refactors the case where an input file is spliced into an output pipe so that it follows a similar pattern, which is arguably cleaner anyway. Updates #3576. PiperOrigin-RevId: 328843954
2020-08-24Update inotify documentation for gofer filesystem.Dean Deng
We now allow hard links to be created within gofer fs (see github.com/google/gvisor/commit/f20e63e31b56784c596897e86f03441f9d05f567). Update the inotify documentation accordingly. PiperOrigin-RevId: 328177485
2020-08-21Make mounts ReadWrite first, then later change to ReadOnly.Nicolas Lacasse
This lets us create "synthetic" mountpoint directories in ReadOnly mounts during VFS setup. Also add context.WithMountNamespace, as some filesystems (like overlay) require a MountNamespace on ctx to handle vfs.Filesystem Operations. PiperOrigin-RevId: 327874971
2020-08-18Move ERESTART* error definitions to syserror package.Dean Deng
This is needed to avoid circular dependencies between the vfs and kernel packages. PiperOrigin-RevId: 327355524
2020-08-17Stop masking the IO error in handleIOError.Nicolas Lacasse
PiperOrigin-RevId: 327123331
2020-08-12Release fd references on aio callback cancellation.Dean Deng
Discovered by reference leak checker on tmpfs.inode. PiperOrigin-RevId: 326294755
2020-08-05Release extra memfd reference.Dean Deng
PiperOrigin-RevId: 325122849
2020-08-04Handle EOF in vfs2 sendfile.Dean Deng
Discovered by syzkaller. PiperOrigin-RevId: 324938438
2020-08-03Remove old TODO.Dean Deng
Fixes #2920. PiperOrigin-RevId: 324695118
2020-08-03Plumbing context.Context to DecRef() and Release().Nayana Bidari
context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-08-03Add inotify events for fallocate and tests for fallocate/sendfile.Dean Deng
Updates #1479, #2923. PiperOrigin-RevId: 324658826
2020-07-31Clean up vfs2 fallocate.Dean Deng
Move to setstat.go and add a FileDescription wrapper method. PiperOrigin-RevId: 324165277
2020-07-30Fix SETOWN_EX return value.Dean Deng
Return on success should be 0, not size of the struct copied out. PiperOrigin-RevId: 324029193
2020-07-27Merge pull request #2958 from lubinszARM:pr_vfs2_1gVisor bot
PiperOrigin-RevId: 323443142
2020-07-23Add permission checks to vfs2 truncate.Dean Deng
- Check write permission on truncate(2). Unlike ftruncate(2), truncate(2) fails if the user does not have write permissions on the file. - For gofers under InteropModeShared, check file type before making a truncate request. We should fail early and avoid making an rpc when possible. Furthermore, depending on the remote host's failure may give us unexpected behavior--if the host converts the truncate request to an ftruncate syscall on an open fd, we will get EINVAL instead of EISDIR. Updates #2923. PiperOrigin-RevId: 322913569
2020-07-23Implement get/set_robust_list.Nicolas Lacasse
PiperOrigin-RevId: 322904430
2020-07-23Added stub FUSE filesystemRidwan Sharif
Allow FUSE filesystems to be mounted using libfuse. The appropriate flags and mount options are parsed and understood by fusefs.
2020-07-23Marshallable socket opitons.Ayush Ranjan
Socket option values are now required to implement marshal.Marshallable. Co-authored-by: Rahat Mahmood <rahat@google.com> PiperOrigin-RevId: 322831612