summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/kernel
AgeCommit message (Collapse)Author
2020-04-10Remove TODO from kernel.StracerFabricio Voznika
The dependency strace=>kernel grew over time. strace also depends on task's FD table and FSContext. It could be fixed with some interfaces the other way, but then we're trading an interface for another, and kernel.Stracer is likely cleaner. Closes #155 PiperOrigin-RevId: 305909678
2020-04-09Merge pull request #2253 from amscanne:nogogVisor bot
PiperOrigin-RevId: 305807868
2020-04-08Clean up TODOsFabricio Voznika
PiperOrigin-RevId: 305592245
2020-04-08Fix all copy locks violations.Adin Scannell
This required minor restructuring of how system call tables were saved and restored, but it makes way more sense this way. Updates #2243
2020-04-06Port timerfd to VFS2.Nicolas Lacasse
PiperOrigin-RevId: 305067208
2020-04-04Record VFS2 sockets in global socket map.Dean Deng
Updates #1476, #1478, #1484, #1485. PiperOrigin-RevId: 304845354
2020-04-03Refactor software GSO code.Bhasker Hariharan
Software GSO implementation currently has a complicated code path with implicit assumptions that all packets to WritePackets carry same Data and it does this to avoid allocations on the path etc. But this makes it hard to reuse the WritePackets API. This change breaks all such assumptions by introducing a new Vectorised View API ReadToVV which can be used to cleanly split a VV into multiple independent VVs. Further this change also makes packet buffers linkable to form an intrusive list. This allows us to get rid of the array of packet buffers that are passed in the WritePackets API call and replace it with a list of packet buffers. While this code does introduce some more allocations in the benchmarks it doesn't cause any degradation. Updates #231 PiperOrigin-RevId: 304731742
2020-04-03Add FileDescriptionImpl for Unix sockets.Dean Deng
This change involves several steps: - Refactor the VFS1 unix socket implementation to share methods between VFS1 and VFS2 where possible. Re-implement the rest. - Override the default PRead, Read, PWrite, Write, Ioctl, Release methods in FileDescriptionDefaultImpl. - Add functions to create and initialize a new Dentry/Inode and FileDescription for a Unix socket file. Updates #1476 PiperOrigin-RevId: 304689796
2020-04-03Ensure EOF is handled propertly during splice.Adin Scannell
PiperOrigin-RevId: 304684417
2020-03-31Implement automated marshalling for slices of Marshallable types.Rahat Mahmood
PiperOrigin-RevId: 304119255
2020-03-31Add socket filesystem and global disconnected socket mount for VFS2.Dean Deng
A socket mount where anonymous sockets will reside is added to the VirtualFilesystem. Socketfs is built on top of kernfs. Updates #1476, #1478, #1484, #1485. PiperOrigin-RevId: 304095251
2020-03-26Support owner matching for iptables.Nayana Bidari
This feature will match UID and GID of the packet creator, for locally generated packets. This match is only valid in the OUTPUT and POSTROUTING chains. Forwarded packets do not have any socket associated with them. Packets from kernel threads do have a socket, but usually no owner.
2020-03-19Remove workMu from tcpip.Endpoint.Bhasker Hariharan
workMu is removed and e.mu is now a mutex that supports TryLock. The packet processing path tries to lock the mutex and if its locked it will just queue the packet and move on. The endpoint.UnlockUser() will process any backlog of packets before unlocking the socket. This simplifies the locking inside tcp endpoints a lot. Further the endpoint.LockUser() implements spinning as long as the lock is not held by another syscall goroutine. This ensures low latency as not spinning leads to the task thread being put to sleep if the lock is held by the packet dispatch path. This is suboptimal as the lower layer rarely holds the lock for long so implementing spinning here helps. If the lock is held by another task goroutine then we just proceed to call LockUser() and the task could be put to sleep. The protocol goroutines themselves just call e.mu.Lock() and block if the lock is currently not available. Updates #231, #357 PiperOrigin-RevId: 301808349
2020-03-18Fix FDTable.NewFDVFS2Fabricio Voznika
It was looking at VFS1 table to determine where to allocate the next FD from. Updates #1035 PiperOrigin-RevId: 301678858
2020-03-16fdtable: don't try to zap fdtable entry if close is called for non-existing fdAndrei Vagin
FDTable.setAll is used to zap entries, but it grows the table up to a specified fd. Reported-by: syzbot+9e281b0750d2d4caa190@syzkaller.appspotmail.com PiperOrigin-RevId: 301280000
2020-03-14Plumb VFS2 imported fds into virtual filesystem.Dean Deng
- When setting up the virtual filesystem, mount a host.filesystem to contain all files that need to be imported. - Make read/preadv syscalls to the host in cases where preadv2 may not be supported yet (likewise for writing). - Make save/restore functions in kernel/kernel.go return early if vfs2 is enabled. PiperOrigin-RevId: 300922353
2020-03-13Add remaining procfs filesFabricio Voznika
Closes #1195 PiperOrigin-RevId: 300867055
2020-03-13Panic if file in FDTable has been destroyedFabricio Voznika
This will give more information about the file to identify where possibly the extra DecRef() would be. PiperOrigin-RevId: 300855874
2020-03-13Fix infinite loop in semaphore.sem.wakeWaiters().Jamie Liu
PiperOrigin-RevId: 300845134
2020-03-13Fix oom_score_adj.Jamie Liu
- Make oomScoreAdj a ThreadGroup field (Linux: signal_struct::oom_score_adj). - Avoid deadlock caused by Task.OOMScoreAdj()/SetOOMScoreAdj() locking Task.mu and TaskSet.mu in the wrong order (via Task.ExitState()). PiperOrigin-RevId: 300814698
2020-03-13Fix lock recursion in kernel.ProcessGroup.SendSignal().Jamie Liu
PiperOrigin-RevId: 300803515
2020-03-06Prevent memory leaks in ilistTamir Duberstein
When list elements are removed from a list but not discarded, it becomes important to invalidate the references they hold to their former neighbors to prevent memory leaks. PiperOrigin-RevId: 299412421
2020-03-05Stub oom_score_adj and oom_score.Ian Lewis
Adds an oom_score_adj and oom_score proc file stub. oom_score_adj accepts writes of values -1000 to 1000 and persists the value with the task. New tasks inherit the parent's oom_score_adj. oom_score is a read-only stub that always returns the value '0'. Issue #202 PiperOrigin-RevId: 299245355
2020-02-28Define CPUIDInstruction for arm64Andrei Vagin
There is no cpuid instruction on arm64, so we need to defined it just to avoid a compile time error. Signed-off-by: Andrei Vagin <avagin@gmail.com>
2020-02-28Make pipe buffer implementation standard.Adin Scannell
A follow-up change will convert the networking code to use this standard pipe implementation. PiperOrigin-RevId: 297903206
2020-02-28Hide /dev/net/tun when using hostinet.Ting-Yu Wang
/dev/net/tun does not currently work with hostinet. This has caused some program starts failing because it thinks the feature exists. PiperOrigin-RevId: 297876196
2020-02-25Add option to skip stuck tasks waiting for address spaceFabricio Voznika
PiperOrigin-RevId: 297192390
2020-02-25Port most syscalls to VFS2.Jamie Liu
pipe and pipe2 aren't ported, pending a slight rework of pipe FDs for VFS2. mount and umount2 aren't ported out of temporary laziness. access and faccessat need additional FSImpl methods to implement properly, but are stubbed to prevent googletest from CHECK-failing. Other syscalls require additional plumbing. Updates #1623 PiperOrigin-RevId: 297188448
2020-02-25Fix nested logging.Adin Scannell
PiperOrigin-RevId: 297175316
2020-02-20Initial network namespace support.gVisor bot
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-20Remove bytes read/written from marshal.Marshallable API.gVisor bot
Users of the API only care about whether the copy in/out succeeds in their entirety, which is already signalled by the returned error. PiperOrigin-RevId: 296297843
2020-02-14Synchronize signalling with S/RgVisor bot
This is to fix a data race between sending an external signal to a ThreadGroup and kernel saving state for S/R. PiperOrigin-RevId: 295244281
2020-02-14Enable automated marshalling for RSeqCriticalSection.gVisor bot
PiperOrigin-RevId: 295226468
2020-02-14Inline vfs.VirtualFilesystem in Kernel structgVisor bot
This saves one pointer dereference per VFS access. Updates #1623 PiperOrigin-RevId: 295216176
2020-02-14Enable automated marshalling for struct stat.gVisor bot
This requires fixing a few build issues for non-am64 platforms. PiperOrigin-RevId: 295196922
2020-02-14Plumb VFS2 inside the SentrygVisor bot
- Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950
2020-02-05Add notes to relevant tests.Adin Scannell
These were out-of-band notes that can help provide additional context and simplify automated imports. PiperOrigin-RevId: 293525915
2020-01-28Changes missing in last submitFabricio Voznika
Updates #1487 Updates #1623 PiperOrigin-RevId: 292040835
2020-01-28Add vfs.FileDescription to FD tableFabricio Voznika
FD table now holds both VFS1 and VFS2 types and uses the correct one based on what's set. Parts of this CL are just initial changes (e.g. sys_read.go, runsc/main.go) to serve as a template for the remaining changes. Updates #1487 Updates #1623 PiperOrigin-RevId: 292023223
2020-01-27Merge pull request #1561 from zhangningdlut:chris_ttygVisor bot
PiperOrigin-RevId: 291821850
2020-01-27Update package locations.Adin Scannell
Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27Fix licenses.Adin Scannell
The preferred Copyright holder is "The gVisor Authors". PiperOrigin-RevId: 291786657
2020-01-27Standardize on tools directory.Adin Scannell
PiperOrigin-RevId: 291745021
2020-01-24Ignore external SIGURGMichael Pratt
Go 1.14+ sends SIGURG to Ms to attempt asynchronous preemption of a G. Since it can't guarantee that a SIGURG is only related to preemption, it continues to forward them to signal.Notify (see runtime.sighandler). We should ignore these signals, as applications shouldn't receive them. Note that this means that truly external SIGURG can no longer be sent to the application (as with SIGCHLD). PiperOrigin-RevId: 291415357
2020-01-23Remove epoll entry from map when dropping it.Nicolas Lacasse
This pattern (delete from map when dropping) is also used in epoll.RemoveEntry, and seems like generally a good idea. PiperOrigin-RevId: 291268208
2020-01-22Move VFS2 handling of FD readability/writability to vfs.FileDescription.Jamie Liu
PiperOrigin-RevId: 291006713
2020-01-15Fix "unlock of unlocked mutex" crash when getting ttychris.zn
This patch holds taskset.mu when getting tty. If we don't do this, it may cause a "unlock of unlocked mutex" problem, since signalHandlers may be replaced by CopyForExec() in runSyscallAfterExecStop after the signalHandlers.mu has been holded in TTY(). The problem is easy to reproduce with keeping to do "runsc ps". The crash log is : fatal error: sync: unlock of unlocked mutex goroutine 5801304 [running]: runtime.throw(0xfd019c, 0x1e) GOROOT/src/runtime/panic.go:774 +0x72 fp=0xc001ba47b0 sp=0xc001ba4780 pc=0x431702 sync.throw(0xfd019c, 0x1e) GOROOT/src/runtime/panic.go:760 +0x35 fp=0xc001ba47d0 sp=0xc001ba47b0 pc=0x431685 sync.(*Mutex).unlockSlow(0xc00cf94a30, 0xc0ffffffff) GOROOT/src/sync/mutex.go:196 +0xd6 fp=0xc001ba47f8 sp=0xc001ba47d0 pc=0x4707d6 sync.(*Mutex).Unlock(0xc00cf94a30) GOROOT/src/sync/mutex.go:190 +0x48 fp=0xc001ba4818 sp=0xc001ba47f8 pc=0x4706e8 gvisor.dev/gvisor/pkg/sentry/kernel.(*ThreadGroup).TTY(0xc011a9e800, 0x0) pkg/sentry/kernel/tty.go:38 +0x88 fp=0xc001ba4868 sp=0xc001ba4818 pc=0x835fa8 gvisor.dev/gvisor/pkg/sentry/control.Processes(0xc00025ae00, 0xc013e397c0, 0x40, 0xc0137b9800, 0x1, 0x7f292e9a4cc0) pkg/sentry/control/proc.go:366 +0x355 fp=0xc001ba49a0 sp=0xc001ba4868 pc=0x9ac4a5 gvisor.dev/gvisor/runsc/boot.(*containerManager).Processes(0xc0003b62c0, 0xc0051423d0, 0xc0137b9800, 0x0, 0x0) runsc/boot/controller.go:228 +0xdf fp=0xc001ba49e8 sp=0xc001ba49a0 pc=0xaf06cf Signed-off-by: chris.zn <chris.zn@antfin.com>
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2020-01-06Implement rseq(2)Michael Pratt
PiperOrigin-RevId: 288342928
2020-01-06Cleanup Shm reference handlingMichael Pratt
Currently, shm.Registry.FindByID will return Shm instances without taking an additional reference on them, making it possible for them to disappear. More explicitly handle references. All callers hold a reference for the duration that they hold the instance. Registry.shms may transitively hold Shms with no references, so it must TryIncRef to determine if they are still valid. PiperOrigin-RevId: 288314529