Age | Commit message (Collapse) | Author |
|
In order to make sure all aio goroutines have stopped during S/R, a new
WaitGroup was added to TaskSet, analagous to runningGoroutines. This WaitGroup
is incremented with each aio goroutine, and waited on during kernel.Pause.
The old VFS1 aio code was changed to use this new WaitGroup, rather than
fs.Async. The only uses of fs.Async are now inode and mount Release operations,
which do not call fs.Async recursively. This fixes a lock-ordering violation
that can cause deadlocks.
Updates #1035.
PiperOrigin-RevId: 316689380
|
|
PiperOrigin-RevId: 316627764
|
|
PiperOrigin-RevId: 316148074
|
|
gaurav1086:sentry_kernel_timekeeper_use_buffered_channel
PiperOrigin-RevId: 315803553
|
|
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
|
|
LockFD is the generic implementation that can be embedded in
FileDescriptionImpl implementations. Unique lock ID is
maintained in vfs.FileDescription and is created on demand.
Updates #1480
PiperOrigin-RevId: 315604825
|
|
This is mostly syscall plumbing, VFS2 already implements the internals of
mounts. In addition to the syscall defintions, the following mount-related
mechanisms are updated:
- Implement MS_NOATIME for VFS2, but only for tmpfs and goferfs. The other VFS2
filesystems don't implement node-level timestamps yet.
- Implement the 'mode', 'uid' and 'gid' mount options for VFS2's tmpfs.
- Plumb mount namespace ownership, which is necessary for checking appropriate
capabilities during mount(2).
Updates #1035
PiperOrigin-RevId: 315035352
|
|
The current task can share its fdtable with a few other tasks,
but after exec, this should be a completely separate process.
PiperOrigin-RevId: 314999565
|
|
Limited to tmpfs. Inotify support in other filesystem implementations to
follow.
Updates #1479
PiperOrigin-RevId: 313828648
|
|
Support in other filesystem impls is still needed. Unlike in Linux and vfs1, we
need to plumb inotify down to each filesystem implementation in order to keep
track of links/inode structures properly.
IN_EXCL_UNLINK still needs to be implemented, as well as a few inotify hooks
that are not present in either vfs1 or vfs2. Those will be addressed in
subsequent changes.
Updates #1479.
PiperOrigin-RevId: 313781995
|
|
Updates #138
PiperOrigin-RevId: 313326354
|
|
* Aggregate architecture Overview in "What is gVisor?" as it makes more sense
in one place.
* Drop "user-space kernel" and use "application kernel". The term "user-space
kernel" is confusing when some platform implementation do not run in
user-space (instead running in guest ring zero).
* Clear up the relationship between the Platform page in the user guide and the
Platform page in the architecture guide, and ensure they are cross-linked.
* Restore the call-to-action quick start link in the main page, and drop the
GitHub link (which also appears in the top-right).
* Improve image formatting by centering all doc and blog images, and move the
image captions to the alt text.
PiperOrigin-RevId: 311845158
|
|
Closes #2612.
PiperOrigin-RevId: 311548074
|
|
Updates #1197, #1198, #1672
PiperOrigin-RevId: 310432006
|
|
They don't depend on anything in VFS2, so they should be their own packages.
PiperOrigin-RevId: 310416807
|
|
We can register any number of tables with any number of architectures, and
need not limit the definitions to the architecture in question. This allows
runsc to generate documentation for all architectures simultaneously.
Similarly, this simplifies the VFSv2 patching process.
PiperOrigin-RevId: 310224827
|
|
This change ensures that even platforms with some TSC issues (e.g. KVM),
can get reliable monotonic time by applied a lower bound on each read.
PiperOrigin-RevId: 309773801
|
|
PiperOrigin-RevId: 308617610
|
|
PiperOrigin-RevId: 308472331
|
|
This is needed to set up host fds passed through a Unix socket. Note that
the host package depends on kernel, so we cannot set up the hostfs mount
directly in Kernel.Init as we do for sockfs and pipefs.
Also, adjust sockfs to make its setup look more like hostfs's and pipefs's.
PiperOrigin-RevId: 308274053
|
|
PiperOrigin-RevId: 308170679
|
|
Ensure we use the correct architecture-specific defintion of epoll
event, and use go-marshal for serialization.
PiperOrigin-RevId: 308145677
|
|
PiperOrigin-RevId: 308100771
|
|
PiperOrigin-RevId: 307941984
|
|
Included:
- loader_test.go RunTest and TestStartSignal VFS2
- container_test.go TestAppExitStatus on VFS2
- experimental flag added to runsc to turn on VFS2
Note: shared mounts are not yet supported.
PiperOrigin-RevId: 307070753
|
|
Updates #1035
PiperOrigin-RevId: 306968644
|
|
PiperOrigin-RevId: 306891171
|
|
PiperOrigin-RevId: 306306809
|
|
Note that most kinds of sockets are not yet supported in VFS2
(only Unix sockets are partially supported at the moment), so
these syscalls will still generally fail. Enabling them allows
us to begin running socket tests for VFS2 as more features are
ported over.
Updates #1476, #1478, #1484, #1485.
PiperOrigin-RevId: 306292294
|
|
The comments in the ticket indicate that this behavior
is fine and that the ticket should be closed, so we shouldn't
need pointers to the ticket.
PiperOrigin-RevId: 306266071
|
|
noNewPrivileges is ignored if set to false since gVisor assumes that
PR_SET_NO_NEW_PRIVS is always enabled.
PiperOrigin-RevId: 305991947
|
|
The dependency strace=>kernel grew over time. strace also depends on
task's FD table and FSContext. It could be fixed with some interfaces
the other way, but then we're trading an interface for another, and
kernel.Stracer is likely cleaner.
Closes #155
PiperOrigin-RevId: 305909678
|
|
PiperOrigin-RevId: 305807868
|
|
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I5bb8fa7d580d173b1438d6465e1adb442216c8fa
|
|
PiperOrigin-RevId: 305592245
|
|
This required minor restructuring of how system call tables were saved
and restored, but it makes way more sense this way.
Updates #2243
|
|
PiperOrigin-RevId: 305067208
|
|
Updates #1476, #1478, #1484, #1485.
PiperOrigin-RevId: 304845354
|
|
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.
This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.
While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.
Updates #231
PiperOrigin-RevId: 304731742
|
|
This change involves several steps:
- Refactor the VFS1 unix socket implementation to share methods between VFS1
and VFS2 where possible. Re-implement the rest.
- Override the default PRead, Read, PWrite, Write, Ioctl, Release methods in
FileDescriptionDefaultImpl.
- Add functions to create and initialize a new Dentry/Inode and FileDescription
for a Unix socket file.
Updates #1476
PiperOrigin-RevId: 304689796
|
|
PiperOrigin-RevId: 304684417
|
|
PiperOrigin-RevId: 304119255
|
|
SA_RESTORER is always used on Intel platform.
But this flag is optional on other platforms.
The vdso is enabled, so we can use the sigreturn trampolines
the vdso provides instead on Arm platform.
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
A socket mount where anonymous sockets will reside is added to the
VirtualFilesystem. Socketfs is built on top of kernfs.
Updates #1476, #1478, #1484, #1485.
PiperOrigin-RevId: 304095251
|
|
This feature will match UID and GID of the packet creator, for locally
generated packets. This match is only valid in the OUTPUT and POSTROUTING
chains. Forwarded packets do not have any socket associated with them.
Packets from kernel threads do have a socket, but usually no owner.
|
|
workMu is removed and e.mu is now a mutex that supports TryLock. The packet
processing path tries to lock the mutex and if its locked it will just queue the
packet and move on. The endpoint.UnlockUser() will process any backlog of
packets before unlocking the socket.
This simplifies the locking inside tcp endpoints a lot. Further the
endpoint.LockUser() implements spinning as long as the lock is not held by
another syscall goroutine. This ensures low latency as not spinning leads to the
task thread being put to sleep if the lock is held by the packet dispatch
path. This is suboptimal as the lower layer rarely holds the lock for long so
implementing spinning here helps.
If the lock is held by another task goroutine then we just proceed to call
LockUser() and the task could be put to sleep.
The protocol goroutines themselves just call e.mu.Lock() and block if the
lock is currently not available.
Updates #231, #357
PiperOrigin-RevId: 301808349
|
|
It was looking at VFS1 table to determine where to
allocate the next FD from.
Updates #1035
PiperOrigin-RevId: 301678858
|
|
FDTable.setAll is used to zap entries, but it grows the table up to
a specified fd.
Reported-by: syzbot+9e281b0750d2d4caa190@syzkaller.appspotmail.com
PiperOrigin-RevId: 301280000
|
|
- When setting up the virtual filesystem, mount a host.filesystem to contain
all files that need to be imported.
- Make read/preadv syscalls to the host in cases where preadv2 may not be
supported yet (likewise for writing).
- Make save/restore functions in kernel/kernel.go return early if vfs2 is
enabled.
PiperOrigin-RevId: 300922353
|
|
Closes #1195
PiperOrigin-RevId: 300867055
|