Age | Commit message (Collapse) | Author |
|
Reported-by: syzbot+39d434b96cf7c29a66ad@syzkaller.appspotmail.com
Reported-by: syzbot+7c38bce6353d91facca3@syzkaller.appspotmail.com
PiperOrigin-RevId: 406024052
|
|
Previously, we recorded a single aggregated count. These per-protocol counts
can help us debug field issues when frames are dropped for this reason.
PiperOrigin-RevId: 405913911
|
|
vfs.NewDisconnectedMount has no error paths. Its much prettier without the
error return value.
Also simplify MountDisconnected which would immediately drop the refs taken by
NewDisconnectedMount. Instead make it directly call newMount.
PiperOrigin-RevId: 405767966
|
|
Right now, each vdso call triggers vmexit. VDSO and VVAR pages are
mapped with VM_IO and get_user_pages fails for such vma-s. KVM was not
able to handle this case up to the v4.8 kernel. This problem was fixed by
add6a0cd1c5ba ("KVM: MMU: try to fix up page faults before giving up").
For some unknown reasons, it still doesn't work in case of nested
virtualization.
Before:
BenchmarkKernelVDSO-6 252519 4598 ns/op
After:
BenchmarkKernelVDSO-6 34431957 34.91 ns/op
PiperOrigin-RevId: 405715941
|
|
As documented in FilesystemType.GetFilesystem, a reference should be taken on
the returned dentry and filesystem by GetFilesystem implementation. mqfs did
not do that.
Additionally cleanup and clarify ref counting of dentry, filesystem and mount
in mqfs.
Reported-by: syzbot+a2c54bfb6e1525228e5f@syzkaller.appspotmail.com
Reported-by: syzbot+ccd305cdab11cfebbfff@syzkaller.appspotmail.com
PiperOrigin-RevId: 405700565
|
|
VFS1 discards the value of f_namelen returned by the filesystem and returns
NAME_MAX unconditionally instead, so it doesn't run into this. Also set
f_frsize for completeness.
PiperOrigin-RevId: 405579707
|
|
As caught by syzkaller, we were leaking non-permission bits while passing the
user generated mode. DynamicBytesFile panics in this case.
Reported-by: syzbot+5abe52d47d56a5a98c89@syzkaller.appspotmail.com
PiperOrigin-RevId: 405481392
|
|
PiperOrigin-RevId: 404901660
|
|
When file corruption is detected, report vfs.ErrCorruption to
distinguish corruption error from other restore errors.
Updates #1035
PiperOrigin-RevId: 404588445
|
|
PiperOrigin-RevId: 404400399
|
|
lisafs.ClientFile.MkdirAt is allowed to return a non-nil Inode and a non-nil
error on an RPC error. The caller must not use the returned (invalid) Inode on
error. But a code path in the gofer client does end up using it.
More specifically, when the Mkdir RPC fails and we end up creating a synthetic
dentry for a mountpoint, we end up returning the (invalid) non-nil Inode to
filesystem.doCreateAt implementation which thinks that a remote file was
created. But that non-nil Inode is actually invalid because the RPC failed.
Things go downhill from there.
Update client to not use childDirInode if RPC failed.
PiperOrigin-RevId: 404396573
|
|
PiperOrigin-RevId: 404382475
|
|
- We should be using a monotonic clock
- This will make future testing easier
Updates #6748.
PiperOrigin-RevId: 404072318
|
|
Updates #1035
PiperOrigin-RevId: 404072231
|
|
Fixes #6590
PiperOrigin-RevId: 404007524
|
|
gVisor was previously reporting the lower of cgroup limit or 2GB as total
memory. This may cause applications to make bad decisions based on amount
of memory available to them when more than 2GB is required.
This change makes the lower of cgroup limit or the host total memory to be
reported inside the sandbox. This also is more inline with docker which always
reports host total memory. Note that reporting cgroup limit is strictly better
than host total memory when there is a limit set.
Fixes #5608
PiperOrigin-RevId: 403241608
|
|
Prior to cl/318010298, //pkg/state couldn't handle pointers to struct fields,
which meant that it couldn't handle intrusive linked lists, which meant that it
couldn't handle waiter.Queue, which meant that it couldn't handle epoll. As a
result, VFS1 unregisters all epoll waiters before saving and re-registers them
after loading, and waitable VFS1 file implementations tag their waiter.Queues
state:"nosave" (causing them to be skipped by the save/restore machinery) or
state:"zerovalue" (causing them to only be checked for zero-value-equality on
save).
VFS2 required cl/318010298 to support save/restore (due to the Impl inheritance
pattern used by vfs.FileDescription, vfs.Dentry, etc.); correspondingly, VFS2
epoll assumes that waiter.Queues *will be* saved and loaded correctly, and VFS2
file implementations do not tag waiter.Queues.
Some waiter.Queues, e.g. pipe.Pipe.Queue and kernel.Task.signalQueue, are used
by both VFS1 and VFS2 (the latter via signalfd); as a result of the above,
tagging these Queues state:"nosave" or state:"zerovalue" breaks VFS2 epoll.
Remove VFS1 epoll unregistration before saving (bringing it in line with VFS2),
and remove these tags from all waiter.Queues.
Also clean up after the epoll test added by cl/402323053, which implied this
issue (by instantiating DisableSave in the new test) without reporting it.
PiperOrigin-RevId: 402596216
|
|
PiperOrigin-RevId: 402323053
|
|
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go
will not clobber floating point registers before/after calling them
respectively. Due to regabig in Go 1.17, this is no longer the case; regabig
(among other things) maintains a zeroed XMM15 during ABIInternal execution,
including by zeroing it after ABI0-to-ABIInternal transitions. In
ring0.sysenter/exception, this happens in
ring0.kernelSyscall/kernelException.abi0 respectively; in
ring0.CPU.SwitchToUser, this happens after returning from
ring0.sysret/iret.abi0. Delete these functions and do floating point save/load
in assembly.
While arm64 doesn't appear to be immediately affected (so this CL permits us to
resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be
incorrect for the same fundamental reason (Go code can't sanely assume what
registers the Go compiler will or won't use) and should be fixed eventually.
PiperOrigin-RevId: 401895658
|
|
listXattr() was doing redundant work. Remove it.
PiperOrigin-RevId: 401871315
|
|
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer
process when the gofer filesystem is the lower layer of an overlay.
The overlay filesystem aggressively queries for "trusted.overlay.opaque" which
in practice is never found in the lower layer gofer. But leads to a lot of
wasted work.
A consequence is that mutable gofer upper layer is not supported anymore but
that is still consistent with VFS1. We can revisit when need arises.
PiperOrigin-RevId: 401860585
|
|
The same create/write/read pattern is copied around several places. It's easier
to understand in a package with names and comments, and we can reuse the smart
blocking code in package rawfile.
PiperOrigin-RevId: 401647108
|
|
- Implements RFC 3522 (Eifel detection algorithm) to detect if the connection
entered loss recovery unnecessarily.
- Added a new metric to count the total number of spurious loss recoveries.
- Added tests to verify the new metric.
PiperOrigin-RevId: 401637359
|
|
PiperOrigin-RevId: 401624134
|
|
Rather than boiling down to an integer eagerly, do it as late as possible.
PiperOrigin-RevId: 401599308
|
|
...all connections should be tracked by ConnTrack, so create a no-op
connection entry on the first hook into IPTables (Prerouting or
Output) and let NAT targets modify the connection entry if they
need to instead of letting the NAT target create their own connection
entry.
This also prepares for "twice-NAT" where a packet may have both DNAT and
SNAT performed on it (which requires the ability to update ConnTrack
entries).
Updates #5696.
PiperOrigin-RevId: 401360377
|
|
PiperOrigin-RevId: 401296116
|
|
PiperOrigin-RevId: 400258924
|
|
For multithreads processes, it is hard to read logs without knowing task pids.
And let's print a decimal return codeo for syscalls. A hex return code are
usefull for system calls that return addresses. For other syscalls, the decimal
form is more readable.
PiperOrigin-RevId: 400035449
|
|
PiperOrigin-RevId: 399560357
|
|
Support mq_open and mq_unlink, and enable syscall tests.
Updates #136
|
|
Remove implements the behaviour of mq_unlink(2).
Updates #136
|
|
Updates #136
|
|
FindOrCreate implements the behaviour of mq_open(2).
Updates #136
|
|
Update RegistryImpl functions to return file descriptions, instead of
queues, and use Views in queue inodes.
Updates #136
|
|
View makes it easier to handle O_RDONLY, O_WRONLY, and ORDWR options in
mq_open(2).
Updates #136
|
|
Updates #136
|
|
Move root dentry and filesystem creation from GetFilesystem to
NewRegistryImpl, create IPCNamespace.InitPosixQueues to create a new
mqueue filesystem for each ipc namespace, and update GetFilesystem to
retreive fs and root dentry from IPCNamespace and return them.
Updates #136
|
|
PiperOrigin-RevId: 399295737
|
|
|
|
Task.netns can be accessed atomically, so Task.mu isn't needed to access it.
PiperOrigin-RevId: 398773947
|
|
PiperOrigin-RevId: 398763161
|
|
This allows to avoind unnecessary lock-ordering dependencies on task.mu.
|
|
Create the /sys/fs/cgroup directory when cgroups are available. This
creates the empty directory to serve as the mountpoint, actually
mounting cgroups is left to the launcher/userspace. This is consistent
with Linux behaviour.
Without this mountpoint, getdents(2) on /sys/fs indicates an empty
directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The
launcher can't create the mountpoint directory since sysfs doesn't
support mkdir.
PiperOrigin-RevId: 398596698
|
|
PiperOrigin-RevId: 398572735
|
|
...instead of an address.
This allows a later change to more precisely select an address
based on the NAT type (source vs. destination NAT).
PiperOrigin-RevId: 398559901
|
|
Call sites for the two checkpoints aren't added yet.
PiperOrigin-RevId: 398375903
|
|
Signed-off-by: Andrei Vagin <avagin@google.com>
|
|
We install seccomp rules so that the SIGSYS signal is generated for
each mmap system call. Then our signal handler executes the real mmap
syscall and if a new regions is created, it maps it to the guest.
Signed-off-by: Andrei Vagin <avagin@google.com>
|
|
|