summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2021-10-11Merge pull request #6428 from dillanzhou:fix_epoll_vfs2gVisor bot
PiperOrigin-RevId: 402323053
2021-10-08Remove ring0 floating point save/load functions on amd64.Jamie Liu
ring0.Save/LoadFloatingPoint() are only usable if the caller can ensure that Go will not clobber floating point registers before/after calling them respectively. Due to regabig in Go 1.17, this is no longer the case; regabig (among other things) maintains a zeroed XMM15 during ABIInternal execution, including by zeroing it after ABI0-to-ABIInternal transitions. In ring0.sysenter/exception, this happens in ring0.kernelSyscall/kernelException.abi0 respectively; in ring0.CPU.SwitchToUser, this happens after returning from ring0.sysret/iret.abi0. Delete these functions and do floating point save/load in assembly. While arm64 doesn't appear to be immediately affected (so this CL permits us to resume usage of Go 1.17), its use of Save/LoadFloatingPoint() still seems to be incorrect for the same fundamental reason (Go code can't sanely assume what registers the Go compiler will or won't use) and should be fixed eventually. PiperOrigin-RevId: 401895658
2021-10-08Remove redundant slice copy in lisafs gofer client.Ayush Ranjan
listXattr() was doing redundant work. Remove it. PiperOrigin-RevId: 401871315
2021-10-08Disallow "trusted" namespace xattr in VFS2 gofer client.Ayush Ranjan
Allowing this namespace makes way for a lot of GetXattr RPCs to the gofer process when the gofer filesystem is the lower layer of an overlay. The overlay filesystem aggressively queries for "trusted.overlay.opaque" which in practice is never found in the lower layer gofer. But leads to a lot of wasted work. A consequence is that mutable gofer upper layer is not supported anymore but that is still consistent with VFS1. We can revisit when need arises. PiperOrigin-RevId: 401860585
2021-10-07add convenient wrapper for eventfdKevin Krakauer
The same create/write/read pattern is copied around several places. It's easier to understand in a package with names and comments, and we can reuse the smart blocking code in package rawfile. PiperOrigin-RevId: 401647108
2021-10-07Add a new metric to detect the number of spurious loss recoveries.Nayana Bidari
- Implements RFC 3522 (Eifel detection algorithm) to detect if the connection entered loss recovery unnecessarily. - Added a new metric to count the total number of spurious loss recoveries. - Added tests to verify the new metric. PiperOrigin-RevId: 401637359
2021-10-07tests: use a proper path to the kvm deviceAndrei Vagin
PiperOrigin-RevId: 401624134
2021-10-07Store timestamps as time.TimeTamir Duberstein
Rather than boiling down to an integer eagerly, do it as late as possible. PiperOrigin-RevId: 401599308
2021-10-06Create null entry connection on first IPTables hookGhanan Gowripalan
...all connections should be tracked by ConnTrack, so create a no-op connection entry on the first hook into IPTables (Prerouting or Output) and let NAT targets modify the connection entry if they need to instead of letting the NAT target create their own connection entry. This also prepares for "twice-NAT" where a packet may have both DNAT and SNAT performed on it (which requires the ability to update ConnTrack entries). Updates #5696. PiperOrigin-RevId: 401360377
2021-10-06Add global lisafs kernel flag.Ayush Ranjan
PiperOrigin-RevId: 401296116
2021-10-01Merge pull request #6551 from sudo-sturbia:msgqueue/procfsgVisor bot
PiperOrigin-RevId: 400258924
2021-09-30kernel: print PID in addition to TID in task log messagesAndrei Vagin
For multithreads processes, it is hard to read logs without knowing task pids. And let's print a decimal return codeo for syscalls. A hex return code are usefull for system calls that return addresses. For other syscalls, the decimal form is more readable. PiperOrigin-RevId: 400035449
2021-09-28Move `safecopy.ReplaceSignalHandler` into `sighandling` package.Etienne Perot
PiperOrigin-RevId: 399560357
2021-09-27Move `sighandling` package out of `sentry`.Etienne Perot
PiperOrigin-RevId: 399295737
2021-09-27Add procfs files for SysV message queues.Zyad A. Ali
2021-09-24Update the comment for Task.netnsAndrei Vagin
Task.netns can be accessed atomically, so Task.mu isn't needed to access it. PiperOrigin-RevId: 398773947
2021-09-24Merge pull request #6647 from avagin:task-netnsgVisor bot
PiperOrigin-RevId: 398763161
2021-09-23kernel: allow to access Task.netns without taking Task.muAndrei Vagin
This allows to avoind unnecessary lock-ordering dependencies on task.mu.
2021-09-23Create the cgroupfs mount point in sysfs.Rahat Mahmood
Create the /sys/fs/cgroup directory when cgroups are available. This creates the empty directory to serve as the mountpoint, actually mounting cgroups is left to the launcher/userspace. This is consistent with Linux behaviour. Without this mountpoint, getdents(2) on /sys/fs indicates an empty directory even if the launcher mounts cgroupfs at /sys/fs/cgroup. The launcher can't create the mountpoint directory since sysfs doesn't support mkdir. PiperOrigin-RevId: 398596698
2021-09-23Merge pull request #6573 from avagin:kvm-seccomp-mmapgVisor bot
PiperOrigin-RevId: 398572735
2021-09-23Pass AddressableEndpoint to IPTablesGhanan Gowripalan
...instead of an address. This allows a later change to more precisely select an address based on the NAT type (source vs. destination NAT). PiperOrigin-RevId: 398559901
2021-09-22Add Execve and ExitNotifyParent checkpoints.Jamie Liu
Call sites for the two checkpoints aren't added yet. PiperOrigin-RevId: 398375903
2021-09-22kvm: check that safecopy is handled correctly in the guest ring0Andrei Vagin
Signed-off-by: Andrei Vagin <avagin@google.com>
2021-09-22kvm: trap mmap syscalls to map new regions to the guestAndrei Vagin
We install seccomp rules so that the SIGSYS signal is generated for each mmap system call. Then our signal handler executes the real mmap syscall and if a new regions is created, it maps it to the guest. Signed-off-by: Andrei Vagin <avagin@google.com>
2021-09-22kvm/arm: calculate virtual-to-physical mappings only onceAndrei Vagin
2021-09-22kvm: fix tests on arm64AV
2021-09-21socket/unix: clean up socket queue after releasing a queue lockAndrei Vagin
A socket queue can contain sockets (others and this one). We have to avoid taking locks of the same class where it is possible. PiperOrigin-RevId: 398100744
2021-09-21[lisa] Implement lisafs protocol methods in VFS2 gofer client and fsgofer.Ayush Ranjan
Introduces RPC methods in lisafs. Makes that gofer client use lisafs RPCs instead of p9 when lisafs is enabled. Implements the handlers for those methods in fsgofer. Fixes #5465 PiperOrigin-RevId: 398080310
2021-09-20[lisa] Plumb lisafs through runsc.Ayush Ranjan
lisafs is only supported in VFS2. Added a runsc flag which enables lisafs. When the flag is enabled, the gofer process and the client communicate using lisafs protocol instead of 9P. Added a filesystem option in fsimpl/gofer which indicates if lisafs is being used. That will be used to gate lisafs on the gofer client. Note that this change does not make the gofer client use lisafs just yet. Updates #5465 PiperOrigin-RevId: 397917844
2021-09-20Internal change.gVisor bot
PiperOrigin-RevId: 397813331
2021-09-19Support IPV6_RECVPKTINFO on UDP socketsGhanan Gowripalan
PiperOrigin-RevId: 397631833
2021-09-17Allow rebinding packet socket protocolGhanan Gowripalan
...to change the network protocol a packet socket may receive packets from. This CL is a portion of an originally larger CL that was split with https://github.com/google/gvisor/commit/a8ad692fd36cbaf7f5a6b9af39d601053dbee338 being the dependent CL. That CL (accidentally) included the change in the endpoint's `afterLoad` method to take the required lock when accessing the endpoint's netProto field. That change should have been in this CL. The CL that made the change mentioned in the commit message is cl/396946187. PiperOrigin-RevId: 397412582
2021-09-16Limit most file mmaps to the range of an int64.Jamie Liu
In the general case, files may have offsets between MaxInt64 and MaxUint64; in Linux pgoff is consistently represented by an unsigned long, and in gVisor the offset types in memmap.MappableRange are uint64. However, regular file mmap is constrained to int64 offsets (on 64-bit systems) by mm/mmap.c:file_mmap_size_max() => MAX_LFS_FILESIZE == LLONG_MAX. As a related fix, check for chunkStart overflow in fsutil.HostFileMapper; chunk offsets are uint64s, but as noted above some file types may use uint64 offsets beyond MaxInt64. Reported-by: syzbot+71342a1585aed97ed9f7@syzkaller.appspotmail.com PiperOrigin-RevId: 397136751
2021-09-16Merge pull request #6579 from prattmic:runsc_do_profilegVisor bot
PiperOrigin-RevId: 397114051
2021-09-16runsc: add global profile collection flagsMichael Pratt
Add global flags -profile-{block,cpu,heap,mutex} and -trace which enable collection of the specified profile for the entire duration of a container execution. This provides a way to definitively start profiling before that application starts, rather than attempting to race with an out-of-band `runsc debug`. Note that only the main boot process is profiled. This exposed a bug in Task.traceExecEvent: a crash when tracing and -race are enabled. traceExecEvent is called off of the task goroutine, but uses the Task as a context, which is a violation of the Task contract. Switching to the AsyncContext fixes the issue. Fixes #220
2021-09-15Pass address properties in a single structTony Gong
Replaced the current AddAddressWithOptions method with AddAddressWithProperties which passes all address properties in a single AddressProperties type. More properties that need to be configured in the future are expected, so adding a type makes adding them easier. PiperOrigin-RevId: 396930729
2021-09-15[bind] Return EINVAL for under sized addressGhanan Gowripalan
...and EAFNOSUPPORT for unexpected address family. To comply with Linux. Updates #6021, #6575. PiperOrigin-RevId: 396893590
2021-09-14Fix race on msgrcv(MSG_COPY).Rahat Mahmood
Previously, we weren't making a copy when a sysv message queue was receiving a message with the MSG_COPY flag. This flag indicates the message being received should be left in the queue and a copy of the message should be returned to userspace. Without the copy, a racing process can modify the original message while it's being marshalled to user memory. Reported-by: syzbot+cb15e644698b20ff4e17@syzkaller.appspotmail.com PiperOrigin-RevId: 396712856
2021-09-10Typo fix.Etienne Perot
PiperOrigin-RevId: 396042572
2021-09-09Remove linux-compat loopback hacks from packet endpointGhanan Gowripalan
Previously, gVisor did not represent loopback devices as an ethernet device as Linux does. To maintain Linux API compatibility for packet sockets, a workaround was used to add an ethernet header if a link header was not already present in the packet buffer delivered to a packet endpoint. However, this workaround is a bug for non-ethernet based interfaces; not all links use an ethernet header (e.g. pure L3/TUN interfaces). As of 3b4bb947517d0d9010120aaa1c3989fd6abf278e, gVisor represents loopback devices as an ethernet-based device so this workaround can now be removed. BUG: https://fxbug.dev/81592 Updates #6530, #6531. PiperOrigin-RevId: 395819151
2021-09-07Stub some memory control files.Rahat Mahmood
PiperOrigin-RevId: 395338926
2021-09-03Add //pkg/sentry/seccheck.Jamie Liu
This defines common infrastructure for dynamically-configured security checks, including an example usage in the clone(2) path. PiperOrigin-RevId: 394797270
2021-09-02unix: avoid taking two endpoint locksAndrei Vagin
If we want to take two endpoint locks, we need to be sure that we always take them in the same order. Accept() locks the listening endpoint to work with acceptedChan and then it calls GetLocalAddress that locks an accepted endpoint. Actually, we can release the listening endpoint lock before calling GetLocalAddress. Reported-by: syzbot+f52bd603f51a4ae91054@syzkaller.appspotmail.com PiperOrigin-RevId: 394553823
2021-09-01Support sending with packet socketsGhanan Gowripalan
...through the loopback interface, only. This change only supports sending on packet sockets through the loopback interface as the loopback interface is the only interface used in packet socket syscall tests - the other link endpoints are not excercised with the existing test infrastructure. Support for sending on packet sockets through the other interfaces will be added as needed. BUG: https://fxbug.dev/81592 PiperOrigin-RevId: 394368899
2021-09-01Extract network datagram endpoint common facilitiesGhanan Gowripalan
...from the UDP endpoint. Datagram-based transport endpoints (e.g. UDP, RAW IP) can share a lot of their write path due to the datagram-based nature of these endpoints. Extract the common facilities from UDP so they can be shared with other transport endpoints (in a later change). Test: UDP syscall tests. PiperOrigin-RevId: 394347774
2021-09-01Don't use reflection in fpu.alignedBytes.Jamie Liu
reflect.ValueOf takes an interface{}, so when passed a slice the compiler emits a call to runtime.convTslice to heap-allocate a copy of the slice header. PiperOrigin-RevId: 394310052
2021-09-01Cache vdso.so's __kernel_rt_sigreturn location.Jamie Liu
PiperOrigin-RevId: 394300607
2021-09-01Propagate vfs.MkdirOptions.ForSyntheticMountpoint to overlay copy-up.Jamie Liu
PiperOrigin-RevId: 394296687
2021-09-01unix: handle a case when a buffer is overflowedAndrei Vagin
Reported-by: syzbot+1aab6800bd14829609b8@syzkaller.appspotmail.com PiperOrigin-RevId: 394279838
2021-08-30Narrow COW-break on thread stacks.Jamie Liu
PiperOrigin-RevId: 393841270