summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry/fs
AgeCommit message (Collapse)Author
2021-09-27Add procfs files for SysV message queues.Zyad A. Ali
2021-09-16Limit most file mmaps to the range of an int64.Jamie Liu
In the general case, files may have offsets between MaxInt64 and MaxUint64; in Linux pgoff is consistently represented by an unsigned long, and in gVisor the offset types in memmap.MappableRange are uint64. However, regular file mmap is constrained to int64 offsets (on 64-bit systems) by mm/mmap.c:file_mmap_size_max() => MAX_LFS_FILESIZE == LLONG_MAX. As a related fix, check for chunkStart overflow in fsutil.HostFileMapper; chunk offsets are uint64s, but as noted above some file types may use uint64 offsets beyond MaxInt64. Reported-by: syzbot+71342a1585aed97ed9f7@syzkaller.appspotmail.com PiperOrigin-RevId: 397136751
2021-09-10Typo fix.Etienne Perot
PiperOrigin-RevId: 396042572
2021-08-27Fix lock order violations: mm.mappingMu > Task.mu.Nicolas Lacasse
Document this ordering in mm/mm.go. PiperOrigin-RevId: 393413203
2021-08-13[syserror] Remove pkg syserror.Zach Koopmans
Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-12[syserror] Convert remaining syserror definitions to linuxerr.Zach Koopmans
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-07-20Add go:build directives as required by Go 1.17's gofmt.Jamie Liu
PiperOrigin-RevId: 385894869
2021-07-12[syserror] Update syserror to linuxerr for more errors.Zach Koopmans
Update the following from syserror to the linuxerr equivalent: EEXIST EFAULT ENOTDIR ENOTTY EOPNOTSUPP ERANGE ESRCH PiperOrigin-RevId: 384329869
2021-07-01Mix checklocks and atomic analyzers.Adin Scannell
This change makes the checklocks analyzer considerable more powerful, adding: * The ability to traverse complex structures, e.g. to have multiple nested fields as part of the annotation. * The ability to resolve simple anonymous functions and closures, and perform lock analysis across these invocations. This does not apply to closures that are passed elsewhere, since it is not possible to know the context in which they might be invoked. * The ability to annotate return values in addition to receivers and other parameters, with the same complex structures noted above. * Ignoring locking semantics for "fresh" objects, i.e. objects that are allocated in the local frame (typically a new-style function). * Sanity checking of locking state across block transitions and returns, to ensure that no unexpected locks are held. Note that initially, most of these findings are excluded by a comprehensive nogo.yaml. The findings that are included are fundamental lock violations. The changes here should be relatively low risk, minor refactorings to either include necessary annotations to simplify the code structure (in general removing closures in favor of methods) so that the analyzer can be easily track the lock state. This change additional includes two changes to nogo itself: * Sanity checking of all types to ensure that the binary and ast-derived types have a consistent objectpath, to prevent the bug above from occurring silently (and causing much confusion). This also requires a trick in order to ensure that serialized facts are consumable downstream. This can be removed with https://go-review.googlesource.com/c/tools/+/331789 merged. * A minor refactoring to isolation the objdump settings in its own package. This was originally used to implement the sanity check above, but this information is now being passed another way. The minor refactor is preserved however, since it cleans up the code slightly and is minimal risk. PiperOrigin-RevId: 382613300
2021-07-01[syserror] Update several syserror errors to linuxerr equivalents.Zach Koopmans
Update/remove most syserror errors to linuxerr equivalents. For list of removed errors, see //pkg/syserror/syserror.go. PiperOrigin-RevId: 382574582
2021-06-30[syserror] Update syserror to linuxerr for EACCES, EBADF, and EPERM.Zach Koopmans
Update all instances of the above errors to the faster linuxerr implementation. With the temporary linuxerr.Equals(), no logical changes are made. PiperOrigin-RevId: 382306655
2021-06-29[syserror] Change syserror to linuxerr for E2BIG, EADDRINUSE, and EINVALZach Koopmans
Remove three syserror entries duplicated in linuxerr. Because of the linuxerr.Equals method, this is a mere change of return values from syserror to linuxerr definitions. Done with only these three errnos as CLs removing all grow to a significantly large size. PiperOrigin-RevId: 382173835
2021-06-24Internal change.Jamie Liu
PiperOrigin-RevId: 381375705
2021-06-22[syserror] Add conversions to linuxerr with temporary Equals method.Zach Koopmans
Add Equals method to compare syserror and unix.Errno errors to linuxerr errors. This will facilitate removal of syserror definitions in a followup, and finding needed conversions from unix.Errno to linuxerr. PiperOrigin-RevId: 380909667
2021-06-10Add /proc/sys/vm/max_map_countFabricio Voznika
Set it to int32 max because gVisor doesn't have a limit. Fixes #2337 PiperOrigin-RevId: 378722230
2021-06-09Change TODO to NOTE.Nicolas Lacasse
It's in VFS1 code, so we probably will not do it. PiperOrigin-RevId: 378474174
2021-05-25setgid directories for VFS1 tmpfs, overlayfs, and goferfsKevin Krakauer
PiperOrigin-RevId: 375780659
2021-05-18Delete /cloud/gvisor/sandbox/sentry/gofer/opened_write_execute_file metricNayana Bidari
This metric is replaced by /cloud/gvisor/sandbox/sentry/suspicious_operations metric with field value opened_write_execute_file. PiperOrigin-RevId: 374509823
2021-05-14Add new metric for suspicious operations.Nayana Bidari
The new metric contains fields and will replace the below existing metric: - opened_write_execute_file PiperOrigin-RevId: 373884604
2021-05-14Don't read forwarding from netstack in sentryGhanan Gowripalan
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt: /proc/sys/net/ipv4/* Variables: ip_forward - BOOLEAN 0 - disabled (default) not 0 - enabled Forward Packets between interfaces. This variable is special, its change resets all configuration parameters to their default state (RFC1122 for hosts, RFC1812 for routers) /proc/sys/net/ipv4/ip_forward only does work when its value is changed and always returns the last written value. The last written value may not reflect the current state of the netstack (e.g. when `ip_forward` was written a value of "1" then disable forwarding on an interface) so there is no need for sentry to probe netstack to get the current forwarding state of interfaces. ``` ~$ cat /proc/sys/net/ipv4/ip_forward 0 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo sysctl -w net.ipv4.conf.wlp1s0.forwarding=0 net.ipv4.conf.wlp1s0.forwarding = 0 ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 1 ~$ sudo bash -c "echo 1 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a | grep ipv4 | grep forward net.ipv4.conf.all.forwarding = 1 net.ipv4.conf.default.forwarding = 1 net.ipv4.conf.eno1.forwarding = 1 net.ipv4.conf.lo.forwarding = 1 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ sudo bash -c "echo 0 > /proc/sys/net/ipv4/ip_forward" ~$ sudo sysctl -a | grep ipv4 | grep forward sysctl: unable to open directory "/proc/sys/fs/binfmt_misc/" net.ipv4.conf.all.forwarding = 0 net.ipv4.conf.default.forwarding = 0 net.ipv4.conf.eno1.forwarding = 0 net.ipv4.conf.lo.forwarding = 0 net.ipv4.conf.wlp1s0.forwarding = 0 net.ipv4.ip_forward = 0 net.ipv4.ip_forward_update_priority = 1 net.ipv4.ip_forward_use_pmtu = 0 ~$ cat /proc/sys/net/ipv4/ip_forward 0 ``` In the above example we can see that writing "1" to /proc/sys/net/ipv4/ip_forward configures the stack to be a router (all interfaces are configured to enable forwarding). However, if we manually update an interace (`wlp1s0`) to not forward packets, /proc/sys/net/ipv4/ip_forward continues to return the last written value of "1", even though not all interfaces will forward packets. Also note that writing the same value twice has no effect; work is performed iff the value changes. This change also removes the 'unset' state from sentry's ip forwarding data structures as an 'unset' ip forwarding value is the same as leaving forwarding disabled as the stack is always brought up with forwarding initially disabled; disabling forwarding on a newly created stack is a no-op. PiperOrigin-RevId: 373853106
2021-04-20Move SO_RCVBUF to socketops.Nayana Bidari
Fixes #2926, #674 PiperOrigin-RevId: 369457123
2021-03-29[syserror] Split usermem packageZach Koopmans
Split usermem package to help remove syserror dependency in go_marshal. New hostarch package contains code not dependent on syserror. PiperOrigin-RevId: 365651233
2021-03-25Use seqfile.SeqHandles correctly in VFS1 /proc/net/.Jamie Liu
Before this change: ``` $ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = EOF $ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = 256 ``` After this change: ``` $ docker run --runtime=runsc --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = 256 $ docker run --runtime=runsc-vfs2 --rm -it -v ~/tmp:/hosttmp ubuntu:focal /hosttmp/issue5732 --bytes1=128 --bytes2=1024 #1: read(128) = 128 #2: read(1024) = 256 ``` Fixes #5732 PiperOrigin-RevId: 365178386
2021-03-24Add POLLRDNORM/POLLWRNORM support.Bhasker Hariharan
On Linux these are meant to be equivalent to POLLIN/POLLOUT. Rather than hack these on in sys_poll etc it felt cleaner to just cleanup the call sites to notify for both events. This is what linux does as well. Fixes #5544 PiperOrigin-RevId: 364859977
2021-03-22Avoid calling sync on each write in writethrough mode.Nicolas Lacasse
PiperOrigin-RevId: 364370595
2021-03-15Make netstack (//pkg/tcpip) buildable for 32 bitKevin Krakauer
Doing so involved breaking dependencies between //pkg/tcpip and the rest of gVisor, which are discouraged anyways. Tested on the Go branch via: gvisor.dev/gvisor/pkg/tcpip/... Addresses #1446. PiperOrigin-RevId: 363081778
2021-03-08Implement /proc/sys/net/ipv4/ip_local_port_rangeKevin Krakauer
Speeds up the socket stress tests by a couple orders of magnitude. PiperOrigin-RevId: 361721050
2021-03-03Add checklocks analyzer.Bhasker Hariharan
This validates that struct fields if annotated with "// checklocks:mu" where "mu" is a mutex field in the same struct then access to the field is only done with "mu" locked. All types that are guarded by a mutex must be annotated with // +checklocks:<mutex field name> For more details please refer to README.md. PiperOrigin-RevId: 360729328
2021-03-03[op] Replace syscall package usage with golang.org/x/sys/unix in pkg/.Ayush Ranjan
The syscall package has been deprecated in favor of golang.org/x/sys. Note that syscall is still used in the following places: - pkg/sentry/socket/hostinet/stack.go: some netlink related functionalities are not yet available in golang.org/x/sys. - syscall.Stat_t is still used in some places because os.FileInfo.Sys() still returns it and not unix.Stat_t. Updates #214 PiperOrigin-RevId: 360701387
2021-02-17Check for directory emptiness in VFS1 overlay rmdir().Jamie Liu
Note that this CL reorders overlayEntry.copyMu before overlayEntry.dirCacheMu in the overlayFileOperations.IterateDir() => readdirEntries() path - but this lock ordering is already required by overlayRemove/Bind() => overlayEntry.markDirectoryDirty(), so this actually just fixes an inconsistency. PiperOrigin-RevId: 358047121
2021-02-11Assign controlling terminal when tty is opened and support NOCTTYKevin Krakauer
PiperOrigin-RevId: 357015186
2021-02-09Add support for setting SO_SNDBUF for unix domain sockets.Bhasker Hariharan
The limits for snd/rcv buffers for unix domain socket is controlled by the following sysctls on linux - net.core.rmem_default - net.core.rmem_max - net.core.wmem_default - net.core.wmem_max Today in gVisor we do not expose these sysctls but we do support setting the equivalent in netstack via stack.Options() method. But AF_UNIX sockets in gVisor can be used without netstack, with hostinet or even without any networking stack at all. Which means ideally these sysctls need to live as globals in gVisor. But rather than make this a big change for now we hardcode the limits in the AF_UNIX implementation itself (which in itself is better than where we were before) where it SO_SNDBUF was hardcoded to 16KiB. Further we bump the initial limit to a default value of 208 KiB to match linux from the paltry 16 KiB we use today. Updates #5132 PiperOrigin-RevId: 356665498
2021-02-09Collapse code that always returns errorTamir Duberstein
PiperOrigin-RevId: 356536548
2021-01-28Change tcpip.Error to an interfaceTamir Duberstein
This makes it possible to add data to types that implement tcpip.Error. ErrBadLinkEndpoint is removed as it is unused. PiperOrigin-RevId: 354437314
2021-01-26Implement error on pointersTamir Duberstein
This improves type-assertion safety. PiperOrigin-RevId: 353931228
2021-01-22Implement F_GETLK fcntl.Dean Deng
Fixes #5113. PiperOrigin-RevId: 353313374
2021-01-12Fix simple mistakes identified by goreportcard.Adin Scannell
These are primarily simplification and lint mistakes. However, minor fixes are also included and tests added where appropriate. PiperOrigin-RevId: 351425971
2020-12-23vfs1: don't allow to open socket filesAndrei Vagin
open() has to return ENXIO in this case. O_PATH isn't supported by vfs1. PiperOrigin-RevId: 348820478
2020-12-11Remove existing nogo exceptions.Adin Scannell
PiperOrigin-RevId: 347047550
2020-12-04Require sync.RWMutex to lock and unlock from the same goroutineMichael Pratt
This is the RWMutex equivalent to the preceding sync.Mutex CL. Updates #4804 PiperOrigin-RevId: 345681051
2020-12-02Add /proc/sys/kernel/sem.Jing Chen
PiperOrigin-RevId: 345178956
2020-11-19Require sync.Mutex to lock and unlock from the same goroutineMichael Pratt
We would like to track locks ordering to detect ordering violations. Detecting violations is much simpler if mutexes must be unlocked by the same goroutine that locked them. Thus, as a first step to tracking lock ordering, add this lock/unlock requirement to gVisor's sync.Mutex. This is more strict than the Go standard library's sync.Mutex, but initial testing indicates only a single lock that is used across goroutines. The new sync.CrossGoroutineMutex relaxes the requirement (but will not provide lock order checking). Due to the additional overhead, enforcement is only enabled with the "checklocks" build tag. Build with this tag using: bazel build --define=gotags=checklocks ... From my spot-checking, this has no changed inlining properties when disabled. Updates #4804 PiperOrigin-RevId: 343370200
2020-11-18Port filesystem metrics to VFS2.Jamie Liu
PiperOrigin-RevId: 343196927
2020-11-16Remove ARP address workaroundGhanan Gowripalan
- Make AddressableEndpoint optional for NetworkEndpoint. Not all NetworkEndpoints need to support addressing (e.g. ARP), so AddressableEndpoint should only be implemented for protocols that support addressing such as IPv4 and IPv6. With this change, tcpip.ErrNotSupported will be returned by the stack when attempting to modify addresses on a network endpoint that does not support addressing. Now that packets are fully handled at the network layer, and (with this change) addresses are optional for network endpoints, we no longer need the workaround for ARP where a fake ARP address was added to each NIC that performs ARP so that packets would be delivered to the ARP layer. PiperOrigin-RevId: 342722547
2020-11-12Fix misuses of kernel.Task as context.Context.Jamie Liu
kernel.Task can only be used as context.Context by that Task's task goroutine. This is violated in at least two places: - In any case where one thread accesses the /proc/[tid] of any other thread, passing the kernel.Task for [tid] as the context.Context is incorrect. - Task.rebuildTraceContext() may be called by Kernel.RebuildTraceContexts() outside the scope of any task goroutine. Fix these (as well as a data race on Task.traceContext discovered during the course of finding the latter). PiperOrigin-RevId: 342174404
2020-11-03Fix more nogo testsTing-Yu Wang
PiperOrigin-RevId: 340536306
2020-11-03Make pipe min/max sizes match linux.Nicolas Lacasse
The default pipe size already matched linux, and is unchanged. Furthermore `atomicIOBytes` is made a proper constant (as it is in Linux). We were plumbing usermem.PageSize everywhere, so this is no functional change. PiperOrigin-RevId: 340497006
2020-10-27Implement /proc/[pid]/memLennart
This PR implements /proc/[pid]/mem for `pkg/sentry/fs` (refer to #2716) and `pkg/sentry/fsimpl`. @majek COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/4060 from lnsp:proc-pid-mem 2caf9021254646f441be618a9bb5528610e44d43 PiperOrigin-RevId: 339369629
2020-10-23Support VFS2 save/restore.Jamie Liu
Inode number consistency checks are now skipped in save/restore tests for reasons described in greatest detail in StatTest.StateDoesntChangeAfterRename. They pass in VFS1 due to the bug described in new test case SimpleStatTest.DifferentFilesHaveDifferentDeviceInodeNumberPairs. Fixes #1663 PiperOrigin-RevId: 338776148
2020-10-13Don't read beyond EOF when inserting into sentry page cache.Jamie Liu
The sentry page cache stores file contents at page granularity; this is necessary for memory mappings. Thus file offset ranges passed to fsutil.FileRangeSet.Fill() must be page-aligned. If the read callback passed to Fill() returns (partial read, nil error) when reading up to EOF (which is the case for p9.ClientFile.ReadAt() since 9P's Rread cannot convey both a partial read and EOF), Fill() will re-invoke the read callback to try to read from EOF to the end of the containing page, which is harmless but needlessly expensive. Fix this by handling file size explicitly in fsutil.FileRangeSet.Fill(). PiperOrigin-RevId: 336934075