Age | Commit message (Collapse) | Author |
|
|
|
This gets us closer to passing the iptables tests and opens up iptables
so it can be worked on by multiple people.
A few restrictions are enforced for security (i.e. we don't want to let
users write a bunch of iptables rules and then just not enforce them):
- Only the filter table is writable.
- Only ACCEPT rules with no matching criteria can be added.
|
|
PiperOrigin-RevId: 288342928
|
|
Add checks for input arguments, file type, permissions, etc. that match
the Linux implementation. A call to get/setxattr that passes all the
checks will still currently return EOPNOTSUPP. Actual support will be
added in following commits.
Only allow user.* extended attributes for the time being.
PiperOrigin-RevId: 285835159
|
|
PiperOrigin-RevId: 285231002
|
|
PiperOrigin-RevId: 284892289
|
|
Some versions of glibc will convert F_GETOWN fcntl(2) calls into F_GETOWN_EX in
some cases.
PiperOrigin-RevId: 284089373
|
|
This involves allowing getsockopt/setsockopt for the corresponding socket
options, as well as allowing hostinet to process control messages received from
the actual recvmsg syscall.
PiperOrigin-RevId: 282851425
|
|
In glibc, some structures are defined differently on different
platforms.
Such as: C.struct_stat
Signed-off-by: Bin Lu <bin.lu@arm.com>
|
|
PiperOrigin-RevId: 276380008
|
|
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.
Binding isn't supported yet.
PiperOrigin-RevId: 275909366
|
|
PiperOrigin-RevId: 275139066
|
|
This proc file reports routing information to applications inside the
container.
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
|
|
In Linux (include/linux/types.h), mode_t is an unsigned short.
PiperOrigin-RevId: 272956350
|
|
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.
In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context. Probably not
worthwhile fixing immediately.
PiperOrigin-RevId: 269901749
|
|
They are no-ops, so the standard rule works fine.
PiperOrigin-RevId: 268776264
|
|
Document limitation of no reasonable implementation for RWF_HIPRI
flag (High Priority Read/Write for block-based file systems).
PiperOrigin-RevId: 264237589
|
|
PiperOrigin-RevId: 263637194
|
|
PiperOrigin-RevId: 263040624
|
|
Now if a process sends an unsupported netlink requests,
an error is returned from the send system call.
The linux kernel works differently in this case. It returns errors in the
nlmsgerr netlink message.
Reported-by: syzbot+571d99510c6f935202da@syzkaller.appspotmail.com
PiperOrigin-RevId: 262690453
|
|
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I1dbd23bb240cca71d0cc30fc75ca5be28cb4c37c
PiperOrigin-RevId: 262619519
|
|
Previously we were representing socket addresses as an interface{},
which allowed any type which could be binary.Marshal()ed to be used as
a socket address. This is fine when the address is passed to userspace
via the linux ABI, but is problematic when used from within the sentry
such as by networking procfs files.
PiperOrigin-RevId: 262460640
|
|
Implements support for RTM_GETROUTE requests for netlink sockets.
Fixes #507
PiperOrigin-RevId: 261051045
|
|
PiperOrigin-RevId: 259437060
|
|
PiperOrigin-RevId: 259376740
|
|
Major differences from the current ("v1") sentry VFS:
- Path resolution is Filesystem-driven (FilesystemImpl methods call
vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a
Dirent tree and calls fs.InodeOperations methods to populate it). This
drastically improves performance, primarily by reducing overhead from
inefficient synchronization and indirection. It also makes it possible
to implement remote filesystem protocols that translate FS system calls
into single RPCs, rather than having to make (at least) one RPC per path
component, significantly reducing the latency of remote filesystems
(especially during cold starts and for uncacheable shared filesystems).
- Mounts are correctly represented as a separate check based on
contextual state (current mount) rather than direct replacement in a
fs.Dirent tree. This makes it possible to support (non-recursive) bind
mounts and mount namespaces.
Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem
that exists primarily to demonstrate intended filesystem implementation
patterns and for benchmarking:
BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op
BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op
BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op
BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op
BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op
BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op
BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op
BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op
BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op
BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op
BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op
BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op
BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op
BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op
BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op
BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op
BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op
BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op
BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op
BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op
BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op
BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op
BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op
BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op
From this we can infer that, on this machine:
- Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1.
- Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a
difference of about 6x.
- The cost of crossing a mount boundary is about 80ns in VFS2
(MemfsMountStat/1 does approximately the same amount of work as
MemfsStat/2, except that it also crosses a mount boundary). This is an
inescapable cost of the separate mount lookup needed to support bind
mounts and mount namespaces.
PiperOrigin-RevId: 258853946
|
|
PiperOrigin-RevId: 258657913
|
|
PiperOrigin-RevId: 258645957
|
|
PiperOrigin-RevId: 257293198
|
|
These are unfortunately unused and unmaintained. They can be brought back in
the future if need requires it.
PiperOrigin-RevId: 255697132
|
|
Addresses obvious typos, in the documentation only.
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65
PiperOrigin-RevId: 255477779
|
|
We don't have the plumbing for btime yet, so that field is left off. The
returned mask indicates that btime is absent.
Fixes #343
PiperOrigin-RevId: 254575752
|
|
PiperOrigin-RevId: 254428866
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
Adds simple introspection for syscall compatibility information to Linux/AMD64.
Syscalls registered in the syscall table now have associated metadata like
name, support level, notes, and URLs to relevant issues.
Syscall information can be exported as a table, JSON, or CSV using the new
'runsc help syscalls' command. Users can use this info to debug and get info
on the compatibility of the version of runsc they are running or to generate
documentation.
PiperOrigin-RevId: 252558304
|
|
SockType isn't specific to unix domain sockets, and the current
definition basically mirrors the linux ABI's definition.
PiperOrigin-RevId: 251956740
|
|
We still only advertise a single NUMA node, and ignore mempolicy
accordingly, but mbind() at least now succeeds and has effects reflected
by get_mempolicy().
Also fix handling of nodemasks: round sizes to unsigned long (as
documented and done by Linux), and zero trailing bits when copying them
out.
PiperOrigin-RevId: 251950859
|
|
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).
For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.
PiperOrigin-RevId: 251934850
|
|
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.
Lack of support for set-uid binaries or fs creds simplifies things a
bit.
As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.
PiperOrigin-RevId: 251712714
|
|
PiperOrigin-RevId: 249888234
Change-Id: Icfef32c3ed34809c34100c07e93e9581c786776e
|
|
Updates #214
PiperOrigin-RevId: 249483756
Change-Id: I0d3cf4112bed75a863d5eb08c2063fbc506cd875
|
|
Pipe internals are made more efficient by avoiding garbage collection.
A pool is now used that can be shared by all pipes, and buffers are
chained via an intrusive list. The documentation for pipe structures
and methods is also simplified and clarified.
The pipe tests are now parameterized, so that they are run on all
different variants (named pipes, small buffers, default buffers).
The pipe buffer sizes are exposed by fcntl, which is now supported
by this change. A size change test has been added to the suite.
These new tests uncovered a bug regarding the semantics of open
named pipes with O_NONBLOCK, which is also fixed by this CL. This
fix also addresses the lack of the O_LARGEFILE flag for named pipes.
PiperOrigin-RevId: 249375888
Change-Id: I48e61e9c868aedb0cadda2dff33f09a560dee773
|
|
This does not actually implement an efficient splice or sendfile. Rather, it
adds a generic plumbing to the file internals so that this can be added. All
file implementations use the stub fileutil.NoSplice implementation, which
causes sendfile and splice to fall back to an internal copy.
A basic splice system call interface is added, along with a test.
PiperOrigin-RevId: 249335960
Change-Id: Ic5568be2af0a505c19e7aec66d5af2480ab0939b
|
|
Closes #225
PiperOrigin-RevId: 247508791
Change-Id: I04f47cf2770b30043e5a272aba4ba6e11d0476cc
|
|
Change-Id: Ifbd2abf63ea8062a89b83e948d3e9735480d8216
PiperOrigin-RevId: 246559904
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER
when attempting to set unsupported values.
PiperOrigin-RevId: 244229675
Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
|
|
The previous implementation revolved around runes instead of bytes, which caused
weird behavior when converting between the two. For example, peekRune would read
the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an
alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for
?. However, peekRune also returned the length of the byte (1). When calling
utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte
character ?.
tl;dr: I apparently didn't understand runes when I wrote this.
PiperOrigin-RevId: 241789081
Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727
|
|
Memfds are simply anonymous tmpfs files with no associated
mounts. Also implementing file seals, which Linux only implements for
memfds at the moment.
PiperOrigin-RevId: 240450031
Change-Id: I31de78b950101ae8d7a13d0e93fe52d98ea06f2f
|
|
It is Implemented without the priority inheritance part given
that gVisor defers scheduling decisions to Go runtime and doesn't
have control over it.
PiperOrigin-RevId: 236989545
Change-Id: I714c8ca0798743ecf3167b14ffeb5cd834302560
|