Age | Commit message (Collapse) | Author |
|
PacketMMap mode has issues due to a kernel bug. This change
reverts us to using recvmmsg instead of a shared ring buffer to
dispatch inbound packets. This will reduce performance but should
be more stable under heavy load till PacketMMap is updated to
use TPacketv3.
See #210 for details.
Perf difference between recvmmsg vs packetmmap.
RecvMMsg :
iperf3 -c 172.17.0.2
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 43478 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 778 MBytes 6.53 Gbits/sec 4349 188 KBytes
[ 4] 1.00-2.00 sec 786 MBytes 6.59 Gbits/sec 4395 212 KBytes
[ 4] 2.00-3.00 sec 756 MBytes 6.34 Gbits/sec 3655 161 KBytes
[ 4] 3.00-4.00 sec 782 MBytes 6.56 Gbits/sec 4419 175 KBytes
[ 4] 4.00-5.00 sec 755 MBytes 6.34 Gbits/sec 4317 187 KBytes
[ 4] 5.00-6.00 sec 774 MBytes 6.49 Gbits/sec 4002 173 KBytes
[ 4] 6.00-7.00 sec 737 MBytes 6.18 Gbits/sec 3904 191 KBytes
[ 4] 7.00-8.00 sec 530 MBytes 4.44 Gbits/sec 3318 189 KBytes
[ 4] 8.00-9.00 sec 487 MBytes 4.09 Gbits/sec 2627 188 KBytes
[ 4] 9.00-10.00 sec 770 MBytes 6.46 Gbits/sec 4221 170 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec 39207 sender
[ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec receiver
iperf Done.
PacketMMap:
bhaskerh@gvisor-bench:~/tensorflow$ iperf3 -c 172.17.0.2
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 43496 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 657 MBytes 5.51 Gbits/sec 0 1.01 MBytes
[ 4] 1.00-2.00 sec 1021 MBytes 8.56 Gbits/sec 0 1.01 MBytes
[ 4] 2.00-3.00 sec 1.21 GBytes 10.4 Gbits/sec 45 1.01 MBytes
[ 4] 3.00-4.00 sec 1018 MBytes 8.54 Gbits/sec 15 1.01 MBytes
[ 4] 4.00-5.00 sec 1.28 GBytes 11.0 Gbits/sec 45 1.01 MBytes
[ 4] 5.00-6.00 sec 1.38 GBytes 11.9 Gbits/sec 0 1.01 MBytes
[ 4] 6.00-7.00 sec 1.34 GBytes 11.5 Gbits/sec 45 856 KBytes
[ 4] 7.00-8.00 sec 1.23 GBytes 10.5 Gbits/sec 0 901 KBytes
[ 4] 8.00-9.00 sec 1010 MBytes 8.48 Gbits/sec 0 923 KBytes
[ 4] 9.00-10.00 sec 1.39 GBytes 11.9 Gbits/sec 0 960 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec 150 sender
[ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec receiver
Updates #210
PiperOrigin-RevId: 244968438
Change-Id: Id461b5cbff2dea6fa55cfc108ea246d8f83da20b
|
|
This CL fixes the following bugs:
- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.
- Increments ringOffsets for frames that are truncated (i.e status is
tpStatusCopy)
- Does not ignore frames with tpStatusLost bit set as they are valid frames and
only indicate that there some frames were lost before this one and metrics can
be retrieved with a getsockopt call.
- Adds checks to make sure blockSize is a multiple of page size. This is
required as the kernel allocates in pages per block and rejects sizes that are
not page aligned with an EINVAL.
Updates #210
PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
|
|
PiperOrigin-RevId: 244959388
Change-Id: Ifb08678d975cf9f694a21012f9a1e9f45b1f197c
|
|
The caller must call Readdir() at least twice to detect
EOF. The old code was always restarting the directory
search and then skipping elements already seen, effectively
doubling the cost to read a directory. The code now
remembers the last offset and doesn't reposition the cursor
if next request comes at the same offset.
PiperOrigin-RevId: 244957816
Change-Id: If21a8dc68b76614adbcf4301439adfda40f2643f
|
|
p9.messageByType was taking 7% of p9.recv before, spending time
with reflection and map lookup. Now it's reduced to 1%.
PiperOrigin-RevId: 244947313
Change-Id: I42813f920557b7656f8b29157eb32acd79e11fa5
|
|
os.NewFile() accounts for 38% of CPU time in localFile.Walk().
This change switchs to use fd.FD which is much cheaper to create.
Now, fd.New() in localFile.Walk() accounts for only 4%.
PiperOrigin-RevId: 244944983
Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
|
|
Create, Start, and Destroy were racing to create and destroy the
metadata directory of containers.
This is a re-upload of
https://gvisor-review.googlesource.com/c/gvisor/+/16260, but with the
correct account.
Change-Id: I16b7a9d0971f0df873e7f4145e6ac8f72730a4f1
PiperOrigin-RevId: 244892991
|
|
For a symbol link to some directory, eg.
`/tmp/symlink -> /tmp/dir`
`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.
Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.
Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
|
|
PiperOrigin-RevId: 244773890
Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa
|
|
PiperOrigin-RevId: 244773836
Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf
|
|
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.
Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.
Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
|
|
The MSG_TRUNC flag is set in the msghdr when a message is truncated.
Fixes google/gvisor#200
PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
|
|
Add a UDP forwarder for intercepting and forwarding UDP sessions.
Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9
PiperOrigin-RevId: 244293576
|
|
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I20103cd6d193431ab7e8120005da1f567b9bc2eb
PiperOrigin-RevId: 244280119
|
|
I0410 15:40:38.854295 3776 x:0] [ 1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1)
I0410 15:40:38.854348 3776 x:0] [ 1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s)
PiperOrigin-RevId: 244269879
Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
|
|
Inode ids are only guaranteed to be stable across save/restore if the file is
held open. This CL fixes a simple stat test to allow it to compare symlink and
target by inode id, as long as the link target is held open.
PiperOrigin-RevId: 244238343
Change-Id: I74c5115915b1cc032a4c16515a056a480f218f00
|
|
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER
when attempting to set unsupported values.
PiperOrigin-RevId: 244229675
Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
|
|
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.
This means that we can't use endpoint.netProto to set gso.L3HdrLen.
PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
|
|
The existing logic attempting to do this is incorrect. Unary ^ has
higher precedence than &^, so mask always has UnblockableSignals
cleared, allowing dequeueSignalLocked to dequeue unblockable signals
(which allows userspace to ignore them).
Switch the logic so that unblockable signals are always masked.
PiperOrigin-RevId: 244058487
Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
|
|
FD limit and file size limit is read from the host, instead
of using hard-coded defaults, given that they effect the sandbox
process. Also limit the direct cache to use no more than half
if the available FDs.
PiperOrigin-RevId: 244050323
Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
|
|
Current, doPoll copies the user struct pollfd array into a
[]syscalls.PollFD, which contains internal kdefs.FD and
waiter.EventMask types. While these are currently binary-compatible with
the Linux versions, we generally discourage copying directly to internal
types (someone may inadvertantly change kdefs.FD to uint64).
Instead, copy directly to a []linux.PollFD, which will certainly be
binary compatible. Most of syscalls/polling.go is included directly into
syscalls/linux/sys_poll.go, as it can then operate directly on
linux.PollFD. The additional syscalls.PollFD type is providing little
value.
I've also added explicit conversion functions for waiter.EventMask,
which creates the possibility of a different binary format.
PiperOrigin-RevId: 244042947
Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
|
|
PiperOrigin-RevId: 244036529
Change-Id: I280f9632a65d2e40d844e0d5ec3a101d808434ee
|
|
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
|
|
The file layout in the bucket is changed a little bit recently to support both v1 shim and v2 shim.
PiperOrigin-RevId: 243682904
Change-Id: Ic1373c6dc088ef41f829e7ce3ea3762e1e2b0292
|
|
It provides an easy way to run commands to quickly test gVisor.
By default it maps the host root as the container root with a
writable overlay on top (so the host root is not modified).
Example:
sudo runsc do ls -lh --color
sudo runsc do ~/src/test/my-test.sh
PiperOrigin-RevId: 243178711
Change-Id: I05f3d6ce253fe4b5f1362f4a07b5387f6ddb5dd9
|
|
Normal files display their path in the current mount namespace:
I0410 10:57:54.964196 216336 x:0] [ 1] ls X read(0x3 /proc/filesystems, 0x55cee3bdb2c0 "nodev\t9p\nnodev\tdevpts \nnodev\tdevtmpfs\nnodev\tproc\nnodev\tramdiskfs\nnodev\tsysfs\nnodev\ttmpfs\n", 0x1000) = 0x58 (24.462?s)
AT_FDCWD includes the CWD:
I0411 12:58:48.278427 1526 x:0] [ 1] stat_test E newfstatat(AT_FDCWD /home/prattmic, 0x55ea719b564e /proc/self, 0x7ef5cefc2be8, 0x0)
Sockets (and other non-vfs files) display an inode number (like
/proc/PID/fd):
I0410 10:54:38.909123 207684 x:0] [ 1] nc E bind(0x3 socket:[1], 0x55b5a1652040 {Family: AF_INET, Addr: , Port: 8080}, 0x10)
I also fixed a few syscall args that should be Path.
PiperOrigin-RevId: 243169025
Change-Id: Ic7dda6a82ae27062fe2a4a371557acfd6a21fa2a
|
|
Change-Id: I93a78a6b2bb2eaa69046c6cfecee2e4cfcf20e44
PiperOrigin-RevId: 243140359
|
|
Change-Id: Ie6b73ac729c8c85b1229e09da5b113be9780fa95
PiperOrigin-RevId: 243131814
|
|
PiperOrigin-RevId: 243018347
Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37
|
|
PiperOrigin-RevId: 242978508
Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
|
|
RootFromContext can return a dirent with reference taken, or nil. We must call
DecRef if (and only if) a real dirent is returned.
PiperOrigin-RevId: 242965515
Change-Id: Ie2b7b4cb19ee09b6ccf788b71f3fd7efcdf35a11
|
|
Even superuser cannot raise RLIMIT_NOFILE above /proc/sys/fs/nr_open, so
start the test by lowering the limits before raising.
Change-Id: Ied6021c64178a6cb9098088a1a3384db523a226f
PiperOrigin-RevId: 242965249
|
|
add renameMu.Lock when oldParent == newParent
in order to avoid data race in following report:
WARNING: DATA RACE
Read at 0x00c000ba2160 by goroutine 405:
gvisor.googlesource.com/gvisor/pkg/sentry/fs.(*Dirent).fullName()
pkg/sentry/fs/dirent.go:246 +0x6c
gvisor.googlesource.com/gvisor/pkg/sentry/fs.(*Dirent).FullName()
pkg/sentry/fs/dirent.go:356 +0x8b
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*FDMap).String()
pkg/sentry/kernel/fd_map.go:135 +0x1e0
fmt.(*pp).handleMethods()
GOROOT/src/fmt/print.go:603 +0x404
fmt.(*pp).printArg()
GOROOT/src/fmt/print.go:686 +0x255
fmt.(*pp).doPrintf()
GOROOT/src/fmt/print.go:1003 +0x33f
fmt.Fprintf()
GOROOT/src/fmt/print.go:188 +0x7f
gvisor.googlesource.com/gvisor/pkg/log.(*Writer).Emit()
pkg/log/log.go:121 +0x89
gvisor.googlesource.com/gvisor/pkg/log.GoogleEmitter.Emit()
pkg/log/glog.go:162 +0x1acc
gvisor.googlesource.com/gvisor/pkg/log.(*GoogleEmitter).Emit()
<autogenerated>:1 +0xe1
gvisor.googlesource.com/gvisor/pkg/log.(*BasicLogger).Debugf()
pkg/log/log.go:177 +0x111
gvisor.googlesource.com/gvisor/pkg/log.Debugf()
pkg/log/log.go:235 +0x66
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Debugf()
pkg/sentry/kernel/task_log.go:48 +0xfe
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).DebugDumpState()
pkg/sentry/kernel/task_log.go:66 +0x11f
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runApp).execute()
pkg/sentry/kernel/task_run.go:272 +0xc80
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run()
pkg/sentry/kernel/task_run.go:91 +0x24b
Previous write at 0x00c000ba2160 by goroutine 423:
gvisor.googlesource.com/gvisor/pkg/sentry/fs.Rename()
pkg/sentry/fs/dirent.go:1628 +0x61f
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1.1()
pkg/sentry/syscalls/linux/sys_file.go:1864 +0x1f8
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt( gvisor.googlesource.com/g/linux/sys_file.go:51 +0x20f
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1()
pkg/sentry/syscalls/linux/sys_file.go:1852 +0x218
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt()
pkg/sentry/syscalls/linux/sys_file.go:51 +0x20f
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt()
pkg/sentry/syscalls/linux/sys_file.go:1840 +0x180
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.Rename()
pkg/sentry/syscalls/linux/sys_file.go:1873 +0x60
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).executeSyscall()
pkg/sentry/kernel/task_syscall.go:165 +0x17a
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscallInvoke()
pkg/sentry/kernel/task_syscall.go:283 +0xb4
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscallEnter()
pkg/sentry/kernel/task_syscall.go:244 +0x10c
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscall()
pkg/sentry/kernel/task_syscall.go:219 +0x1e3
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runApp).execute()
pkg/sentry/kernel/task_run.go:215 +0x15a9
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run()
pkg/sentry/kernel/task_run.go:91 +0x24b
Reported-by: syzbot+e1babbf756fab380dfff@syzkaller.appspotmail.com
Change-Id: Icd2620bb3ea28b817bf0672d454a22b9d8ee189a
PiperOrigin-RevId: 242938741
|
|
PiperOrigin-RevId: 242919489
Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
|
|
DirentCache is already a savable type, and it ensures that it is empty at the
point of Save. There is no reason not to save it along with the MountSource.
This did uncover an issue where not all MountSources were properly flushed
before Save. If a mount point has an open file and is then unmounted, we save
the MountSource without flushing it first. This CL also fixes that by flushing
all MountSources for all open FDs on Save.
PiperOrigin-RevId: 242906637
Change-Id: I3acd9d52b6ce6b8c989f835a408016cb3e67018f
|
|
This also applies these permissions to other static proc files.
Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d
PiperOrigin-RevId: 242898575
|
|
From a recent test failure:
"State:\tD (disk sleep)\n"
"disk sleep" does not match \w+. We need to allow spaces.
PiperOrigin-RevId: 242762469
Change-Id: Ic8d05a16669412a72c1e76b498373e5b22fe64c4
|
|
From sendfile spec and also the linux kernel code, we should
limit the count arg to 'MAX_RW_COUNT'. This patch export
'MAX_RW_COUNT' in kernel pkg and use it in the implementation
of sendfile syscall.
Signed-off-by: Li Qiang <pangpei.lq@antfin.com>
Change-Id: I1086fec0685587116984555abd22b07ac233fbd2
PiperOrigin-RevId: 242745831
|
|
Otherwise, we will not have capabilities in the user namespace.
And this patch adds the noexec option for mounts.
https://github.com/google/gvisor/issues/145
PiperOrigin-RevId: 242706519
Change-Id: I1b78b77d6969bd18038c71616e8eb7111b71207c
|
|
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
|
|
PiperOrigin-RevId: 242690968
Change-Id: I1ac2248b5ab3bcd95beed52ecddbb9f34eeb3775
|
|
PiperOrigin-RevId: 242647530
Change-Id: I1bf9ac1d664f452dc47ca670d408a73538cb482f
|
|
PiperOrigin-RevId: 242573252
Change-Id: Ibb4c6bfae2c2e322bf1cec23181a0ab663d8530a
|
|
Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks.
PiperOrigin-RevId: 242562428
Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90
|
|
PiperOrigin-RevId: 242531141
Change-Id: I2a3bd815bda09f392f511f47120d5d9e6e86a40d
|
|
We construct a ramfs tree of "scaffolding" directories for all mount points, so
that a directory exists that each mount point can be mounted over.
We were creating these directories without write permissions, which meant that
they were not wribable even when underlayed under a writable filesystem. They
should be writable.
PiperOrigin-RevId: 242507789
Change-Id: I86645e35417560d862442ff5962da211dbe9b731
|
|
PiperOrigin-RevId: 242493066
Change-Id: I2b2b590799d208895c5c16606e4f854dfd112dba
|
|
PiperOrigin-RevId: 242226319
Change-Id: Iefc78656841315f6b7d48bd85db451486850264d
|
|
Strings are a better fit for this usage because they are immutable in Go, and
can contain arbitrary bytes. It also allows us to avoid casting bytes to string
(and the associated allocation) in the hot path when checking for overlay
whiteouts.
PiperOrigin-RevId: 242208856
Change-Id: I7699ae6302492eca71787dd0b72e0a5a217a3db2
|
|
This CL merges all RBE-specific configuration from .bazelrc_rbe into .bazelrc
so that it will be picked up by default by users running bazel.
It also checks in a bazelrc from the upstream bazel-toolchains repository, and
imports that into our repo-specific .bazelrc. This makes it easier to maintain
and update the bazelrc going forward.
Documentation was added to the README.
PiperOrigin-RevId: 242208733
Change-Id: Iea32de9be85b024bd74f88909b56b2a8ab34851a
|