Age | Commit message (Collapse) | Author |
|
This feature allows MemoryFile to delay eviction of "optional"
allocations, such as unused cached file pages.
Note that this incidentally makes CachingInodeOperations writeback
asynchronous, in the sense that it doesn't occur until eviction; this is
necessary because between when a cached page becomes evictable and when
it's evicted, file writes (via CachingInodeOperations.Write) may dirty
the page.
As currently implemented, this feature won't meaningfully impact
steady-state memory usage or caching; the reclaimer goroutine will
schedule eviction as soon as it runs out of other work to do. Future CLs
increase caching by adding constraints on when eviction is scheduled.
PiperOrigin-RevId: 246014822
Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb
|
|
Updates google/gvisor#206
PiperOrigin-RevId: 245880573
Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
|
|
Test times out when it runs on a single core. Skip until the
bug in the Go runtime is fixed.
PiperOrigin-RevId: 245866466
Change-Id: Ic3e72131c27136d58b71f6b11acc78abf55895d4
|
|
Cache last used messages and reuse them for subsequent requests.
If more messages are needed, they are created outside the cache
on demand.
PiperOrigin-RevId: 245836910
Change-Id: Icf099ddff95df420db8e09f5cdd41dcdce406c61
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
|
|
PiperOrigin-RevId: 245810347
Change-Id: Ia5f4bb268a8207bd2a7d4c77c83cdfbe1483c64f
|
|
PiperOrigin-RevId: 245788366
Change-Id: I17bbecf8493132dbe95564c34c45b838194bfabb
|
|
Previously, createAt was eating all errors from FindInode except for EACCES and
proceeding with the creation. This is incorrect, as FindInode can return many
other errors (like ENAMETOOLONG) that should stop creation.
This CL changes createAt to return all errors encountered except for ENOENT,
which we can ignore because we are about to create the thing.
PiperOrigin-RevId: 245773222
Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
|
|
Add the CloseRead & CloseWrite methods that performs shutdown on the
corresponding Read & Write sides of a connection.
Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1
PiperOrigin-RevId: 245537950
|
|
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
|
|
Apparently some platforms don't have pSize < vSize.
Fixes #208
PiperOrigin-RevId: 245480998
Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
|
|
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.
Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.
Updates #211
iperf output w/ 4MB buffer.
iperf3 -c 172.17.0.2 -t 100
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.02 MBytes
[ 4] 1.00-2.00 sec 1.18 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 2.00-3.00 sec 965 MBytes 8.09 Gbits/sec 0 1.02 MBytes
[ 4] 3.00-4.00 sec 942 MBytes 7.90 Gbits/sec 0 1.02 MBytes
[ 4] 4.00-5.00 sec 952 MBytes 7.99 Gbits/sec 0 1.02 MBytes
[ 4] 5.00-6.00 sec 1.14 GBytes 9.81 Gbits/sec 0 1.02 MBytes
[ 4] 6.00-7.00 sec 1.13 GBytes 9.68 Gbits/sec 0 1.02 MBytes
[ 4] 7.00-8.00 sec 930 MBytes 7.80 Gbits/sec 0 1.02 MBytes
[ 4] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.02 MBytes
[ 4] 9.00-10.00 sec 938 MBytes 7.87 Gbits/sec 0 1.02 MBytes
[ 4] 10.00-11.00 sec 737 MBytes 6.18 Gbits/sec 0 1.02 MBytes
[ 4] 11.00-12.00 sec 1.16 GBytes 9.93 Gbits/sec 0 1.02 MBytes
[ 4] 12.00-13.00 sec 917 MBytes 7.69 Gbits/sec 0 1.02 MBytes
[ 4] 13.00-14.00 sec 1.19 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 14.00-15.00 sec 1.01 GBytes 8.70 Gbits/sec 0 1.02 MBytes
[ 4] 15.00-16.00 sec 1.20 GBytes 10.3 Gbits/sec 0 1.02 MBytes
[ 4] 16.00-17.00 sec 1.14 GBytes 9.80 Gbits/sec 0 1.02 MBytes
^C[ 4] 17.00-17.60 sec 718 MBytes 10.1 Gbits/sec 0 1.02 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-17.60 sec 18.4 GBytes 8.98 Gbits/sec 0 sender
[ 4] 0.00-17.60 sec 0.00 Bytes 0.00 bits/sec receiver
PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
|
|
PiperOrigin-RevId: 245469859
Change-Id: I0610e477cc3a884275852e83028ecfb501f2c039
|
|
PiperOrigin-RevId: 245452217
Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
|
|
PiperOrigin-RevId: 245451875
Change-Id: Icee2c4ed74564e77454c60d60f456454443ccadf
|
|
PiperOrigin-RevId: 245341004
Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657
|
|
Maximum filename length is filesystem-dependent, and obtained via
statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not
always. For example, VFAT supports filenames of up to 255... UCS-2
characters, which Linux conservatively takes to mean UTF-8-encoded
bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
As a result, Linux's VFS does not enforce NAME_MAX:
$ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/
fs/libfs.c
38: buf->f_namelen = NAME_MAX;
64: if (dentry->d_name.len > NAME_MAX)
include/linux/relay.h
74: char base_filename[NAME_MAX]; /* saved base filename */
include/linux/fscrypt.h
149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length.
include/linux/exportfs.h
176: * understanding that it is already pointing to a a %NAME_MAX+1 sized
Remove this check from core VFS, and add it to ramfs (and by extension
tmpfs), where it is actually applicable:
mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does*
enforce NAME_MAX.
PiperOrigin-RevId: 245324748
Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
|
|
See https://git.musl-libc.org/cgit/musl/tree/include/sys/poll.h
PiperOrigin-RevId: 245312375
Change-Id: If749ae3f94ccedc82eb6b594b32155924a354b58
|
|
PiperOrigin-RevId: 245306581
Change-Id: I44a034310809f8e9e651be8023ff1985561602fc
|
|
PiperOrigin-RevId: 245304611
Change-Id: Ie0e9bfc03d064e41d50157eeb4df22b2635f41e2
|
|
Bazel 0.23.0 is required due to the use of cc_flags_supplier.bzl in the vdso
package. cc_flags_supplier.bzl was added in 0.23.0.
PiperOrigin-RevId: 245192715
Change-Id: I4258c064e5cc3bac2a587c887e0d8f87b6678ec7
|
|
TCP tests and the implementation will come in followup CLs.
Updates google/gvisor#206
Updates google/gvisor#207
PiperOrigin-RevId: 245121470
Change-Id: Ib50b62724d3ba0cbfb1374e1f908798431ee2b21
|
|
PacketMMap mode has issues due to a kernel bug. This change
reverts us to using recvmmsg instead of a shared ring buffer to
dispatch inbound packets. This will reduce performance but should
be more stable under heavy load till PacketMMap is updated to
use TPacketv3.
See #210 for details.
Perf difference between recvmmsg vs packetmmap.
RecvMMsg :
iperf3 -c 172.17.0.2
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 43478 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 778 MBytes 6.53 Gbits/sec 4349 188 KBytes
[ 4] 1.00-2.00 sec 786 MBytes 6.59 Gbits/sec 4395 212 KBytes
[ 4] 2.00-3.00 sec 756 MBytes 6.34 Gbits/sec 3655 161 KBytes
[ 4] 3.00-4.00 sec 782 MBytes 6.56 Gbits/sec 4419 175 KBytes
[ 4] 4.00-5.00 sec 755 MBytes 6.34 Gbits/sec 4317 187 KBytes
[ 4] 5.00-6.00 sec 774 MBytes 6.49 Gbits/sec 4002 173 KBytes
[ 4] 6.00-7.00 sec 737 MBytes 6.18 Gbits/sec 3904 191 KBytes
[ 4] 7.00-8.00 sec 530 MBytes 4.44 Gbits/sec 3318 189 KBytes
[ 4] 8.00-9.00 sec 487 MBytes 4.09 Gbits/sec 2627 188 KBytes
[ 4] 9.00-10.00 sec 770 MBytes 6.46 Gbits/sec 4221 170 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec 39207 sender
[ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec receiver
iperf Done.
PacketMMap:
bhaskerh@gvisor-bench:~/tensorflow$ iperf3 -c 172.17.0.2
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 43496 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 657 MBytes 5.51 Gbits/sec 0 1.01 MBytes
[ 4] 1.00-2.00 sec 1021 MBytes 8.56 Gbits/sec 0 1.01 MBytes
[ 4] 2.00-3.00 sec 1.21 GBytes 10.4 Gbits/sec 45 1.01 MBytes
[ 4] 3.00-4.00 sec 1018 MBytes 8.54 Gbits/sec 15 1.01 MBytes
[ 4] 4.00-5.00 sec 1.28 GBytes 11.0 Gbits/sec 45 1.01 MBytes
[ 4] 5.00-6.00 sec 1.38 GBytes 11.9 Gbits/sec 0 1.01 MBytes
[ 4] 6.00-7.00 sec 1.34 GBytes 11.5 Gbits/sec 45 856 KBytes
[ 4] 7.00-8.00 sec 1.23 GBytes 10.5 Gbits/sec 0 901 KBytes
[ 4] 8.00-9.00 sec 1010 MBytes 8.48 Gbits/sec 0 923 KBytes
[ 4] 9.00-10.00 sec 1.39 GBytes 11.9 Gbits/sec 0 960 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec 150 sender
[ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec receiver
Updates #210
PiperOrigin-RevId: 244968438
Change-Id: Id461b5cbff2dea6fa55cfc108ea246d8f83da20b
|
|
This CL fixes the following bugs:
- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.
- Increments ringOffsets for frames that are truncated (i.e status is
tpStatusCopy)
- Does not ignore frames with tpStatusLost bit set as they are valid frames and
only indicate that there some frames were lost before this one and metrics can
be retrieved with a getsockopt call.
- Adds checks to make sure blockSize is a multiple of page size. This is
required as the kernel allocates in pages per block and rejects sizes that are
not page aligned with an EINVAL.
Updates #210
PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
|
|
PiperOrigin-RevId: 244959388
Change-Id: Ifb08678d975cf9f694a21012f9a1e9f45b1f197c
|
|
The caller must call Readdir() at least twice to detect
EOF. The old code was always restarting the directory
search and then skipping elements already seen, effectively
doubling the cost to read a directory. The code now
remembers the last offset and doesn't reposition the cursor
if next request comes at the same offset.
PiperOrigin-RevId: 244957816
Change-Id: If21a8dc68b76614adbcf4301439adfda40f2643f
|
|
p9.messageByType was taking 7% of p9.recv before, spending time
with reflection and map lookup. Now it's reduced to 1%.
PiperOrigin-RevId: 244947313
Change-Id: I42813f920557b7656f8b29157eb32acd79e11fa5
|
|
os.NewFile() accounts for 38% of CPU time in localFile.Walk().
This change switchs to use fd.FD which is much cheaper to create.
Now, fd.New() in localFile.Walk() accounts for only 4%.
PiperOrigin-RevId: 244944983
Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
|
|
Create, Start, and Destroy were racing to create and destroy the
metadata directory of containers.
This is a re-upload of
https://gvisor-review.googlesource.com/c/gvisor/+/16260, but with the
correct account.
Change-Id: I16b7a9d0971f0df873e7f4145e6ac8f72730a4f1
PiperOrigin-RevId: 244892991
|
|
For a symbol link to some directory, eg.
`/tmp/symlink -> /tmp/dir`
`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.
Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.
Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
|
|
PiperOrigin-RevId: 244773890
Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa
|
|
PiperOrigin-RevId: 244773836
Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf
|
|
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.
Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.
Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
|
|
The MSG_TRUNC flag is set in the msghdr when a message is truncated.
Fixes google/gvisor#200
PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
|
|
Add a UDP forwarder for intercepting and forwarding UDP sessions.
Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9
PiperOrigin-RevId: 244293576
|
|
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I20103cd6d193431ab7e8120005da1f567b9bc2eb
PiperOrigin-RevId: 244280119
|
|
I0410 15:40:38.854295 3776 x:0] [ 1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1)
I0410 15:40:38.854348 3776 x:0] [ 1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s)
PiperOrigin-RevId: 244269879
Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
|
|
Inode ids are only guaranteed to be stable across save/restore if the file is
held open. This CL fixes a simple stat test to allow it to compare symlink and
target by inode id, as long as the link target is held open.
PiperOrigin-RevId: 244238343
Change-Id: I74c5115915b1cc032a4c16515a056a480f218f00
|
|
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER
when attempting to set unsupported values.
PiperOrigin-RevId: 244229675
Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
|
|
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.
This means that we can't use endpoint.netProto to set gso.L3HdrLen.
PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
|
|
The existing logic attempting to do this is incorrect. Unary ^ has
higher precedence than &^, so mask always has UnblockableSignals
cleared, allowing dequeueSignalLocked to dequeue unblockable signals
(which allows userspace to ignore them).
Switch the logic so that unblockable signals are always masked.
PiperOrigin-RevId: 244058487
Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
|
|
FD limit and file size limit is read from the host, instead
of using hard-coded defaults, given that they effect the sandbox
process. Also limit the direct cache to use no more than half
if the available FDs.
PiperOrigin-RevId: 244050323
Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
|
|
Current, doPoll copies the user struct pollfd array into a
[]syscalls.PollFD, which contains internal kdefs.FD and
waiter.EventMask types. While these are currently binary-compatible with
the Linux versions, we generally discourage copying directly to internal
types (someone may inadvertantly change kdefs.FD to uint64).
Instead, copy directly to a []linux.PollFD, which will certainly be
binary compatible. Most of syscalls/polling.go is included directly into
syscalls/linux/sys_poll.go, as it can then operate directly on
linux.PollFD. The additional syscalls.PollFD type is providing little
value.
I've also added explicit conversion functions for waiter.EventMask,
which creates the possibility of a different binary format.
PiperOrigin-RevId: 244042947
Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
|
|
PiperOrigin-RevId: 244036529
Change-Id: I280f9632a65d2e40d844e0d5ec3a101d808434ee
|
|
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
|
|
The file layout in the bucket is changed a little bit recently to support both v1 shim and v2 shim.
PiperOrigin-RevId: 243682904
Change-Id: Ic1373c6dc088ef41f829e7ce3ea3762e1e2b0292
|
|
It provides an easy way to run commands to quickly test gVisor.
By default it maps the host root as the container root with a
writable overlay on top (so the host root is not modified).
Example:
sudo runsc do ls -lh --color
sudo runsc do ~/src/test/my-test.sh
PiperOrigin-RevId: 243178711
Change-Id: I05f3d6ce253fe4b5f1362f4a07b5387f6ddb5dd9
|
|
Normal files display their path in the current mount namespace:
I0410 10:57:54.964196 216336 x:0] [ 1] ls X read(0x3 /proc/filesystems, 0x55cee3bdb2c0 "nodev\t9p\nnodev\tdevpts \nnodev\tdevtmpfs\nnodev\tproc\nnodev\tramdiskfs\nnodev\tsysfs\nnodev\ttmpfs\n", 0x1000) = 0x58 (24.462?s)
AT_FDCWD includes the CWD:
I0411 12:58:48.278427 1526 x:0] [ 1] stat_test E newfstatat(AT_FDCWD /home/prattmic, 0x55ea719b564e /proc/self, 0x7ef5cefc2be8, 0x0)
Sockets (and other non-vfs files) display an inode number (like
/proc/PID/fd):
I0410 10:54:38.909123 207684 x:0] [ 1] nc E bind(0x3 socket:[1], 0x55b5a1652040 {Family: AF_INET, Addr: , Port: 8080}, 0x10)
I also fixed a few syscall args that should be Path.
PiperOrigin-RevId: 243169025
Change-Id: Ic7dda6a82ae27062fe2a4a371557acfd6a21fa2a
|
|
Change-Id: I93a78a6b2bb2eaa69046c6cfecee2e4cfcf20e44
PiperOrigin-RevId: 243140359
|