Age | Commit message (Collapse) | Author |
|
This requires two changes:
1) Support for more than one socket to join a given multicast group.
2) Duplicate delivery of incoming multicast packets to all sockets listening
for it.
In addition, I tweaked the code (and added a test) to disallow duplicates
IP_ADD_MEMBERSHIP calls for the same group and NIC. This is how Linux does
it.
PiperOrigin-RevId: 246437315
Change-Id: Icad8300b4a8c3f501d9b4cd283bd3beabef88b72
|
|
PiperOrigin-RevId: 246036806
Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7
|
|
This feature allows MemoryFile to delay eviction of "optional"
allocations, such as unused cached file pages.
Note that this incidentally makes CachingInodeOperations writeback
asynchronous, in the sense that it doesn't occur until eviction; this is
necessary because between when a cached page becomes evictable and when
it's evicted, file writes (via CachingInodeOperations.Write) may dirty
the page.
As currently implemented, this feature won't meaningfully impact
steady-state memory usage or caching; the reclaimer goroutine will
schedule eviction as soon as it runs out of other work to do. Future CLs
increase caching by adding constraints on when eviction is scheduled.
PiperOrigin-RevId: 246014822
Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb
|
|
Updates google/gvisor#206
PiperOrigin-RevId: 245880573
Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
|
|
Cache last used messages and reuse them for subsequent requests.
If more messages are needed, they are created outside the cache
on demand.
PiperOrigin-RevId: 245836910
Change-Id: Icf099ddff95df420db8e09f5cdd41dcdce406c61
|
|
Based on the guidelines at
https://opensource.google.com/docs/releasing/authors/.
1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./'
2. Manual fixup of "Google Inc" references.
3. Add AUTHORS file. Authors may request to be added to this file.
4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS.
Fixes #209
PiperOrigin-RevId: 245823212
Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
|
|
PiperOrigin-RevId: 245818639
Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
|
|
Previously, createAt was eating all errors from FindInode except for EACCES and
proceeding with the creation. This is incorrect, as FindInode can return many
other errors (like ENAMETOOLONG) that should stop creation.
This CL changes createAt to return all errors encountered except for ENOENT,
which we can ignore because we are about to create the thing.
PiperOrigin-RevId: 245773222
Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
|
|
Add the CloseRead & CloseWrite methods that performs shutdown on the
corresponding Read & Write sides of a connection.
Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1
PiperOrigin-RevId: 245537950
|
|
PiperOrigin-RevId: 245511019
Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
|
|
Apparently some platforms don't have pSize < vSize.
Fixes #208
PiperOrigin-RevId: 245480998
Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
|
|
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.
Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.
Updates #211
iperf output w/ 4MB buffer.
iperf3 -c 172.17.0.2 -t 100
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.02 MBytes
[ 4] 1.00-2.00 sec 1.18 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 2.00-3.00 sec 965 MBytes 8.09 Gbits/sec 0 1.02 MBytes
[ 4] 3.00-4.00 sec 942 MBytes 7.90 Gbits/sec 0 1.02 MBytes
[ 4] 4.00-5.00 sec 952 MBytes 7.99 Gbits/sec 0 1.02 MBytes
[ 4] 5.00-6.00 sec 1.14 GBytes 9.81 Gbits/sec 0 1.02 MBytes
[ 4] 6.00-7.00 sec 1.13 GBytes 9.68 Gbits/sec 0 1.02 MBytes
[ 4] 7.00-8.00 sec 930 MBytes 7.80 Gbits/sec 0 1.02 MBytes
[ 4] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.02 MBytes
[ 4] 9.00-10.00 sec 938 MBytes 7.87 Gbits/sec 0 1.02 MBytes
[ 4] 10.00-11.00 sec 737 MBytes 6.18 Gbits/sec 0 1.02 MBytes
[ 4] 11.00-12.00 sec 1.16 GBytes 9.93 Gbits/sec 0 1.02 MBytes
[ 4] 12.00-13.00 sec 917 MBytes 7.69 Gbits/sec 0 1.02 MBytes
[ 4] 13.00-14.00 sec 1.19 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 14.00-15.00 sec 1.01 GBytes 8.70 Gbits/sec 0 1.02 MBytes
[ 4] 15.00-16.00 sec 1.20 GBytes 10.3 Gbits/sec 0 1.02 MBytes
[ 4] 16.00-17.00 sec 1.14 GBytes 9.80 Gbits/sec 0 1.02 MBytes
^C[ 4] 17.00-17.60 sec 718 MBytes 10.1 Gbits/sec 0 1.02 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-17.60 sec 18.4 GBytes 8.98 Gbits/sec 0 sender
[ 4] 0.00-17.60 sec 0.00 Bytes 0.00 bits/sec receiver
PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
|
|
PiperOrigin-RevId: 245452217
Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
|
|
PiperOrigin-RevId: 245341004
Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657
|
|
Maximum filename length is filesystem-dependent, and obtained via
statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not
always. For example, VFAT supports filenames of up to 255... UCS-2
characters, which Linux conservatively takes to mean UTF-8-encoded
bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
As a result, Linux's VFS does not enforce NAME_MAX:
$ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/
fs/libfs.c
38: buf->f_namelen = NAME_MAX;
64: if (dentry->d_name.len > NAME_MAX)
include/linux/relay.h
74: char base_filename[NAME_MAX]; /* saved base filename */
include/linux/fscrypt.h
149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length.
include/linux/exportfs.h
176: * understanding that it is already pointing to a a %NAME_MAX+1 sized
Remove this check from core VFS, and add it to ramfs (and by extension
tmpfs), where it is actually applicable:
mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does*
enforce NAME_MAX.
PiperOrigin-RevId: 245324748
Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
|
|
This CL fixes the following bugs:
- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.
- Increments ringOffsets for frames that are truncated (i.e status is
tpStatusCopy)
- Does not ignore frames with tpStatusLost bit set as they are valid frames and
only indicate that there some frames were lost before this one and metrics can
be retrieved with a getsockopt call.
- Adds checks to make sure blockSize is a multiple of page size. This is
required as the kernel allocates in pages per block and rejects sizes that are
not page aligned with an EINVAL.
Updates #210
PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
|
|
p9.messageByType was taking 7% of p9.recv before, spending time
with reflection and map lookup. Now it's reduced to 1%.
PiperOrigin-RevId: 244947313
Change-Id: I42813f920557b7656f8b29157eb32acd79e11fa5
|
|
os.NewFile() accounts for 38% of CPU time in localFile.Walk().
This change switchs to use fd.FD which is much cheaper to create.
Now, fd.New() in localFile.Walk() accounts for only 4%.
PiperOrigin-RevId: 244944983
Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
|
|
For a symbol link to some directory, eg.
`/tmp/symlink -> /tmp/dir`
`fstatat("/tmp/symlink")` should return symbol link data, but
`fstatat("/tmp/symlink/")` (symlink with trailing slash) should return
directory data it points following linux behaviour.
Currently fstatat() a symlink with trailing slash will get "not a
directory" error which is wrong.
Signed-off-by: Wei Zhang <zhangwei198900@gmail.com>
Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2
PiperOrigin-RevId: 244783916
|
|
PiperOrigin-RevId: 244773890
Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa
|
|
PiperOrigin-RevId: 244773836
Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf
|
|
Support shutdown on only the read side of an endpoint. Reads performed
after a call to Shutdown with only the ShutdownRead flag will return
ErrClosedForReceive without data.
Break out the shutdown(2) with SHUT_RD syscall test into to two tests.
The first tests that no packets are sent when shutting down the read
side of a socket. The second tests that, after shutting down the read
side of a socket, unread data can still be read, or an EOF if there is
no more data to read.
Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce
PiperOrigin-RevId: 244459430
|
|
The MSG_TRUNC flag is set in the msghdr when a message is truncated.
Fixes google/gvisor#200
PiperOrigin-RevId: 244440486
Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
|
|
Add a UDP forwarder for intercepting and forwarding UDP sessions.
Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9
PiperOrigin-RevId: 244293576
|
|
I0410 15:40:38.854295 3776 x:0] [ 1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1)
I0410 15:40:38.854348 3776 x:0] [ 1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s)
PiperOrigin-RevId: 244269879
Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
|
|
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER
when attempting to set unsupported values.
PiperOrigin-RevId: 244229675
Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
|
|
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.
This means that we can't use endpoint.netProto to set gso.L3HdrLen.
PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
|
|
The existing logic attempting to do this is incorrect. Unary ^ has
higher precedence than &^, so mask always has UnblockableSignals
cleared, allowing dequeueSignalLocked to dequeue unblockable signals
(which allows userspace to ignore them).
Switch the logic so that unblockable signals are always masked.
PiperOrigin-RevId: 244058487
Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
|
|
FD limit and file size limit is read from the host, instead
of using hard-coded defaults, given that they effect the sandbox
process. Also limit the direct cache to use no more than half
if the available FDs.
PiperOrigin-RevId: 244050323
Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
|
|
Current, doPoll copies the user struct pollfd array into a
[]syscalls.PollFD, which contains internal kdefs.FD and
waiter.EventMask types. While these are currently binary-compatible with
the Linux versions, we generally discourage copying directly to internal
types (someone may inadvertantly change kdefs.FD to uint64).
Instead, copy directly to a []linux.PollFD, which will certainly be
binary compatible. Most of syscalls/polling.go is included directly into
syscalls/linux/sys_poll.go, as it can then operate directly on
linux.PollFD. The additional syscalls.PollFD type is providing little
value.
I've also added explicit conversion functions for waiter.EventMask,
which creates the possibility of a different binary format.
PiperOrigin-RevId: 244042947
Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
|
|
PiperOrigin-RevId: 244036529
Change-Id: I280f9632a65d2e40d844e0d5ec3a101d808434ee
|
|
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
|
|
Normal files display their path in the current mount namespace:
I0410 10:57:54.964196 216336 x:0] [ 1] ls X read(0x3 /proc/filesystems, 0x55cee3bdb2c0 "nodev\t9p\nnodev\tdevpts \nnodev\tdevtmpfs\nnodev\tproc\nnodev\tramdiskfs\nnodev\tsysfs\nnodev\ttmpfs\n", 0x1000) = 0x58 (24.462?s)
AT_FDCWD includes the CWD:
I0411 12:58:48.278427 1526 x:0] [ 1] stat_test E newfstatat(AT_FDCWD /home/prattmic, 0x55ea719b564e /proc/self, 0x7ef5cefc2be8, 0x0)
Sockets (and other non-vfs files) display an inode number (like
/proc/PID/fd):
I0410 10:54:38.909123 207684 x:0] [ 1] nc E bind(0x3 socket:[1], 0x55b5a1652040 {Family: AF_INET, Addr: , Port: 8080}, 0x10)
I also fixed a few syscall args that should be Path.
PiperOrigin-RevId: 243169025
Change-Id: Ic7dda6a82ae27062fe2a4a371557acfd6a21fa2a
|
|
PiperOrigin-RevId: 243018347
Change-Id: I1e5b80607c1df0747482abea61db7fcf24536d37
|
|
PiperOrigin-RevId: 242978508
Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
|
|
RootFromContext can return a dirent with reference taken, or nil. We must call
DecRef if (and only if) a real dirent is returned.
PiperOrigin-RevId: 242965515
Change-Id: Ie2b7b4cb19ee09b6ccf788b71f3fd7efcdf35a11
|
|
add renameMu.Lock when oldParent == newParent
in order to avoid data race in following report:
WARNING: DATA RACE
Read at 0x00c000ba2160 by goroutine 405:
gvisor.googlesource.com/gvisor/pkg/sentry/fs.(*Dirent).fullName()
pkg/sentry/fs/dirent.go:246 +0x6c
gvisor.googlesource.com/gvisor/pkg/sentry/fs.(*Dirent).FullName()
pkg/sentry/fs/dirent.go:356 +0x8b
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*FDMap).String()
pkg/sentry/kernel/fd_map.go:135 +0x1e0
fmt.(*pp).handleMethods()
GOROOT/src/fmt/print.go:603 +0x404
fmt.(*pp).printArg()
GOROOT/src/fmt/print.go:686 +0x255
fmt.(*pp).doPrintf()
GOROOT/src/fmt/print.go:1003 +0x33f
fmt.Fprintf()
GOROOT/src/fmt/print.go:188 +0x7f
gvisor.googlesource.com/gvisor/pkg/log.(*Writer).Emit()
pkg/log/log.go:121 +0x89
gvisor.googlesource.com/gvisor/pkg/log.GoogleEmitter.Emit()
pkg/log/glog.go:162 +0x1acc
gvisor.googlesource.com/gvisor/pkg/log.(*GoogleEmitter).Emit()
<autogenerated>:1 +0xe1
gvisor.googlesource.com/gvisor/pkg/log.(*BasicLogger).Debugf()
pkg/log/log.go:177 +0x111
gvisor.googlesource.com/gvisor/pkg/log.Debugf()
pkg/log/log.go:235 +0x66
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).Debugf()
pkg/sentry/kernel/task_log.go:48 +0xfe
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).DebugDumpState()
pkg/sentry/kernel/task_log.go:66 +0x11f
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runApp).execute()
pkg/sentry/kernel/task_run.go:272 +0xc80
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run()
pkg/sentry/kernel/task_run.go:91 +0x24b
Previous write at 0x00c000ba2160 by goroutine 423:
gvisor.googlesource.com/gvisor/pkg/sentry/fs.Rename()
pkg/sentry/fs/dirent.go:1628 +0x61f
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1.1()
pkg/sentry/syscalls/linux/sys_file.go:1864 +0x1f8
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt( gvisor.googlesource.com/g/linux/sys_file.go:51 +0x20f
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt.func1()
pkg/sentry/syscalls/linux/sys_file.go:1852 +0x218
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.fileOpAt()
pkg/sentry/syscalls/linux/sys_file.go:51 +0x20f
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.renameAt()
pkg/sentry/syscalls/linux/sys_file.go:1840 +0x180
gvisor.googlesource.com/gvisor/pkg/sentry/syscalls/linux.Rename()
pkg/sentry/syscalls/linux/sys_file.go:1873 +0x60
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).executeSyscall()
pkg/sentry/kernel/task_syscall.go:165 +0x17a
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscallInvoke()
pkg/sentry/kernel/task_syscall.go:283 +0xb4
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscallEnter()
pkg/sentry/kernel/task_syscall.go:244 +0x10c
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).doSyscall()
pkg/sentry/kernel/task_syscall.go:219 +0x1e3
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*runApp).execute()
pkg/sentry/kernel/task_run.go:215 +0x15a9
gvisor.googlesource.com/gvisor/pkg/sentry/kernel.(*Task).run()
pkg/sentry/kernel/task_run.go:91 +0x24b
Reported-by: syzbot+e1babbf756fab380dfff@syzkaller.appspotmail.com
Change-Id: Icd2620bb3ea28b817bf0672d454a22b9d8ee189a
PiperOrigin-RevId: 242938741
|
|
PiperOrigin-RevId: 242919489
Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
|
|
DirentCache is already a savable type, and it ensures that it is empty at the
point of Save. There is no reason not to save it along with the MountSource.
This did uncover an issue where not all MountSources were properly flushed
before Save. If a mount point has an open file and is then unmounted, we save
the MountSource without flushing it first. This CL also fixes that by flushing
all MountSources for all open FDs on Save.
PiperOrigin-RevId: 242906637
Change-Id: I3acd9d52b6ce6b8c989f835a408016cb3e67018f
|
|
This also applies these permissions to other static proc files.
Change-Id: I4167e585fed49ad271aa4e1f1260babb3239a73d
PiperOrigin-RevId: 242898575
|
|
From sendfile spec and also the linux kernel code, we should
limit the count arg to 'MAX_RW_COUNT'. This patch export
'MAX_RW_COUNT' in kernel pkg and use it in the implementation
of sendfile syscall.
Signed-off-by: Li Qiang <pangpei.lq@antfin.com>
Change-Id: I1086fec0685587116984555abd22b07ac233fbd2
PiperOrigin-RevId: 242745831
|
|
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
|
|
PiperOrigin-RevId: 242647530
Change-Id: I1bf9ac1d664f452dc47ca670d408a73538cb482f
|
|
Also add kernel.SignalInfoNoInfo, and use it in RLIMIT_FSIZE checks.
PiperOrigin-RevId: 242562428
Change-Id: I4887c0e1c8f5fddcabfe6d4281bf76d2f2eafe90
|
|
We construct a ramfs tree of "scaffolding" directories for all mount points, so
that a directory exists that each mount point can be mounted over.
We were creating these directories without write permissions, which meant that
they were not wribable even when underlayed under a writable filesystem. They
should be writable.
PiperOrigin-RevId: 242507789
Change-Id: I86645e35417560d862442ff5962da211dbe9b731
|
|
Strings are a better fit for this usage because they are immutable in Go, and
can contain arbitrary bytes. It also allows us to avoid casting bytes to string
(and the associated allocation) in the hot path when checking for overlay
whiteouts.
PiperOrigin-RevId: 242208856
Change-Id: I7699ae6302492eca71787dd0b72e0a5a217a3db2
|
|
From the SDM: "The least-significant byte in register EAX (register AL)
will always return 01H. Software should ignore this value and not
interpret it as an informational descriptor."
Unfortunately, online docs [1] [2] (likely based on an old version of the SDM)
say: "The least-significant byte in register EAX (register AL) indicates
the number of times the CPUID instruction must be executed with an input
value of 2 to get a complete description of the processor's caches and
TLBs."
dlang uses this second interpretation [3] and will loop 2^32 times if we
return zero. Fix this by specifying the fixed value of one. We still
don't support exposing the actual cache information, leaving all other
bytes empty. A zero byte means: "Null descriptor, this byte contains no
information."
[1] http://www.sandpile.org/x86/cpuid.htm#level_0000_0002h
[2] https://c9x.me/x86/html/file_module_x86_id_45.html
[3] https://github.com/dlang/druntime/blob/424640864c2aa001731467e96f637bd3e704e481/src/core/cpuid.d#L533-L534
PiperOrigin-RevId: 242046629
Change-Id: Ic0f0a5f974b20f71391cb85645bdcd4003e5fe88
|
|
https://github.com/google/gvisor/issues/145
PiperOrigin-RevId: 242044115
Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878
|
|
In particular, ns.IDOfTask and tg.ID are used for gettid and getpid,
respectively, where removing defer saves ~100ns. This may be a small
improvement to application logging, which may call gettid/getpid
frequently.
PiperOrigin-RevId: 242039616
Change-Id: I860beb62db3fe077519835e6bafa7c74cba6ca80
|
|
Change-Id: Ibd6d8a1a63826af6e62a0f0669f8f0866c8091b4
PiperOrigin-RevId: 242037969
|