summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2019-05-02Add [simple] network support to 'runsc do'Fabricio Voznika
Sandbox always runsc with IP 192.168.10.2 and the peer network adds 1 to the address (192.168.10.3). Sandbox IP can be changed using --ip flag. Here a few examples: sudo runsc do curl www.google.com sudo runsc do --ip=10.10.10.2 bash -c "echo 123 | netcat -l -p 8080" PiperOrigin-RevId: 246421277 Change-Id: I7b3dce4af46a57300350dab41cb27e04e4b6e9da
2019-04-30CONTRIBUTING: fix broken repository linkAdin Scannell
PiperOrigin-RevId: 246079174 Change-Id: I423078a065e0cc5d258d674b4f2f0680a5db0aee
2019-04-30Update reference to old typeMichael Pratt
PiperOrigin-RevId: 246036806 Change-Id: I5554a43a1f8146c927402db3bf98488a2da0fbe7
2019-04-30Implement async MemoryFile eviction, and use it in CachingInodeOperations.Jamie Liu
This feature allows MemoryFile to delay eviction of "optional" allocations, such as unused cached file pages. Note that this incidentally makes CachingInodeOperations writeback asynchronous, in the sense that it doesn't occur until eviction; this is necessary because between when a cached page becomes evictable and when it's evicted, file writes (via CachingInodeOperations.Write) may dirty the page. As currently implemented, this feature won't meaningfully impact steady-state memory usage or caching; the reclaimer goroutine will schedule eviction as soon as it runs out of other work to do. Future CLs increase caching by adding constraints on when eviction is scheduled. PiperOrigin-RevId: 246014822 Change-Id: Ia85feb25a2de92a48359eb84434b6ec6f9bea2cb
2019-04-29Implement the MSG_CTRUNC msghdr flag for Unix sockets.Ian Gudger
Updates google/gvisor#206 PiperOrigin-RevId: 245880573 Change-Id: Ifa715e98d47f64b8a32b04ae9378d6cd6bd4025e
2019-04-29Skip flaky ClockGettime.CputimeIdFabricio Voznika
Test times out when it runs on a single core. Skip until the bug in the Go runtime is fixed. PiperOrigin-RevId: 245866466 Change-Id: Ic3e72131c27136d58b71f6b11acc78abf55895d4
2019-04-29Reduce memory allocations on serving pathFabricio Voznika
Cache last used messages and reuse them for subsequent requests. If more messages are needed, they are created outside the cache on demand. PiperOrigin-RevId: 245836910 Change-Id: Icf099ddff95df420db8e09f5cdd41dcdce406c61
2019-04-29Change copyright notice to "The gVisor Authors"Michael Pratt
Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" | xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29Allow and document bug ids in gVisor codebase.Nicolas Lacasse
PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-29n/aGoogler
PiperOrigin-RevId: 245810347 Change-Id: Ia5f4bb268a8207bd2a7d4c77c83cdfbe1483c64f
2019-04-29Appease googletest deprecationTamir Duberstein
PiperOrigin-RevId: 245788366 Change-Id: I17bbecf8493132dbe95564c34c45b838194bfabb
2019-04-29createAt should return all errors from FindInode except ENOENT.Nicolas Lacasse
Previously, createAt was eating all errors from FindInode except for EACCES and proceeding with the creation. This is incorrect, as FindInode can return many other errors (like ENAMETOOLONG) that should stop creation. This CL changes createAt to return all errors encountered except for ENOENT, which we can ignore because we are about to create the thing. PiperOrigin-RevId: 245773222 Change-Id: I1b317021de70f0550fb865506f6d8147d4aebc56
2019-04-26tcpip/adapters/gonet: add CloseRead & CloseWrite methods to ConnBen Burkert
Add the CloseRead & CloseWrite methods that performs shutdown on the corresponding Read & Write sides of a connection. Change-Id: I3996a2abdc7cd68a2becba44dc4bd9f0919d2ce1 PiperOrigin-RevId: 245537950
2019-04-26Make raw sockets a toggleable feature disabled by default.Kevin Krakauer
PiperOrigin-RevId: 245511019 Change-Id: Ia9562a301b46458988a6a1f0bbd5f07cbfcb0615
2019-04-26kvm: remove non-sane sanity checkAdin Scannell
Apparently some platforms don't have pSize < vSize. Fixes #208 PiperOrigin-RevId: 245480998 Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
2019-04-26Bump the AF_PACKET socket rcv buf size to 4MB by default.Bhasker Hariharan
Packet socket receive buffers default to the sysctl value of net.core.rmem_default and are capped by net.core.rmem_max both which are usually set to 208KB on most systems. Since we can't expect every gVisor user to bump these we use SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs with CAP_NET_ADMIN outside the sandbox and can do this before the FD is passed to the sentry inside the sandbox. Updates #211 iperf output w/ 4MB buffer. iperf3 -c 172.17.0.2 -t 100 Connecting to host 172.17.0.2, port 5201 [ 4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.02 MBytes [ 4] 1.00-2.00 sec 1.18 GBytes 10.2 Gbits/sec 0 1.02 MBytes [ 4] 2.00-3.00 sec 965 MBytes 8.09 Gbits/sec 0 1.02 MBytes [ 4] 3.00-4.00 sec 942 MBytes 7.90 Gbits/sec 0 1.02 MBytes [ 4] 4.00-5.00 sec 952 MBytes 7.99 Gbits/sec 0 1.02 MBytes [ 4] 5.00-6.00 sec 1.14 GBytes 9.81 Gbits/sec 0 1.02 MBytes [ 4] 6.00-7.00 sec 1.13 GBytes 9.68 Gbits/sec 0 1.02 MBytes [ 4] 7.00-8.00 sec 930 MBytes 7.80 Gbits/sec 0 1.02 MBytes [ 4] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.02 MBytes [ 4] 9.00-10.00 sec 938 MBytes 7.87 Gbits/sec 0 1.02 MBytes [ 4] 10.00-11.00 sec 737 MBytes 6.18 Gbits/sec 0 1.02 MBytes [ 4] 11.00-12.00 sec 1.16 GBytes 9.93 Gbits/sec 0 1.02 MBytes [ 4] 12.00-13.00 sec 917 MBytes 7.69 Gbits/sec 0 1.02 MBytes [ 4] 13.00-14.00 sec 1.19 GBytes 10.2 Gbits/sec 0 1.02 MBytes [ 4] 14.00-15.00 sec 1.01 GBytes 8.70 Gbits/sec 0 1.02 MBytes [ 4] 15.00-16.00 sec 1.20 GBytes 10.3 Gbits/sec 0 1.02 MBytes [ 4] 16.00-17.00 sec 1.14 GBytes 9.80 Gbits/sec 0 1.02 MBytes ^C[ 4] 17.00-17.60 sec 718 MBytes 10.1 Gbits/sec 0 1.02 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-17.60 sec 18.4 GBytes 8.98 Gbits/sec 0 sender [ 4] 0.00-17.60 sec 0.00 Bytes 0.00 bits/sec receiver PiperOrigin-RevId: 245470590 Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
2019-04-26Remove syscall tests' dependency on glogTamir Duberstein
PiperOrigin-RevId: 245469859 Change-Id: I0610e477cc3a884275852e83028ecfb501f2c039
2019-04-26Fix reference counting bug in /proc/PID/fdinfo/.Kevin Krakauer
PiperOrigin-RevId: 245452217 Change-Id: I7164d8f57fe34c17e601079eb9410a6d95af1869
2019-04-26Change name of sticky test arg.Kevin Krakauer
PiperOrigin-RevId: 245451875 Change-Id: Icee2c4ed74564e77454c60d60f456454443ccadf
2019-04-25Perform explicit CPUID and FP state compatibility checks on restoreMichael Pratt
PiperOrigin-RevId: 245341004 Change-Id: Ic4d581039d034a8ae944b43e45e84eb2c3973657
2019-04-25Don't enforce NAME_MAX in fs.Dirent.walk().Jamie Liu
Maximum filename length is filesystem-dependent, and obtained via statfs::f_namelen. This limit is usually 255 bytes (NAME_MAX), but not always. For example, VFAT supports filenames of up to 255... UCS-2 characters, which Linux conservatively takes to mean UTF-8-encoded bytes: fs/fat/inode.c:fat_statfs(), FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. As a result, Linux's VFS does not enforce NAME_MAX: $ rg --maxdepth=1 '\WNAME_MAX\W' fs/ include/linux/ fs/libfs.c 38: buf->f_namelen = NAME_MAX; 64: if (dentry->d_name.len > NAME_MAX) include/linux/relay.h 74: char base_filename[NAME_MAX]; /* saved base filename */ include/linux/fscrypt.h 149: * filenames up to NAME_MAX bytes, since base64 encoding expands the length. include/linux/exportfs.h 176: * understanding that it is already pointing to a a %NAME_MAX+1 sized Remove this check from core VFS, and add it to ramfs (and by extension tmpfs), where it is actually applicable: mm/shmem.c:shmem_dir_inode_operations.lookup == simple_lookup *does* enforce NAME_MAX. PiperOrigin-RevId: 245324748 Change-Id: I17567c4324bfd60e31746a5270096e75db963fac
2019-04-25s,sys/poll.h/,poll.h,gTamir Duberstein
See https://git.musl-libc.org/cgit/musl/tree/include/sys/poll.h PiperOrigin-RevId: 245312375 Change-Id: If749ae3f94ccedc82eb6b594b32155924a354b58
2019-04-25Handle glibc and XSI variants of strerror_rTamir Duberstein
PiperOrigin-RevId: 245306581 Change-Id: I44a034310809f8e9e651be8023ff1985561602fc
2019-04-25Remove useless modifiersTamir Duberstein
PiperOrigin-RevId: 245304611 Change-Id: Ie0e9bfc03d064e41d50157eeb4df22b2635f41e2
2019-04-25Update required bazel version to 0.23.0 in READMEIan Lewis
Bazel 0.23.0 is required due to the use of cc_flags_supplier.bzl in the vdso package. cc_flags_supplier.bzl was added in 0.23.0. PiperOrigin-RevId: 245192715 Change-Id: I4258c064e5cc3bac2a587c887e0d8f87b6678ec7
2019-04-24Add Unix socket tests for the MSG_CTRUNC msghdr flag.Ian Gudger
TCP tests and the implementation will come in followup CLs. Updates google/gvisor#206 Updates google/gvisor#207 PiperOrigin-RevId: 245121470 Change-Id: Ib50b62724d3ba0cbfb1374e1f908798431ee2b21
2019-04-23Revert runsc to use RecvMMsg packet dispatcher.Bhasker Hariharan
PacketMMap mode has issues due to a kernel bug. This change reverts us to using recvmmsg instead of a shared ring buffer to dispatch inbound packets. This will reduce performance but should be more stable under heavy load till PacketMMap is updated to use TPacketv3. See #210 for details. Perf difference between recvmmsg vs packetmmap. RecvMMsg : iperf3 -c 172.17.0.2 Connecting to host 172.17.0.2, port 5201 [ 4] local 172.17.0.1 port 43478 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 778 MBytes 6.53 Gbits/sec 4349 188 KBytes [ 4] 1.00-2.00 sec 786 MBytes 6.59 Gbits/sec 4395 212 KBytes [ 4] 2.00-3.00 sec 756 MBytes 6.34 Gbits/sec 3655 161 KBytes [ 4] 3.00-4.00 sec 782 MBytes 6.56 Gbits/sec 4419 175 KBytes [ 4] 4.00-5.00 sec 755 MBytes 6.34 Gbits/sec 4317 187 KBytes [ 4] 5.00-6.00 sec 774 MBytes 6.49 Gbits/sec 4002 173 KBytes [ 4] 6.00-7.00 sec 737 MBytes 6.18 Gbits/sec 3904 191 KBytes [ 4] 7.00-8.00 sec 530 MBytes 4.44 Gbits/sec 3318 189 KBytes [ 4] 8.00-9.00 sec 487 MBytes 4.09 Gbits/sec 2627 188 KBytes [ 4] 9.00-10.00 sec 770 MBytes 6.46 Gbits/sec 4221 170 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec 39207 sender [ 4] 0.00-10.00 sec 6.99 GBytes 6.00 Gbits/sec receiver iperf Done. PacketMMap: bhaskerh@gvisor-bench:~/tensorflow$ iperf3 -c 172.17.0.2 Connecting to host 172.17.0.2, port 5201 [ 4] local 172.17.0.1 port 43496 connected to 172.17.0.2 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 657 MBytes 5.51 Gbits/sec 0 1.01 MBytes [ 4] 1.00-2.00 sec 1021 MBytes 8.56 Gbits/sec 0 1.01 MBytes [ 4] 2.00-3.00 sec 1.21 GBytes 10.4 Gbits/sec 45 1.01 MBytes [ 4] 3.00-4.00 sec 1018 MBytes 8.54 Gbits/sec 15 1.01 MBytes [ 4] 4.00-5.00 sec 1.28 GBytes 11.0 Gbits/sec 45 1.01 MBytes [ 4] 5.00-6.00 sec 1.38 GBytes 11.9 Gbits/sec 0 1.01 MBytes [ 4] 6.00-7.00 sec 1.34 GBytes 11.5 Gbits/sec 45 856 KBytes [ 4] 7.00-8.00 sec 1.23 GBytes 10.5 Gbits/sec 0 901 KBytes [ 4] 8.00-9.00 sec 1010 MBytes 8.48 Gbits/sec 0 923 KBytes [ 4] 9.00-10.00 sec 1.39 GBytes 11.9 Gbits/sec 0 960 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec 150 sender [ 4] 0.00-10.00 sec 11.4 GBytes 9.83 Gbits/sec receiver Updates #210 PiperOrigin-RevId: 244968438 Change-Id: Id461b5cbff2dea6fa55cfc108ea246d8f83da20b
2019-04-23Fixes to PacketMMap dispatcher.Bhasker Hariharan
This CL fixes the following bugs: - Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc which are not atomic. - Increments ringOffsets for frames that are truncated (i.e status is tpStatusCopy) - Does not ignore frames with tpStatusLost bit set as they are valid frames and only indicate that there some frames were lost before this one and metrics can be retrieved with a getsockopt call. - Adds checks to make sure blockSize is a multiple of page size. This is required as the kernel allocates in pages per block and rejects sizes that are not page aligned with an EINVAL. Updates #210 PiperOrigin-RevId: 244959464 Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
2019-04-23Add Linux version to requirements sectionFabricio Voznika
PiperOrigin-RevId: 244959388 Change-Id: Ifb08678d975cf9f694a21012f9a1e9f45b1f197c
2019-04-23Remember file position during Readdir()Fabricio Voznika
The caller must call Readdir() at least twice to detect EOF. The old code was always restarting the directory search and then skipping elements already seen, effectively doubling the cost to read a directory. The code now remembers the last offset and doesn't reposition the cursor if next request comes at the same offset. PiperOrigin-RevId: 244957816 Change-Id: If21a8dc68b76614adbcf4301439adfda40f2643f
2019-04-23Remove reflection from 9P serving pathFabricio Voznika
p9.messageByType was taking 7% of p9.recv before, spending time with reflection and map lookup. Now it's reduced to 1%. PiperOrigin-RevId: 244947313 Change-Id: I42813f920557b7656f8b29157eb32acd79e11fa5
2019-04-23Replace os.File with fd.FD in fsgoferFabricio Voznika
os.NewFile() accounts for 38% of CPU time in localFile.Walk(). This change switchs to use fd.FD which is much cheaper to create. Now, fd.New() in localFile.Walk() accounts for only 4%. PiperOrigin-RevId: 244944983 Change-Id: Ic892df96cf2633e78ad379227a213cb93ee0ca46
2019-04-23Fix container_test flakes.Kevin Krakauer
Create, Start, and Destroy were racing to create and destroy the metadata directory of containers. This is a re-upload of https://gvisor-review.googlesource.com/c/gvisor/+/16260, but with the correct account. Change-Id: I16b7a9d0971f0df873e7f4145e6ac8f72730a4f1 PiperOrigin-RevId: 244892991
2019-04-22Bugfix: fix fstatat symbol link to dirWei Zhang
For a symbol link to some directory, eg. `/tmp/symlink -> /tmp/dir` `fstatat("/tmp/symlink")` should return symbol link data, but `fstatat("/tmp/symlink/")` (symlink with trailing slash) should return directory data it points following linux behaviour. Currently fstatat() a symlink with trailing slash will get "not a directory" error which is wrong. Signed-off-by: Wei Zhang <zhangwei198900@gmail.com> Change-Id: I63469b1fb89d083d1c1255d32d52864606fbd7e2 PiperOrigin-RevId: 244783916
2019-04-22Fix doc typoMichael Pratt
PiperOrigin-RevId: 244773890 Change-Id: I2d0cd7789771276ba545b38efff6d3e24133baaa
2019-04-22Clean up state error handlingMichael Pratt
PiperOrigin-RevId: 244773836 Change-Id: I32223f79d2314fe1ac4ddfc63004fc22ff634adf
2019-04-19tcpip/transport/tcp: read side only shutdown of an endpointBen Burkert
Support shutdown on only the read side of an endpoint. Reads performed after a call to Shutdown with only the ShutdownRead flag will return ErrClosedForReceive without data. Break out the shutdown(2) with SHUT_RD syscall test into to two tests. The first tests that no packets are sent when shutting down the read side of a socket. The second tests that, after shutting down the read side of a socket, unread data can still be read, or an EOF if there is no more data to read. Change-Id: I9d7c0a06937909cbb466b7591544a4bcaebb11ce PiperOrigin-RevId: 244459430
2019-04-19Add support for the MSG_TRUNC msghdr flag.Ian Gudger
The MSG_TRUNC flag is set in the msghdr when a message is truncated. Fixes google/gvisor#200 PiperOrigin-RevId: 244440486 Change-Id: I03c7d5e7f5935c0c6b8d69b012db1780ac5b8456
2019-04-18tcpip/transport/udp: add Forwarder typeBen Burkert
Add a UDP forwarder for intercepting and forwarding UDP sessions. Change-Id: I2d83c900c1931adfc59a532dd4f6b33a0db406c9 PiperOrigin-RevId: 244293576
2019-04-18Enable vDSO support on arm64.Haibo Xu
Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I20103cd6d193431ab7e8120005da1f567b9bc2eb PiperOrigin-RevId: 244280119
2019-04-18Format struct pollfd in poll(2)/ppoll(2)Michael Pratt
I0410 15:40:38.854295 3776 x:0] [ 1] poll_test E poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT, REvents: ...}], 0x1, 0x1) I0410 15:40:38.854348 3776 x:0] [ 1] poll_test X poll(0x2b00bfb5c020 [{FD: 0x3 anon_inode:[eventfd], Events: POLLOUT|POLLERR|POLLHUP, REvents: POLLOUT}], 0x1, 0x1) = 0x1 (10.765?s) PiperOrigin-RevId: 244269879 Change-Id: If07ba54a486fdeaaedfc0123769b78d1da862307
2019-04-18Keep symlink target open while in test that compares inode ids.Nicolas Lacasse
Inode ids are only guaranteed to be stable across save/restore if the file is held open. This CL fixes a simple stat test to allow it to compare symlink and target by inode id, as long as the link target is held open. PiperOrigin-RevId: 244238343 Change-Id: I74c5115915b1cc032a4c16515a056a480f218f00
2019-04-18Only emit unimplemented syscall events for unsupported values.Ian Gudger
Only emit unimplemented syscall events for setting SO_OOBINLINE and SO_LINGER when attempting to set unsupported values. PiperOrigin-RevId: 244229675 Change-Id: Icc4562af8f733dd75a90404621711f01a32a9fc1
2019-04-18netstack: use a proper network protocol to set gso.L3HdrLenAndrei Vagin
It is possible to create a listening socket which will accept IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber for all accepted endpoints, even if they handle IPv4 connections. This means that we can't use endpoint.netProto to set gso.L3HdrLen. PiperOrigin-RevId: 244227948 Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
2019-04-17Don't allow sigtimedwait to catch unblockable signalsMichael Pratt
The existing logic attempting to do this is incorrect. Unary ^ has higher precedence than &^, so mask always has UnblockableSignals cleared, allowing dequeueSignalLocked to dequeue unblockable signals (which allows userspace to ignore them). Switch the logic so that unblockable signals are always masked. PiperOrigin-RevId: 244058487 Change-Id: Ib19630ac04068a1fbfb9dc4a8eab1ccbdb21edc3
2019-04-17Use FD limit and file size limit from hostFabricio Voznika
FD limit and file size limit is read from the host, instead of using hard-coded defaults, given that they effect the sandbox process. Also limit the direct cache to use no more than half if the available FDs. PiperOrigin-RevId: 244050323 Change-Id: I787ad0fdf07c49d589e51aebfeae477324fe26e6
2019-04-17Convert poll/select to operate more directly on linux.PollFDMichael Pratt
Current, doPoll copies the user struct pollfd array into a []syscalls.PollFD, which contains internal kdefs.FD and waiter.EventMask types. While these are currently binary-compatible with the Linux versions, we generally discourage copying directly to internal types (someone may inadvertantly change kdefs.FD to uint64). Instead, copy directly to a []linux.PollFD, which will certainly be binary compatible. Most of syscalls/polling.go is included directly into syscalls/linux/sys_poll.go, as it can then operate directly on linux.PollFD. The additional syscalls.PollFD type is providing little value. I've also added explicit conversion functions for waiter.EventMask, which creates the possibility of a different binary format. PiperOrigin-RevId: 244042947 Change-Id: I24e5b642002a32b3afb95a9dcb80d4acd1288abf
2019-04-17Internal change.Googler
PiperOrigin-RevId: 244036529 Change-Id: I280f9632a65d2e40d844e0d5ec3a101d808434ee
2019-04-17Return error from fdbased.NewFabricio Voznika
RELNOTES: n/a PiperOrigin-RevId: 244031742 Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
2019-04-15Fix gvisor-containerd-shim download in the test.Lantao Liu
The file layout in the bucket is changed a little bit recently to support both v1 shim and v2 shim. PiperOrigin-RevId: 243682904 Change-Id: Ic1373c6dc088ef41f829e7ce3ea3762e1e2b0292