summaryrefslogtreecommitdiffhomepage
path: root/runsc
AgeCommit message (Collapse)Author
2020-01-16Plumb getting/setting xattrs through InodeOperations and 9p gofer interfaces.Dean Deng
There was a very bare get/setxattr in the InodeOperations interface. Add context.Context to both, size to getxattr, and flags to setxattr. Note that extended attributes are passed around as strings in this implementation, so size is automatically encoded into the value. Size is added in getxattr so that implementations can return ERANGE if a value is larger than can fit in the user-allocated buffer. This prevents us from unnecessarily passing around an arbitrarily large xattr when the user buffer is actually too small. Don't use the existing xattrwalk and xattrcreate messages and define our own, mainly for the sake of simplicity. Extended attributes will be implemented in future commits. PiperOrigin-RevId: 290121300
2020-01-15Bump SO_SNDBUF for fdbased endpoint used by runsc.Bhasker Hariharan
Updates #231 PiperOrigin-RevId: 289897881
2020-01-09New sync package.Ian Gudger
* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2020-01-08Combine various Create*NIC methods into CreateNICWithOptions.Bert Muthalaly
PiperOrigin-RevId: 288779416
2020-01-08Add NIC.isLoopback()Bert Muthalaly
...enabling us to remove the "CreateNamedLoopbackNIC" variant of CreateNIC and all the plumbing to connect it through to where the value is read in FindRoute. PiperOrigin-RevId: 288713093
2019-12-18Increase waitForProcessList timeoutFabricio Voznika
It can take more than 10 seconds when running under --race. PiperOrigin-RevId: 286296060
2019-12-17Leave minimum CPU number as a constantAleksandr Razumov
Remove introduced CPUNumMin config and hard-code it as 2.
2019-12-17Add minimum CPU number and only lower CPUs on --cpu-num-from-quotaAleksandr Razumov
* Add `--cpu-num-min` flag to control minimum CPUs * Only lower CPU count * Fix comments
2019-12-15Set CPU number to CPU quotaAleksandr Razumov
When application is not cgroups-aware, it can spawn excessive threads which often defaults to CPU number. Introduce a opt-in flag that will set CPU number accordingly to CPU quota (if available). Fixes #1391
2019-12-12Add iptables testing framework.Kevin Krakauer
It would be preferrable to test iptables via syscall tests, but there are some problems with that approach: * We're limited to loopback-only, as syscall tests involve only a single container. Other link interfaces (e.g. fdbased) should be tested. * We'd have to shell out to call iptables anyways, as the iptables syscall interface itself is too large and complex to work with alone. * Running the Linux/native version of the syscall test will require root, which is a pain to configure, is inherently unsafe, and could leave host iptables misconfigured. Using the go_test target allows there to be no new test runner. PiperOrigin-RevId: 285274275
2019-12-11Enable IPv6 in runscBhasker Hariharan
Fixes #1341 PiperOrigin-RevId: 285108973
2019-12-11runsc/debug: add an option to list all processesAndrei Vagin
runsc debug --ps list all processes with all threads. This option is added to the debug command but not to the ps command, because it is going to be used for debug purposes and we want to add any useful information without thinking about backward compatibility. This will help to investigate syzkaller issues. PiperOrigin-RevId: 285013668
2019-12-11Finish incomplete comment.Dean Deng
PiperOrigin-RevId: 285012278
2019-12-06Bump up Go 1.13 as minimum requirementFabricio Voznika
PiperOrigin-RevId: 284320186
2019-12-06Merge pull request #1233 from xiaobo55x:compatLoggVisor bot
PiperOrigin-RevId: 284305935
2019-12-06Add runtime tracing.Adin Scannell
This adds meaningful annotations to the trace generated by the runtime/trace package. PiperOrigin-RevId: 284290115
2019-12-06Implement TTY field in control.Processes().Nicolas Lacasse
Threadgroups already know their TTY (if they have one), which now contains the TTY Index, and is returned in the Processes() call. PiperOrigin-RevId: 284263850
2019-12-06Make annotations OCI compliantFabricio Voznika
Changed annotation to follow the standard defined here: https://github.com/opencontainers/image-spec/blob/master/annotations.md PiperOrigin-RevId: 284254847
2019-12-05Fix possible race condition destroying containerFabricio Voznika
When the sandbox is destroyed, making URPC calls to destroy the container will fail. The code was checking if the sandbox was running before attempting to make the URPC call, but that is racy. PiperOrigin-RevId: 284093764
2019-12-03Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets.Dean Deng
There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the latter. Fixes #1188 PiperOrigin-RevId: 283550925
2019-12-03Enable runsc compatLog support on arm64.Haibo Xu
Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I3fd5e552f5f03b5144ed52647f75af3b8253b1d6
2019-11-27Add support for receiving TOS and TCLASS control messages in hostinet.Dean Deng
This involves allowing getsockopt/setsockopt for the corresponding socket options, as well as allowing hostinet to process control messages received from the actual recvmsg syscall. PiperOrigin-RevId: 282851425
2019-11-26Merge pull request #981 from tanjianfeng:fix-898gVisor bot
PiperOrigin-RevId: 282669859
2019-11-25Use mount hints to determine FileAccessTypeFabricio Voznika
PiperOrigin-RevId: 282401165
2019-11-25Merge pull request #1176 from xiaobo55x:runsc_bootgVisor bot
PiperOrigin-RevId: 282382564
2019-11-23gofer: reduce CPU usage on GC as of frequent readdirJianfeng Tan
Refer to golang mallocgc(), each time of allocating an object > 32 KB, a gc will be triggered. When we do readdir, sentry always passes 65535, which leads to a malloc of 65535 * sizeof(p9.Direnta) > 32 KB. Considering we already use slice append, let's avoid defining the capability for this slide. Command for test: Before this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m54.272s user 0m2.010s sys 0m1.740s (CPU usage of Gofer: ~30 cores) (host)$ perf top -p <pid-of-gofer> 42.57% runsc [.] runtime.gcDrain 23.41% runsc [.] runtime.(*lfstack).pop 9.74% runsc [.] runtime.greyobject 8.06% runsc [.] runtime.(*lfstack).push 4.33% runsc [.] runtime.scanobject 1.69% runsc [.] runtime.findObject 1.12% runsc [.] runtime.findrunnable 0.69% runsc [.] runtime.runqgrab ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 2m10.934s user 0m0.280s sys 0m4.260s (CPU usage of Gofer: ~1 core) After this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m22.465s user 0m1.270s sys 0m1.310s (CPU usage of Gofer: ~1 core) $ perf top -p <pid-of-gofer> 20.57% runsc [.] runtime.gcDrain 7.15% runsc [.] runtime.(*lfstack).pop 4.11% runsc [.] runtime.scanobject 3.78% runsc [.] runtime.greyobject 2.78% runsc [.] runtime.(*lfstack).push ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 0m13.338s user 0m0.190s sys 0m3.980s (CPU usage of Gofer: ~0.8 core) Fixes #898 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-11-22Force timezone initialization before filter installationMichael Pratt
The first use of time.Local (usually via time.Time.Date, et. al) performs initialization of the local timezone, which involves open several tzdata files from the host. Since filter installation disallows open, we should explicitly force this initialization rather than implicitly depending on the first logging (or other time) call occurring before filter installation. PiperOrigin-RevId: 282053121
2019-11-13Enable runsc/boot support on arm64.Haibo Xu
This patch also include a minor change to replace syscall.Dup2 with syscall.Dup3 which was missed in a previous commit(ref a25a976). Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-06Add p9.OpenTruncate.Jamie Liu
This is required to implement O_TRUNC correctly on filesystems backed by gofers. 9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags bits, e.g. O_RDONLY, O_RDWR, O_WRONLY." open(2): "The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation flags and file status flags can be bitwise-or'd in flags." The reference 9P2000.L implementation also appears to expect arbitrary flags, not just access modes, in Tlopen.flags: https://github.com/chaos/diod/blob/master/diod/ops.c#L703 PiperOrigin-RevId: 278972683
2019-11-05Fix repository build scripts.Adin Scannell
This fixes a number of issues with the repository build process: * Fix the overall structure of the repository. * Fix the debian package description. * Fix the broken version number for packages. * Update the digest algorithm used for signing the release. I've validated that installation works from a separate staging bucket. Updates #852 PiperOrigin-RevId: 278716914
2019-11-04Add NETLINK_KOBJECT_UEVENT socket supportMichael Pratt
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events. gVisor doesn't have any device events, so our sockets don't need to do anything once created. systemd's device manager needs to be able to create one of these sockets. It also wants to install a BPF filter on the socket. Since we'll never send any messages, the filter would never be invoked, thus we just fake it out. Fixes #1117 Updates #1119 PiperOrigin-RevId: 278405893
2019-11-01Merge pull request #1109 from xiaobo55x:fsgofergVisor bot
PiperOrigin-RevId: 278032567
2019-11-01Allow the watchdog to detect when the sandbox is stuck during setup.Nicolas Lacasse
The watchdog currently can find stuck tasks, but has no way to tell if the sandbox is stuck before the application starts executing. This CL adds a startup timeout and action to the watchdog. If Start() is not called before the given timeout (if non-zero), then the watchdog will take the action. PiperOrigin-RevId: 277970577
2019-10-31Add systemd-cgroup flag option.Ian Lewis
Adds a systemd-cgroup flag option that prints an error letting the user know that systemd cgroups are not supported and points them to the relevant issue. Issue #193 PiperOrigin-RevId: 277837162
2019-10-31Merge pull request #1058 from cmingxu:mastergVisor bot
PiperOrigin-RevId: 277623766
2019-10-30Fix container lockingFabricio Voznika
Sandbox root dir was not being saved with the Container state, so it would point to the wrong directory location when attempting to lock the sandbox. This led to race conditions saving and loading container state. Fixing it, led to multiple deadlocks. I've moved the saving and locking logic to a separate struct and moved the lock file inside the RootDir (instead of container root dir), which allows the lock to be taken inside Destroy, and removes the need to lock the sandbox. PiperOrigin-RevId: 277599612
2019-10-30Store endpoints inside multiPortEndpoint in a sorted orderAndrei Vagin
It is required to guarantee the same order of endpoints after save/restore. PiperOrigin-RevId: 277598665
2019-10-30Enable runsc/fsgofer support on arm64.Haibo Xu
newfstatat() syscall is not supported on arm64, so we resort to use the fstatat() syscall. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
2019-10-28Cast the Stat_t.Nlink to uint64 on arm64.Haibo Xu
Since the syscall.Stat_t.Nlink is defined as different types on amd64 and arm64(uint64 and uint32 respectively), we need to cast them to a unified uint64 type in gVisor code. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-24Fix early deletion of rootDirFabricio Voznika
container.startContainers() cannot be called twice in a test (e.g. TestMultiContainerLoadSandbox) because the cleanup function deletes the rootDir, together with information from all other containers that may exist. PiperOrigin-RevId: 276591806
2019-10-23fix typokevin.xu
fix a typo
2019-10-23remove duplicated periodkevin.xu
remove a duplicated period
2019-10-22Merge pull request #1046 from tomlanyon:criogVisor bot
PiperOrigin-RevId: 276172466
2019-10-22netstack/tcp: software segmentation offloadAndrei Vagin
Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-21AF_PACKET support for netstack (aka epsocket).Kevin Krakauer
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366
2019-10-20Add runsc OCI annotations to support CRI-O.Tom Lanyon
Obligatory https://xkcd.com/927 Fixes #626
2019-10-18Cleanup host UDS supportMichael Pratt
This change fixes several issues with the fsgofer host UDS support. Notably, it adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes unsafe use of unet.Socket, which could cause a panic if Socket.FD is called when err != nil, and calls to Socket.FD with nothing to prevent the garbage collector from destroying and closing the socket. A set of tests is added to exercise host UDS access. This required extracting most of the syscall test runner into a library that can be used by custom tests. Updates #235 Updates #1003 [1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can only reply to a client that binds first. We don't allow bind, so these are unlikely to be used. PiperOrigin-RevId: 275558502
2019-10-16Fix problem with open FD when copy up is triggered in overlayfsFabricio Voznika
Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-15Make Attach no longer a special snowflakeMichael Pratt
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW file or connect a socket rather than creating a standard control file like localFile.Walk. This is unecessary and error-prone, as the attach point still has to go through Open or Connect which will properly convert the control file to something usable. As such, switch the logic to be equivalent to a simple Walk. Updates #235 PiperOrigin-RevId: 274827872
2019-10-14Merge pull request #997 from dvrkps:patch-1gVisor bot
PiperOrigin-RevId: 274675428