summaryrefslogtreecommitdiffhomepage
path: root/runsc
AgeCommit message (Collapse)Author
2019-12-17Add minimum CPU number and only lower CPUs on --cpu-num-from-quotaAleksandr Razumov
* Add `--cpu-num-min` flag to control minimum CPUs * Only lower CPU count * Fix comments
2019-12-15Set CPU number to CPU quotaAleksandr Razumov
When application is not cgroups-aware, it can spawn excessive threads which often defaults to CPU number. Introduce a opt-in flag that will set CPU number accordingly to CPU quota (if available). Fixes #1391
2019-12-12Add iptables testing framework.Kevin Krakauer
It would be preferrable to test iptables via syscall tests, but there are some problems with that approach: * We're limited to loopback-only, as syscall tests involve only a single container. Other link interfaces (e.g. fdbased) should be tested. * We'd have to shell out to call iptables anyways, as the iptables syscall interface itself is too large and complex to work with alone. * Running the Linux/native version of the syscall test will require root, which is a pain to configure, is inherently unsafe, and could leave host iptables misconfigured. Using the go_test target allows there to be no new test runner. PiperOrigin-RevId: 285274275
2019-12-11Enable IPv6 in runscBhasker Hariharan
Fixes #1341 PiperOrigin-RevId: 285108973
2019-12-11runsc/debug: add an option to list all processesAndrei Vagin
runsc debug --ps list all processes with all threads. This option is added to the debug command but not to the ps command, because it is going to be used for debug purposes and we want to add any useful information without thinking about backward compatibility. This will help to investigate syzkaller issues. PiperOrigin-RevId: 285013668
2019-12-11Finish incomplete comment.Dean Deng
PiperOrigin-RevId: 285012278
2019-12-06Bump up Go 1.13 as minimum requirementFabricio Voznika
PiperOrigin-RevId: 284320186
2019-12-06Merge pull request #1233 from xiaobo55x:compatLoggVisor bot
PiperOrigin-RevId: 284305935
2019-12-06Add runtime tracing.Adin Scannell
This adds meaningful annotations to the trace generated by the runtime/trace package. PiperOrigin-RevId: 284290115
2019-12-06Implement TTY field in control.Processes().Nicolas Lacasse
Threadgroups already know their TTY (if they have one), which now contains the TTY Index, and is returned in the Processes() call. PiperOrigin-RevId: 284263850
2019-12-06Make annotations OCI compliantFabricio Voznika
Changed annotation to follow the standard defined here: https://github.com/opencontainers/image-spec/blob/master/annotations.md PiperOrigin-RevId: 284254847
2019-12-05Fix possible race condition destroying containerFabricio Voznika
When the sandbox is destroyed, making URPC calls to destroy the container will fail. The code was checking if the sandbox was running before attempting to make the URPC call, but that is racy. PiperOrigin-RevId: 284093764
2019-12-03Support IP_TOS and IPV6_TCLASS socket options for hostinet sockets.Dean Deng
There are two potential ways of sending a TOS byte with outgoing packets: including a control message in sendmsg, or setting the IP_TOS/IPV6_TCLASS socket options (for IPV4 and IPV6 respectively). This change lets hostinet support the latter. Fixes #1188 PiperOrigin-RevId: 283550925
2019-12-03Enable runsc compatLog support on arm64.Haibo Xu
Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I3fd5e552f5f03b5144ed52647f75af3b8253b1d6
2019-11-27Add support for receiving TOS and TCLASS control messages in hostinet.Dean Deng
This involves allowing getsockopt/setsockopt for the corresponding socket options, as well as allowing hostinet to process control messages received from the actual recvmsg syscall. PiperOrigin-RevId: 282851425
2019-11-26Merge pull request #981 from tanjianfeng:fix-898gVisor bot
PiperOrigin-RevId: 282669859
2019-11-25Use mount hints to determine FileAccessTypeFabricio Voznika
PiperOrigin-RevId: 282401165
2019-11-25Merge pull request #1176 from xiaobo55x:runsc_bootgVisor bot
PiperOrigin-RevId: 282382564
2019-11-23gofer: reduce CPU usage on GC as of frequent readdirJianfeng Tan
Refer to golang mallocgc(), each time of allocating an object > 32 KB, a gc will be triggered. When we do readdir, sentry always passes 65535, which leads to a malloc of 65535 * sizeof(p9.Direnta) > 32 KB. Considering we already use slice append, let's avoid defining the capability for this slide. Command for test: Before this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m54.272s user 0m2.010s sys 0m1.740s (CPU usage of Gofer: ~30 cores) (host)$ perf top -p <pid-of-gofer> 42.57% runsc [.] runtime.gcDrain 23.41% runsc [.] runtime.(*lfstack).pop 9.74% runsc [.] runtime.greyobject 8.06% runsc [.] runtime.(*lfstack).push 4.33% runsc [.] runtime.scanobject 1.69% runsc [.] runtime.findObject 1.12% runsc [.] runtime.findrunnable 0.69% runsc [.] runtime.runqgrab ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 2m10.934s user 0m0.280s sys 0m4.260s (CPU usage of Gofer: ~1 core) After this change: (container)$ time tree linux-5.3.1 > /dev/null real 0m22.465s user 0m1.270s sys 0m1.310s (CPU usage of Gofer: ~1 core) $ perf top -p <pid-of-gofer> 20.57% runsc [.] runtime.gcDrain 7.15% runsc [.] runtime.(*lfstack).pop 4.11% runsc [.] runtime.scanobject 3.78% runsc [.] runtime.greyobject 2.78% runsc [.] runtime.(*lfstack).push ... (host)$ mkdir test && cd test (host)$ for i in `seq 1 65536`; do mkdir $i; done (container)$ time ls test/ > /dev/null real 0m13.338s user 0m0.190s sys 0m3.980s (CPU usage of Gofer: ~0.8 core) Fixes #898 Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
2019-11-22Force timezone initialization before filter installationMichael Pratt
The first use of time.Local (usually via time.Time.Date, et. al) performs initialization of the local timezone, which involves open several tzdata files from the host. Since filter installation disallows open, we should explicitly force this initialization rather than implicitly depending on the first logging (or other time) call occurring before filter installation. PiperOrigin-RevId: 282053121
2019-11-13Enable runsc/boot support on arm64.Haibo Xu
This patch also include a minor change to replace syscall.Dup2 with syscall.Dup3 which was missed in a previous commit(ref a25a976). Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
2019-11-06Add p9.OpenTruncate.Jamie Liu
This is required to implement O_TRUNC correctly on filesystems backed by gofers. 9P2000.L: "lopen prepares fid for file I/O. flags contains Linux open(2) flags bits, e.g. O_RDONLY, O_RDWR, O_WRONLY." open(2): "The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR. ... In addition, zero or more file creation flags and file status flags can be bitwise-or'd in flags." The reference 9P2000.L implementation also appears to expect arbitrary flags, not just access modes, in Tlopen.flags: https://github.com/chaos/diod/blob/master/diod/ops.c#L703 PiperOrigin-RevId: 278972683
2019-11-05Fix repository build scripts.Adin Scannell
This fixes a number of issues with the repository build process: * Fix the overall structure of the repository. * Fix the debian package description. * Fix the broken version number for packages. * Update the digest algorithm used for signing the release. I've validated that installation works from a separate staging bucket. Updates #852 PiperOrigin-RevId: 278716914
2019-11-04Add NETLINK_KOBJECT_UEVENT socket supportMichael Pratt
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events. gVisor doesn't have any device events, so our sockets don't need to do anything once created. systemd's device manager needs to be able to create one of these sockets. It also wants to install a BPF filter on the socket. Since we'll never send any messages, the filter would never be invoked, thus we just fake it out. Fixes #1117 Updates #1119 PiperOrigin-RevId: 278405893
2019-11-01Merge pull request #1109 from xiaobo55x:fsgofergVisor bot
PiperOrigin-RevId: 278032567
2019-11-01Allow the watchdog to detect when the sandbox is stuck during setup.Nicolas Lacasse
The watchdog currently can find stuck tasks, but has no way to tell if the sandbox is stuck before the application starts executing. This CL adds a startup timeout and action to the watchdog. If Start() is not called before the given timeout (if non-zero), then the watchdog will take the action. PiperOrigin-RevId: 277970577
2019-10-31Add systemd-cgroup flag option.Ian Lewis
Adds a systemd-cgroup flag option that prints an error letting the user know that systemd cgroups are not supported and points them to the relevant issue. Issue #193 PiperOrigin-RevId: 277837162
2019-10-31Merge pull request #1058 from cmingxu:mastergVisor bot
PiperOrigin-RevId: 277623766
2019-10-30Fix container lockingFabricio Voznika
Sandbox root dir was not being saved with the Container state, so it would point to the wrong directory location when attempting to lock the sandbox. This led to race conditions saving and loading container state. Fixing it, led to multiple deadlocks. I've moved the saving and locking logic to a separate struct and moved the lock file inside the RootDir (instead of container root dir), which allows the lock to be taken inside Destroy, and removes the need to lock the sandbox. PiperOrigin-RevId: 277599612
2019-10-30Store endpoints inside multiPortEndpoint in a sorted orderAndrei Vagin
It is required to guarantee the same order of endpoints after save/restore. PiperOrigin-RevId: 277598665
2019-10-30Enable runsc/fsgofer support on arm64.Haibo Xu
newfstatat() syscall is not supported on arm64, so we resort to use the fstatat() syscall. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I9e89d46c5ec9ae07db201c9da5b6dda9bfd2eaf0
2019-10-28Cast the Stat_t.Nlink to uint64 on arm64.Haibo Xu
Since the syscall.Stat_t.Nlink is defined as different types on amd64 and arm64(uint64 and uint32 respectively), we need to cast them to a unified uint64 type in gVisor code. Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I7542b99b195c708f3fc49b1cbe6adebdd2f6e96b
2019-10-24Fix early deletion of rootDirFabricio Voznika
container.startContainers() cannot be called twice in a test (e.g. TestMultiContainerLoadSandbox) because the cleanup function deletes the rootDir, together with information from all other containers that may exist. PiperOrigin-RevId: 276591806
2019-10-23fix typokevin.xu
fix a typo
2019-10-23remove duplicated periodkevin.xu
remove a duplicated period
2019-10-22Merge pull request #1046 from tomlanyon:criogVisor bot
PiperOrigin-RevId: 276172466
2019-10-22netstack/tcp: software segmentation offloadAndrei Vagin
Right now, we send each tcp packet separately, we call one system call per-packet. This patch allows to generate multiple tcp packets and send them by sendmmsg. The arguable part of this CL is a way how to handle multiple headers. This CL adds the next field to the Prepandable buffer. Nginx test results: Server Software: nginx/1.15.9 Server Hostname: 10.138.0.2 Server Port: 8080 Document Path: /10m.txt Document Length: 10485760 bytes w/o gso: Concurrency Level: 5 Time taken for tests: 5.491 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 18.21 [#/sec] (mean) Time per request: 274.525 [ms] (mean) Time per request: 54.905 [ms] (mean, across all concurrent requests) Transfer rate: 186508.03 [Kbytes/sec] received sw-gso: Concurrency Level: 5 Time taken for tests: 3.852 seconds Complete requests: 100 Failed requests: 0 Total transferred: 1048600200 bytes HTML transferred: 1048576000 bytes Requests per second: 25.96 [#/sec] (mean) Time per request: 192.576 [ms] (mean) Time per request: 38.515 [ms] (mean, across all concurrent requests) Transfer rate: 265874.92 [Kbytes/sec] received w/o gso: $ ./tcp_benchmark --client --duration 15 --ideal [SUM] 0.0-15.1 sec 2.20 GBytes 1.25 Gbits/sec software gso: $ tcp_benchmark --client --duration 15 --ideal --gso $((1<<16)) --swgso [SUM] 0.0-15.1 sec 3.99 GBytes 2.26 Gbits/sec PiperOrigin-RevId: 276112677
2019-10-21AF_PACKET support for netstack (aka epsocket).Kevin Krakauer
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With runsc, you'll need to pass `--net-raw=true` to enable them. Binding isn't supported yet. PiperOrigin-RevId: 275909366
2019-10-20Add runsc OCI annotations to support CRI-O.Tom Lanyon
Obligatory https://xkcd.com/927 Fixes #626
2019-10-18Cleanup host UDS supportMichael Pratt
This change fixes several issues with the fsgofer host UDS support. Notably, it adds support for SOCK_SEQPACKET and SOCK_DGRAM sockets [1]. It also fixes unsafe use of unet.Socket, which could cause a panic if Socket.FD is called when err != nil, and calls to Socket.FD with nothing to prevent the garbage collector from destroying and closing the socket. A set of tests is added to exercise host UDS access. This required extracting most of the syscall test runner into a library that can be used by custom tests. Updates #235 Updates #1003 [1] N.B. SOCK_DGRAM sockets are likely not particularly useful, as a server can only reply to a client that binds first. We don't allow bind, so these are unlikely to be used. PiperOrigin-RevId: 275558502
2019-10-16Fix problem with open FD when copy up is triggered in overlayfsFabricio Voznika
Linux kernel before 4.19 doesn't implement a feature that updates open FD after a file is open for write (and is copied to the upper layer). Already open FD will continue to read the old file content until they are reopened. This is especially problematic for gVisor because it caches open files. Flag was added to force readonly files to be reopenned when the same file is open for write. This is only needed if using kernels prior to 4.19. Closes #1006 It's difficult to really test this because we never run on tests on older kernels. I'm adding a test in GKE which uses kernels with the overlayfs problem for 1.14 and lower. PiperOrigin-RevId: 275115289
2019-10-15Make Attach no longer a special snowflakeMichael Pratt
fsgofer.attachPoint.Attach has a bunch of funky special logic to create a RW file or connect a socket rather than creating a standard control file like localFile.Walk. This is unecessary and error-prone, as the attach point still has to go through Open or Connect which will properly convert the control file to something usable. As such, switch the logic to be equivalent to a simple Walk. Updates #235 PiperOrigin-RevId: 274827872
2019-10-14Merge pull request #997 from dvrkps:patch-1gVisor bot
PiperOrigin-RevId: 274675428
2019-10-11Set base to rootDavor Kapsa
2019-10-10Update TODO for OCI seccomp support.Ian Lewis
PiperOrigin-RevId: 274042343
2019-10-10Remove unnecessary assignment to pathDavor Kapsa
2019-10-10Allow rt_sigreturn in runsc goferMichael Pratt
rt_sigreturn is required for signal handling (e.g., SIGSEGV for nil-pointer dereference). Before this, nil-pointer dereferences cause a syscall violation instead of a panic. PiperOrigin-RevId: 274028767
2019-10-08Remove stale TODOFabricio Voznika
PiperOrigin-RevId: 273630282
2019-10-08Ignore mount options that are not supported in shared mountsFabricio Voznika
Options that do not change mount behavior inside the Sentry are irrelevant and should not be used when looking for possible incompatibilities between master and slave mounts. PiperOrigin-RevId: 273593486
2019-10-07Implement IP_TTL.Ian Gudger
Also change the default TTL to 64 to match Linux. PiperOrigin-RevId: 273430341