Age | Commit message (Collapse) | Author |
|
Refactoring in 0036d1f7eb95bcc52977f15507f00dd07018e7e2 (v4.10) caused Linux to
start unconditionally zeroing the remainder of the last page in the
interpreter. Previously it did not due so if filesz == memsz, and *still* does
not do so when filesz == memsz for loading binaries, only interpreter.
This inconsistency is not worth replicating in gVisor, as it is arguably a bug,
but our tests must ensure we create interpreter ELFs compatible with this new
requirement.
PiperOrigin-RevId: 272266401
|
|
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to
track their CPU usage. This improves latency in the syscall path by avoid
expensive monotonic clock calls on every syscall entry/exit.
However, this timer fires every 10ms. Thus, when all tasks are idle (i.e.,
blocked or stopped), this forces a sentry wakeup every 10ms, when we may
otherwise be able to sleep until the next app-relevant event. These wakeups
cause the sentry to utilize approximately 2% CPU when the application is
otherwise idle.
Updates to clock are not strictly necessary when the app is idle, as there are
no readers of cpuClock. This commit reduces idle CPU by disabling the timer
when tasks are completely idle, and computing its effects at the next wakeup.
Rather than disabling the timer as soon as the app goes idle, we wait until the
next tick, which provides a window for short sleeps to sleep and wakeup without
doing the (relatively) expensive work of disabling and enabling the timer.
PiperOrigin-RevId: 272265822
|
|
'docker exec' was getting CAP_NET_RAW even when --net-raw=false
because it was not filtered out from when copying container's
capabilities.
PiperOrigin-RevId: 272260451
|
|
Linux changed this behavior in 16e72e9b30986ee15f17fbb68189ca842c32af58
(v4.11). Previously, extra pages were always mapped RW. Now, those pages will
be executable if the segment specified PF_X. They still must be writeable.
PiperOrigin-RevId: 272256280
|
|
PiperOrigin-RevId: 272059043
|
|
It looks like the old code attempted to do this, but didn't realize that err !=
nil even in the happy case.
PiperOrigin-RevId: 272005887
|
|
PiperOrigin-RevId: 271665517
|
|
PiperOrigin-RevId: 271649711
|
|
PiperOrigin-RevId: 271644926
|
|
PiperOrigin-RevId: 271442321
|
|
This change fixes compile errors:
pty.cc:1460:7: error: expected primary-expression before '.' token
...
PiperOrigin-RevId: 271033729
|
|
Closes #261
PiperOrigin-RevId: 270973347
|
|
This makes them run much faster. Also cleaned up the log reporting.
PiperOrigin-RevId: 270799808
|
|
We already do this for `runsc run`, but need to do the same for `runsc exec`.
PiperOrigin-RevId: 270793459
|
|
The test is checking the wrong poll_fd for POLLHUP. The only
reason it passed till now was because it was also checking
for POLLIN which was always true on the other fd from the
previous poll!
PiperOrigin-RevId: 270780401
|
|
PiperOrigin-RevId: 270764996
|
|
Previously, when we set hostname:
$ strace hostname abc
...
sethostname("abc", 3) = -1 ENAMETOOLONG (File name too long)
...
According to man 2 sethostname:
"The len argument specifies the number of bytes in name. (Thus, name
does not require a terminating null byte.)"
We wrongly use the CopyStringIn() to check terminating zero byte in
the implementation of sethostname syscall.
To fix this, we use CopyInBytes() instead.
Fixes: #861
Reported-by: chenglang.hy <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
Adresses a deadlock with the rolled back change:
https://github.com/google/gvisor/commit/b6a5b950d28e0b474fdad160b88bc15314cf9259
Creating a session from an orphaned process group was causing a lock to be
acquired twice by a single goroutine. This behavior is addressed, and a test
(OrphanRegression) has been added to pty.cc.
Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().
Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.
PiperOrigin-RevId: 270088599
|
|
Default of 20 shards was arbitrary and will need fine-tuning in later CLs.
PiperOrigin-RevId: 269922871
|
|
Note that the exact semantics for these signalfds are slightly different from
Linux. These signalfds are bound to the process at creation time. Reads, polls,
etc. are all associated with signals directed at that task. In Linux, all
signalfd operations are associated with current, regardless of where the
signalfd originated.
In practice, this should not be an issue given how signalfds are used. In order
to fix this however, we will need to plumb the context through all the event
APIs. This gets complicated really quickly, because the waiter APIs are all
netstack-specific, and not generally exposed to the context. Probably not
worthwhile fixing immediately.
PiperOrigin-RevId: 269901749
|
|
- Fix ARG syntax in Dockerfiles.
- Fix curl commands in Dockerfiles.
- Fix some paths in proctor binaries.
- Check error from Walk in search helper.
PiperOrigin-RevId: 269641686
|
|
* Use multi-stage builds in Dockerfiles.
* Combine all proctor binaries into a single binary.
* Change the TestRunner interface to reduce code duplication.
PiperOrigin-RevId: 269462101
|
|
absl flags are more modern and we can easily depend on them directly.
The repo now successfully builds with --incompatible_load_cc_rules_from_bzl.
PiperOrigin-RevId: 269387081
|
|
- Sandbox logs are generated when running tests
- Kokoro uploads the sandbox logs
- Supports multiple parallel runs
- Revive script to install locally built runsc with docker
PiperOrigin-RevId: 269337274
|
|
ENOTDIR has to be returned when a component used as a directory in
pathname is not, in fact, a directory.
PiperOrigin-RevId: 269037893
|
|
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.
Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.
PiperOrigin-RevId: 268805207
|
|
A recent Kokoro change pointed to go_tests.cfg (in line with the
other configurations), which unfortunately broke the presubmits.
This change also enabled the KVM tests, which were still using a
remote execution strategy.
This fixes both of these issues and allows presubmits to pass.
One additional test was caught with this case, which seems to
have been broken. It's unclear why this was not being caught.
PiperOrigin-RevId: 268166291
|
|
See https://github.com/bazelbuild/bazel/issues/8743. This will be required in
Bazel 1.0.
Protobuf was updated in
https://github.com/protocolbuffers/protobuf/commit/bf0c69e1302fe9568fbe310cc54b37d20a9d16a3#diff-96239ee297e0a92ac6ff96a6bc434ef0.
GoogleTest was updated in
https://github.com/google/googletest/commit/6fd262ecf787d0dc2a91696fd4bf1d3ee1ebfa14.
gflags has not yet been updated, so the repo still won't build with
--incompatible_load_cc_rules_from_bzl.
Tested with buildifier -warnings=native-cc -lint=warn **/BUILD.
PiperOrigin-RevId: 267638515
|
|
This is done because the root container for CRI is the infrastructure (pause)
container and always gets a low oom_score_adj. We do this to ensure that only
the oom_score_adj of user containers is used to calculated the sandbox
oom_score_adj.
Implemented in runsc rather than the containerd shim as it's a bit cleaner to
implement here (in the shim it would require overwriting the oomScoreAdj and
re-writing out the config.json again). This processing is Kubernetes(CRI)
specific but we are currently only supporting CRI for multi-container support
anyway.
PiperOrigin-RevId: 267507706
|
|
TestNoDuplicates is racy as it tries to read the /proc file system
while the test is running. But it's possible that from the time a
directory entries are read and each entry processed something could
change and in some cases the entry being processed could have been
deleted. In such cases we should not fail the test but just
ignore the error and move on.
PiperOrigin-RevId: 267483094
|
|
- Most AIO tests call io_setup(nr_events = 128). sizeof(struct io_event)
(128*32 = 4096). However, the actual size of the mapping created by
io_setup() is determined by:
(from fs/aio.c:ioctx_alloc())
/*
* We keep track of the number of available ringbuffer slots, to prevent
* overflow (reqs_available), and we also use percpu counters for this.
*
* So since up to half the slots might be on other cpu's percpu counters
* and unavailable, double nr_events so userspace sees what they
* expected: additionally, we move req_batch slots to/from percpu
* counters at a time, so make sure that isn't 0:
*/
nr_events = max(nr_events, num_possible_cpus() * 4);
nr_events *= 2;
(from fs/aio.c:aio_setup_ring())
/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
size = sizeof(struct aio_ring);
size += sizeof(struct io_event) * nr_events;
nr_pages = PFN_UP(size);
When we mremap() only the first page of a multi-page AIO ring buffer
mapping, fs/aio.c:aio_ring_mremap() updates struct kioctx::mmap_base -
but struct kioctx::mmap_size is untouched, so sys_io_destroy() =>
kill_ioctx() vm_unmaps() the mremapped page, plus some number of pages
after it. Just get the actual size of the mapping from /proc/self/maps.
- Delete test case MremapOver; while it is correct that Linux will not
complain if you overwrite the AIO ring buffer with another mapping, it
won't actually work in the sense that AIO events will not be written to
the new mapping, because Linux stores the struct pages of the ring
buffer in struct kioctx::ring_pages and writes to those through kmap()
rather than using userspace addresses.
- Don't munmap() after mremap(MREMAP_FIXED) returns EFAULT; see new
comment in factored-out test case MremapExpansion.
PiperOrigin-RevId: 267482903
|
|
PiperOrigin-RevId: 267280086
|
|
The simple test script has gotten out of control. Shard this script into
different pieces and attempt to impose order on overall test structure. This
change helps lay some of the foundations for future improvements.
* The runsc/test directories are moved into just test/.
* The runsc/test/testutil package is split into logical pieces.
* The scripts/ directory contains new top-level targets.
* Each test is now responsible for building targets it requires.
* The install functionality is moved into `runsc` itself for simplicity.
* The existing kokoro run_tests.sh file now just calls all (can be split).
After this change is merged, I will create multiple distinct workflows for
Kokoro, one for each of the scripts currently targeted by `run_tests.sh` today,
which should dramatically reduce the time-to-run for the Kokoro tests, and
provides a better foundation for further improvements to the infrastructure.
PiperOrigin-RevId: 267081397
|
|
PiperOrigin-RevId: 266491264
|
|
PiperOrigin-RevId: 266491246
|
|
Ioctl was returning just the buffer size from epsocket.endpoint
and it was not considering data from epsocket.SocketOperations
that was read from the endpoint, but not yet sent to the caller.
PiperOrigin-RevId: 266485461
|
|
This was accidentally introduced in 31f05d5d4f62c4cd4fe3b95b333d0130aae4b2c1.
Fixes #788.
PiperOrigin-RevId: 266462843
|
|
When abstract unix domain socket paths are displayed in
/proc/net/unix, Linux historically emitted null bytes as padding at
the end of the path. Newer versions of Linux (v4.9,
e7947ea770d0de434d38a0f823e660d3fd4bebb5) display these as '@'
characters.
Update proc_net_unix test to handle both version of the padding.
PiperOrigin-RevId: 266230200
|
|
PiperOrigin-RevId: 266229756
|
|
Using "go run ..." in the ENTRYPOINT causes the go compiler to run each time
the container is started. We can just compile the binary once as part of the
image.
PiperOrigin-RevId: 266212462
|
|
PiperOrigin-RevId: 266199211
|
|
Fix a uninitialized memory bug in pwritev2 test.
PiperOrigin-RevId: 265772176
|
|
When output file is in append mode, sendfile(2) should fail
with EINVAL and not EBADF.
Closes #721
PiperOrigin-RevId: 265718958
|
|
PiperOrigin-RevId: 265535438
|
|
The flake had the call to futex_unlock_pi() returning EINVAL with the
FUTEX_OWNER_DIED set. In this case, userspace has to clean up stale
state. So instead of calling FUTEX_UNLOCK_PI outright, we'll use the
advised atomic compare_exchange as advised in the man page.
PiperOrigin-RevId: 265163920
|
|
In cl/264434674 and cl/264498919, we stop running test cases
in parallel to not overload test hosts. But now tests requires
more time to run, so we need to increase a default number of
shards or a default test timeout. Let's start with increasing
the number of shards and see how it will works.
PiperOrigin-RevId: 264917055
|
|
The gunit macros are not safe to use in the child.
PiperOrigin-RevId: 264904348
|
|
This fixes the issue of not being able to bind to either a multicast or
broadcast address as well as to send and receive data from it. The way to solve
this is to treat these addresses similar to the ANY address and register their
transport endpoint ID with the global stack's demuxer rather than the NIC's.
That way there is no need to require an endpoint with that multicast or
broadcast address. The stack's demuxer is in fact the only correct one to use,
because neither broadcast- nor multicast-bound sockets care which NIC a
packet was received on (for multicast a join is still needed to receive packets
on a NIC).
I also took the liberty of refactoring udp_test.go to consolidate a lot of
duplicate code and make it easier to create repetitive tests that test the same
feature for a variety of packet and socket types. For this purpose I created a
"flowType" that represents two things: 1) the type of packet being sent or
received and 2) the type of socket used for the test. E.g., a "multicastV4in6"
flow represents a V4-mapped multicast packet run through a V6-dual socket.
This allows writing significantly simpler tests. A nice example is testTTL().
PiperOrigin-RevId: 264766909
|
|
test/syscalls/linux/proc_net_tcp.cc:252: Failure
Value of: connect(client->get(), &addr, addrlen)
Expected: not -1 (success)
Actual: -1 (of type int), with errno PosixError(errno=4 Interrupted system call)
PiperOrigin-RevId: 264743815
|
|
goroutine 5 [running]:
os/signal.process(0x10e21c0, 0xc00050c280)
third_party/go/gc/src/os/signal/signal.go:227 +0x164
os/signal.loop()
third_party/go/gc/src/os/signal/signal_unix.go:23 +0x3e
created by os/signal.init.0
third_party/go/gc/src/os/signal/signal_unix.go:29 +0x41
PiperOrigin-RevId: 264518530
|