Age | Commit message (Collapse) | Author |
|
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.
When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.
To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.
PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
|
|
PiperOrigin-RevId: 215658757
Change-Id: If63b33293f3e53a7f607ae72daa79e2b7ef6fcfd
|
|
PiperOrigin-RevId: 215655197
Change-Id: I668b1bc7c29daaf2999f8f759138bcbb09c4de6f
|
|
Terminal support in runsc relies on host tty file descriptors that are imported
into the sandbox. Application tty ioctls are sent directly to the host fd.
However, those host tty ioctls are associated in the host kernel with a host
process (in this case runsc), and the host kernel intercepts job control
characters like ^C and send signals to the host process. Thus, typing ^C into a
"runsc exec" shell will send a SIGINT to the runsc process.
This change makes "runsc exec" handle all signals, and forward them into the
sandbox via the "ContainerSignal" urpc method. Since the "runsc exec" is
associated with a particular container process in the sandbox, the signal must
be associated with the same container process.
One big difficulty is that the signal should not necessarily be sent to the
sandbox process started by "exec", but instead must be sent to the foreground
process group for the tty. For example, we may exec "bash", and from bash call
"sleep 100". A ^C at this point should SIGINT sleep, not bash.
To handle this, tty files inside the sandbox must keep track of their
foreground process group, which is set/get via ioctls. When an incoming
ContainerSignal urpc comes in, we look up the foreground process group via the
tty file. Unfortunately, this means we have to expose and cache the tty file in
the Loader.
Note that "runsc exec" now handles signals properly, but "runs run" does not.
That will come in a later CL, as this one is complex enough already.
Example:
root@:/usr/local/apache2# sleep 100
^C
root@:/usr/local/apache2# sleep 100
^Z
[1]+ Stopped sleep 100
root@:/usr/local/apache2# fg
sleep 100
^C
root@:/usr/local/apache2#
PiperOrigin-RevId: 215334554
Change-Id: I53cdce39653027908510a5ba8d08c49f9cf24f39
|
|
host.endpoint already has the check, but it is missing from
host.ConnectedEndpoint.
PiperOrigin-RevId: 214962762
Change-Id: I88bb13a5c5871775e4e7bf2608433df8a3d348e6
|
|
We already forward TCSETS and TCSETSW. TCSETSF is roughly equivalent but
discards pending input.
The filters were relaxed to allow host ioctls with TCSETSF argument.
This fixes programs like "passwd" that prevent user input from being displayed
on the terminal.
Before:
root@b8a0240fc836:/# passwd
Enter new UNIX password: 123
Retype new UNIX password: 123
passwd: password updated successfully
After:
root@ae6f5dabe402:/# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
PiperOrigin-RevId: 214869788
Change-Id: I31b4d1373c1388f7b51d0f2f45ce40aa8e8b0b58
|
|
If we have an overlay file whose corresponding Dirent is frozen, then we should
not bother calling Readdir on the upper or lower files, since DirentReaddir
will calculate children based on the frozen Dirent tree.
A test was added that fails without this change.
PiperOrigin-RevId: 213531215
Change-Id: I4d6c98f1416541a476a34418f664ba58f936a81d
|
|
PiperOrigin-RevId: 213328293
Change-Id: I4164133e6f709ecdb89ffbb5f7df3324c273860a
|
|
PiperOrigin-RevId: 213058623
Change-Id: I522598c655d633b9330990951ff1c54d1023ec29
|
|
Linux permits hard-linking if the target is owned by the user OR the target has
Read+Write permission.
PiperOrigin-RevId: 213024613
Change-Id: If642066317b568b99084edd33ee4e8822ec9cbb3
|
|
We must use a context.Context with a Root Dirent that corresponds to the
container's chroot. Previously we were using the root context, which does not
have a chroot.
Getting the correct context required refactoring some of the path-lookup code.
We can't lookup the path without a context.Context, which requires
kernel.CreateProcArgs, which we only get inside control.Execute. So we have to
do the path lookup much later than we previously were.
PiperOrigin-RevId: 212064734
Change-Id: I84a5cfadacb21fd9c3ab9c393f7e308a40b9b537
|
|
Imported file needs to be closed after it's
been imported.
PiperOrigin-RevId: 211732472
Change-Id: Ia9249210558b77be076bcce465b832a22eed301f
|
|
PiperOrigin-RevId: 211513847
Change-Id: Ib484dd2d921c3e5d70d0e410cd973d3bff4f6b73
|
|
Furthermore, allow for the specification of an ElementMapper. This allows a
single "Element" type to exist on multiple inline lists, and work without
having to embed the entry type.
This is a requisite change for supporting a per-Inode list of Dirents.
PiperOrigin-RevId: 211467497
Change-Id: If2768999b43e03fdaecf8ed15f435fe37518d163
|
|
From //pkg/sentry/context/context.go:
// - It is *not safe* to retain a Context passed to a function beyond the scope
// of that function call.
Passing a stored kernel.Task as a context.Context to
fs.FileOwnerFromContext violates this requirement.
PiperOrigin-RevId: 211143021
Change-Id: I4c5b02bd941407be4c9cfdbcbdfe5a26acaec037
|
|
PiperOrigin-RevId: 210953512
Change-Id: I07d2d7fb0d268aa8eca26d81ef28b5b5c42289ee
|
|
dirent.walk() takes renameMu, but is often called with renameMu already held,
which can lead to a deadlock.
Fix this by requiring renameMu to be held for reading when dirent.walk() is
called. This causes walks and existence checks to block while a rename
operation takes place, but that is what we were already trying to enforce by
taking renameMu in walk() anyways.
PiperOrigin-RevId: 210760780
Change-Id: Id61018e6e4adbeac53b9c1b3aa24ab77f75d8a54
|
|
dirent.go:Rename() walks to the file being replaced and defers
replaced.DecRef(). After the rename, the reference is dropped, triggering a
writeout and SettAttr call to the gofer. Because of lazyOpenForWrite, the gofer
opens the replaced file BY ITS OLD NAME and calls ftruncate on it.
This CL changes Remove to drop the reference on replaced (and thus trigger
writeout) before the actual rename call.
PiperOrigin-RevId: 210756097
Change-Id: I01ea09a5ee6c2e2d464560362f09943641638e0f
|
|
PiperOrigin-RevId: 210616454
Change-Id: I3f536e2b4d603e540cdd9a67c61b8ec3351f4ac3
|
|
PiperOrigin-RevId: 210614669
Change-Id: I408365ff6d6c7765ed7b789446d30e7079cbfc67
|
|
Weak references save / restore involves multiple interface indirection
and cause material latency overhead when there are lots of dirents, each
containing a weak reference map. The nil entries in the map should also
be purged.
PiperOrigin-RevId: 210593727
Change-Id: Ied6f4c3c0726fcc53a24b983d9b3a79121b6b758
|
|
PiperOrigin-RevId: 210459956
Change-Id: I51859b90fa967631e0a54a390abc3b5541fbee66
|
|
When revalidating a Dirent, if the inode id is the same, then we don't need to
throw away the entire Dirent. We can just update the unstable attributes in
place.
If the inode id has changed, then the remote file has been deleted or moved,
and we have no choice but to throw away the dirent we have a look up another.
In this case, we may still end up losing a mounted dirent that is a child of
the revalidated dirent. However, that seems appropriate here because the entire
mount point has been pulled out from underneath us.
Because gVisor's overlay is at the Inode level rather than the Dirent level, we
must pass the parent Inode and name along with the Inode that is being
revalidated.
PiperOrigin-RevId: 210431270
Change-Id: I705caef9c68900234972d5aac4ae3a78c61c7d42
|
|
PiperOrigin-RevId: 210405166
Change-Id: I252766015885c418e914007baf2fc058fec39b3e
|
|
Implements the TIOCGWINSZ and TIOCSWINSZ ioctls, which allow processes to resize
the terminal. This allows, for example, sshd to properly set the window size for
ssh sessions.
PiperOrigin-RevId: 210392504
Change-Id: I0d4789154d6d22f02509b31d71392e13ee4a50ba
|
|
This CL adds terminal support for "docker exec". We previously only supported
consoles for the container process, but not exec processes.
The SYS_IOCTL syscall was added to the default seccomp filter list, but only
for ioctls that get/set winsize and termios structs. We need to allow these
ioctl for all containers because it's possible to run "exec -ti" on a
container that was started without an attached console, after the filters
have been installed.
Note that control-character signals are still not properly supported.
Tested with:
$ docker run --runtime=runsc -it alpine
In another terminial:
$ docker exec -it <containerid> /bin/sh
PiperOrigin-RevId: 210185456
Change-Id: I6d2401e53a7697bb988c120a8961505c335f96d9
|
|
PiperOrigin-RevId: 210182476
Change-Id: I655a2a801e2069108d30323f7f5ae76deb3ea3ec
|
|
PiperOrigin-RevId: 209994384
Change-Id: I16186cf79cb4760a134f3968db30c168a5f4340e
|
|
PiperOrigin-RevId: 209817767
Change-Id: Iddf2b8441bc44f31f9a8cf6f2bd8e7a5b824b487
|
|
As required by the contract in Dirent.flush().
Also inline Dirent.freeze() into Dirent.Freeze(), since it is only called from
there.
PiperOrigin-RevId: 209783626
Change-Id: Ie6de4533d93dd299ffa01dabfa257c9cc259b1f4
|
|
When an inode file state failed to load asynchronuously, we want to report
the error instead of potentially panicing in another async loading goroutine
incorrectly unblocked.
PiperOrigin-RevId: 209683977
Change-Id: I591cde97710bbe3cdc53717ee58f1d28bbda9261
|
|
Numpy needs these.
Also added the "present" directory, since the contents are the same as possible
and online.
PiperOrigin-RevId: 209451777
Change-Id: I2048de3f57bf1c57e9b5421d607ca89c2a173684
|
|
Some linux commands depend on /sys/devices/system/cpu/possible, such
as 'lscpu'.
Add 2 knobs for cpu:
/sys/devices/system/cpu/possible
/sys/devices/system/cpu/online
Both the values are '0 - Kernel.ApplicationCores()-1'.
Change-Id: Iabd8a4e559cbb630ed249686b92c22b4e7120663
PiperOrigin-RevId: 209070163
|
|
Previously, gofer filesystems were configured with the default "fscache"
policy, which caches filesystem metadata and contents aggressively. While this
setting is best for performance, it means that changes from inside the sandbox
may not be immediately propagated outside the sandbox, and vice-versa.
This CL changes volumes and the root fs configuration to use a new
"remote-revalidate" cache policy which tries to retain as much caching as
possible while still making fs changes visible across the sandbox boundary.
This cache policy is enabled by default for the root filesystem. The default
value for the "--file-access" flag is still "proxy", but the behavior is
changed to use the new cache policy.
A new value for the "--file-access" flag is added, called "proxy-exclusive",
which turns on the previous aggressive caching behavior. As the name implies,
this flag should be used when the sandbox has "exclusive" access to the
filesystem.
All volume mounts are configured to use the new cache policy, since it is
safest and most likely to be correct. There is not currently a way to change
this behavior, but it's possible to add such a mechanism in the future. The
configurability is a smaller issue for volumes, since most of the expensive
application fs operations (walking + stating files) will likely served by the
root fs.
PiperOrigin-RevId: 208735037
Change-Id: Ife048fab1948205f6665df8563434dbc6ca8cfc9
|
|
Now, there's a waiter for each end (master and slave) of the TTY, and each
waiter.Entry is only enqueued in one of the waiters.
PiperOrigin-RevId: 208734483
Change-Id: I06996148f123075f8dd48cde5a553e2be74c6dce
|
|
stat()-ing /proc/PID/fd/FD incremented but didn't decrement the refcount for
FD. This behavior wasn't usually noticeable, but in the above case:
- ls would never decrement the refcount of the write end of the pipe to 0.
- This caused the write end of the pipe never to close.
- wc would then hang read()-ing from the pipe.
PiperOrigin-RevId: 208728817
Change-Id: I4fca1ba5ca24e4108915a1d30b41dc63da40604d
|
|
InodeOperations.Bind now returns a Dirent which will be cached in the Dirent
tree.
When an overlay is in-use, Bind cannot return the Dirent created by the upper
filesystem because the Dirent does not know about the overlay. Instead,
overlayBind must create a new overlay-aware Inode and Dirent and return that.
This is analagous to how Lookup and overlayLookup work.
PiperOrigin-RevId: 208670710
Change-Id: I6390affbcf94c38656b4b458e248739b4853da29
|
|
PiperOrigin-RevId: 208533436
Change-Id: Ifc1a4e2d6438a424650bee831c301b1ac0d670a3
|
|
Previously, an overlay would panic if either the upper or lower fs required
revalidation for a given Dirent. Now, we allow revalidation from the upper
file, but not the lower.
If a cached overlay inode does need revalidation (because the upper needs
revalidation), then the entire overlay Inode will be discarded and a new
overlay Inode will be built with a fresh copy of the upper file.
As a side effect of this change, Revalidate must take an Inode instead of a
Dirent, since an overlay needs to revalidate individual Inodes.
PiperOrigin-RevId: 208293638
Change-Id: Ic8f8d1ffdc09114721745661a09522b54420c5f1
|
|
The cache policy determines whether Lookup should return a negative dirent, or
just ENOENT. This CL fixes one spot where we returned a negative dirent without
first consulting the policy.
PiperOrigin-RevId: 208280230
Change-Id: I8f963bbdb45a95a74ad0ecc1eef47eff2092d3a4
|
|
Previously, processes which used file-system Unix Domain Sockets could not be
checkpoint-ed in runsc because the sockets were saved with their inode
numbers which do not necessarily remain the same upon restore. Now,
the sockets are also saved with their paths so that the new inodes
can be determined for the sockets based on these paths after restoring.
Tests for cases with UDS use are included. Test cleanup to come.
PiperOrigin-RevId: 208268781
Change-Id: Ieaa5d5d9a64914ca105cae199fd8492710b1d7ec
|
|
This CL adds a new cache-policy for gofer filesystems that uses the host page
cache, but causes dirents to be reloaded on each Walk, and does not cache
readdir results.
This policy is useful when the remote filesystem may change out from underneath
us, as any remote changes will be reflected on the next Walk.
Importantly, this cache policy is only consistent if we do not use gVisor's
internal page cache, since that page cache is tied to the Inode and may be
thrown away upon Revalidation.
This cache policy should only be used when the gofer supports donating host
FDs, since then gVisor will make use of the host kernel page cache, which will
be consistent for all open files in the gofer. In fact, a panic will be raised
if a file is opened without a donated FD.
PiperOrigin-RevId: 207752937
Change-Id: I233cb78b4695bbe00a4605ae64080a47629329b8
|
|
PiperOrigin-RevId: 207737935
Change-Id: I3a301ece1f1d30909715f36562474e3248b6a0d5
|
|
In other news, apparently proc.fdInfo is the last user of ramfs.File.
PiperOrigin-RevId: 207564572
Change-Id: I5a92515698cc89652b80bea9a32d309e14059869
|
|
PiperOrigin-RevId: 207303405
Change-Id: I17b6433963d78e3631a862b7ac80f566c8e7d106
|
|
PiperOrigin-RevId: 207180809
Change-Id: I08c264812919e81b2c56fdd4a9ef06924de8b52f
|
|
PiperOrigin-RevId: 207125440
Change-Id: I6c572afb4d693ee72a0c458a988b0e96d191cd49
|
|
PiperOrigin-RevId: 207037226
Change-Id: I8b5f1a056d4f3eab17846f2e0193bb737ecb5428
|
|
PiperOrigin-RevId: 207007153
Change-Id: Ifedf1cc3758dc18be16647a4ece9c840c1c636c9
|
|
Currently, there is an attempt to print FD flags, but
they are not decoded into a number, so we see something like this:
/criu # cat /proc/self/fdinfo/0
flags: {%!o(bool=000false)}
Actually, fdinfo has to contain file flags.
Change-Id: Idcbb7db908067447eb9ae6f2c3cfb861f2be1a97
PiperOrigin-RevId: 206794498
|