Age | Commit message (Collapse) | Author |
|
Copy up parent when binding UDS on overlayfs is supported in commit
02ab1f187cd24c67b754b004229421d189cee264.
But the using of copyUp in overlayBind will cause sentry stuck, reason
is dead lock in renameMu.
1 [Process A] Invoke a Unix socket bind operation
renameMu is hold in fs.(*Dirent).genericCreate by process A
2 [Process B] Invoke a read syscall on /proc/task/mounts
waitng on Lock of renameMu in fs.(*MountNamespace).FindMount
3 [Process A] Continue Unix socket bind operation
wating on RLock of renameMu in fs.copyUp
Root cause is recursive reading lock of reanmeMu in bind call trace,
if there are writing lock between the two reading lock, then deadlock
occured.
Fixes #1397
|
|
PiperOrigin-RevId: 284606233
|
|
Threadgroups already know their TTY (if they have one), which now contains the
TTY Index, and is returned in the Processes() call.
PiperOrigin-RevId: 284263850
|
|
PiperOrigin-RevId: 284038840
|
|
PiperOrigin-RevId: 283630669
|
|
This allows writable proc and devices files to be opened with O_CREAT|O_TRUNC.
This is encountered most frequently when interacting with proc or devices files
via the command line.
e.g. $ echo 8192 1048576 4194304 > /proc/sys/net/ipv4/tcp_rmem
Also adds a test to test the behavior of open(O_TRUNC), truncate, and ftruncate
on named pipes.
Fixes #1116
PiperOrigin-RevId: 282677425
|
|
PiperOrigin-RevId: 282382564
|
|
PiperOrigin-RevId: 281795269
|
|
Note that the Sentry still calls Truncate() on the file before calling Open.
A new p9 version check was added to ensure that the p9 server can handle the
the OpenTruncate flag. If not, then the flag is stripped before sending.
PiperOrigin-RevId: 281609112
|
|
PiperOrigin-RevId: 281112758
|
|
It was possible to panic the sentry by opening a cache revalidating folder with
O_TRUNC|O_CREAT.
Avoids breaking php tests.
PiperOrigin-RevId: 280533213
|
|
PiperOrigin-RevId: 280131840
|
|
newfstatat() syscall is not supported on arm64, so we resort
to use the fstatat() syscall.
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Iea95550ea53bcf85c01f7b3b95da70ad0952177d
|
|
This patch also include a minor change to replace syscall.Dup2
with syscall.Dup3 which was missed in a previous commit(ref a25a976).
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: I00beb9cc492e44c762ebaa3750201c63c1f7c2f3
|
|
PiperOrigin-RevId: 279365629
|
|
It was possible to panic the sentry by opening a cache revalidating folder with
O_TRUNC|O_CREAT.
PiperOrigin-RevId: 278417533
|
|
PiperOrigin-RevId: 276380008
|
|
PiperOrigin-RevId: 275139066
|
|
Linux kernel before 4.19 doesn't implement a feature that updates
open FD after a file is open for write (and is copied to the upper
layer). Already open FD will continue to read the old file content
until they are reopened. This is especially problematic for gVisor
because it caches open files.
Flag was added to force readonly files to be reopenned when the
same file is open for write. This is only needed if using kernels
prior to 4.19.
Closes #1006
It's difficult to really test this because we never run on tests
on older kernels. I'm adding a test in GKE which uses kernels
with the overlayfs problem for 1.14 and lower.
PiperOrigin-RevId: 275115289
|
|
When any of these flags are set, all writes will trigger a subsequent fsync
call. This behavior already existed for "write-through" mounts.
O_DIRECT is treated as an alias for O_SYNC. Better support coming soon.
PiperOrigin-RevId: 275114392
|
|
PiperOrigin-RevId: 275114157
|
|
This proc file reports routing information to applications inside the
container.
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I498e47f8c4c185419befbb42d849d0b099ec71f3
|
|
This proc file contains statistics according to [1].
[1] https://tools.ietf.org/html/rfc2013
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I9662132085edd8a7783d356ce4237d7ac0800d94
|
|
PiperOrigin-RevId: 273421634
|
|
PiperOrigin-RevId: 272987037
|
|
PiperOrigin-RevId: 272522508
|
|
Right now, we can find more than one process with the 1 PID in /proc.
$ for i in `seq 10`; do
> unshare -fp sleep 1000 &
> done
$ ls /proc
1 1 1 1 12 18 24 29 6 loadavg net sys version
1 1 1 1 16 20 26 32 cpuinfo meminfo self thread-self
1 1 1 1 17 21 28 36 filesystems mounts stat uptime
PiperOrigin-RevId: 272506593
|
|
Kernel.cpuClockTicker increments kernel.cpuClock, which tasks use as a clock to
track their CPU usage. This improves latency in the syscall path by avoid
expensive monotonic clock calls on every syscall entry/exit.
However, this timer fires every 10ms. Thus, when all tasks are idle (i.e.,
blocked or stopped), this forces a sentry wakeup every 10ms, when we may
otherwise be able to sleep until the next app-relevant event. These wakeups
cause the sentry to utilize approximately 2% CPU when the application is
otherwise idle.
Updates to clock are not strictly necessary when the app is idle, as there are
no readers of cpuClock. This commit reduces idle CPU by disabling the timer
when tasks are completely idle, and computing its effects at the next wakeup.
Rather than disabling the timer as soon as the app goes idle, we wait until the
next tick, which provides a window for short sleeps to sleep and wakeup without
doing the (relatively) expensive work of disabling and enabling the timer.
PiperOrigin-RevId: 272265822
|
|
Reported-by: syzbot+bb5ed342be51d39b0cbb@syzkaller.appspotmail.com
PiperOrigin-RevId: 272110815
|
|
The gofer's CachingInodeOperations implementation contains an optimization for
the common open-read-close pattern when we have a host FD. In this case, the
host kernel will update the timestamp for us to a reasonably close time, so we
don't need an extra RPC to the gofer.
However, when the app explicitly sets the timestamps (via futimes or similar)
then we actually DO need to update the timestamps, because the host kernel
won't do it for us.
To fix this, a new boolean `forceSetTimestamps` was added to
CachineInodeOperations.SetMaskedAttributes. It is only set by
gofer.InodeOperations.SetTimestamps.
PiperOrigin-RevId: 272048146
|
|
How to reproduce:
$ echo "timeout 10 ls" > foo.sh
$ chmod +x foo.sh
$ ./foo.sh
(will hang here for 10 secs, and the output of ls does not show)
When "ls" process writes to stdout, it receives SIGTTOU signal, and
hangs there. Until "timeout" process timeouts, and kills "ls" process.
The expected result is: "ls" writes its output into tty, and terminates
immdedately, then "timeout" process receives SIGCHLD and terminates.
The reason for this failure is that we missed the check for TOSTOP (if
set, background processes will receive the SIGTTOU signal when they do
write).
We use drivers/tty/n_tty.c:n_tty_write() as a reference.
Fixes: #862
Reported-by: chris.zn <chris.zn@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Signed-off-by: chenglang.hy <chenglang.hy@antfin.com>
|
|
Fixes: #829
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Signed-off-by: Jielong Zhou <jielong.zjl@antfin.com>
|
|
Adresses a deadlock with the rolled back change:
https://github.com/google/gvisor/commit/b6a5b950d28e0b474fdad160b88bc15314cf9259
Creating a session from an orphaned process group was causing a lock to be
acquired twice by a single goroutine. This behavior is addressed, and a test
(OrphanRegression) has been added to pty.cc.
Implemented the following ioctls:
- TIOCSCTTY - set controlling TTY
- TIOCNOTTY - remove controlling tty, maybe signal some other processes
- TIOCGPGRP - get foreground process group. Also enables tcgetpgrp().
- TIOCSPGRP - set foreground process group. Also enabled tcsetpgrp().
Next steps are to actually turn terminal-generated control characters (e.g. C^c)
into signals to the proper process groups, and to send SIGTTOU and SIGTTIN when
appropriate.
PiperOrigin-RevId: 270088599
|
|
ENOTDIR has to be returned when a component used as a directory in
pathname is not, in fact, a directory.
PiperOrigin-RevId: 269037893
|
|
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.
Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.
PiperOrigin-RevId: 268805207
|
|
They are no-ops, so the standard rule works fine.
PiperOrigin-RevId: 268776264
|
|
PiperOrigin-RevId: 266496644
|
|
PiperOrigin-RevId: 266491264
|
|
PiperOrigin-RevId: 266229756
|
|
PiperOrigin-RevId: 266177409
|
|
This used to be the case, but regressed after a recent change.
Also made a few fixes around it and clean up the code a bit.
Closes #720
PiperOrigin-RevId: 265717496
|
|
For SOCK_STREAM type unix socket, we shall return ECONNRESET if peer is
closed with data not read.
We explictly set a flag when closing one end, to differentiate from
just shutdown (where zero shall be returned).
Fixes: #735
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
This is in accordance with newer parts of the standard library.
PiperOrigin-RevId: 263449916
|
|
PiperOrigin-RevId: 263203441
|
|
fsimpl is the keeper of all filesystem implementations in VFS2.
PiperOrigin-RevId: 262617869
|
|
Added benchmark tests which emulate memfs benchmarks.
Stat benchmarks
BenchmarkVFS2Ext4fsStat/1-12 10000000 145 ns/op
BenchmarkVFS2Ext4fsStat/2-12 10000000 170 ns/op
BenchmarkVFS2Ext4fsStat/3-12 10000000 202 ns/op
BenchmarkVFS2Ext4fsStat/8-12 3000000 374 ns/op
BenchmarkVFS2Ext4fsStat/64-12 500000 2159 ns/op
BenchmarkVFS2Ext4fsStat/100-12 300000 3459 ns/op
BenchmarkVFS1TmpfsStat/1-12 5000000 348 ns/op
BenchmarkVFS1TmpfsStat/2-12 3000000 487 ns/op
BenchmarkVFS1TmpfsStat/3-12 2000000 655 ns/op
BenchmarkVFS1TmpfsStat/8-12 1000000 1365 ns/op
BenchmarkVFS1TmpfsStat/64-12 200000 9565 ns/op
BenchmarkVFS1TmpfsStat/100-12 100000 15158 ns/op
BenchmarkVFS2MemfsStat/1-12 10000000 133 ns/op
BenchmarkVFS2MemfsStat/2-12 10000000 155 ns/op
BenchmarkVFS2MemfsStat/3-12 10000000 182 ns/op
BenchmarkVFS2MemfsStat/8-12 5000000 310 ns/op
BenchmarkVFS2MemfsStat/64-12 1000000 1659 ns/op
BenchmarkVFS2MemfsStat/100-12 500000 2787 ns/op
Mount Stat benchmarks
BenchmarkVFS2ExtfsMountStat/1-12 5000000 245 ns/op
BenchmarkVFS2ExtfsMountStat/2-12 5000000 266 ns/op
BenchmarkVFS2ExtfsMountStat/3-12 5000000 304 ns/op
BenchmarkVFS2ExtfsMountStat/8-12 3000000 456 ns/op
BenchmarkVFS2ExtfsMountStat/64-12 500000 2308 ns/op
BenchmarkVFS2ExtfsMountStat/100-12 300000 3482 ns/op
BenchmarkVFS1TmpfsMountStat/1-12 3000000 488 ns/op
BenchmarkVFS1TmpfsMountStat/2-12 2000000 658 ns/op
BenchmarkVFS1TmpfsMountStat/3-12 2000000 806 ns/op
BenchmarkVFS1TmpfsMountStat/8-12 1000000 1514 ns/op
BenchmarkVFS1TmpfsMountStat/64-12 100000 10037 ns/op
BenchmarkVFS1TmpfsMountStat/100-12 100000 15280 ns/op
BenchmarkVFS2MemfsMountStat/1-12 10000000 212 ns/op
BenchmarkVFS2MemfsMountStat/2-12 5000000 232 ns/op
BenchmarkVFS2MemfsMountStat/3-12 5000000 264 ns/op
BenchmarkVFS2MemfsMountStat/8-12 3000000 390 ns/op
BenchmarkVFS2MemfsMountStat/64-12 1000000 1813 ns/op
BenchmarkVFS2MemfsMountStat/100-12 500000 2812 ns/op
PiperOrigin-RevId: 262477158
|
|
Previously we were representing socket addresses as an interface{},
which allowed any type which could be binary.Marshal()ed to be used as
a socket address. This is fine when the address is passed to userspace
via the linux ABI, but is problematic when used from within the sentry
such as by networking procfs files.
PiperOrigin-RevId: 262460640
|
|
PiperOrigin-RevId: 262264674
|
|
PiperOrigin-RevId: 262249166
|
|
PiperOrigin-RevId: 262242410
|