summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2019-08-02Remove kernel.mounts.Nicolas Lacasse
We can get the mount namespace from the CreateProcessArgs in all cases where we need it. This also gets rid of kernel.Destroy method, since the only thing it was doing was DecRefing the mounts. Removing the need to call kernel.SetRootMountNamespace also allowed for some more simplifications in the container fs setup code. PiperOrigin-RevId: 261357060
2019-08-01Drop reference on fs.Inode if Mount goes wrong.Nicolas Lacasse
PiperOrigin-RevId: 261203674
2019-08-01tmpfs and ramfs Dirs should drop references on children in Release().Nicolas Lacasse
This is the source of many warnings like: AtomicRefCount 0x7f5ff84e3500 owned by "fs.Inode" garbage collected with ref count of 1 (want 0) PiperOrigin-RevId: 261197093
2019-08-01Implement getsockopt(TCP_INFO).Rahat Mahmood
Export some readily-available fields for TCP_INFO and stub out the rest. PiperOrigin-RevId: 261191548
2019-07-31Basic support for 'ip route'Ian Lewis
Implements support for RTM_GETROUTE requests for netlink sockets. Fixes #507 PiperOrigin-RevId: 261051045
2019-07-31Initialize kernel.unimplementedSyscallEmitter with a sync.Once.Nicolas Lacasse
This is initialized lazily on the first unimplemented syscall. Without the sync.Once, this is racy. PiperOrigin-RevId: 260971758
2019-07-30Cache pages in CachingInodeOperations.Read when memory evictions are delayed.Jamie Liu
PiperOrigin-RevId: 260851452
2019-07-30ext: Migrate from using fileReader custom interface to using io.Reader.Ayush Ranjan
It gets rid of holding state of the io.Reader offset (which is anyways held by the vfs.FileDescriptor struct. It is also odd using a io.Reader becuase we using io.ReaderAt to interact with the device. So making a io.ReaderAt wrapper makes more sense. Most importantly, it gets rid of the complexity of extracting the file reader from a regular file implementation and then using it. Now we can just use the regular file implementation as a reader which is more intuitive. PiperOrigin-RevId: 260846927
2019-07-30ext: block map file reader implementation.Ayush Ranjan
Also adds stress tests for block map reader and intensifies extent reader tests. PiperOrigin-RevId: 260838177
2019-07-30Merge pull request #607 from DarcySail:mastergVisor bot
PiperOrigin-RevId: 260783254
2019-07-30Add feature to launch Sentry from an open host FD.Zach Koopmans
Adds feature to launch from an open host FD instead of a binary_path. The FD should point to a valid executable and most likely be statically compiled. If the executable is not statically compiled, the loader will search along the interpreter paths, which must be able to be resolved in the Sandbox's file system or start will fail. PiperOrigin-RevId: 260756825
2019-07-29Migrate from using io.ReadSeeker to io.ReaderAt.Ayush Ranjan
This provides the following benefits: - We can now use pkg/fd package which does not take ownership of the file descriptor. So it does not close the fd when garbage collected. This reduces scope of errors from unexpected garbage collection of io.File. - It enforces the offset parameter in every read call. It does not affect the fd offset nor is it affected by it. Hence reducing scope of error of using stale offsets when reading. - We do not need to serialize the usage of any global file descriptor anymore. So this drops the mutual exclusion req hence reducing complexity and congestion. PiperOrigin-RevId: 260635174
2019-07-30Combine multiple epoll events copiesHang Su
Allocate a larger memory buffer and combine multiple copies into one copy, to reduce the number of copies from kernel memory to user memory. Signed-off-by: Hang Su <darcy.sh@antfin.com>
2019-07-29ext: extent reader implementation.Ayush Ranjan
PiperOrigin-RevId: 260629559
2019-07-29ext: inode implementations.Ayush Ranjan
PiperOrigin-RevId: 260624470
2019-07-29Rate limit the unimplemented syscall event handler.Nicolas Lacasse
This introduces two new types of Emitters: 1. MultiEmitter, which will forward events to other registered Emitters, and 2. RateLimitedEmitter, which will forward events to a wrapped Emitter, subject to given rate limits. The methods in the eventchannel package itself act like a multiEmitter, but is not actually an Emitter. Now we have a DefaultEmitter, and the methods in eventchannel simply forward calls to the DefaultEmitter. The unimplemented syscall handler now uses a RateLimetedEmitter that wraps the DefaultEmitter. PiperOrigin-RevId: 260612770
2019-07-26Merge pull request #452 from zhangningdlut:chris_test_pidnsgVisor bot
PiperOrigin-RevId: 260220279
2019-07-25Automated rollback of changelist 255679453Fabricio Voznika
PiperOrigin-RevId: 260047477
2019-07-24ext: filesystem boilerplate code.Ayush Ranjan
PiperOrigin-RevId: 259865366
2019-07-24ext: Add tests for root directory inode.Ayush Ranjan
PiperOrigin-RevId: 259856442
2019-07-24ext: testing environment setup with VFS2 support.Ayush Ranjan
PiperOrigin-RevId: 259835948
2019-07-24Add support for a subnet prefix length on interface network addressesChris Kuiper
This allows the user code to add a network address with a subnet prefix length. The prefix length value is stored in the network endpoint and provided back to the user in the ProtocolAddress type. PiperOrigin-RevId: 259807693
2019-07-24Use different pidns among different containerschris.zn
The different containers in a sandbox used only one pid namespace before. This results in that a container can see the processes in another container in the same sandbox. This patch use different pid namespace for different containers. Signed-off-by: chris.zn <chris.zn@antfin.com>
2019-07-23ext: Inode creation logic.Ayush Ranjan
PiperOrigin-RevId: 259666476
2019-07-23ext: Add ext2 and ext3 tiny images.Ayush Ranjan
PiperOrigin-RevId: 259657917
2019-07-23ext: Added extent tree building logic.Ayush Ranjan
PiperOrigin-RevId: 259628657
2019-07-23Give each container a distinct MountNamespace.Nicolas Lacasse
This keeps all container filesystem completely separate from eachother (including from the root container filesystem), and allows us to get rid of the "__runsc_containers__" directory. It also simplifies container startup/teardown as we don't have to muck around in the root container's filesystem. PiperOrigin-RevId: 259613346
2019-07-22Merge pull request #571 from lubinszARM:pr_loadergVisor bot
PiperOrigin-RevId: 259427074
2019-07-22kvm: fix race between machine.Put and machine.GetAndrei Vagin
m.available.Signal() has to be called under m.mu.RLock, otherwise it can race with machine.Get: m.Get | m.Put ------------------------------------- m.mu.Lock() | Seatching available vcpu| | m.available.Signal() m.available.Wait | PiperOrigin-RevId: 259394051
2019-07-21Add ARM64 support to pkg/sentry/loaderBin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2019-07-19Merge pull request #450 from Pixep:feature/add-clock-boottime-as-monotonicgVisor bot
PiperOrigin-RevId: 258996346
2019-07-18net/tcp/setockopt: impelment setsockopt(fd, SOL_TCP, TCP_INQ)Andrei Vagin
PiperOrigin-RevId: 258859507
2019-07-18Sentry virtual filesystem, v2Jamie Liu
Major differences from the current ("v1") sentry VFS: - Path resolution is Filesystem-driven (FilesystemImpl methods call vfs.ResolvingPath methods) rather than VFS-driven (fs package owns a Dirent tree and calls fs.InodeOperations methods to populate it). This drastically improves performance, primarily by reducing overhead from inefficient synchronization and indirection. It also makes it possible to implement remote filesystem protocols that translate FS system calls into single RPCs, rather than having to make (at least) one RPC per path component, significantly reducing the latency of remote filesystems (especially during cold starts and for uncacheable shared filesystems). - Mounts are correctly represented as a separate check based on contextual state (current mount) rather than direct replacement in a fs.Dirent tree. This makes it possible to support (non-recursive) bind mounts and mount namespaces. Included in this CL is fsimpl/memfs, an incomplete in-memory filesystem that exists primarily to demonstrate intended filesystem implementation patterns and for benchmarking: BenchmarkVFS1TmpfsStat/1-6 3000000 497 ns/op BenchmarkVFS1TmpfsStat/2-6 2000000 676 ns/op BenchmarkVFS1TmpfsStat/3-6 2000000 904 ns/op BenchmarkVFS1TmpfsStat/8-6 1000000 1944 ns/op BenchmarkVFS1TmpfsStat/64-6 100000 14067 ns/op BenchmarkVFS1TmpfsStat/100-6 50000 21700 ns/op BenchmarkVFS2MemfsStat/1-6 10000000 197 ns/op BenchmarkVFS2MemfsStat/2-6 5000000 233 ns/op BenchmarkVFS2MemfsStat/3-6 5000000 268 ns/op BenchmarkVFS2MemfsStat/8-6 3000000 477 ns/op BenchmarkVFS2MemfsStat/64-6 500000 2592 ns/op BenchmarkVFS2MemfsStat/100-6 300000 4045 ns/op BenchmarkVFS1TmpfsMountStat/1-6 2000000 679 ns/op BenchmarkVFS1TmpfsMountStat/2-6 2000000 912 ns/op BenchmarkVFS1TmpfsMountStat/3-6 1000000 1113 ns/op BenchmarkVFS1TmpfsMountStat/8-6 1000000 2118 ns/op BenchmarkVFS1TmpfsMountStat/64-6 100000 14251 ns/op BenchmarkVFS1TmpfsMountStat/100-6 100000 22397 ns/op BenchmarkVFS2MemfsMountStat/1-6 5000000 317 ns/op BenchmarkVFS2MemfsMountStat/2-6 5000000 361 ns/op BenchmarkVFS2MemfsMountStat/3-6 5000000 387 ns/op BenchmarkVFS2MemfsMountStat/8-6 3000000 582 ns/op BenchmarkVFS2MemfsMountStat/64-6 500000 2699 ns/op BenchmarkVFS2MemfsMountStat/100-6 300000 4133 ns/op From this we can infer that, on this machine: - Constant cost for tmpfs stat() is ~160ns in VFS2 and ~280ns in VFS1. - Per-path-component cost is ~35ns in VFS2 and ~215ns in VFS1, a difference of about 6x. - The cost of crossing a mount boundary is about 80ns in VFS2 (MemfsMountStat/1 does approximately the same amount of work as MemfsStat/2, except that it also crosses a mount boundary). This is an inescapable cost of the separate mount lookup needed to support bind mounts and mount namespaces. PiperOrigin-RevId: 258853946
2019-07-17sys_time: Wrap comments to 80 columnsAdrien Leravat
2019-07-17Take copyMu in RevalidateMichael Pratt
copyMu is required to read child.overlay.upper. PiperOrigin-RevId: 258662209
2019-07-17Separate O_DSYNC and O_SYNC.Jamie Liu
PiperOrigin-RevId: 258657913
2019-07-17ext: disklayout: extents support.Ayush Ranjan
PiperOrigin-RevId: 258657776
2019-07-17ext: Filesystem init implementation.Ayush Ranjan
PiperOrigin-RevId: 258645957
2019-07-17Merge pull request #355 from zhuangel:mastergVisor bot
PiperOrigin-RevId: 258643966
2019-07-17Fix race in FDTable.GetFDs().Bhasker Hariharan
PiperOrigin-RevId: 258635459
2019-07-17Add AF_UNIX, SOCK_RAW sockets, which exist for some reason.Kevin Krakauer
tcpdump creates these. PiperOrigin-RevId: 258611829
2019-07-17Merge pull request #533 from kevinGC:stub-dev-ttygVisor bot
PiperOrigin-RevId: 258607547
2019-07-17Properly invalidate cache in rename and removeMichael Pratt
We were invalidating the wrong overlayEntry in rename and missing invalidation in rename and remove if lower exists. PiperOrigin-RevId: 258604685
2019-07-16Merge pull request #474 from zhuangel:proctasksgVisor bot
PiperOrigin-RevId: 258479216
2019-07-15Support /proc/net/devJianfeng Tan
This proc file reports the stats of interfaces. We could use ifconfig command to check the result. Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com> Change-Id: Ia7c1e637f5c76c30791ffda68ee61e861b6ef827 COPYBARA_INTEGRATE_REVIEW=https://gvisor-review.googlesource.com/c/gvisor/+/18282/ PiperOrigin-RevId: 258303936
2019-07-15kvm: wake up all waiter of vCPU.stateAndrei Vagin
Now we call FUTEX_WAKE with ^uintptr(0) of waiters, but in this case only one waiter will be waked up. If we want to wake up all of them, the number of waiters has to be set to math.MaxInt32. PiperOrigin-RevId: 258285286
2019-07-12Add IPPROTO_RAW, which allows raw sockets to write IP headers.Kevin Krakauer
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to manipulate the kernel's tables, but it doesn't actually use any of the functionality. Blegh. PiperOrigin-RevId: 257903078
2019-07-12Stub out support for TCP_MAXSEG.Bhasker Hariharan
Adds support to set/get the TCP_MAXSEG value but does not really change the segment sizes emitted by netstack or alter the MSS advertised by the endpoint. This is currently being added only to unblock iperf3 on gVisor. Plumbing this correctly requires a bit more work which will come in separate CLs. PiperOrigin-RevId: 257859112
2019-07-12Merge pull request #282 from zhangningdlut:chris_test_procgVisor bot
PiperOrigin-RevId: 257855479
2019-07-12Don't emit an event for extended attribute syscalls.Nicolas Lacasse
These are filesystem-specific, and filesystems are allowed to return ENOTSUP if they are not supported. PiperOrigin-RevId: 257813477