summaryrefslogtreecommitdiffhomepage
path: root/pkg/sentry
AgeCommit message (Collapse)Author
2018-07-18Fix lock-ordering violation in Create by logging BaseName instead of FullName.Nicolas Lacasse
Dirent.FullName takes the global renameMu, but can be called during Create, which itself takes dirent.mu and dirent.dirMu, which is a lock-order violation: Dirent.Create d.dirMu.Lock d.mu.Lock Inode.Create gofer.inodeOperations.Create gofer.NewFile Dirent.FullName d.renameMu.RLock We only use the FullName here for logging, and in this case we can get by with logging only the BaseName. A `BaseName` method was added to Dirent, which simply returns the name, taking d.parent.mu as required. In the Create pathway, we can't call d.BaseName() because taking d.parent.mu after d.mu violates the lock order. But we already know the base name of the file we just created, so that's OK. In the Open/GetFile pathway, we are free to call d.BaseName() because the other dirent locks are not held. PiperOrigin-RevId: 205112278 Change-Id: Ib45c734081aecc9b225249a65fa8093eb4995f10
2018-07-17Merge FileMem.usage in IncRefMichael Pratt
Per the doc, usage must be kept maximally merged. Beyond that, it is simply a good idea to keep fragmentation in usage to a minimum. The glibc malloc allocator allocates one page at a time, potentially causing lots of fragmentation. However, those pages are likely to have the same number of references, often making it possible to merge ranges. PiperOrigin-RevId: 204960339 Change-Id: I03a050cf771c29a4f05b36eaf75b1a09c9465e14
2018-07-16Add CPUID faulting for ptrace and KVM.Adin Scannell
PiperOrigin-RevId: 204858314 Change-Id: I8252bf8de3232a7a27af51076139b585e73276d4
2018-07-16Start allocation and reclaim scans only where they may find a matchMichael Pratt
If usageSet is heavily fragmented, findUnallocatedRange and findReclaimable can spend excessive cycles linearly scanning the set for unallocated/free pages. Improve common cases by beginning the scan only at the first page that could possibly contain an unallocated/free page. This metadata only guarantees that there is no lower unallocated/free page, but a scan may still be required (especially for multi-page allocations). That said, this heuristic can still provide significant performance improvements for certain applications. PiperOrigin-RevId: 204841833 Change-Id: Ic41ad33bf9537ecd673a6f5852ab353bf63ea1e6
2018-07-16Add EventOperations.HostFD()Neel Natu
This method allows an eventfd inside the Sentry to be registered with with the host kernel. Update comment about memory mapping host fds via CachingInodeOperations. PiperOrigin-RevId: 204784859 Change-Id: I55823321e2d84c17ae0f7efaabc6b55b852ae257
2018-07-13Allow a filesystem to control its visibility in /proc/filesystems.Neel Natu
PiperOrigin-RevId: 204508520 Change-Id: I09e5f8b6e69413370e1a0d39dbb7dc1ee0b6192d
2018-07-13Note that Mount errors do not require translationsMichael Pratt
PiperOrigin-RevId: 204490639 Change-Id: I0fe26306bae9320c6aa4f854fe0ef25eebd93233
2018-07-12Fix aio eventfd lookupMichael Pratt
We're failing to set eventFile in the outer scope. PiperOrigin-RevId: 204392995 Change-Id: Ib9b04f839599ef552d7b5951d08223e2b1d5f6ad
2018-07-12sentry: wait for restore clock instead of panicing in Timekeeper.Zhaozhong Ni
PiperOrigin-RevId: 204372296 Change-Id: If1ed9843b93039806e0c65521f30177dc8036979
2018-07-12sentry: save inet stacks in proc files.Zhaozhong Ni
PiperOrigin-RevId: 204362791 Change-Id: If85ea7442741e299f0d7cddbc3d6b415e285da81
2018-07-12Format documentationMichael Pratt
PiperOrigin-RevId: 204323728 Change-Id: I1ff9aa062ffa12583b2e38ec94c87db7a3711971
2018-07-11Move ptrace constants to abi/linux.Jamie Liu
PiperOrigin-RevId: 204188763 Change-Id: I5596ab7abb3ec9e210a7f57b3fc420e836fa43f3
2018-07-11Add MemoryManager.Pin.Jamie Liu
PiperOrigin-RevId: 204162313 Change-Id: Ib0593dde88ac33e222c12d0dca6733ef1f1035dc
2018-07-10Exit tmpfs.fileInodeOperations.Translate early if required.Start >= EOF.Jamie Liu
Otherwise required and optional can be empty or have negative length. PiperOrigin-RevId: 204007079 Change-Id: I59e472a87a8caac11ffb9a914b8d79bf0cd70995
2018-07-10netstack: tcp socket connected state S/R support.Zhaozhong Ni
PiperOrigin-RevId: 203958972 Change-Id: Ia6fe16547539296d48e2c6731edacdd96bd6e93c
2018-07-09Inherit parent in clone(CLONE_THREAD) under TaskSet.mu.Jamie Liu
PiperOrigin-RevId: 203849534 Change-Id: I4d81513bfd32e0b7fc40c8a4c194eba7abc35a83
2018-07-09Trim all whitespace between interpreter and argMichael Pratt
Multiple whitespace characters are allowed. This fixes Ubuntu's /usr/sbin/invoke-rc.d, which has trailing whitespace after the interpreter which we were treating as an arg. PiperOrigin-RevId: 203802278 Change-Id: I0a6cdb0af4b139cf8abb22fa70351fe3697a5c6b
2018-07-03Fix data race on inotify.Watch.mask.Rahat Mahmood
PiperOrigin-RevId: 203180463 Change-Id: Ief50988c1c028f81ec07a26e704d893e86985bf0
2018-07-03Fix runsc VDSO mappingMichael Pratt
80bdf8a4068de3ac4a73b6b61a0cdcfe3e3571af accidentally moved vdso into an inner scope, never assigning the vdso variable passed to the Kernel and thus skipping VDSO mappings. Fix this and remove the ability for loadVDSO to skip VDSO mappings, since tests that do so are gone. PiperOrigin-RevId: 203169135 Change-Id: Ifd8cadcbaf82f959223c501edcc4d83d05327eba
2018-07-03Handle NUL-only paths in execMichael Pratt
The path in execve(2), interpreter script, and ELF interpreter may all be no more than a NUL-byte. Handle each of those cases. PiperOrigin-RevId: 203155745 Change-Id: I1c8b1b387924b23b2cf942341dfc76c9003da959
2018-07-02Hold d.parent.mu when reading d.nameMichael Pratt
PiperOrigin-RevId: 203041657 Change-Id: I120783d91712818e600505454c9276f8d9877f37
2018-06-29Sets the restore environment for restoring a container.Justine Olshan
Updated how restoring occurs through boot.go with a separate Restore function. This prevents a new process and new mounts from being created. Added tests to ensure the container is restored. Registered checkpoint and restore commands so they can be used. Docker support for these commands is still limited. Working on #80. PiperOrigin-RevId: 202710950 Change-Id: I2b893ceaef6b9442b1ce3743bd112383cb92af0c
2018-06-29aio: Return EINVAL if the number of events is negative.Nicolas Lacasse
PiperOrigin-RevId: 202671065 Change-Id: I248b74544d47ddde9cd59d89aa6ccb7dad2b6f89
2018-06-28Hold t.mu while calling t.FSContext().Nicolas Lacasse
PiperOrigin-RevId: 202562686 Change-Id: I0f5be7cc9098e86fa31d016251c127cb91084b05
2018-06-28Check for invalid offset when submitting an AIO read/write request.Nicolas Lacasse
PiperOrigin-RevId: 202528335 Change-Id: Ic32312cf4337bcb40a7155cb2174e5cd89a280f7
2018-06-27Fix semaphore data racesFabricio Voznika
PiperOrigin-RevId: 202371908 Change-Id: I72603b1d321878cae6404987c49e64732b676331
2018-06-27Call mm.CheckIORange() when copying in IOVecs.Nicolas Lacasse
CheckIORange is analagous to Linux's access_ok() method, which is checked when copying in IOVecs in both lib/iov_iter.c:import_single_range() and lib/iov_iter.c:import_iovec() => fs/read_write.c:rw_copy_check_uvector(). gVisor copies in IOVecs via Task.SingleIOSequence() and Task.CopyInIovecs(). We were checking the address range bounds, but not whether the address is valid. To conform with linux, we should also check that the address is valid. For usual preadv/pwritev syscalls, the effect of this change is not noticeable, since we find out that the address is invalid before the syscall completes. For vectorized async-IO operations, however, this change is necessary because Linux returns EFAULT when the operation is submitted, but before it executes. Thus, we must validate the iovecs when copying them in. PiperOrigin-RevId: 202370092 Change-Id: I8759a63ccf7e6b90d90d30f78ab8935a0fcf4936
2018-06-27Ignore MADV_DONTDUMP and MADV_DODUMP.Jamie Liu
PiperOrigin-RevId: 202361912 Change-Id: I1d0ee529073954d467b870872f494cebbf8ea61a
2018-06-26Add KVM, overlay and host network to image testsFabricio Voznika
PiperOrigin-RevId: 202236006 Change-Id: I4ea964a70fc49e8b51c9da27d77301c4eadaae71
2018-06-26Change SIGCHLD to SIGKILL in ptrace stubs.Adin Scannell
If the child stubs are killed by any unmaskable signal (e.g. SIGKILL), then the parent process will similarly be killed, resulting in the death of all other stubs. The effect of this is that if the OOM killer selects and kills a stub, the effect is the same as though the OOM killer selected and killed the sentry. PiperOrigin-RevId: 202219984 Change-Id: I0b638ce7e59e0a0f4d5cde12a7d05242673049d7
2018-06-26Use the correct Context for /proc/[pid]/maps.Jamie Liu
PiperOrigin-RevId: 202180487 Change-Id: I95cce41a4842ab731a4821b387b32008bfbdcb08
2018-06-26Add Context to seqfile.SeqSource.ReadSeqFileData.Jamie Liu
PiperOrigin-RevId: 202163895 Change-Id: Ib9942fcff80c0834216f4f10780662bef5b52270
2018-06-26Automated rollback of changelist 201596247Brian Geffon
PiperOrigin-RevId: 202151720 Change-Id: I0491172c436bbb32b977f557953ba0bc41cfe299
2018-06-25Fix panic messageMichael Pratt
The arguments are backwards from the message. PiperOrigin-RevId: 202054887 Change-Id: Id5750a84ca091f8b8fbe15be8c648d4fa3e31eb2
2018-06-25Check for empty applicationAddrRange in MM.DecUsers.Jamie Liu
PiperOrigin-RevId: 202043776 Change-Id: I4373abbcf735dc1cf4bebbbbb0c7124df36e9e78
2018-06-25Don't read cwd or root without holding muMichael Pratt
PiperOrigin-RevId: 202043090 Change-Id: I3c47fb3413ca8615d50d8a0503d72fcce9b09421
2018-06-25MountSource.Root() should return a refernce on the dirent.Nicolas Lacasse
PiperOrigin-RevId: 202038397 Change-Id: I074d525f2e2d9bcd43b247b62f86f9129c101b78
2018-06-25Don't read FSContext.root without holding FSContext.muMichael Pratt
IsChrooted still has the opportunity to race with another thread entering the FSContext into a chroot, but that is unchanged (and fine, AFAIK). PiperOrigin-RevId: 202029117 Change-Id: I38bce763b3a7715fa6ae98aa200a19d51a0235f1
2018-06-22Add rpcinet support for SIOCGIFCONF.Brian Geffon
The interfaces and their addresses are already available via the stack Intefaces and InterfaceAddrs. Also add some tests as we had no tests around SIOCGIFCONF. I also added the socket_netgofer lifecycle for IOCTL tests. PiperOrigin-RevId: 201744863 Change-Id: Ie0a285a2a2f859fa0cafada13201d5941b95499a
2018-06-22Simplify some handle logic.Nicolas Lacasse
PiperOrigin-RevId: 201738936 Change-Id: Ib75136415e28e8df0c742acd6b9512d4809fe3a8
2018-06-22Handle mremap(old_size=0).Jamie Liu
PiperOrigin-RevId: 201729703 Change-Id: I486900b0c6ec59533b88da225a5829c474e35a70
2018-06-22Netstack should return EOF on closed read.Brian Geffon
The shutdown behavior where we return EAGAIN for sockets which are non-blocking is only correct for packet based sockets. SOCK_STREAM sockets should return EOF. PiperOrigin-RevId: 201703055 Change-Id: I20b25ceca7286c37766936475855959706fc5397
2018-06-21netstack: tcp socket connected state S/R support.Zhaozhong Ni
PiperOrigin-RevId: 201596247 Change-Id: Id22f47b2cdcbe14aa0d930f7807ba75f91a56724
2018-06-21Drop return from SendExternalSignalMichael Pratt
SendExternalSignal is no longer called before CreateProcess, so it can enforce this simplified precondition. StartForwarding, and after Kernel.Start. PiperOrigin-RevId: 201591170 Change-Id: Ib7022ef7895612d7d82a00942ab59fa433c4d6e9
2018-06-21Forward SIGUSR2 to the sandbox tooFabricio Voznika
SIGUSR2 was being masked out to be used as a way to dump sentry stacks. This could cause compatibility problems in cases anyone uses SIGUSR2 to communicate with the container init process. PiperOrigin-RevId: 201575374 Change-Id: I312246e828f38ad059139bb45b8addc2ed055d74
2018-06-21Implement ioctl(FIOASYNC)Ian Gudger
FIOASYNC and friends are used to send signals when a file is ready for IO. This may or may not be needed by Nginx. While Nginx does use it, it is unclear if the code that uses it has any effect. PiperOrigin-RevId: 201550828 Change-Id: I7ba05a7db4eb2dfffde11e9bd9a35b65b98d7f50
2018-06-20Remove some defers in hot paths in the filesystem code.Nicolas Lacasse
PiperOrigin-RevId: 201401727 Change-Id: Ia5589882ba58a00efb522ab372e206b7e8e62aee
2018-06-20sentry: pending signals S/R optimization.Zhaozhong Ni
Almost all of the hundreds of pending signal queues are empty upon save. PiperOrigin-RevId: 201380318 Change-Id: I40747072435299de681d646e0862efac0637e172
2018-06-19Epsocket has incorrect recv(2) behavior after SHUT_RD.Brian Geffon
After shutdown(SHUT_RD) calls to recv /w MSG_DONTWAIT or with O_NONBLOCK should result in a EAGAIN and not 0. Blocking sockets should return 0 as they would have otherwise blocked indefinitely. PiperOrigin-RevId: 201271123 Change-Id: If589b69c17fa5b9ff05bcf9e44024da9588c8876
2018-06-19Make KVM more scalable by removing CPU cap.Adin Scannell
Instead, CPUs will be created dynamically. We also allow a relatively efficient mechanism for stealing and notifying when a vCPU becomes available via unlock. Since the number of vCPUs is no longer fixed at machine creation time, we make the dirtySet packing more efficient. This has the pleasant side effect of cutting out the unsafe address space code. PiperOrigin-RevId: 201266691 Change-Id: I275c73525a4f38e3714b9ac0fd88731c26adfe66