gvisor - Container Runtime Sandbox

Age	Commit message (Collapse)	Author
2021-01-20	Remove string allocation from strings.Repeat() in /proc/[pid]/maps.	Jamie Liu
	PiperOrigin-RevId: 352894106
2020-12-15	[syzkaller] Avoid AIOContext from resurrecting after being marked dead.	Ayush Ranjan
	syzkaller reported the closing of a nil channel. This is only possible when the AIOContext was destroyed twice. Some scenarios that could lead to this: - It died and then some called aioCtx.Prepare() on it and then killed it again which could cause the double destroy. The context could have been destroyed in between the call to LookupAIOContext() and Prepare(). - aioManager was destroyed but it did not update the contexts map. So Lookup could still return a dead AIOContext and then someone could call Prepare on it and kill it again. So added a check in aioCtx.Prepare() for the context being dead. This will prevent a dead context from resurrecting. Also refactored code to destroy the aioContext consistently. Earlier we were not munmapping the aioContexts that were destroyed upon aioManager destruction. Reported-by: syzbot+ef6a588d0ce6059991d2@syzkaller.appspotmail.com PiperOrigin-RevId: 347704347
2020-12-11	Remove existing nogo exceptions.	Adin Scannell
	PiperOrigin-RevId: 347047550
2020-11-09	Initialize references with a value of 1.	Dean Deng
	This lets us avoid treating a value of 0 as one reference. All references using the refsvfs2 template must call InitRefs() before the reference is incremented/decremented, or else a panic will occur. Therefore, it should be pretty easy to identify missing InitRef calls during testing. Updates #1486. PiperOrigin-RevId: 341411151
2020-10-23	Rewrite reference leak checker without finalizers.	Dean Deng
	Our current reference leak checker uses finalizers to verify whether an object has reached zero references before it is garbage collected. There are multiple problems with this mechanism, so a rewrite is in order. With finalizers, there is no way to guarantee that a finalizer will run before the program exits. When an unreachable object with a finalizer is garbage collected, its finalizer will be added to a queue and run asynchronously. The best we can do is run garbage collection upon sandbox exit to make sure that all finalizers are enqueued. Furthermore, if there is a chain of finalized objects, e.g. A points to B points to C, garbage collection needs to run multiple times before all of the finalizers are enqueued. The first GC run will register the finalizer for A but not free it. It takes another GC run to free A, at which point B's finalizer can be registered. As a result, we need to run GC as many times as the length of the longest such chain to have a somewhat reliable leak checker. Finally, a cyclical chain of structs pointing to one another will never be garbage collected if a finalizer is set. This is a well-known issue with Go finalizers (https://github.com/golang/go/issues/7358). Using leak checking on filesystem objects that produce cycles will not work and even result in memory leaks. The new leak checker stores reference counted objects in a global map when leak check is enabled and removes them once they are destroyed. At sandbox exit, any remaining objects in the map are considered as leaked. This provides a deterministic way of detecting leaks without relying on the complexities of finalizers and garbage collection. This approach has several benefits over the former, including: - Always detects leaks of objects that should be destroyed very close to sandbox exit. The old checker very rarely detected these leaks, because it relied on garbage collection to be run in a short window of time. - Panics if we forgot to enable leak check on a ref-counted object (we will try to remove it from the map when it is destroyed, but it will never have been added). - Can store extra logging information in the map values without adding to the size of the ref count struct itself. With the size of just an int64, the ref count object remains compact, meaning frequent operations like IncRef/DecRef are more cache-efficient. - Can aggregate leak results in a single report after the sandbox exits. Instead of having warnings littered in the log, which were non-deterministically triggered by garbage collection, we can print all warning messages at once. Note that this could also be a limitation--the sandbox must exit properly for leaks to be detected. Some basic benchmarking indicates that this change does not significantly affect performance when leak checking is enabled, which is understandable since registering/unregistering is only done once for each filesystem object. Updates #1486. PiperOrigin-RevId: 338685972
2020-10-08	Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.	Jamie Liu
	cf. 2a36ab717e8f "rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ" PiperOrigin-RevId: 336186795
2020-10-07	Add staticcheck and staticstyle analyzers.	Adin Scannell
	This change also adds support to go_stateify for detecting an appropriate receiver name, avoiding a large number of false positives. PiperOrigin-RevId: 335994587
2020-10-06	Implement membarrier(2) commands other than *_SYNC_CORE.	Jamie Liu
	Updates #267 PiperOrigin-RevId: 335713923
2020-09-18	Use a tmpfs file for shared anonymous and /dev/zero mmap on VFS2.	Jamie Liu
	This is more consistent with Linux (see comment on MM.NewSharedAnonMappable()). We don't do the same thing on VFS1 for reasons documented by the updated comment. PiperOrigin-RevId: 332514849
2020-08-25	Use new reference count utility throughout gvisor.	Dean Deng
	This uses the refs_vfs2 template in vfs2 as well as objects common to vfs1 and vfs2. Note that vfs1-only refcounts are not replaced, since vfs1 will be deleted soon anyway. The following structs now use the new tool, with leak check enabled: devpts:rootInode fuse:inode kernfs:Dentry kernfs:dir kernfs:readonlyDir kernfs:StaticDirectory proc:fdDirInode proc:fdInfoDirInode proc:subtasksInode proc:taskInode proc:tasksInode vfs:FileDescription vfs:MountNamespace vfs:Filesystem sys:dir kernel:FSContext kernel:ProcessGroup kernel:Session shm:Shm mm:aioMappable mm:SpecialMappable transport:queue And the following use the template, but because they currently are not leak checked, a TODO is left instead of enabling leak check in this patch: kernel:FDTable tun:tunEndpoint Updates #1486. PiperOrigin-RevId: 328460377
2020-08-20	Consistent precondition formatting	Michael Pratt
	Our "Preconditions:" blocks are very useful to determine the input invariants, but they are bit inconsistent throughout the codebase, which makes them harder to read (particularly cases with 5+ conditions in a single paragraph). I've reformatted all of the cases to fit in simple rules: 1. Cases with a single condition are placed on a single line. 2. Cases with multiple conditions are placed in a bulleted list. This format has been added to the style guide. I've also mentioned "Postconditions:", though those are much less frequently used, and all uses already match this style. PiperOrigin-RevId: 327687465
2020-08-03	Add callbacks to support lazy loading/restoring thread states	Andrei Vagin
	PiperOrigin-RevId: 324748508
2020-08-03	Plumbing context.Context to DecRef() and Release().	Nayana Bidari
	context is passed to DecRef() and Release() which is needed for SO_LINGER implementation. PiperOrigin-RevId: 324672584
2020-07-27	Move platform.File in memmap	Andrei Vagin
	The subsequent systrap changes will need to import memmap from the platform package. PiperOrigin-RevId: 323409486
2020-05-20	Implement gap tracking in the segment set.	Reapor-Yurnero
	This change was derived from a change by: Reapor-Yurnero <reapor.yurnero@gmail.com> And has been modified by: Adin Scannell <ascannell@google.com> (The original change author is preserved for the commit.) This change implements gap tracking in the segment set by adding additional information in each node, and using that information to speed up gap finding from a linear scan to a O(log(n)) walk of the tree. This gap tracking is optional, and will default to off except for segment instances that set gapTracking equal to 1 in their const lists. PiperOrigin-RevId: 312621607
2020-04-23	Enable automated marshalling for mempolicy syscalls.	Rahat Mahmood
	PiperOrigin-RevId: 308170679
2020-04-21	Misc VFS2 fixes	Fabricio Voznika
	- Fix defer operation ordering in kernfs.Filesystem.AccessAt() - Add AT_NULL entry in proc/pid/auvx - Fix line padding in /proc/pid/maps - Fix linux_dirent serialization for getdents(2) - Remove file creation flags from vfs.FileDescription.statusFlags() Updates #1193, #1035 PiperOrigin-RevId: 307704159
2020-04-17	Merge pull request #1978 from lubinszARM:pr_signal_mm	gVisor bot
	PiperOrigin-RevId: 307078788
2020-04-09	Remove TODOs from Async IO	Fabricio Voznika
	Block and drain requests in io_destroy(2). Note the reason to create read-only mapping. PiperOrigin-RevId: 305786312
2020-04-08	Don't call platform.AddressSpace.MapFile with no permissions.	Jamie Liu
	PiperOrigin-RevId: 305598136
2020-03-27	add arch-specific feature into mm	Bin Lu
	Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-02-25	Add option to skip stuck tasks waiting for address space	Fabricio Voznika
	PiperOrigin-RevId: 297192390
2020-02-18	atomicbitops package cleanups	gVisor bot
	- Redocument memory ordering from "no ordering" to "acquire-release". (No functional change: both LOCK WHATEVER on x86, and LDAXR/STLXR loops on ARM64, already have this property.) - Remove IncUnlessZeroInt32 and DecUnlessOneInt32, which were only faster than the equivalent loops using sync/atomic before the Go compiler inlined non-unsafe.Pointer atomics many releases ago. PiperOrigin-RevId: 295811743
2020-02-14	Plumb VFS2 inside the Sentry	gVisor bot
	- Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950
2020-01-31	Internal change.	gVisor bot
	PiperOrigin-RevId: 292587459
2020-01-27	Update package locations.	Adin Scannell
	Because the abi will depend on the core types for marshalling (usermem, context, safemem, safecopy), these need to be flattened from the sentry directory. These packages contain no sentry-specific details. PiperOrigin-RevId: 291811289
2020-01-27	Standardize on tools directory.	Adin Scannell
	PiperOrigin-RevId: 291745021
2020-01-21	Rename DowngradableRWMutex to RWmutex.	Ian Gudger
	Also renames TMutex to Mutex. These custom mutexes aren't any worse than the standard library versions (same code), so having both seems redundant. PiperOrigin-RevId: 290873587
2020-01-09	New sync package.	Ian Gudger
	* Rename syncutil to sync. * Add aliases to sync types. * Replace existing usage of standard library sync package. This will make it easier to swap out synchronization primitives. For example, this will allow us to use primitives from github.com/sasha-s/go-deadlock to check for lock ordering violations. Updates #1472 PiperOrigin-RevId: 289033387
2020-01-03	Remove FIXME comments to close old bug.	Zach Koopmans
	PiperOrigin-RevId: 288075400
2019-11-21	Import and structure cleanup.	Adin Scannell
	PiperOrigin-RevId: 281795269
2019-10-16	Reorder BUILD license and load functions in gvisor.	Kevin Krakauer
	PiperOrigin-RevId: 275139066
2019-09-12	Remove go_test from go_stateify and go_marshal	Michael Pratt
	They are no-ops, so the standard rule works fine. PiperOrigin-RevId: 268776264
2019-08-16	procfs: Migrate seqfile implementations.	Ayush Ranjan
	Migrates all (except 3) seqfile implementations to the vfs.DynamicBytesSource interface. There should not be any change in functionality due to this migration itself. Please note that the following seqfile implementations have not been migrated: - /proc/filesystems in proc/filesystems.go - /proc/[pid]/mountinfo in proc/mounts.go - /proc/[pid]/mounts in proc/mounts.go This is because these depend on pending changes in /pkg/senty/vfs. PiperOrigin-RevId: 263880719
2019-06-28	Add finalizer on AtomicRefCount to check for leaks.	Ian Gudger
	PiperOrigin-RevId: 255711454
2019-06-27	Fix various spelling issues in the documentation	Michael Pratt
	Addresses obvious typos, in the documentation only. COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/443 from Pixep:fix/documentation-spelling 4d0688164eafaf0b3010e5f4824b35d1e7176d65 PiperOrigin-RevId: 255477779
2019-06-20	Implement madvise(MADV_DONTFORK)	Neel Natu
	PiperOrigin-RevId: 254253777
2019-06-13	Update canonical repository.	Adin Scannell
	This can be merged after: https://github.com/google/gvisor-website/pull/77 or https://github.com/google/gvisor-website/pull/78 PiperOrigin-RevId: 253132620
2019-06-06	"Implement" mbind(2).	Jamie Liu
	We still only advertise a single NUMA node, and ignore mempolicy accordingly, but mbind() at least now succeeds and has effects reflected by get_mempolicy(). Also fix handling of nodemasks: round sizes to unsigned long (as documented and done by Linux), and zero trailing bits when copying them out. PiperOrigin-RevId: 251950859
2019-06-05	Implement dumpability tracking and checks	Michael Pratt
	We don't actually support core dumps, but some applications want to get/set dumpability, which still has an effect in procfs. Lack of support for set-uid binaries or fs creds simplifies things a bit. As-is, processes started via CreateProcess (i.e., init and sentryctl exec) have normal dumpability. I'm a bit torn on whether sentryctl exec tasks should be dumpable, but at least since they have no parent normal UID/GID checks should protect them. PiperOrigin-RevId: 251712714
2019-06-03	gvisor: validate a new map region in the mremap syscall	Andrei Vagin
	Right now, mremap allows to remap a memory region over MaxUserAddress, this means that we can change the stub region. PiperOrigin-RevId: 251266886
2019-05-30	Add VmData field to /proc/{pid}/status	chris.zn
	VmData is the size of private data segments. It has the same meaning as in Linux. Change-Id: Iebf1ae85940a810524a6cde9c2e767d4233ddb2a PiperOrigin-RevId: 250593739
2019-05-06	Ensure all uses of MM.brk occur under MM.mappingMu in MM.Brk().	Jamie Liu
	PiperOrigin-RevId: 246921386 Change-Id: I71d8908858f45a9a33a0483470d0240eaf0fd012
2019-04-29	Change copyright notice to "The gVisor Authors"	Michael Pratt
	Based on the guidelines at https://opensource.google.com/docs/releasing/authors/. 1. $ rg -l "Google LLC" \| xargs sed -i 's/Google LLC.*/The gVisor Authors./' 2. Manual fixup of "Google Inc" references. 3. Add AUTHORS file. Authors may request to be added to this file. 4. Point netstack AUTHORS to gVisor AUTHORS. Drop CONTRIBUTORS. Fixes #209 PiperOrigin-RevId: 245823212 Change-Id: I64530b24ad021a7d683137459cafc510f5ee1de9
2019-04-29	Allow and document bug ids in gVisor codebase.	Nicolas Lacasse
	PiperOrigin-RevId: 245818639 Change-Id: I03703ef0fb9b6675955637b9fe2776204c545789
2019-04-10	Internal change	Michael Pratt
	PiperOrigin-RevId: 242978508 Change-Id: I0ea59ac5ba1dd499e87c53f2e24709371048679b
2019-04-10	Allow threads with CAP_SYS_RESOURCE to raise hard rlimits.	Kevin Krakauer
	PiperOrigin-RevId: 242919489 Change-Id: Ie3267b3bcd8a54b54bc16a6556369a19e843376f
2019-04-01	Don't expand COW-break on executable VMAs.	Jamie Liu
	PiperOrigin-RevId: 241403847 Change-Id: I4631ca05734142da6e80cdfa1a1d63ed68aa05cc
2019-03-29	Drop reference on shared anon mappable	Michael Pratt
	We call NewSharedAnonMappable simply to use it for Mappable/MappingIdentity for shared anon mmap. From MMapOpts.MappingIdentity: "If MMapOpts is used to successfully create a memory mapping, a reference is taken on MappingIdentity." mm.createVMALocked (below) takes this additional reference, so we don't need the reference returned by NewSharedAnonMappable. Holding it leaks the mappable. PiperOrigin-RevId: 241038108 Change-Id: I78ee3af78e0cc7aac4063b274b30d0e41eb5677d
2019-03-25	Call memmap.Mappable.Translate with more conservative usermem.AccessType.	Jamie Liu
	MM.insertPMAsLocked() passes vma.maxPerms to memmap.Mappable.Translate (although it unsets AccessType.Write if the vma is private). This somewhat simplifies handling of pmas, since it means only COW-break needs to replace existing pmas. However, it also means that a MAP_SHARED mapping of a file opened O_RDWR dirties the file, regardless of the mapping's permissions and whether or not the mapping is ever actually written to with I/O that ignores permissions (e.g. ptrace(PTRACE_POKEDATA)). To fix this: - Change the pma-getting path to request only the permissions that are required for the calling access. - Change memmap.Mappable.Translate to take requested permissions, and return allowed permissions. This preserves the existing behavior in the common cases where the memmap.Mappable isn't fsutil.CachingInodeOperations and doesn't care if the translated platform.File pages are written to. - Change the MM.getPMAsLocked path to support permission upgrading of pmas outside of copy-on-write. PiperOrigin-RevId: 240196979 Change-Id: Ie0147c62c1fbc409467a6fa16269a413f3d7d571