summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
-rw-r--r--pkg/refsvfs2/README.md66
1 files changed, 66 insertions, 0 deletions
diff --git a/pkg/refsvfs2/README.md b/pkg/refsvfs2/README.md
new file mode 100644
index 000000000..eca53c282
--- /dev/null
+++ b/pkg/refsvfs2/README.md
@@ -0,0 +1,66 @@
+# Reference Counting
+
+Go does not offer a reliable way to couple custom resource management with
+object lifetime. As a result, we need to manually implement reference counting
+for many objects in gVisor to make sure that resources are acquired and released
+appropriately. For example, the filesystem has many reference-counted objects
+(file descriptions, dentries, inodes, etc.), and it is important that each
+object persists while anything holds a reference on it and is destroyed once all
+references are dropped.
+
+We provide a template in `refs_template.go` that can be applied to most objects
+in need of reference counting. It contains a simple `Refs` struct that can be
+incremented and decremented, and once the reference count reaches zero, a
+destructor can be called. Note that there are some objects (e.g. `gofer.dentry`,
+`overlay.dentry`) that should not immediately be destroyed upon reaching zero
+references; in these cases, this template cannot be applied.
+
+# Reference Checking
+
+Unfortunately, manually keeping track of reference counts is extremely error
+prone, and improper accounting can lead to production bugs that are very
+difficult to root cause.
+
+We have several ways of discovering reference count errors in gVisor. Any
+attempt to increment/decrement a `Refs` struct with a count of zero will trigger
+a sentry panic, since the object should have been destroyed and become
+unreachable. This allows us to identify missing increments or extra decrements,
+which cause the reference count to be lower than it should be: the count will
+reach zero earlier than expected, and the next increment/decrement--which should
+be valid--will result in a panic.
+
+It is trickier to identify extra increments and missing decrements, which cause
+the reference count to be higher than expected (i.e. a “reference leak”).
+Reference leaks prevent resources from being released properly and can translate
+to various issues that are tricky to diagnose, such as memory leaks. The
+following section discusses how we implement leak checking.
+
+## Leak Checking
+
+When leak checking is enabled, reference-counted objects are added to a global
+map when constructed and removed when destroyed. Near the very end of sandbox
+execution, once no reference-counted objects should still be reachable, we
+report everything left in the map as having leaked. Leak-checking objects
+implement the `CheckedObject` interface, which allows us to print informative
+warnings for each of the leaked objects.
+
+Leak checking is provided by `refs_template`, but objects that do not use the
+template will also need to implement `CheckedObject` and be manually
+registered/unregistered from the map in order to be checked.
+
+Note that leak checking affects performance and memory usage, so it should only
+be enabled in testing environments.
+
+## Debugging
+
+Even with the checks described above, it can be difficult to track down the
+exact source of a reference counting error. The error may occur far before it is
+discovered (for instance, a missing `IncRef` may not be discovered until a
+future `DecRef` makes the count negative). To aid in debugging, `refs_template`
+provides the `enableLogging` option to log every `IncRef`, `DecRef`, and leak
+check registration/unregistration, along with the object address and a call
+stack. This allows us to search a log for all of the changes to a particular
+object's reference count, which makes it much easier to identify the absent or
+extraneous operation(s). The reference-counted objects that do not use
+`refs_template` also provide logging, and others defined in the future should do
+so as well.