summaryrefslogtreecommitdiff
path: root/doc/threads
diff options
context:
space:
mode:
authorMaria Matejka <mq@ucw.cz>2021-06-04 18:14:10 +0200
committerMaria Matejka <mq@ucw.cz>2021-12-08 12:39:36 +0100
commitb6612ec792a1539a22163429080967c9c416cae6 (patch)
treec57eff1e9eec73f4562ed27df8de2ba3d6954a95 /doc/threads
parentf459deee9f494ae8f741cc3b04e0e0db94a77c53 (diff)
Thread documentation: chapter 3, coroutines and locking
Diffstat (limited to 'doc/threads')
-rw-r--r--doc/threads/03_coroutines.md122
1 files changed, 122 insertions, 0 deletions
diff --git a/doc/threads/03_coroutines.md b/doc/threads/03_coroutines.md
new file mode 100644
index 00000000..33b337f7
--- /dev/null
+++ b/doc/threads/03_coroutines.md
@@ -0,0 +1,122 @@
+# BIRD Journey to Threads. Chapter 3: Coroutines and Locking
+
+Parallel execution in BIRD uses an underlying mechanism of coroutines and
+locks. This chapter covers these two internal modules and their principles.
+Knowing this is a need if you want to create any non-trivial extension to
+future BIRD.
+
+## BIRD's internal logical structure
+
+The authors of original BIRD concepts wisely chose a highly modular structure.
+We can therefore use this structure with minimal rework needed. This structure is roughly:
+
+1. Reconfiguration routines
+2. Protocols
+3. BFD
+4. Private routing tables
+5. Standalone routing tables
+6. Route attribute cache
+
+This order is important for locking. Most actions in BIRD are called
+top-to-bottom in this list, e.g. a reconfiguration triggers protocol action,
+this triggers BFD update, then a routing table update which in turn calls route
+attribute cache. The major locking decision for BIRD is enforcement of this order.
+
+We're not sure yet about where the interface list and `protocol device` should
+be. For now, it is somewhere between 1 and 2, as the interface updates are
+synchronous. It may move in future to 2+5 after implementing asynchronous
+interface updates.
+
+## Locking
+
+BIRD is split into so-called *domains*. These consist of data structures
+logically bound together. These domains should have their own lock guarding
+access to them. These domains are divided into the categories mentioned above.
+
+Currently, lots of domains don't have their own lock. Last changes in branch
+`alderney` assigned locks to routing tables and route attribute cache (4, 5 and 6).
+BFD has had its own lock since it was added to BIRD as it needs much lower
+latency than BIRD typically allows. The rest of BIRD (reconfiguration,
+protocols and CLI) has one common lock, called `the_bird_lock`. This is going to change later.
+
+Locking and unlocking is heavily checked. BIRD always stores the thread's
+locking stack at one place for debug and consistency checking purposes. The
+locking stack is limited to the number of categories. All domains must be
+locked top-to-bottom in this order and unlocked bottom-to-top. No thread is
+allowed to lock more than one domain in each category.
+
+This brings some possible problems in communication between tables (recursive
+nexthop updates had to be thoroughly checked) and it also needs the last big
+change, the asynchronous export. If any data needs to be handed from down to
+up, it must use some kind of asynchronicity to unlock the lower domain before
+accesing the higher level. On the other hand, data flow from up to down is
+straightforward as it is possible to just lock and call the appropriate function.
+
+## Coroutines
+
+There are three principal types of coroutines. One-shot tasks, workers
+and IO handlers. They all share one coroutine data type, anyway the
+synchronization mechanisms are different.
+
+### One-shot tasks
+
+The simplest coroutine type is a one-shot task. Some part of BIRD requests a
+one-time CPU-intensive work. This is used in reconfiguration rework. When
+reconfig is requested, BIRD starts a reconfig coroutine which first parses the
+file (which can take tens of seconds if you have a gigabyte of config files).
+Then this coroutine locks everything and applies the parsed configuration.
+
+One-shot tasks simply start when they are requested and stop when they are
+done. To cancel them prematurely, it is typically enough to set/check an atomic
+variable.
+
+### Workers
+
+In lots of cases, a module has to wait for some supplied data. This is used in
+the channel feed-export coroutine. When feed-export is requested, BIRD starts a
+coroutine which waits on semaphore to get exports, processes the exports and
+then jumps back to wait on semaphore for more work.
+
+These coroutines must be woken up by their semaphore after setting the
+cancellation variable. Then the coroutine cleans up and calls what is required
+next after its cleanup, until finally exiting.
+
+### IO handlers
+
+BIRD needs IO. It is possible to handle almost all IO events in parallel and
+these coroutines will take care of that. There is currently only one such
+thread, it is a low-latency BFD thread handling its own socket.
+
+IO coroutines are also possibly timer coroutines as the `poll` call typically
+has a timeout option. In future, there should be independent IO coroutines for
+each protocol instance to handle IO bottlenecks. It should be noted that e.g.
+in BGP, the protocol synchronously advertises and withdraws routes directly
+from the receive handler.
+
+These coroutines sometimes have to be updated (protocol shuts down, timer is
+modified), therefore every IO coroutine needs its own *fifo* which it polls for
+read. On any update, one byte is sent to this fifo, effectively waking up the
+poll. The fifo is always checked first for changes; if there are some, the poll
+is reloaded before looking at anything else.
+
+### The Main Loop
+
+Currently, BIRD executes everything (with exception of those parts already
+moved to their threads) in one single loop. There are all the sockets with a
+magic round-robin selection of what socket we're going to read from next. This
+loop also runs all the timers and other "asynchronous" events to handle the
+risk that some code would tamper with the caller's data structures badly.
+
+This loop should gradually lose its work to do when more and more routines get
+moved to their own domains and coroutines. After all, possibly the last task
+for the main loop would be signal handling and maybe basic CLI handling.
+
+The main loop dismantling is a long term goal. Before that, we have to do lots of
+changes, allowing for more and more code to run independently. Since [route
+exports are now asynchronous](TODO), there is no more obstacle in adopting the
+locking order as shown here.
+
+*This chapter is last at least for a while. There will be more posts on BIRD
+internals in future, you may expect e.g. protocol API description and maybe
+also a tutorial how to create your own protocol. Thank you all for your support.
+It helps us make your BIRD run smooth and fast.*