July 22, 2022
Demystifying complex corners: Tari's LMDB Wrapper
Jumping into a new codebase can be daunting. Everything is new again, something that seems familiar has the sense of just being a little off. Getting your bearings can take time but you’re a great developer and you always make it work.
Today what I want to do is introduce you to a complex area of code within Tari so while you’re contributing, if you come across it you’ll have a better idea of what’s going on and how to manage it. I’ll give you a brief history of what this code does, and why it works the way it works.
Our LMDB Wrapper
What is LMDB?
LMDB is a compact, memory-efficient database. All data is exposed via a memory map which allows for blazingly fast reads.
What is Tari using LMDB for?
The Tari Base node uses LMDB as a backend store for chain related data. Simply put, we store the
blockchain in it. We were using lmdb in Tari production but we had been using an in-memory database for tests.
Part of the original design of the BlockchainBackend
trait and main goal was to abstract away any database specifics
so
we could implement common data stores or in-memory data stores without a problem. Go figure modeling
transactions, or read & write locks in memory doesn’t always map easily to transactions or locks in other
databases. It required us to make bug fixes twice over and also resulted in the possibility of the tests passing while
the production interface still failed. This resulted in a lot of bugs and time spent managing the in-memory adapters
instead of on the fun problems.
Eventually all of the in-memory data store was also swapped for lmdb but the remnants of the common interface remain.
How Tari implemented LMDB?
Let’s take a few steps back. We have many modules in the Tari repository and code reuse across them is common. I
mean why wouldn’t you when you have the flexibility. As a result, we write abstractions to wrap functionality as each of
the dependent modules may want to use that functionality differently. Let’s dive into
the chain_storage module. The
module is designed around managing blockchain state. It is written to allow configuration of the chain state to be
defined by the needs of the module. You may wish to store the UTXO set in memory, and the kernels backed by LMDB, while
the merkle trees are stored in flat files for example. To make this work our chain_storage
module has
a BlockchainDatabase
struct.
pub struct BlockchainDatabase<B> {
db: Arc<RwLock<B>>,
validators: Validators<B>,
config: BlockchainDatabaseConfig,
consensus_manager: ConsensusManager,
difficulty_calculator: Arc<DifficultyCalculator>,
disable_add_block_flag: Arc<AtomicBool>,
}
This struct is used to compose the API for storing and retrieving blockchain data. I won’t get into too much detail about its attributes but the quick rundown is:
db
: Thread safe access to the backend of choicevalidators
: Used to decide if the block going into the db is validconfig
: Configuration options (horizon for pruning etc.)consensus_manager
: Consensus rulesdifficulty_calculator
: Rules and expectations for mined blocksdisable_add_block_flag
: A flag to prevent propagated blocks from being added during block sync
The part we care about most though is <B>
where B: BlockchainBackend
.
BlockchainBackend
is a trait that defines behaviour for database wrappers that store blockchain data. Using the trait
allows us to make some guarantees such as using Send
, and Sync
(rusts automatically derived, methodless, marker
traits) to guarantee thread safety, as well as atomic transactions to ensure database integrity.
Here is where LMDB enters the picture again. We’ve written an LMDBDatabase
struct which implements
the BlockchainBackend
trait. This allows us to define a wrapper that promises we meet all the necessities to operate
as a
blockchain backend, and ensures the functionality to host that data in LMDB. You may have already taken a peek
and noticed we have an lmdb_db
subfolder in our chain_storage
module folder. Within the subfolder we have two
similarly
named files:
- base_layer/core/src/chain_storage/lmdb_db/lmdb.rs
- base_layer/core/src/chain_storage/lmdb_db/lmdb_db.rs
So what gives? What’s the separation of concerns here and who’s doing what job exactly? Great question! I’m glad you asked.
Let’s start with lmdb_db.rs
.
This is where our implementation of a BlockchainBackend
occurs. We’ve implemented all
the necessary methods here, and it provides the API that is commonly used throughout different Tari clients. But it does
a lot more. lmdb is a key, value store and as such it doesn’t represent just a single database, but the possibility of
many databases where each is an independent key, value store. but it is capable of cross database reads
and writes in a single atomic transaction. If you wanted to map the concept to something more
familiar you might think about each database as a table. Back to lmdb_db.rs
, where
we define all the different databases
we want to store into. It’ll create the databases if they don’t exist and keep a reference to each, so that they are
readily
available. It also attempts to handle some additional transaction locking for us but we’ll get into that a little later.
What lmdb.rs
offers
Consider lmdb.rs
just a smidge closer to the actual database. Something between our wrapper abstraction, and the lmdb
interface. It’s designed to perform very specific tasks on the databases. It provides utility functions used within
the chain_storage
module. De-duping inserts, counts, gets, sets,
matching, and filtering results. It’s a module of functions we commonly need when we’re about to finalize a transaction.
Where lmdb_db.rs
may perform some additional transformation before passing the results we actually want to store down
to the functions in lmdb.rs
. Most of these functions will be called from our LMDBDatabase
wrapper, although it’s not
always a guarantee, sometimes we use them directly from other call points withing the chain_storage
module.
Caveats of the existing interface
Earlier I mentioned that the BlockchainDatabase
had ReadWriteLock applied to the backend
interface db: Arc<RwLock<B>>
.
This was needed in the rust realm after experiencing some unexpected behaviour from our abstraction.
This as it turns out was a result of the API we made public via the BlockchainBackend
trait.
Take for example, a situation where different threads are operating on the same database. Under the hood of our
convenience methods we gain read and write locks for each independent call, but still leaves room for error:
db.add_person("Peter", 3) ?;
thread::spawn(| | {
db.add_person("Sam", 23) ?;
if db.person_exists("Peter") ? { db.delete_person("Peter") ?; }
});
thread::spawn(| | {
db.add_person( & tx, "Tom", 66) ?;
if db.person_exists("Peter")? {
// ❌ There is a chance that peter is deleted between checking for Peter's existence and deleting
db.delete_person("Peter") ?;
}
});
This could be solved a handful of ways, but we’ll look specifically at two ways to achieve safety here. If we go back
into the helper functions performing calls to lmdb via the lmdb.rs
we notice they actually take
a transaction in their signatures
. This makes it easy to chain different functions together and ensure they are run as single atom transaction. lmdb
supports atomic
transactions, and our interface here lets us utilize those features at the database level.
The problem arises in aligning the outer abstraction for different backend types. The LMDBDatabase
has convenience
methods for us to operate on the database but unlike the slightly lower level lmdb.rs
functions these methods do not
accept a transaction as an argument. This means if we call two different methods on the database wrapper, each method
will utilize a different transaction and ensure no atomicity. As a solution for this we wrap the whole database wrapper
in a RwLock
forcing us to acquire a lock to the entire database for the duration of our processing. Ensuring nobody
can perform a sneaky delete out from underneath us.
This means we have got two locks happening when we perform any operation. A language specific lock, as well as a
database specific lock. If we
wanted to try and tidy up our double lock situation we once again have two clear candidates for refactoring. We could
introduce the transaction passing present in the underlying lmdb.rs
interface to the outer trait. This would offer us
the same level of atomicity the lower level lmdb.rs
but we end up moving the implementation details of a
particular backend to the forefront of our generic. At that point it’s arguable we may not need the generic at all. If
we revisit the definition of our BlockchainBackend
trait it says:
The backend must also execute transactions atomically; i.e., every operation within it must succeed, or they all fail
It lets us know that an operation we call on the backend will happen atomically, but doesn’t make the promise that
operations called
together will happen atomically. If we as the developers are utilizing multiple methods of the LMDBBackend
together
and
require atomicity then this can be a red flag for us. Instead of calling multiple methods on the wrapper itself, we can
create one specific method that performs this series of calls with the needed level atomicity. Using our previous
example of finding a person “Peter” and removing him if found, we can get a safer guarantee from our backend if we merge
the two independent backend methods. Instead of calling person_exists
and delete_person
we can create a new
method delete_person_if_exists
.
From this:
// The db method implementation abridge
fn person_exists(&mut self, name: String) -> Result<bool, Error> {
let txn = self.transaction()?;
self.read_from_table(txn, "People", name).is_some()
}
fn delete_person(&mut self, name: String) -> Result<bool, Error> {
let txn = self.transaction()?;
self.delete_from_table(txn, "People", name)
}
if db.person_exists("Peter") ? {
// ❌ There is a chance that peter is deleted between checking for Peter's existence and deleting
db.delete_person("Peter") ?;
}
To this:
fn delete_person_if_exists(&mut self, name: String) -> Result<(), Error> {
let txn = self.transaction()?;
if let Some(person) = self.read_from_table(txn, "People", name).is_some() {
self.delete_from_table(txn, "People", name);
};
}
db.delete_person_if_exists("Peter") ?
This gives us the level of atomicity we want while also keeping our generic interface absent of backend specific details. If we follow this pattern of identifying areas similar to this, and ensuring we push the complexity of the transactions down to the implementation than we could safely remove our outer language specific lock. This begs the question though: Would we want to? This pattern gives us everything we want but also moves the dangers of getting it wrong back onto the developer, and reviewers. They need to know this pattern exists, and needs to be rigidly adhered too at the risk of causing race conditions or other bugs. Sticking to a common pattern in a codebase is a good idea but sometimes you could use just a little more safety, so why not let the language make those promises for you.
Contribute to Tari!
Thanks for coming along on this journey with me, I hope it helped pull back the veil and gives you a bit of perspective as to how Tari works under the hood and some reasons for utilizing redundancy across language, and database features which may not always be obvious at first glance. If simplifying and improving ergonomics is your thing, why not contribute? Check out our open issues, and if you are just starting out look for our “good first issue” tag.