I’ve been exploring B+Tree implementations recently and decided to build a small storage engine in .NET as a way to understand the lower-level behavior of on-disk indexing structures. I wanted to share some of the interesting technical challenges I ran into — in case others find the internals fun to think about.
Some of the more interesting aspects were:
• Page layout decisions
Choosing the right fixed-size page format ended up being more subtle than expected. Even small choices (header size, key encoding, how much space to reserve for split operations) had major downstream effects on fragmentation and insert performance.
• Handling node splits efficiently
B+Tree splits are straightforward in memory, but on disk the cost model is very different. Ensuring minimal writes and predictable locality forced me to rethink a few “textbook” algorithms.
• Concurrency vs. simplicity
I experimented with optimistic vs. coarse-grained locking. Even implementing a read-optimized path required careful handling of pointer updates during splits.
• Crash-safety without a full WAL
One interesting constraint was trying to maintain reasonable crash-safety guarantees without embedding a full write-ahead log. Page write ordering and atomic metadata updates become tricky puzzles.
• Benchmarking surprises
Some operations that I expected to be expensive (like sequential inserts) performed far better than random inserts, even after caching. A few caching heuristics ended up mattering much more than raw structure layout.
If anyone wants to look deeper into the implementation details (purely from an educational/technical standpoint), the code is available on NuGet:
https://www.nuget.org/packages/BTreePlus
(Sharing only as reference material — not asking for feedback or promoting anything.)
Always happy to discuss data-structure internals or hear how others have approached similar problems.