B-Tree Implementation: Leaf Nodes

The benchmarks in the previous chapter proved that linear scan is untenable for large datasets. The solution is a B-Tree — a self-balancing tree optimized for systems that read and write fixed-size blocks. B-Trees were invented in 1970 by Rudolf Bayer and Edward McCreight specifically for disk-based storage, and nearly every production database engine uses some variant of them today.

A B-Tree is built from two kinds of nodes: leaf nodes that store actual key-value data, and internal nodes that store keys and child pointers for routing searches. We start with leaf nodes because they are simpler and immediately useful — a single leaf node is already a sorted, searchable container.

The Paradigm Shift: Pages Become Nodes

Until now, pages held rows in insertion order. Starting here, every page is a B-Tree node. The Pager does not change — it still reads and writes 4096-byte blocks — but the interpretation of those bytes changes completely.

Each page now begins with a node header instead of raw row data:

┌──────────────────────────────────────────────┐
│  Node Type (1B) │ Is Root (1B) │ Num Cells (4B) │
├──────────────────────────────────────────────┤
│  Cell 0: [Key (4B)] [Value (36B)]            │
│  Cell 1: [Key (4B)] [Value (36B)]            │
│  ...                                          │
│  Cell N-1                                      │
│  [unused space / zero padding]                │
└──────────────────────────────────────────────┘

Leaf Node Header

The header occupies 6 bytes at the start of every leaf node:

Field	Format	Size	Offset	Description
Node Type	`B` (unsigned byte)	1	0	`0` = leaf, `1` = internal
Is Root	`B` (unsigned byte)	1	1	`1` if this is the tree’s root
Num Cells	`<I` (unsigned 32-bit int)	4	2	Count of key-value cells stored

The format string is <B B I — little-endian, 6 bytes total.

Leaf Node Cells

After the 6-byte header, cells are packed contiguously. Each cell holds a key and a value:

Field	Size	Description
Key	4 bytes	Unsigned 32-bit integer, the row’s id
Value	36 bytes	Serialized `Row` (the full `ROW_SIZE` payload)
Cell total	40 bytes	`LEAF_NODE_CELL_SIZE`

Cells are stored in ascending key order. This invariant is what makes binary search possible. The maximum number of cells per leaf node:

$$\text{max_cells} = \left\lfloor \frac{\text{PAGE_SIZE} - \text{HEADER_SIZE}}{\text{CELL_SIZE}} \right\rfloor = \left\lfloor \frac{4096 - 6}{40} \right\rfloor = 102$$

So each leaf node holds up to 102 key-value pairs, with 10 bytes of unused space at the end of the page.

Constants and Node Initialization

# btree.py
import struct
from typing import Final
from constants import PAGE_SIZE
from row import Row

# --- Leaf Node Layout ---
LEAF_NODE_TYPE: Final[int] = 0
INTERNAL_NODE_TYPE: Final[int] = 1

LEAF_NODE_HEADER_FORMAT: Final[str] = "<B B I"
LEAF_NODE_HEADER_SIZE: Final[int] = struct.calcsize(LEAF_NODE_HEADER_FORMAT)  # 6

LEAF_NODE_KEY_SIZE: Final[int] = 4
LEAF_NODE_VALUE_SIZE: Final[int] = Row.ROW_SIZE  # 36
LEAF_NODE_CELL_SIZE: Final[int] = LEAF_NODE_KEY_SIZE + LEAF_NODE_VALUE_SIZE  # 40
LEAF_NODE_MAX_CELLS: Final[int] = (
    (PAGE_SIZE - LEAF_NODE_HEADER_SIZE) // LEAF_NODE_CELL_SIZE
)  # 102


def initialize_leaf_node(page: memoryview, is_root: bool = False) -> None:
    """Write a fresh leaf node header into the page buffer."""
    struct.pack_into(
        LEAF_NODE_HEADER_FORMAT,
        page,
        0,
        LEAF_NODE_TYPE,        # node_type = 0 (leaf)
        1 if is_root else 0,   # is_root flag
        0,                     # num_cells = 0
    )
    # Zero out the data section
    page[LEAF_NODE_HEADER_SIZE:] = b"\x00" * (PAGE_SIZE - LEAF_NODE_HEADER_SIZE)

We use struct.pack_into instead of struct.pack because it writes directly into the memoryview at a given offset — no intermediate bytes object, no copy.

Reading and Writing Cells

Two helper functions to access cells by index within a leaf node:

def leaf_node_cell_offset(cell_num: int) -> int:
    """Return the byte offset of cell `cell_num` within the page."""
    return LEAF_NODE_HEADER_SIZE + cell_num * LEAF_NODE_CELL_SIZE


def leaf_node_key(page: memoryview, cell_num: int) -> int:
    """Read the key from cell `cell_num`."""
    offset = leaf_node_cell_offset(cell_num)
    return struct.unpack_from("<I", page, offset)[0]


def leaf_node_value(page: memoryview, cell_num: int) -> memoryview:
    """Return a memoryview of the value (serialized Row) in cell `cell_num`."""
    offset = leaf_node_cell_offset(cell_num) + LEAF_NODE_KEY_SIZE
    return page[offset : offset + LEAF_NODE_VALUE_SIZE]


def leaf_node_num_cells(page: memoryview) -> int:
    """Read the num_cells field from the header."""
    return struct.unpack_from("<I", page, 2)[0]

These functions form the lowest-level access API for leaf nodes. Everything above — insertion, search, splitting — is built on top of them.

Summary of Leaf Node Geometry

Property	Value	Derivation
Page size	4096 bytes	`PAGE_SIZE` constant
Header size	6 bytes	`<B B I` → 1 + 1 + 4
Cell size	40 bytes	4 (key) + 36 (value)
Max cells per leaf	102	(4096 − 6) ÷ 40
Wasted bytes per page	10	4096 − 6 − (102 × 40)
Data capacity per leaf	4080 bytes	102 × 40

With the leaf node structure defined, we can proceed to the two critical operations: inserting a key-value pair in sorted order, and searching for a key using binary search.