Skip to main content
python database internals building a persistent engine from scratch

B-Tree Implementation: Leaf Nodes

5 min read Chapter 10 of 21
Summary

This section introduces the structure of leaf nodes...

This section introduces the structure of leaf nodes in a B-Tree, the foundational component for database indexing. Leaf nodes store key-value pairs without child pointers, using a binary layout for efficient access. The leaf node header contains metadata: Node Type (0 for leaf, 1 for internal), Is Root flag (boolean), and Num Cells (unsigned integer), formatted in little-endian byte order. Each cell consists of a key (4-byte integer) for binary search and a value (36-byte serialized row data) for actual storage, enabling O(log N) access times. The implementation is in Python 3.12+ with type hints, using the struct module and memoryview for low-level binary manipulation. Key functions include initialize_leaf_node to create nodes with default headers, create_cell to pack key-value pairs, and extract_cell to retrieve data without copying. The design aligns with page-based storage (4096-byte pages) and leverages fixed offsets for direct access, setting the stage for B-Tree operations.

B-Tree Implementation: Leaf Nodes

The benchmarks in the previous chapter proved that linear scan is untenable for large datasets. The solution is a B-Tree — a self-balancing tree optimized for systems that read and write fixed-size blocks. B-Trees were invented in 1970 by Rudolf Bayer and Edward McCreight specifically for disk-based storage, and nearly every production database engine uses some variant of them today.

A B-Tree is built from two kinds of nodes: leaf nodes that store actual key-value data, and internal nodes that store keys and child pointers for routing searches. We start with leaf nodes because they are simpler and immediately useful — a single leaf node is already a sorted, searchable container.

The Paradigm Shift: Pages Become Nodes

Until now, pages held rows in insertion order. Starting here, every page is a B-Tree node. The Pager does not change — it still reads and writes 4096-byte blocks — but the interpretation of those bytes changes completely.

Each page now begins with a node header instead of raw row data:

┌──────────────────────────────────────────────┐
│  Node Type (1B) │ Is Root (1B) │ Num Cells (4B) │
├──────────────────────────────────────────────┤
│  Cell 0: [Key (4B)] [Value (36B)]            │
│  Cell 1: [Key (4B)] [Value (36B)]            │
│  ...                                          │
│  Cell N-1                                      │
│  [unused space / zero padding]                │
└──────────────────────────────────────────────┘

Leaf Node Header

The header occupies 6 bytes at the start of every leaf node:

FieldFormatSizeOffsetDescription
Node TypeB (unsigned byte)100 = leaf, 1 = internal
Is RootB (unsigned byte)111 if this is the tree’s root
Num Cells<I (unsigned 32-bit int)42Count of key-value cells stored

The format string is <B B I — little-endian, 6 bytes total.

Leaf Node Cells

After the 6-byte header, cells are packed contiguously. Each cell holds a key and a value:

FieldSizeDescription
Key4 bytesUnsigned 32-bit integer, the row’s id
Value36 bytesSerialized Row (the full ROW_SIZE payload)
Cell total40 bytesLEAF_NODE_CELL_SIZE

Cells are stored in ascending key order. This invariant is what makes binary search possible. The maximum number of cells per leaf node:

$$\text{max_cells} = \left\lfloor \frac{\text{PAGE_SIZE} - \text{HEADER_SIZE}}{\text{CELL_SIZE}} \right\rfloor = \left\lfloor \frac{4096 - 6}{40} \right\rfloor = 102$$

So each leaf node holds up to 102 key-value pairs, with 10 bytes of unused space at the end of the page.

Constants and Node Initialization

# btree.py
import struct
from typing import Final
from constants import PAGE_SIZE
from row import Row

# --- Leaf Node Layout ---
LEAF_NODE_TYPE: Final[int] = 0
INTERNAL_NODE_TYPE: Final[int] = 1

LEAF_NODE_HEADER_FORMAT: Final[str] = "<B B I"
LEAF_NODE_HEADER_SIZE: Final[int] = struct.calcsize(LEAF_NODE_HEADER_FORMAT)  # 6

LEAF_NODE_KEY_SIZE: Final[int] = 4
LEAF_NODE_VALUE_SIZE: Final[int] = Row.ROW_SIZE  # 36
LEAF_NODE_CELL_SIZE: Final[int] = LEAF_NODE_KEY_SIZE + LEAF_NODE_VALUE_SIZE  # 40
LEAF_NODE_MAX_CELLS: Final[int] = (
    (PAGE_SIZE - LEAF_NODE_HEADER_SIZE) // LEAF_NODE_CELL_SIZE
)  # 102


def initialize_leaf_node(page: memoryview, is_root: bool = False) -> None:
    """Write a fresh leaf node header into the page buffer."""
    struct.pack_into(
        LEAF_NODE_HEADER_FORMAT,
        page,
        0,
        LEAF_NODE_TYPE,        # node_type = 0 (leaf)
        1 if is_root else 0,   # is_root flag
        0,                     # num_cells = 0
    )
    # Zero out the data section
    page[LEAF_NODE_HEADER_SIZE:] = b"\x00" * (PAGE_SIZE - LEAF_NODE_HEADER_SIZE)

We use struct.pack_into instead of struct.pack because it writes directly into the memoryview at a given offset — no intermediate bytes object, no copy.

Reading and Writing Cells

Two helper functions to access cells by index within a leaf node:

def leaf_node_cell_offset(cell_num: int) -> int:
    """Return the byte offset of cell `cell_num` within the page."""
    return LEAF_NODE_HEADER_SIZE + cell_num * LEAF_NODE_CELL_SIZE


def leaf_node_key(page: memoryview, cell_num: int) -> int:
    """Read the key from cell `cell_num`."""
    offset = leaf_node_cell_offset(cell_num)
    return struct.unpack_from("<I", page, offset)[0]


def leaf_node_value(page: memoryview, cell_num: int) -> memoryview:
    """Return a memoryview of the value (serialized Row) in cell `cell_num`."""
    offset = leaf_node_cell_offset(cell_num) + LEAF_NODE_KEY_SIZE
    return page[offset : offset + LEAF_NODE_VALUE_SIZE]


def leaf_node_num_cells(page: memoryview) -> int:
    """Read the num_cells field from the header."""
    return struct.unpack_from("<I", page, 2)[0]

These functions form the lowest-level access API for leaf nodes. Everything above — insertion, search, splitting — is built on top of them.

Summary of Leaf Node Geometry

PropertyValueDerivation
Page size4096 bytesPAGE_SIZE constant
Header size6 bytes<B B I → 1 + 1 + 4
Cell size40 bytes4 (key) + 36 (value)
Max cells per leaf102(4096 − 6) ÷ 40
Wasted bytes per page104096 − 6 − (102 × 40)
Data capacity per leaf4080 bytes102 × 40

With the leaf node structure defined, we can proceed to the two critical operations: inserting a key-value pair in sorted order, and searching for a key using binary search.