The Storage Engine: Pages and the Pager

We have a row format (36 bytes, fixed-width) and a page size (4096 bytes). Now we need the component that sits between the B-Tree and the operating system: the Pager. Its contract is straightforward — given a page number, return that page’s data in memory. Given dirty page data, write it back to disk. Everything else in the database treats the Pager as an infinite array of 4 KB slots.

This chapter builds the Pager class from the ground up: opening files, reading pages, writing pages, and managing the page count. We deliberately leave out caching and eviction for now — the goal is a correct, minimal implementation that we can layer optimizations onto later.

The Pager owns the database file. It is the only module that calls open(), read(), write(), and seek(). Every other component — Table, Cursor, B-Tree, VM — interacts with data exclusively through the Pager.

Its responsibilities:

Open the database file in binary read-write mode, creating it if it does not exist.
Track the page count based on the file’s byte length.
Read a page from disk given its page number, returning a memoryview.
Write a page back to disk, ensuring exactly PAGE_SIZE bytes are flushed.
Close the file cleanly on shutdown.

The Pager does not decide what goes inside a page. In Chapters 1–3, pages hold sequential rows. Starting in Chapter 4, they hold B-Tree nodes. The Pager does not care — it moves opaque 4 KB blocks.

# pager.py
import os
import struct
from typing import Optional
from constants import PAGE_SIZE, MAX_PAGES

class Pager:
    """Manages fixed-size page I/O against a single database file."""

    def __init__(self, filename: str) -> None:
        self.filename: str = filename
        self.file_descriptor: Optional[int] = None
        self.file_length: int = 0
        self.num_pages: int = 0
        self.pages: list[Optional[bytearray]] = [None] * MAX_PAGES

    def open(self) -> None:
        """Open the database file in binary read/write mode.
        Creates the file if it does not exist.
        """
        flags = os.O_RDWR | os.O_CREAT
        self.file_descriptor = os.open(self.filename, flags, 0o644)
        self.file_length = os.lseek(self.file_descriptor, 0, os.SEEK_END)

        if self.file_length % PAGE_SIZE != 0:
            raise IOError(
                f"Database file is not a whole number of pages. "
                f"File length: {self.file_length}, page size: {PAGE_SIZE}"
            )
        self.num_pages = self.file_length // PAGE_SIZE

    def get_page(self, page_num: int) -> memoryview:
        """Return a memoryview of the requested page.
        Reads from disk on first access; subsequent calls return
        the cached in-memory copy.
        """
        if page_num >= MAX_PAGES:
            raise ValueError(
                f"Page number {page_num} exceeds maximum {MAX_PAGES}"
            )

        if self.pages[page_num] is None:
            # Allocate a fresh page buffer
            page = bytearray(PAGE_SIZE)

            # How many pages currently exist on disk?
            if page_num < self.num_pages:
                os.lseek(self.file_descriptor, page_num * PAGE_SIZE, os.SEEK_SET)
                bytes_read = os.read(self.file_descriptor, PAGE_SIZE)
                page[: len(bytes_read)] = bytes_read

            self.pages[page_num] = page

            # If we are creating a page beyond the current end, track it
            if page_num >= self.num_pages:
                self.num_pages = page_num + 1

        return memoryview(self.pages[page_num])

    def flush(self, page_num: int) -> None:
        """Write a single page to disk at the correct offset."""
        if self.pages[page_num] is None:
            raise ValueError(f"Tried to flush null page {page_num}")

        offset = page_num * PAGE_SIZE
        os.lseek(self.file_descriptor, offset, os.SEEK_SET)

        written = os.write(self.file_descriptor, self.pages[page_num])
        if written != PAGE_SIZE:
            raise IOError(
                f"Partial write on page {page_num}: "
                f"{written}/{PAGE_SIZE} bytes"
            )

    def close(self) -> None:
        """Flush all cached pages and close the file."""
        for i in range(self.num_pages):
            if self.pages[i] is not None:
                self.flush(i)
                self.pages[i] = None

        if self.file_descriptor is not None:
            os.close(self.file_descriptor)
            self.file_descriptor = None

Several things to note:

We use os.open / os.read / os.write instead of Python’s open(). The low-level POSIX calls give us direct control over file descriptors, flags, and seek positions. No internal Python buffering sits between us and the kernel.
get_page allocates lazily. The pages list starts as [None] * MAX_PAGES. We only allocate a 4 KB bytearray when someone actually asks for that page. This keeps memory usage proportional to the working set, not the maximum file size.
flush writes exactly PAGE_SIZE bytes. If the OS returns a short write (fewer bytes than requested), we raise immediately. A partial page on disk is corrupt data — there is no graceful recovery, so we fail loudly.
close flushes everything. On clean shutdown, every cached page gets written back. This is the coarse-grained durability path. The fine-grained path (WAL + fsync) comes in Chapter 6.

File Length Invariant

Notice the check in open():

if self.file_length % PAGE_SIZE != 0:
    raise IOError(...)

Our database file must always be an exact multiple of PAGE_SIZE. If it is not, something went wrong — a crash during a partial write, a file truncation, or external tampering. We refuse to open a corrupted file rather than silently misinterpreting page boundaries.

This invariant simplifies every offset calculation in the system: page_num * PAGE_SIZE is always valid for any page_num < num_pages.

With the Pager in place, we can now build the Table abstraction — the layer that maps logical row numbers to physical page locations.

What the Pager Must Do

The Pager Class

File Length Invariant