Skip to main content
data systems mechanics invariants in distributed architectures

Columnar Storage and Encoding

6 min read Chapter 4 of 28
Summary

This section introduces columnar storage as a paradigm...

This section introduces columnar storage as a paradigm optimized for analytic (OLAP) workloads, contrasting it with row-oriented storage for transactional (OLTP) systems. Key concepts include compression techniques like Run-Length Encoding (RLE) and dictionary encoding that leverage columnar data's homogeneity, and bitmap indexes for efficient filtering of low-cardinality columns. The section explains vectorized query execution, which processes data in batches to leverage CPU SIMD instructions and reduce overhead. Materialization strategies—late versus early—are explored as critical optimizations for assembling rows from columnar data. A comprehensive comparison table details the characteristics of OLTP versus OLAP workloads, while another table summarizes encoding techniques. Through practical code examples for RLE and vectorized filtering, the section demonstrates how these concepts translate to implementation, establishing why columnar storage is foundational for modern analytical databases like Amazon Redshift and Google BigQuery.

Columnar Storage and Encoding

Columnar storage is a paradigm optimized for analytic workloads, characterized by read-mostly access patterns, complex queries, and the need for high throughput. Unlike row-oriented storage, which stores all column values for a single row contiguously, columnar storage organizes data by columns, storing all values of a single column together. This layout enables efficient compression, reduces I/O for queries that only access a subset of columns, and facilitates vectorized query execution, which processes data in batches to leverage CPU SIMD instructions.

Compression Techniques for Columnar Data

Several compression techniques are particularly effective for columnar data due to its homogeneous nature. Run-Length Encoding (RLE) is a simple yet effective method for columns with repeated values, storing each sequence of identical values as a single value and a count. Dictionary Encoding maps distinct values in a column to compact integer IDs, reducing storage size, especially for columns with low cardinality. Bitmap Indexes are used for filtering on low-cardinality columns, representing each distinct value as a bitmap where each bit corresponds to a row, enabling fast bitwise operations for filtering.

Example: Run-Length Encoding Implementation

# Example of Run-Length Encoding (RLE) for a column of integers.
# Input column (already sorted for best compression): [1,1,1,2,2,3,3,3,3]
# RLE encoded as list of (value, run_length) pairs.

input_column = [1, 1, 1, 2, 2, 3, 3, 3, 3]

def rle_encode(column):
    if not column:
        return []
    encoded = []
    current_val = column[0]
    run_length = 1
    for val in column[1:]:
        if val == current_val:
            run_length += 1
        else:
            encoded.append((current_val, run_length))
            current_val = val
            run_length = 1
    encoded.append((current_val, run_length))
    return encoded

encoded = rle_encode(input_column)
print(f"Encoded RLE: {encoded}")
# Output: [(1, 3), (2, 2), (3, 4)]

# Decoding function
def rle_decode(encoded):
    decoded = []
    for value, count in encoded:
        decoded.extend([value] * count)
    return decoded

decoded = rle_decode(encoded)
print(f"Decoded column: {decoded}")
assert decoded == input_column

Vectorized Query Execution

Vectorized query execution is a processing technique where operations are performed on batches of values (vectors) at once, rather than on single values (scalars). This approach reduces per-tuple overhead and leverages CPU SIMD (Single Instruction, Multiple Data) instructions for high throughput. Modern analytical databases and data warehouses often implement vectorized execution engines to improve performance on columnar data.

Example: Vectorized Filter Operation

# Pseudocode for a vectorized filter operation (late materialization style)
# Processes a batch (vector) of column values and a selection bitmap.

import numpy as np

# Column data for 8 rows (a vector)
age_vector = np.array([25, 42, 31, 28, 55, 19, 33, 60], dtype=np.int32)

# Vectorized comparison: produces a boolean mask
predicate_mask = age_vector > 30
print(f"Predicate mask (age > 30): {predicate_mask}")
# Output: [False, True, True, False, True, False, True, True]

# Use mask to select positions (row IDs) that pass the filter
selected_positions = np.where(predicate_mask)[0]
print(f"Selected row positions (IDs): {selected_positions}")
# Output: [1, 2, 4, 6, 7]

# Late materialization: We now have positions, not full rows.
# Subsequent operators can use these positions to fetch other column values.
# E.g., fetch 'salary' for those positions:
salary_vector = np.array([50000, 75000, 60000, 48000, 90000, 35000, 80000, 95000], dtype=np.int32)
selected_salaries = salary_vector[selected_positions]
print(f"Salaries for selected rows: {selected_salaries}")

Materialization Strategies

Materialization refers to the process of assembling a full row from columnar data by combining values from different columns. There are two primary strategies: late materialization and early materialization. Late materialization delays the assembly of full rows until the final result set is needed, reducing I/O and memory bandwidth by only materializing rows that pass all filters. Early materialization assembles rows early in the query pipeline, which can be beneficial for operations that need many columns from a row or for final result output but may increase memory usage and I/O for intermediate results.

Diagram: Late vs. Early Materialization

Diagram: Late vs. Early Materialization in a Columnar Query Pipeline

Description:
- **Query**: SELECT name, salary FROM employees WHERE age > 30 AND dept = 'Sales';
- **Late Materialization Path**:
  1. Scan column 'age' → apply predicate >30 → output bitmap B1 of qualifying row IDs.
  2. Scan column 'dept' → apply predicate = 'Sales' → output bitmap B2.
  3. Bitwise AND(B1, B2) → final bitmap F of row IDs matching both predicates.
  4. **Only now materialize**: Use bitmap F to fetch values from 'name' and 'salary' columns and assemble final result rows.
- **Early Materialization Path**:
  1. Scan and assemble full rows (age, dept, name, salary) from all columns.
  2. Apply predicates (>30 and ='Sales') on the materialized rows, filtering in memory.
  3. Project to output columns (name, salary).
- **Comparison Arrow**: Late materialization minimizes data movement (only moves column data for filtered rows). Early materialization moves all column data for all rows early.

Comparison of Storage Layouts

Workload CharacteristicTransactional (OLTP)Analytic (OLAP)
Primary OperationPoint reads/writes (by key)Aggregations, scans, joins
Data Access PatternRandom I/O (index lookups)Sequential I/O (full/partial scans)
Write FrequencyHigh (frequent inserts/updates)Low (batch loads, infrequent updates)
Read PatternSmall, selective readsLarge, bulk reads
Storage LayoutRow-oriented (optimized for whole-row access)Column-oriented (optimized for column scans)
Index TypeB-tree / LSM-tree (for point queries)Bitmap, columnar projections, zone maps
Optimization GoalLow latency, high concurrencyHigh throughput, efficient compression
Example SystemsPostgreSQL, MySQL, Oracle DBAmazon Redshift, Google BigQuery, Snowflake

Encoding Techniques

Encoding TechniqueBest ForCompression MechanismExample (Before → After)
Run-Length Encoding (RLE)Sorted columns, low cardinalityReplace consecutive identical values with (value, count)[A,A,A,B,B] → [(A,3),(B,2)]
Dictionary EncodingColumns with repeated valuesMap distinct values to compact integer IDs[‘Red’,‘Blue’,‘Red’] → [0,1,0] + Dict{0:‘Red’,1:‘Blue’}
Bit-PackingSmall-range integersStore values using minimal bits per value (e.g., 5 bits for values 0-31)-
Delta EncodingSorted, sequential values (e.g., timestamps)Store difference from previous value[1000, 1002, 1005] → [1000, +2, +3]
Frame Of ReferenceInteger columns with local clusteringStore values relative to a base (min value in block)Block values [103, 105, 104] → base=100, encoded [3,5,4]

Conclusion

Columnar storage and encoding are foundational elements of modern analytical databases and data warehouses, offering significant advantages in terms of compression, query performance, and storage efficiency. By understanding the principles behind columnar storage, vectorized query execution, and materialization strategies, developers and data analysts can better leverage these technologies to support complex analytics and business intelligence applications.

Sources

This section has drawn on internal knowledge and examples to illustrate key concepts in columnar storage and encoding. For further reading and deeper dives into specific technologies and implementations, readers are encouraged to explore the documentation and research papers related to Amazon Redshift, Google BigQuery, Snowflake, and other leading analytical databases and data warehouses.