Skip to main content

On This Page

Meta Open Sources OpenZL: A Universal Compression Framework for Structured Data

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

OpenZL: A New Approach to Data Compression for Structured Data

Meta has introduced OpenZL, an open-source data compression framework designed to significantly improve compression efficiency for structured datasets. This framework addresses the limitations of traditional compression methods, which treat data as raw byte streams, by explicitly modeling data structures like schemas, columnar layouts, and repetitive patterns. OpenZL offers better compression ratios and speeds, alongside operational simplicity, making it a valuable tool for modern data infrastructures.

Core Functionality and Design

OpenZL’s core innovation lies in its structured compression approach. Unlike general-purpose compressors, OpenZL analyzes the data schema to apply a configurable sequence of reversible transforms. This allows it to leverage the inherent order and patterns within structured data, leading to more efficient compression. The framework utilizes “Compression Plans,” which are generated offline by a “trainer” component based on the data schema. These plans are embedded within the compressed data frame, enabling decompression without requiring external metadata.

Key Features:

  • Schema Modeling: OpenZL explicitly models data structures (e.g., columnar layouts, enumerations) to optimize compression.
  • Universal Decompressor: A single binary can decompress any OpenZL file, regardless of the specific transform sequence used. This simplifies deployment and maintenance.
  • Offline Plan Generation: Compression Plans are generated offline, allowing for optimized compression without impacting encoding performance.
  • Deterministic Decoding: OpenZL’s fixed execution graph ensures reproducible decoding, crucial for data archival and long-term data integrity.
  • Fallback to Zstd: For data with minimal structure (e.g., plain text), OpenZL intelligently falls back to using Zstd.

Performance and Efficiency

Internal benchmarks on structured datasets, specifically using the Silesia Compression Corpus’s “sao star” records, demonstrate OpenZL’s superior performance.

Performance Comparison (on “sao star” records):

CompressorCompressed SizeCompression RatioCompression SpeedDecompression Speed
zstd -35,531,935 B× 1.31220 MB/s850 MB/s
xz -93,516,649 B× 2.063.5 MB/s45 MB/s
OpenZL4,414,351 B× 1.64340 MB/s1200 MB/s

As shown in the table, OpenZL achieves a better compression ratio (1.64x) while preserving or improving both compression and decompression speeds compared to zstd -3.

Operational Advantages

OpenZL offers significant operational advantages, particularly for data center deployments:

  • Single Decompression Surface: A single binary can handle decompression for all OpenZL files.
  • Fleet-Wide Updates: Updates to the compression plans can be deployed fleet-wide from a single binary, without requiring changes to the decoder.
  • Version Control: Clear version control is possible across large infrastructures due to the deterministic nature of OpenZL.
  • No External Metadata: The decoder doesn’t require external metadata; it executes the embedded recipe.

Data Description and Implementation

Users can describe their data structure using the Simple Data Description Language (SDDL) or a custom parser function. The offline trainer then uses a budgeted search to generate an optimal compression Plan. OpenZL is publicly available on GitHub for developers to experiment with and contribute to.

Use Cases

OpenZL is particularly well-suited for structured data, including:

  • Time-series datasets
  • ML tensors
  • Database tables

Conclusion

Meta’s OpenZL represents a significant advancement in data compression, especially for structured datasets. By leveraging schema modeling and a universal decompressor, it offers improved compression ratios, faster speeds, and simplified operational management compared to traditional compression methods. The open-source nature of the framework encourages community contributions and wider adoption.

Reference: https://www.infoq.com/news/2025/10/openzl-structured-compression/

Continue reading

Next article

New Android Trojan 'Herodotus' Evades Anti-Fraud Systems with Human-Like Typing Behavior

Related Content