How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution

Ibis allows developers to build portable, in-database feature engineering pipelines that execute entirely inside the database, similar to Pandas, using lazy Python APIs. The system was demonstrated using DuckDB, registering data safely and defining complex transformations without moving data into local memory.

Why This Matters

Traditional data science workflows often involve pulling large datasets into Python environments (like Pandas) for feature engineering, resulting in significant data transfer overhead, memory constraints, and scalability issues. This is especially problematic with modern datasets that routinely exceed available RAM. Ibis addresses this by pushing computation into the database, close to the data, minimizing data movement. The cost of inefficient pipelines can scale quickly, often exceeding infrastructure costs for storage and compute.

Key Insights

Lazy Evaluation: Ibis expressions are not executed immediately; they are compiled into SQL and run within the database.
Backend Agnostic: Ibis provides a single Python API that translates to the specific SQL dialect of the connected backend, e.g., DuckDB, PostgreSQL, or BigQuery.
Window Functions: Ibis supports complex window functions for time-series analysis and other advanced feature engineering tasks.

Working Example

!pip -q install "ibis-framework[duckdb,examples]" duckdb pyarrow pandas
import ibis
from ibis import _
print("Ibis version:", ibis.__version__)
con = ibis.duckdb.connect()
ibis.options.interactive = True

try:
    base_expr = ibis.examples.penguins.fetch(backend=con)
except TypeError:
    base_expr = ibis.examples.penguins.fetch()
if "penguins" not in con.list_tables():
    try:
        con.create_table("penguins", base_expr, overwrite=True)
    except Exception:
        con.create_table("penguins", base_expr.execute(), overwrite=True)
t = con.table("penguins")
print(t.schema())

Practical Applications

Fraud Detection: Financial institutions can use Ibis to build real-time fraud detection pipelines that leverage in-database features, minimizing latency.
Pitfall: Relying on eager execution in Pandas and then writing results back to the database negates Ibis’ benefits and reintroduces data transfer overhead.

References:

https://www.marktechpost.com/2026/01/09/how-to-build-portable-in-database-feature-engineering-pipelines-with-ibis-using-lazy-python-apis-and-duckdb-execution/

On This Page

How to Build Portable, In-Database Feature Engineering Pipelines with Ibis Using Lazy Python APIs and DuckDB Execution