Skip to content

colnade_dask

Dask backend adapter, construction functions, and I/O functions.

DaskBackend

DaskBackend

Colnade backend adapter for Dask.

Expression translation produces callables (df) -> Series | scalar since Dask, like Pandas, has no standalone lazy expression API. The callables build lazy Dask task graphs instead of executing immediately.

translate_expr(expr)

Recursively translate a Colnade AST node to a callable (df -> result).

validate_schema(source, schema)

Validate that a Dask DataFrame matches the schema.

validate_field_constraints(source, schema)

Validate value-level constraints (Field(), @schema_check) on data.

to_arrow_batches(source, batch_size)

Convert a Dask DataFrame to an iterator of Arrow RecordBatches.

from_arrow_batches(batches, schema)

Reconstruct a Dask DataFrame from Arrow RecordBatches.

from_dict(data, schema)

Create a Dask DataFrame from a columnar dict with schema-driven dtypes.

Construction Functions

from_dict(schema, data)

Create a typed LazyFrame from a columnar dict.

Returns a LazyFrame because Dask is inherently lazy — use .collect() to materialize. The schema drives dtype coercion so plain Python values ([1, 2, 3]) are cast to the correct native types (e.g. UInt64).

from_rows(schema, rows)

Create a typed LazyFrame from Row[S] instances.

Returns a LazyFrame because Dask is inherently lazy — use .collect() to materialize. The type checker verifies that rows match the schema — passing Orders.Row where Users.Row is expected is a static error.

I/O Functions

scan_parquet(path, schema, **kwargs)

Lazily scan a Parquet file into a typed LazyFrame backed by Dask.

scan_csv(path, schema, **kwargs)

Lazily scan a CSV file into a typed LazyFrame backed by Dask.

Applies the schema's dtype mapping to ensure correct column types.

write_parquet(df, path, **kwargs)

Write a DataFrame or LazyFrame to a Parquet file.

write_csv(df, path, **kwargs)

Write a DataFrame or LazyFrame to a CSV file.