Skip to content

colnade.dataframe

DataFrame, LazyFrame, GroupBy, JoinedDataFrame, and untyped escape hatches.

DataFrame

DataFrame(*, _data=None, _schema=None, _backend=None)

Bases: Generic[S]

A typed, materialized DataFrame parameterized by a Schema.

Schema-preserving operations (filter, sort, limit, etc.) return DataFrame[S]. Schema-transforming operations (select, group_by+agg) return DataFrame[Any] and require cast_schema() to bind to a named output schema.

filter(predicate)

Filter rows by a boolean expression.

sort(*columns, descending=False)

Sort rows by columns or sort expressions.

limit(n)

Limit to the first n rows.

head(n=5)

Return the first n rows (materialized only).

tail(n=5)

Return the last n rows (materialized only).

sample(n)

Return a random sample of n rows (materialized only).

unique(*columns)

Remove duplicate rows based on the given columns.

drop_nulls(*columns)

Drop rows with null values in the given columns.

with_columns(*exprs)

Add or overwrite columns. Returns DataFrame[S] (optimistic).

select(*columns)

select(c1: Column[Any]) -> DataFrame[Any]
select(c1: Column[Any], c2: Column[Any]) -> DataFrame[Any]
select(
    c1: Column[Any], c2: Column[Any], c3: Column[Any]
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
    c10: Column[Any],
) -> DataFrame[Any]

Select columns. Returns DataFrame[Any] — use cast_schema() to bind.

group_by(*keys)

Group by columns for aggregation.

join(other, on, how='inner')

Join with another DataFrame on a JoinCondition.

cast_schema(schema, mapping=None, extra='drop')

Bind to a new schema via mapping resolution.

lazy()

Convert to a lazy query plan.

untyped()

Drop type information — string-based escape hatch.

validate()

Validate that the data conforms to the schema.

to_batches(batch_size=None)

Convert to an iterator of typed Arrow batches.

Delegates to the backend's to_arrow_batches() method, wrapping each raw pa.RecordBatch in an ArrowBatch[S] to preserve schema type information across the boundary.

from_batches(batches, schema, backend) classmethod

Create a DataFrame from an iterator of typed Arrow batches.

Unwraps each ArrowBatch[S] to its raw pa.RecordBatch and delegates to the backend's from_arrow_batches() method.

LazyFrame

LazyFrame(*, _data=None, _schema=None, _backend=None)

Bases: Generic[S]

A typed, lazy query plan parameterized by a Schema.

Same operations as DataFrame except: no head(), tail(), sample(), to_batches() (materialized-only ops). Use collect() to materialize.

filter(predicate)

Filter rows by a boolean expression.

sort(*columns, descending=False)

Sort rows by columns or sort expressions.

limit(n)

Limit to the first n rows.

unique(*columns)

Remove duplicate rows based on the given columns.

drop_nulls(*columns)

Drop rows with null values in the given columns.

with_columns(*exprs)

Add or overwrite columns. Returns LazyFrame[S] (optimistic).

select(*columns)

select(c1: Column[Any]) -> LazyFrame[Any]
select(c1: Column[Any], c2: Column[Any]) -> LazyFrame[Any]
select(
    c1: Column[Any], c2: Column[Any], c3: Column[Any]
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
    c10: Column[Any],
) -> LazyFrame[Any]

Select columns. Returns LazyFrame[Any] — use cast_schema() to bind.

group_by(*keys)

Group by columns for aggregation.

join(other, on, how='inner')

Join with another LazyFrame on a JoinCondition.

cast_schema(schema, mapping=None, extra='drop')

Bind to a new schema via mapping resolution.

collect()

Materialize the lazy query plan into a DataFrame.

untyped()

Drop type information — string-based escape hatch.

validate()

Validate that the data conforms to the schema.

GroupBy

GroupBy(*, _data=None, _schema=None, _keys=(), _backend=None)

Bases: Generic[S]

GroupBy on a materialized DataFrame.

agg(*exprs)

Aggregate grouped data. Returns DataFrame[Any] — use cast_schema().

LazyGroupBy

LazyGroupBy(*, _data=None, _schema=None, _keys=(), _backend=None)

Bases: Generic[S]

GroupBy on a lazy query plan.

agg(*exprs)

Aggregate grouped data. Returns LazyFrame[Any] — use cast_schema().

JoinedDataFrame

JoinedDataFrame(*, _data=None, _schema_left=None, _schema_right=None, _backend=None)

Bases: Generic[S, S2]

A typed DataFrame resulting from a join of two schemas.

Operations accept columns from either schema S or S2. Schema-preserving operations return JoinedDataFrame[S, S2]. Schema-transforming operations (select) return DataFrame[Any] and require cast_schema() to bind.

filter(predicate)

Filter rows by a boolean expression.

sort(*columns, descending=False)

Sort rows by columns or sort expressions.

limit(n)

Limit to the first n rows.

head(n=5)

Return the first n rows (materialized only).

tail(n=5)

Return the last n rows (materialized only).

sample(n)

Return a random sample of n rows (materialized only).

unique(*columns)

Remove duplicate rows based on the given columns.

drop_nulls(*columns)

Drop rows with null values in the given columns.

with_columns(*exprs)

Add or overwrite columns.

select(*columns)

select(c1: Column[Any]) -> DataFrame[Any]
select(c1: Column[Any], c2: Column[Any]) -> DataFrame[Any]
select(
    c1: Column[Any], c2: Column[Any], c3: Column[Any]
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
) -> DataFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
    c10: Column[Any],
) -> DataFrame[Any]

Select columns. Returns DataFrame[Any] — use cast_schema() to bind.

cast_schema(schema, mapping=None, extra='drop')

Flatten join result into a single-schema DataFrame.

lazy()

Convert to a lazy query plan.

untyped()

Drop type information — string-based escape hatch.

JoinedLazyFrame

JoinedLazyFrame(*, _data=None, _schema_left=None, _schema_right=None, _backend=None)

Bases: Generic[S, S2]

A typed lazy query plan resulting from a join of two schemas.

Same operations as JoinedDataFrame except: no head(), tail(), sample() (materialized-only ops). Use collect() to materialize.

filter(predicate)

Filter rows by a boolean expression.

sort(*columns, descending=False)

Sort rows by columns or sort expressions.

limit(n)

Limit to the first n rows.

unique(*columns)

Remove duplicate rows based on the given columns.

drop_nulls(*columns)

Drop rows with null values in the given columns.

with_columns(*exprs)

Add or overwrite columns.

select(*columns)

select(c1: Column[Any]) -> LazyFrame[Any]
select(c1: Column[Any], c2: Column[Any]) -> LazyFrame[Any]
select(
    c1: Column[Any], c2: Column[Any], c3: Column[Any]
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
) -> LazyFrame[Any]
select(
    c1: Column[Any],
    c2: Column[Any],
    c3: Column[Any],
    c4: Column[Any],
    c5: Column[Any],
    c6: Column[Any],
    c7: Column[Any],
    c8: Column[Any],
    c9: Column[Any],
    c10: Column[Any],
) -> LazyFrame[Any]

Select columns. Returns LazyFrame[Any] — use cast_schema() to bind.

cast_schema(schema, mapping=None, extra='drop')

Flatten join result into a single-schema LazyFrame.

collect()

Materialize the lazy query plan into a JoinedDataFrame.

untyped()

Drop type information — string-based escape hatch.

UntypedDataFrame

UntypedDataFrame(*, _data=None)

A DataFrame with no schema parameter. String-based column access.

select(*columns)

Select columns by name.

filter(expr)

Filter rows.

with_columns(*exprs)

Add or overwrite columns.

sort(*columns, descending=False)

Sort rows by column names.

limit(n)

Limit to the first n rows.

head(n=5)

Return the first n rows.

tail(n=5)

Return the last n rows.

to_typed(schema)

Bind to a schema.

UntypedLazyFrame

UntypedLazyFrame(*, _data=None)

A LazyFrame with no schema parameter. String-based column access.

select(*columns)

Select columns by name.

filter(expr)

Filter rows.

with_columns(*exprs)

Add or overwrite columns.

sort(*columns, descending=False)

Sort rows by column names.

limit(n)

Limit to the first n rows.

collect()

Materialize the lazy query plan.

to_typed(schema)

Bind to a schema.