colnade.dataframe¶
DataFrame, LazyFrame, GroupBy, JoinedDataFrame, and untyped escape hatches.
DataFrame¶
DataFrame(*, _data=None, _schema=None, _backend=None)
¶
Bases: Generic[S]
A typed, materialized DataFrame parameterized by a Schema.
Schema-preserving operations (filter, sort, limit, etc.) return
DataFrame[S]. Schema-transforming operations (select, group_by+agg)
return DataFrame[Any] and require cast_schema() to bind to a
named output schema.
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
head(n=5)
¶
Return the first n rows (materialized only).
tail(n=5)
¶
Return the last n rows (materialized only).
sample(n)
¶
Return a random sample of n rows (materialized only).
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns. Returns DataFrame[S] (optimistic).
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> DataFrame[Any]
Select columns. Returns DataFrame[Any] — use cast_schema() to bind.
group_by(*keys)
¶
Group by columns for aggregation.
join(other, on, how='inner')
¶
Join with another DataFrame on a JoinCondition.
cast_schema(schema, mapping=None, extra='drop')
¶
Bind to a new schema via mapping resolution.
lazy()
¶
Convert to a lazy query plan.
untyped()
¶
Drop type information — string-based escape hatch.
validate()
¶
Validate that the data conforms to the schema.
to_batches(batch_size=None)
¶
Convert to an iterator of typed Arrow batches.
Delegates to the backend's to_arrow_batches() method, wrapping
each raw pa.RecordBatch in an ArrowBatch[S] to preserve
schema type information across the boundary.
from_batches(batches, schema, backend)
classmethod
¶
Create a DataFrame from an iterator of typed Arrow batches.
Unwraps each ArrowBatch[S] to its raw pa.RecordBatch and
delegates to the backend's from_arrow_batches() method.
LazyFrame¶
LazyFrame(*, _data=None, _schema=None, _backend=None)
¶
Bases: Generic[S]
A typed, lazy query plan parameterized by a Schema.
Same operations as DataFrame except: no head(), tail(), sample(), to_batches() (materialized-only ops). Use collect() to materialize.
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns. Returns LazyFrame[S] (optimistic).
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> LazyFrame[Any]
Select columns. Returns LazyFrame[Any] — use cast_schema() to bind.
group_by(*keys)
¶
Group by columns for aggregation.
join(other, on, how='inner')
¶
Join with another LazyFrame on a JoinCondition.
cast_schema(schema, mapping=None, extra='drop')
¶
Bind to a new schema via mapping resolution.
collect()
¶
Materialize the lazy query plan into a DataFrame.
untyped()
¶
Drop type information — string-based escape hatch.
validate()
¶
Validate that the data conforms to the schema.
GroupBy¶
GroupBy(*, _data=None, _schema=None, _keys=(), _backend=None)
¶
Bases: Generic[S]
GroupBy on a materialized DataFrame.
agg(*exprs)
¶
Aggregate grouped data. Returns DataFrame[Any] — use cast_schema().
LazyGroupBy¶
LazyGroupBy(*, _data=None, _schema=None, _keys=(), _backend=None)
¶
Bases: Generic[S]
GroupBy on a lazy query plan.
agg(*exprs)
¶
Aggregate grouped data. Returns LazyFrame[Any] — use cast_schema().
JoinedDataFrame¶
JoinedDataFrame(*, _data=None, _schema_left=None, _schema_right=None, _backend=None)
¶
Bases: Generic[S, S2]
A typed DataFrame resulting from a join of two schemas.
Operations accept columns from either schema S or S2. Schema-preserving
operations return JoinedDataFrame[S, S2]. Schema-transforming operations
(select) return DataFrame[Any] and require cast_schema() to bind.
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
head(n=5)
¶
Return the first n rows (materialized only).
tail(n=5)
¶
Return the last n rows (materialized only).
sample(n)
¶
Return a random sample of n rows (materialized only).
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns.
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> DataFrame[Any]
Select columns. Returns DataFrame[Any] — use cast_schema() to bind.
cast_schema(schema, mapping=None, extra='drop')
¶
Flatten join result into a single-schema DataFrame.
lazy()
¶
Convert to a lazy query plan.
untyped()
¶
Drop type information — string-based escape hatch.
JoinedLazyFrame¶
JoinedLazyFrame(*, _data=None, _schema_left=None, _schema_right=None, _backend=None)
¶
Bases: Generic[S, S2]
A typed lazy query plan resulting from a join of two schemas.
Same operations as JoinedDataFrame except: no head(), tail(), sample() (materialized-only ops). Use collect() to materialize.
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns.
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> LazyFrame[Any]
Select columns. Returns LazyFrame[Any] — use cast_schema() to bind.
cast_schema(schema, mapping=None, extra='drop')
¶
Flatten join result into a single-schema LazyFrame.
collect()
¶
Materialize the lazy query plan into a JoinedDataFrame.
untyped()
¶
Drop type information — string-based escape hatch.
UntypedDataFrame¶
UntypedDataFrame(*, _data=None)
¶
A DataFrame with no schema parameter. String-based column access.
select(*columns)
¶
Select columns by name.
filter(expr)
¶
Filter rows.
with_columns(*exprs)
¶
Add or overwrite columns.
sort(*columns, descending=False)
¶
Sort rows by column names.
limit(n)
¶
Limit to the first n rows.
head(n=5)
¶
Return the first n rows.
tail(n=5)
¶
Return the last n rows.
to_typed(schema)
¶
Bind to a schema.
UntypedLazyFrame¶
UntypedLazyFrame(*, _data=None)
¶
A LazyFrame with no schema parameter. String-based column access.
select(*columns)
¶
Select columns by name.
filter(expr)
¶
Filter rows.
with_columns(*exprs)
¶
Add or overwrite columns.
sort(*columns, descending=False)
¶
Sort rows by column names.
limit(n)
¶
Limit to the first n rows.
collect()
¶
Materialize the lazy query plan.
to_typed(schema)
¶
Bind to a schema.