colnade.dataframe¶
DataFrame, LazyFrame, GroupBy, JoinedDataFrame, and JoinedLazyFrame.
DataFrame¶
DataFrame(*, _data=None, _schema=None, _backend=None)
¶
Bases: Generic[S]
A typed, materialized DataFrame parameterized by a Schema.
Schema-preserving operations (filter, sort, limit, etc.) return
DataFrame[S]. Schema-transforming operations (select, group_by+agg)
return DataFrame[Any] and require cast_schema() to bind to a
named output schema.
height
property
¶
Return the number of rows.
width
property
¶
Return the number of columns.
Raises TypeError on DataFrame[Any] (schema erased).
Use cast_schema() first to bind to a named schema.
shape
property
¶
Return (rows, columns).
to_native()
¶
Return the underlying backend-native data object (e.g. pl.DataFrame).
__len__()
¶
Return the number of rows.
is_empty()
¶
Return True if the DataFrame has zero rows.
iter_rows_as(row_type)
¶
Iterate rows, constructing row_type instances via row_type(**row_dict).
Works with Schema.Row (frozen dataclass), dict, plain
dataclasses, NamedTuple, Pydantic models, or any callable
accepting **kwargs.
item(column=None)
¶
Extract a scalar value from a single-row DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
Column[Any] | None
|
Column to extract. If |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A plain Python scalar whose type corresponds to the column dtype |
Any
|
(e.g. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the shape constraint is not met (1×1 when
column is |
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
head(n=5)
¶
Return the first n rows (materialized only).
tail(n=5)
¶
Return the last n rows (materialized only).
sample(n)
¶
Return a random sample of n rows (materialized only).
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns. Returns DataFrame[S] (optimistic).
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> DataFrame[Any]
Select columns. Returns DataFrame[Any] — use cast_schema() to bind.
agg(*exprs)
¶
Aggregate all rows into a single row.
group_by(*keys)
¶
Group by columns for aggregation.
join(other, on, how='inner')
¶
Join with another DataFrame on a JoinCondition.
cast_schema(schema, mapping=None, extra='drop')
¶
Bind to a new schema via mapping resolution.
lazy()
¶
Convert to a lazy query plan.
with_raw(fn)
¶
Apply a function to the raw engine DataFrame and re-wrap.
The function receives the underlying engine DataFrame (e.g.
pl.DataFrame, pd.DataFrame) and must return the same type.
The result is wrapped back into DataFrame[S] with the same
schema and backend. If validation is enabled, the result is
validated before returning.
A bounded escape hatch — like Rust's unsafe block.
validate()
¶
Validate that the data conforms to the schema.
Always runs structural checks (columns, types, nullability) and
value-level constraint checks (Field() constraints,
@schema_check) regardless of the validation level toggle.
to_batches(batch_size=None)
¶
Convert to an iterator of typed Arrow batches.
Delegates to the backend's to_arrow_batches() method, wrapping
each raw pa.RecordBatch in an ArrowBatch[S] to preserve
schema type information across the boundary.
from_dict(data, schema, backend)
classmethod
¶
Create a DataFrame from a columnar dict.
The backend reads column dtypes from schema and coerces values to the correct native types. Validates if validation is enabled.
from_batches(batches, schema, backend)
classmethod
¶
Create a DataFrame from an iterator of typed Arrow batches.
Unwraps each ArrowBatch[S] to its raw pa.RecordBatch and
delegates to the backend's from_arrow_batches() method.
LazyFrame¶
LazyFrame(*, _data=None, _schema=None, _backend=None)
¶
Bases: Generic[S]
A typed, lazy query plan parameterized by a Schema.
Supports the same operations as DataFrame. Use collect() to materialize.
width
property
¶
Return the number of columns.
Derivable from the schema without materializing. Raises TypeError
on LazyFrame[Any] (schema erased).
height
property
¶
Return the number of rows.
This triggers computation on lazy backends (e.g. Dask).
to_native()
¶
Return the underlying backend-native data object (e.g. pl.LazyFrame).
__len__()
¶
Return the number of rows.
to_batches(batch_size=None)
¶
Convert to an iterator of typed Arrow batches.
This triggers computation on lazy backends (e.g. Dask).
item(column=None)
¶
Extract a scalar value from a single-row LazyFrame.
This triggers computation on lazy backends.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
Column[Any] | None
|
Column to extract. If |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A plain Python scalar whose type corresponds to the column dtype |
Any
|
(e.g. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the shape constraint is not met (1×1 when
column is |
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
head(n=5)
¶
Return the first n rows (alias for limit).
tail(n=5)
¶
Return the last n rows.
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns. Returns LazyFrame[S] (optimistic).
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> LazyFrame[Any]
Select columns. Returns LazyFrame[Any] — use cast_schema() to bind.
agg(*exprs)
¶
Aggregate all rows into a single row.
group_by(*keys)
¶
Group by columns for aggregation.
join(other, on, how='inner')
¶
Join with another LazyFrame on a JoinCondition.
cast_schema(schema, mapping=None, extra='drop')
¶
Bind to a new schema via mapping resolution.
collect()
¶
Materialize the lazy query plan into a DataFrame.
with_raw(fn)
¶
Apply a function to the raw engine LazyFrame and re-wrap.
The function receives the underlying engine LazyFrame and must
return the same type. The result is wrapped back into
LazyFrame[S] with the same schema and backend.
Validation is deferred — it runs at collect() time if enabled,
not at with_raw() time.
validate()
¶
Validate that the data conforms to the schema.
Always runs structural checks and value-level constraint checks regardless of the validation level toggle.
GroupBy¶
GroupBy(*, _data=None, _schema=None, _keys=(), _backend=None)
¶
Bases: Generic[S]
GroupBy on a materialized DataFrame.
agg(*exprs)
¶
Aggregate grouped data. Returns DataFrame[Any] — use cast_schema().
LazyGroupBy¶
LazyGroupBy(*, _data=None, _schema=None, _keys=(), _backend=None)
¶
Bases: Generic[S]
GroupBy on a lazy query plan.
agg(*exprs)
¶
Aggregate grouped data. Returns LazyFrame[Any] — use cast_schema().
JoinedDataFrame¶
JoinedDataFrame(*, _data=None, _schema_left=None, _schema_right=None, _backend=None)
¶
Bases: Generic[S, S2]
A transitional typed DataFrame resulting from a join of two schemas.
Operations accept columns from either schema S or S2. Available operations
are limited to filtering, sorting, and other row-level transforms. Use
cast_schema() to flatten into a DataFrame[S3] before group_by,
head/tail/sample, or passing to functions that expect a single schema.
to_native()
¶
Return the underlying backend-native data object (e.g. pl.DataFrame).
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns.
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> DataFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> DataFrame[Any]
Select columns. Returns DataFrame[Any] — use cast_schema() to bind.
cast_schema(schema, mapping=None, extra='drop')
¶
Flatten join result into a single-schema DataFrame.
lazy()
¶
Convert to a lazy query plan.
JoinedLazyFrame¶
JoinedLazyFrame(*, _data=None, _schema_left=None, _schema_right=None, _backend=None)
¶
Bases: Generic[S, S2]
A transitional typed lazy query plan resulting from a join of two schemas.
Available operations are limited to filtering, sorting, and other row-level
transforms. Use cast_schema() to flatten into a LazyFrame[S3]
before group_by or passing to functions that expect a single schema.
to_native()
¶
Return the underlying backend-native data object (e.g. pl.LazyFrame).
filter(predicate)
¶
Filter rows by a boolean expression.
sort(*columns, descending=False)
¶
Sort rows by columns or sort expressions.
limit(n)
¶
Limit to the first n rows.
unique(*columns)
¶
Remove duplicate rows based on the given columns.
drop_nulls(*columns)
¶
Drop rows with null values in the given columns.
with_columns(*exprs)
¶
Add or overwrite columns.
select(*columns)
¶
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
) -> LazyFrame[Any]
select(
c1: Column[Any],
c2: Column[Any],
c3: Column[Any],
c4: Column[Any],
c5: Column[Any],
c6: Column[Any],
c7: Column[Any],
c8: Column[Any],
) -> LazyFrame[Any]
Select columns. Returns LazyFrame[Any] — use cast_schema() to bind.
cast_schema(schema, mapping=None, extra='drop')
¶
Flatten join result into a single-schema LazyFrame.
collect()
¶
Materialize the lazy query plan into a JoinedDataFrame.
concat¶
concat(*frames)
¶
Concatenate DataFrames or LazyFrames vertically (stack rows).
All inputs must share the same schema class (identity check, not
structural equality) and the same frame type (all DataFrame or all
LazyFrame). The backend is taken from the first frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*frames
|
DataFrame[S] | LazyFrame[S]
|
Two or more frames to stack. All must be parameterised by
the same |
()
|
Returns:
| Type | Description |
|---|---|
DataFrame[S] | LazyFrame[S]
|
A new |
DataFrame[S] | LazyFrame[S]
|
the input frames, in order. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fewer than 2 frames are provided, or if any frame's schema does not match the first frame's schema. |
TypeError
|
If frames mix |
RuntimeError
|
If the first frame has no backend attached. |
Usage::
combined = concat(df_jan, df_feb, df_mar) # DataFrame[Sales]
combined = concat(lazy_jan, lazy_feb) # LazyFrame[Sales]