Quick Start¶
This guide walks through Colnade's core workflow in 5 minutes.
1. Define a schema¶
Schemas declare the shape of your data with typed columns:
import colnade as cn
class Users(cn.Schema):
id: cn.Column[cn.UInt64]
name: cn.Column[cn.Utf8]
age: cn.Column[cn.UInt64]
score: cn.Column[cn.Float64]
Each Column[DType] annotation creates a typed descriptor. Users.age is a Column[UInt64] that the type checker can verify.
2. Read typed data¶
from colnade_polars import read_parquet
df = read_parquet("users.parquet", Users)
# df is DataFrame[Users] — the type checker knows the schema
The read_parquet function returns a DataFrame[Users] with the Polars backend attached. When validation is enabled, it also checks that the data matches the schema.
3. Transform with type safety¶
# Filter — column references are attributes, not strings
adults = df.filter(Users.age >= 30)
# Sort — with typed sort expressions
by_score = df.sort(Users.score.desc())
# Compute new values
doubled = df.with_columns((Users.score * 2).alias(Users.score))
All these operations return DataFrame[Users] — the schema type is preserved.
4. Select and bind to an output schema¶
Operations like filter and sort keep all columns, so they preserve DataFrame[Users]. But select changes which columns exist, so it returns DataFrame[Any]. Use cast_schema() to bind the result to a new schema:
class UserSummary(cn.Schema):
name: cn.Column[cn.Utf8]
score: cn.Column[cn.Float64]
summary = df.select(Users.name, Users.score).cast_schema(UserSummary)
# summary is DataFrame[UserSummary]
5. Write results¶
What the type checker catches¶
If you misspell a column name:
If you pass the wrong schema type:
def process_orders(df: DataFrame[Orders]) -> None: ...
process_orders(users_df) # DataFrame[Users] ≠ DataFrame[Orders]
# ty error: Object of type `DataFrame[Users]` is not assignable to `DataFrame[Orders]`
Enable runtime validation¶
By default, runtime validation is off for zero overhead. Enable it in development to catch schema mismatches at data boundaries:
Or via environment variable:
See Validation for details on validation levels.
Handling validation errors¶
When validation is enabled, schema mismatches raise SchemaError with structured information:
import colnade as cn
cn.set_validation("structural")
try:
df = read_parquet("data.parquet", Users)
except cn.SchemaError as e:
print(e.missing_columns) # e.g. ["score"]
print(e.type_mismatches) # e.g. {"age": ("UInt64", "Utf8")}
print(e.null_violations) # e.g. ["name"]
See SchemaError for the full list of attributes.
Next steps¶
- User Guide — understand the architecture
- Tutorials — worked examples with real data
- API Reference — complete API documentation