Skip to content

Quick Start

This guide walks through Colnade's core workflow in 5 minutes.

1. Define a schema

Schemas declare the shape of your data with typed columns:

import colnade as cn

class Users(cn.Schema):
    id: cn.Column[cn.UInt64]
    name: cn.Column[cn.Utf8]
    age: cn.Column[cn.UInt64]
    score: cn.Column[cn.Float64]

Each Column[DType] annotation creates a typed descriptor. Users.age is a Column[UInt64] that the type checker can verify.

2. Read typed data

from colnade_polars import read_parquet

df = read_parquet("users.parquet", Users)
# df is DataFrame[Users] — the type checker knows the schema

The read_parquet function returns a DataFrame[Users] with the Polars backend attached. When validation is enabled, it also checks that the data matches the schema.

3. Transform with type safety

# Filter — column references are attributes, not strings
adults = df.filter(Users.age >= 30)

# Sort — with typed sort expressions
by_score = df.sort(Users.score.desc())

# Compute new values
doubled = df.with_columns((Users.score * 2).alias(Users.score))

All these operations return DataFrame[Users] — the schema type is preserved.

4. Select and bind to an output schema

Operations like filter and sort keep all columns, so they preserve DataFrame[Users]. But select changes which columns exist, so it returns DataFrame[Any]. Use cast_schema() to bind the result to a new schema:

class UserSummary(cn.Schema):
    name: cn.Column[cn.Utf8]
    score: cn.Column[cn.Float64]

summary = df.select(Users.name, Users.score).cast_schema(UserSummary)
# summary is DataFrame[UserSummary]

5. Write results

from colnade_polars import write_parquet

write_parquet(summary, "summary.parquet")

What the type checker catches

If you misspell a column name:

df.filter(Users.naem > 25)
#         ^^^^^^^^^^
# ty error: Class `Users` has no attribute `naem`

If you pass the wrong schema type:

def process_orders(df: DataFrame[Orders]) -> None: ...

process_orders(users_df)  # DataFrame[Users] ≠ DataFrame[Orders]
# ty error: Object of type `DataFrame[Users]` is not assignable to `DataFrame[Orders]`

Enable runtime validation

By default, runtime validation is off for zero overhead. Enable it in development to catch schema mismatches at data boundaries:

import colnade as cn
cn.set_validation("structural")  # check columns, dtypes, nullability

Or via environment variable:

COLNADE_VALIDATE=structural pytest tests/

See Validation for details on validation levels.

Handling validation errors

When validation is enabled, schema mismatches raise SchemaError with structured information:

import colnade as cn

cn.set_validation("structural")

try:
    df = read_parquet("data.parquet", Users)
except cn.SchemaError as e:
    print(e.missing_columns)   # e.g. ["score"]
    print(e.type_mismatches)   # e.g. {"age": ("UInt64", "Utf8")}
    print(e.null_violations)   # e.g. ["name"]

See SchemaError for the full list of attributes.

Next steps