Schemas¶

Schemas are the foundation of Colnade's type safety. They declare the structure of your data as Python classes.

Defining a schema¶

from colnade import Column, Schema, UInt64, Float64, Utf8

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt64]
    score: Column[Float64]

Each annotation creates a Column descriptor on the class. After class creation:

Users.id is a Column[UInt64] instance with name="id"
Users._columns is a dict: {"id": Column, "name": Column, "age": Column, "score": Column}

Data types¶

Colnade provides types that map to backend-native types:

Category	Types
Boolean	`Bool`
Unsigned integers	`UInt8`, `UInt16`, `UInt32`, `UInt64`
Signed integers	`Int8`, `Int16`, `Int32`, `Int64`
Floating point	`Float32`, `Float64`
String / Binary	`Utf8`, `Binary`
Temporal	`Date`, `Time`, `Datetime`, `Duration`
Nested	`Struct[S]`, `List[T]`

Nullable columns¶

Use T | None to mark a column as nullable:

class Users(Schema):
    age: Column[UInt64 | None]    # nullable integer
    tags: Column[List[Utf8] | None]  # nullable list

Schema inheritance¶

Schemas support standard Python inheritance:

class BaseRecord(Schema):
    id: Column[UInt64]
    created_at: Column[Datetime]

class Users(BaseRecord):
    name: Column[Utf8]
    # Inherits id and created_at

Trait composition¶

Combine multiple schemas via multiple inheritance:

class Timestamped(Schema):
    created_at: Column[Datetime]
    updated_at: Column[Datetime]

class SoftDeletable(Schema):
    deleted_at: Column[Datetime | None]

class Users(Timestamped, SoftDeletable):
    id: Column[UInt64]
    name: Column[Utf8]
    # Has: id, name, created_at, updated_at, deleted_at

mapped_from¶

Use mapped_from to declare how columns map between schemas during cast_schema:

from colnade import mapped_from

class UserSummary(Schema):
    user_name: Column[Utf8] = mapped_from(Users.name)
    user_id: Column[UInt64] = mapped_from(Users.id)

When you call df.cast_schema(UserSummary), the user_name column is populated from Users.name and user_id from Users.id.

Nullability checking

mapped_from preserves the source column's type. Mapping a nullable column (Column[UInt64 | None]) to a non-nullable annotation (Column[UInt64]) is a type error caught by the type checker.

SchemaError¶

Schema validation raises SchemaError with structured information:

from colnade import SchemaError

try:
    df.validate()
except SchemaError as e:
    print(e.missing_columns)   # columns in schema but not in data
    print(e.type_mismatches)   # {column: (expected, actual)}
    print(e.extra_columns)     # columns in data but not in schema