Skip to content

colnade.schema

Schema base class, Column descriptors, and supporting utilities.

Schema

Schema

Bases: Protocol

Base class for user-defined data schemas.

Subclass this to define a typed schema::

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt8 | None]

The metaclass extracts the dtype from each Column[DType] annotation and creates Column descriptor instances, giving type checkers full visibility into column methods and operators.

Column

Column(name, dtype, schema, _mapped_from=None)

Bases: Generic[DType]

A typed reference to a named column in a schema.

Used as the annotation type in schema definitions::

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt8 | None]

At the type level: Column[UInt8] tells the type checker this is a column holding UInt8 data, with full access to expression-building methods (.sum(), .mean(), operators, etc.).

At runtime: stores the column name, dtype annotation, and owning schema class. All operator overloads and methods produce expression tree nodes (AST) for backend translation.

list property

Access list operations on a list column.

Returns a ListAccessor that provides list-specific methods::

Users.tags.list.len()          # ListOp node
Users.tags.list.get(0)         # ListOp node
Users.tags.list.contains("x")  # ListOp node

field(col)

Access a field within a struct column.

The col argument must be a Column descriptor from the struct's schema::

Users.address.field(Address.city)  # StructFieldAccess[Utf8]

mapped_from

mapped_from(source)

Declare a column's source for cast_schema() resolution.

Used in target schema definitions to map a column back to its source::

class UsersClean(Schema):
    user_id: Column[UInt64] = mapped_from(Users.id)
    name: Column[Utf8]

SchemaError

SchemaError(*, missing_columns=None, extra_columns=None, type_mismatches=None, null_violations=None)

Bases: Exception

Raised when data does not conform to the declared schema.

ListAccessor

ListAccessor(column)

Bases: Generic[DType]

Typed accessor for list column operations.

Provides list-specific methods (len, get, contains, sum, mean, min, max) that produce ListOp AST nodes for backend translation.

Created via the .list property on Column::

Users.tags.list.len()          # ListOp(op="len")
Users.tags.list.get(0)         # ListOp(op="get", args=(0,))
Users.tags.list.contains("x")  # ListOp(op="contains", args=("x",))

Type precision limitation: Methods like get(), sum(), etc. return ListOp[Any] because the list element type T is not available — it is lost at the .list property boundary (see comment on Column.list). With self narrowing support, these would become:

  • get(index) -> ListOp[T | None]
  • sum() -> ListOp[T]
  • contains(value: T) -> ListOp[Bool]
  • etc.

Methods with fixed return types (len() -> ListOp[UInt32], contains() -> ListOp[Bool]) are already precise.

SchemaMeta

SchemaMeta

Bases: type(Protocol)

Metaclass for Schema that creates Column descriptors from annotations.

At class creation time: 1. Collects annotations from the class and all bases (MRO traversal). 2. Creates Column descriptor objects for each non-private field. 3. Stores column descriptors in cls._columns. 4. Registers the schema in the internal registry.

Note: Inherits from type(Protocol) so that Schema subclasses are valid Protocol types for structural subtyping. This resolves to _ProtocolMeta, a private CPython implementation detail. If this breaks on a future Python version, replace with type and drop Protocol compatibility.