colnade.schema¶
Schema base class, Column descriptors, and supporting utilities.
Schema¶
Schema
¶
Bases: Protocol
Base class for user-defined data schemas.
Subclass this to define a typed schema::
class Users(Schema):
id: Column[UInt64]
name: Column[Utf8]
age: Column[UInt8 | None]
The metaclass extracts the dtype from each Column[DType] annotation
and creates Column descriptor instances, giving type checkers full
visibility into column methods and operators.
Column¶
Column(name, dtype, schema, _mapped_from=None)
¶
Bases: Generic[DType]
A typed reference to a named column in a schema.
Used as the annotation type in schema definitions::
class Users(Schema):
id: Column[UInt64]
name: Column[Utf8]
age: Column[UInt8 | None]
At the type level: Column[UInt8] tells the type checker this is a
column holding UInt8 data, with full access to expression-building
methods (.sum(), .mean(), operators, etc.).
At runtime: stores the column name, dtype annotation, and owning
schema class. All operator overloads and methods produce expression
tree nodes (AST) for backend translation.
list
property
¶
Access list operations on a list column.
Returns a ListAccessor that provides list-specific methods::
Users.tags.list.len() # ListOp node
Users.tags.list.get(0) # ListOp node
Users.tags.list.contains("x") # ListOp node
field(col)
¶
Access a field within a struct column.
The col argument must be a Column descriptor from the struct's schema::
Users.address.field(Address.city) # StructFieldAccess[Utf8]
mapped_from¶
mapped_from(source)
¶
Declare a column's source for cast_schema() resolution.
Used in target schema definitions to map a column back to its source::
class UsersClean(Schema):
user_id: Column[UInt64] = mapped_from(Users.id)
name: Column[Utf8]
SchemaError¶
SchemaError(*, missing_columns=None, extra_columns=None, type_mismatches=None, null_violations=None)
¶
Bases: Exception
Raised when data does not conform to the declared schema.
ListAccessor¶
ListAccessor(column)
¶
Bases: Generic[DType]
Typed accessor for list column operations.
Provides list-specific methods (len, get, contains, sum, mean, min, max)
that produce ListOp AST nodes for backend translation.
Created via the .list property on Column::
Users.tags.list.len() # ListOp(op="len")
Users.tags.list.get(0) # ListOp(op="get", args=(0,))
Users.tags.list.contains("x") # ListOp(op="contains", args=("x",))
Type precision limitation: Methods like get(), sum(), etc. return
ListOp[Any] because the list element type T is not available — it is
lost at the .list property boundary (see comment on Column.list).
With self narrowing support, these would become:
get(index) -> ListOp[T | None]sum() -> ListOp[T]contains(value: T) -> ListOp[Bool]- etc.
Methods with fixed return types (len() -> ListOp[UInt32],
contains() -> ListOp[Bool]) are already precise.
SchemaMeta¶
SchemaMeta
¶
Bases: type(Protocol)
Metaclass for Schema that creates Column descriptors from annotations.
At class creation time:
1. Collects annotations from the class and all bases (MRO traversal).
2. Creates Column descriptor objects for each non-private field.
3. Stores column descriptors in cls._columns.
4. Registers the schema in the internal registry.
Note: Inherits from type(Protocol) so that Schema subclasses are valid
Protocol types for structural subtyping. This resolves to _ProtocolMeta,
a private CPython implementation detail. If this breaks on a future Python
version, replace with type and drop Protocol compatibility.