Skip to content

colnade core

Schema base class, Column descriptors, backend protocol, and supporting utilities.

Schema

Schema

Bases: Protocol

Base class for user-defined data schemas.

Subclass this to define a typed schema::

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt8 | None]

The metaclass extracts the dtype from each Column[DType] annotation and creates Column descriptor instances, giving type checkers full visibility into column methods and operators.

Column

Column(name, dtype, schema, _mapped_from=None, _field_info=None)

Bases: Generic[DType]

A typed reference to a named column in a schema.

Used as the annotation type in schema definitions::

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt8 | None]

At the type level: Column[UInt8] tells the type checker this is a column holding UInt8 data, with full access to expression-building methods (.sum(), .mean(), operators, etc.).

At runtime: stores the column name, dtype annotation, and owning schema class. All operator overloads and methods produce expression tree nodes (AST) for backend translation.

list property

Access list operations on a list column.

Returns a ListAccessor that provides list-specific methods::

Users.tags.list.len()          # ListOp node
Users.tags.list.get(0)         # ListOp node
Users.tags.list.contains("x")  # ListOp node

field(col)

Access a field within a struct column.

The col argument must be a Column descriptor from the struct's schema::

Users.address.field(Address.city)  # StructFieldAccess[Utf8]

mapped_from

mapped_from(source)

Declare a column's source for cast_schema() resolution.

Used in target schema definitions to map a column back to its source::

class UsersClean(Schema):
    user_id: Column[UInt64] = mapped_from(Users.id)
    name: Column[Utf8]

SchemaError

SchemaError(*, missing_columns=None, extra_columns=None, type_mismatches=None, null_violations=None, value_violations=None)

Bases: Exception

Raised when data does not conform to the declared schema.

ListAccessor

ListAccessor(column)

Bases: Generic[DType]

Typed accessor for list column operations.

Provides list-specific methods (len, get, contains, sum, mean, min, max) that produce ListOp AST nodes for backend translation.

Created via the .list property on Column::

Users.tags.list.len()          # ListOp(op="len")
Users.tags.list.get(0)         # ListOp(op="get", args=(0,))
Users.tags.list.contains("x")  # ListOp(op="contains", args=("x",))

The element type DType flows from the .list property via self-narrowing, so get(), sum(), etc. preserve element types.

SchemaMeta

SchemaMeta

Bases: type(Protocol)

Metaclass for Schema that creates Column descriptors from annotations.

At class creation time: 1. Collects annotations from the class and all bases (MRO traversal). 2. Creates Column descriptor objects for each non-private field. 3. Stores column descriptors in cls._columns. 4. Registers the schema in the internal registry.

Note: Inherits from type(Protocol) so that Schema subclasses are valid Protocol types for structural subtyping. This resolves to _ProtocolMeta, a private CPython implementation detail. If this breaks on a future Python version, replace with type and drop Protocol compatibility.

BackendProtocol

BackendProtocol

Bases: Protocol

Interface that all backend adapters must implement.

Each method takes backend-native data (Any) as source and returns backend-native data. The DataFrame/LazyFrame layer wraps results in typed frame instances.

translate_expr(expr)

Translate a Colnade expression AST to a backend-native expression.

from_dict(data, schema)

Create a backend-native data object from a columnar dict.

The backend reads column dtypes from schema and coerces values to the correct native types.

Row

Row

Bases: Generic[S]

Base class for schema Row dataclasses.

Row[S] links a row type to its schema, enabling static verification that from_rows(Users, rows) receives Row[Users] instances, not Row[Orders]. Same pattern as ArrowBatch[S].

Each Schema subclass gets an auto-generated Row inner class::

Users.Row(id=1, name="Alice", age=30)  # type is Row[Users]

ArrowBatch

ArrowBatch(*, _batch, _schema)

Bases: Generic[S]

A typed wrapper around a pyarrow.RecordBatch.

Preserves the schema type parameter S across Arrow serialization boundaries, so that type checkers can verify schema consistency when data moves between backends.

num_rows property

Number of rows in this batch.

schema property

The Colnade schema type.

to_pyarrow()

Return the underlying pyarrow.RecordBatch.

from_pyarrow(batch, schema) classmethod

Wrap a pyarrow.RecordBatch with schema validation.

Checks that the Arrow batch's column names match the schema. Raises :class:SchemaError on missing columns.

Field

Field(*, ge=None, gt=None, le=None, lt=None, min_length=None, max_length=None, pattern=None, unique=False, isin=None, mapped_from=None)

Declare value-level constraints (and optional mapped_from) for a column.

Returns a FieldInfo instance that SchemaMeta detects at class creation time. The return type is Any so it satisfies the Column[DType] annotation (same pattern as mapped_from()).

Usage::

class Users(Schema):
    age: Column[UInt64] = Field(ge=0, le=150)
    email: Column[Utf8] = Field(pattern=r"^[^@]+@[^@]+\\.[^@]+$")
    id: Column[UInt64] = Field(unique=True)
    status: Column[Utf8] = Field(isin=["active", "inactive"])

FieldInfo

FieldInfo(ge=None, gt=None, le=None, lt=None, min_length=None, max_length=None, pattern=None, unique=False, isin=None, mapped_from=None) dataclass

Immutable container for column-level value constraints.

Created by Field() and attached to Column descriptors by SchemaMeta. Uses pydantic-compatible parameter names for familiarity.

has_constraints()

Return True if any value constraint is set (excluding mapped_from).

ValueViolation

ValueViolation(column, constraint, got_count, sample_values) dataclass

Describes a value-level constraint failure for error reporting.

schema_check

schema_check(fn)

Decorator for declaring cross-column constraints on a Schema.

The decorated method receives the schema class and must return a boolean expression (Expr[Bool]) that should be True for valid rows::

class Events(Schema):
    start_date: Column[Date]
    end_date: Column[Date]

    @schema_check
    def dates_ordered(cls):
        return Events.start_date <= Events.end_date

ValidationLevel

ValidationLevel

Bases: Enum

Validation level for runtime schema checks.

  • OFF — No runtime checks. Trust the type checker. Zero overhead.
  • STRUCTURAL — Check columns exist, dtypes match, nullability.
  • FULL — Structural checks plus value-level constraints from Field().

set_validation

set_validation(level)

Set the validation level.

Accepts a ValidationLevel enum, a level string ("off", "structural", "full"), or a boolean for backward compatibility (TrueSTRUCTURAL, FalseOFF).

is_validation_enabled

is_validation_enabled()

Return whether automatic validation at data boundaries is enabled.

Returns True when the validation level is STRUCTURAL or FULL.

get_validation_level

get_validation_level()

Return the current validation level.