colnade core¶
Schema base class, Column descriptors, backend protocol, and supporting utilities.
Schema¶
Schema
¶
Bases: Protocol
Base class for user-defined data schemas.
Subclass this to define a typed schema::
class Users(Schema):
id: Column[UInt64]
name: Column[Utf8]
age: Column[UInt8 | None]
The metaclass extracts the dtype from each Column[DType] annotation
and creates Column descriptor instances, giving type checkers full
visibility into column methods and operators.
Column¶
Column(name, dtype, schema, _mapped_from=None, _field_info=None)
¶
Bases: Generic[DType]
A typed reference to a named column in a schema.
Used as the annotation type in schema definitions::
class Users(Schema):
id: Column[UInt64]
name: Column[Utf8]
age: Column[UInt8 | None]
At the type level: Column[UInt8] tells the type checker this is a
column holding UInt8 data, with full access to expression-building
methods (.sum(), .mean(), operators, etc.).
At runtime: stores the column name, dtype annotation, and owning
schema class. All operator overloads and methods produce expression
tree nodes (AST) for backend translation.
list
property
¶
Access list operations on a list column.
Returns a ListAccessor that provides list-specific methods::
Users.tags.list.len() # ListOp node
Users.tags.list.get(0) # ListOp node
Users.tags.list.contains("x") # ListOp node
field(col)
¶
Access a field within a struct column.
The col argument must be a Column descriptor from the struct's schema::
Users.address.field(Address.city) # StructFieldAccess[Utf8]
mapped_from¶
mapped_from(source)
¶
Declare a column's source for cast_schema() resolution.
Used in target schema definitions to map a column back to its source::
class UsersClean(Schema):
user_id: Column[UInt64] = mapped_from(Users.id)
name: Column[Utf8]
SchemaError¶
SchemaError(*, missing_columns=None, extra_columns=None, type_mismatches=None, null_violations=None, value_violations=None)
¶
Bases: Exception
Raised when data does not conform to the declared schema.
ListAccessor¶
ListAccessor(column)
¶
Bases: Generic[DType]
Typed accessor for list column operations.
Provides list-specific methods (len, get, contains, sum, mean, min, max)
that produce ListOp AST nodes for backend translation.
Created via the .list property on Column::
Users.tags.list.len() # ListOp(op="len")
Users.tags.list.get(0) # ListOp(op="get", args=(0,))
Users.tags.list.contains("x") # ListOp(op="contains", args=("x",))
The element type DType flows from the .list property via self-narrowing,
so get(), sum(), etc. preserve element types.
SchemaMeta¶
SchemaMeta
¶
Bases: type(Protocol)
Metaclass for Schema that creates Column descriptors from annotations.
At class creation time:
1. Collects annotations from the class and all bases (MRO traversal).
2. Creates Column descriptor objects for each non-private field.
3. Stores column descriptors in cls._columns.
4. Registers the schema in the internal registry.
Note: Inherits from type(Protocol) so that Schema subclasses are valid
Protocol types for structural subtyping. This resolves to _ProtocolMeta,
a private CPython implementation detail. If this breaks on a future Python
version, replace with type and drop Protocol compatibility.
BackendProtocol¶
BackendProtocol
¶
Bases: Protocol
Interface that all backend adapters must implement.
Each method takes backend-native data (Any) as source and returns
backend-native data. The DataFrame/LazyFrame layer wraps results in typed
frame instances.
Row¶
Row
¶
Bases: Generic[S]
Base class for schema Row dataclasses.
Row[S] links a row type to its schema, enabling static verification
that from_rows(Users, rows) receives Row[Users] instances, not
Row[Orders]. Same pattern as ArrowBatch[S].
Each Schema subclass gets an auto-generated Row inner class::
Users.Row(id=1, name="Alice", age=30) # type is Row[Users]
ArrowBatch¶
ArrowBatch(*, _batch, _schema)
¶
Bases: Generic[S]
A typed wrapper around a pyarrow.RecordBatch.
Preserves the schema type parameter S across Arrow serialization boundaries, so that type checkers can verify schema consistency when data moves between backends.
num_rows
property
¶
Number of rows in this batch.
schema
property
¶
The Colnade schema type.
to_pyarrow()
¶
Return the underlying pyarrow.RecordBatch.
from_pyarrow(batch, schema)
classmethod
¶
Wrap a pyarrow.RecordBatch with schema validation.
Checks that the Arrow batch's column names match the schema.
Raises :class:SchemaError on missing columns.
Field¶
Field(*, ge=None, gt=None, le=None, lt=None, min_length=None, max_length=None, pattern=None, unique=False, isin=None, mapped_from=None)
¶
Declare value-level constraints (and optional mapped_from) for a column.
Returns a FieldInfo instance that SchemaMeta detects at class
creation time. The return type is Any so it satisfies the
Column[DType] annotation (same pattern as mapped_from()).
Usage::
class Users(Schema):
age: Column[UInt64] = Field(ge=0, le=150)
email: Column[Utf8] = Field(pattern=r"^[^@]+@[^@]+\\.[^@]+$")
id: Column[UInt64] = Field(unique=True)
status: Column[Utf8] = Field(isin=["active", "inactive"])
FieldInfo¶
FieldInfo(ge=None, gt=None, le=None, lt=None, min_length=None, max_length=None, pattern=None, unique=False, isin=None, mapped_from=None)
dataclass
¶
Immutable container for column-level value constraints.
Created by Field() and attached to Column descriptors by
SchemaMeta. Uses pydantic-compatible parameter names for
familiarity.
has_constraints()
¶
Return True if any value constraint is set (excluding mapped_from).
ValueViolation¶
ValueViolation(column, constraint, got_count, sample_values)
dataclass
¶
Describes a value-level constraint failure for error reporting.
schema_check¶
schema_check(fn)
¶
Decorator for declaring cross-column constraints on a Schema.
The decorated method receives the schema class and must return
a boolean expression (Expr[Bool]) that should be True for valid rows::
class Events(Schema):
start_date: Column[Date]
end_date: Column[Date]
@schema_check
def dates_ordered(cls):
return Events.start_date <= Events.end_date
ValidationLevel¶
ValidationLevel
¶
Bases: Enum
Validation level for runtime schema checks.
OFF— No runtime checks. Trust the type checker. Zero overhead.STRUCTURAL— Check columns exist, dtypes match, nullability.FULL— Structural checks plus value-level constraints fromField().
set_validation¶
set_validation(level)
¶
Set the validation level.
Accepts a ValidationLevel enum, a level string
("off", "structural", "full"), or a boolean for backward
compatibility (True → STRUCTURAL, False → OFF).
is_validation_enabled¶
is_validation_enabled()
¶
Return whether automatic validation at data boundaries is enabled.
Returns True when the validation level is STRUCTURAL or FULL.
get_validation_level¶
get_validation_level()
¶
Return the current validation level.