Nested Types¶
This tutorial demonstrates working with struct and list columns.
Runnable example
The complete code is in examples/nested_types.py.
Define schemas with nested types¶
from colnade import Column, Schema, Struct, List, UInt64, Float64, Utf8
class Address(Schema):
city: Column[Utf8]
zip_code: Column[Utf8]
class UserProfile(Schema):
id: Column[UInt64]
name: Column[Utf8]
address: Column[Struct[Address]]
tags: Column[List[Utf8]]
scores: Column[List[Float64]]
Struct field access¶
Access fields within a struct column using .field():
# Filter by struct field value
new_yorkers = df.filter(
UserProfile.address.field(Address.city) == "New York"
)
# Check if a struct field is not null
df.filter(UserProfile.address.field(Address.zip_code).is_not_null())
.field(Address.city) creates a StructFieldAccess node. The backend translates it to pl.col("address").struct.field("city").
List operations¶
Access list methods via the .list property:
# Count elements in each list
tag_counts = df.with_columns(
UserProfile.tags.list.len().alias(UserProfile.tags)
)
# Check if list contains a value
python_users = df.filter(
UserProfile.tags.list.contains("python")
)
# Get element by index (0-based)
first_tags = df.with_columns(
UserProfile.tags.list.get(0).alias(UserProfile.tags)
)
# Aggregate list elements (numeric lists)
score_totals = df.with_columns(
UserProfile.scores.list.sum().alias(UserProfile.scores)
)
Available list methods¶
| Method | Description | Return type |
|---|---|---|
.list.len() |
Number of elements | ListOp[UInt32] |
.list.get(i) |
Element at index | ListOp[Any] |
.list.contains(v) |
Contains value? | ListOp[Bool] |
.list.sum() |
Sum of elements | ListOp[Any] |
.list.mean() |
Mean of elements | ListOp[Any] |
.list.min() |
Minimum element | ListOp[Any] |
.list.max() |
Maximum element | ListOp[Any] |