Nested Types¶
This tutorial demonstrates working with struct and list columns.
Runnable example
The complete code is in examples/nested_types.py.
Define schemas with nested types¶
import colnade as cn
class Address(cn.Schema):
city: cn.Column[cn.Utf8]
zip_code: cn.Column[cn.Utf8]
class UserProfile(cn.Schema):
id: cn.Column[cn.UInt64]
name: cn.Column[cn.Utf8]
address: cn.Column[cn.Struct[Address]]
tags: cn.Column[cn.List[cn.Utf8]]
scores: cn.Column[cn.List[cn.Float64]]
Struct field access¶
Access fields within a struct column using .field():
# Filter by struct field value
new_yorkers = df.filter(
UserProfile.address.field(Address.city) == "New York"
)
# Check if a struct field is not null
df.filter(UserProfile.address.field(Address.zip_code).is_not_null())
.field(Address.city) creates a StructFieldAccess node. The backend translates it to pl.col("address").struct.field("city").
List operations¶
Access list methods via the .list property:
# Check if list contains a value
python_users = df.filter(
UserProfile.tags.list.contains("python")
)
# Compute list aggregations into new columns
class ProfileWithCounts(UserProfile):
tag_count: cn.Column[cn.UInt32]
first_tag: cn.Column[cn.Utf8]
total_score: cn.Column[cn.Float64]
enriched = df.with_columns(
UserProfile.tags.list.len().alias(ProfileWithCounts.tag_count),
UserProfile.tags.list.get(0).alias(ProfileWithCounts.first_tag),
UserProfile.scores.list.sum().alias(ProfileWithCounts.total_score),
).cast_schema(ProfileWithCounts)
Available list methods¶
| Method | Description | Return type |
|---|---|---|
.list.len() |
Number of elements | ListOp[UInt32] |
.list.get(i) |
Element at index | ListOp[DType] |
.list.contains(v) |
Contains value? | ListOp[Bool] |
.list.sum() |
Sum of elements | ListOp[DType] |
.list.mean() |
Mean of elements | ListOp[DType] |
.list.min() |
Minimum element | ListOp[DType] |
.list.max() |
Maximum element | ListOp[DType] |