Skip to content

Nested Types

This tutorial demonstrates working with struct and list columns.

Runnable example

The complete code is in examples/nested_types.py.

Define schemas with nested types

from colnade import Column, Schema, Struct, List, UInt64, Float64, Utf8

class Address(Schema):
    city: Column[Utf8]
    zip_code: Column[Utf8]

class UserProfile(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    address: Column[Struct[Address]]
    tags: Column[List[Utf8]]
    scores: Column[List[Float64]]

Struct field access

Access fields within a struct column using .field():

# Filter by struct field value
new_yorkers = df.filter(
    UserProfile.address.field(Address.city) == "New York"
)

# Check if a struct field is not null
df.filter(UserProfile.address.field(Address.zip_code).is_not_null())

.field(Address.city) creates a StructFieldAccess node. The backend translates it to pl.col("address").struct.field("city").

List operations

Access list methods via the .list property:

# Count elements in each list
tag_counts = df.with_columns(
    UserProfile.tags.list.len().alias(UserProfile.tags)
)

# Check if list contains a value
python_users = df.filter(
    UserProfile.tags.list.contains("python")
)

# Get element by index (0-based)
first_tags = df.with_columns(
    UserProfile.tags.list.get(0).alias(UserProfile.tags)
)

# Aggregate list elements (numeric lists)
score_totals = df.with_columns(
    UserProfile.scores.list.sum().alias(UserProfile.scores)
)

Available list methods

Method Description Return type
.list.len() Number of elements ListOp[UInt32]
.list.get(i) Element at index ListOp[Any]
.list.contains(v) Contains value? ListOp[Bool]
.list.sum() Sum of elements ListOp[Any]
.list.mean() Mean of elements ListOp[Any]
.list.min() Minimum element ListOp[Any]
.list.max() Maximum element ListOp[Any]