Joins¶

This tutorial demonstrates joining DataFrames from different schemas and flattening the result with cast_schema.

Runnable example

The complete code is in examples/joins.py.

Define schemas¶

from colnade import Column, Schema, UInt64, Float64, Utf8, mapped_from

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt64]

class Orders(Schema):
    id: Column[UInt64]
    user_id: Column[UInt64]
    amount: Column[Float64]

class UserOrders(Schema):
    user_name: Column[Utf8] = mapped_from(Users.name)
    user_id: Column[UInt64] = mapped_from(Users.id)
    amount: Column[Float64]

The UserOrders output schema uses mapped_from to disambiguate — both Users and Orders have an id column.

Perform the join¶

joined = users.join(orders, on=Users.id == Orders.user_id)

Users.id == Orders.user_id creates a JoinCondition (not a BinOp) because the columns belong to different schemas. The result is JoinedDataFrame[Users, Orders].

Work with joined data¶

You can use columns from either schema on a JoinedDataFrame:

# Filter using columns from either schema
big_orders = joined.filter(Orders.amount >= 150)
young_users = joined.filter(Users.age < 30)

Flatten to output schema¶

result = joined.cast_schema(UserOrders)

cast_schema resolves each target column:

user_name → mapped_from(Users.name) → selects name from the left (Users) side
user_id → mapped_from(Users.id) → selects id from the left side
amount → name matches Orders.amount (unambiguous)

The result is DataFrame[UserOrders] with columns ["user_name", "user_id", "amount"].

Filter then cast¶

You can chain operations on the joined data before casting:

result = (
    joined
    .filter(Orders.amount >= 150)
    .cast_schema(UserOrders)
)