Joins¶
This tutorial demonstrates joining DataFrames from different schemas and flattening the result with cast_schema.
Runnable example
The complete code is in examples/joins.py.
Define schemas¶
from colnade import Column, Schema, UInt64, Float64, Utf8, mapped_from
class Users(Schema):
id: Column[UInt64]
name: Column[Utf8]
age: Column[UInt64]
class Orders(Schema):
id: Column[UInt64]
user_id: Column[UInt64]
amount: Column[Float64]
class UserOrders(Schema):
user_name: Column[Utf8] = mapped_from(Users.name)
user_id: Column[UInt64] = mapped_from(Users.id)
amount: Column[Float64]
The UserOrders output schema uses mapped_from to disambiguate — both Users and Orders have an id column.
Perform the join¶
Users.id == Orders.user_id creates a JoinCondition (not a BinOp) because the columns belong to different schemas. The result is JoinedDataFrame[Users, Orders].
Work with joined data¶
You can use columns from either schema on a JoinedDataFrame:
# Filter using columns from either schema
big_orders = joined.filter(Orders.amount >= 150)
young_users = joined.filter(Users.age < 30)
Flatten to output schema¶
cast_schema resolves each target column:
user_name→mapped_from(Users.name)→ selectsnamefrom the left (Users) sideuser_id→mapped_from(Users.id)→ selectsidfrom the left sideamount→ name matchesOrders.amount(unambiguous)
The result is DataFrame[UserOrders] with columns ["user_name", "user_id", "amount"].
Filter then cast¶
You can chain operations on the joined data before casting: