Joins¶
Colnade supports typed joins between DataFrames with different schemas.
Join condition¶
When you compare columns from different schemas with ==, Colnade creates a JoinCondition instead of a regular BinOp:
# Same schema → BinOp[Bool] (filter expression)
Users.age == 30
# Different schemas → JoinCondition
Users.id == Orders.user_id
Performing a join¶
joined = users.join(orders, on=Users.id == Orders.user_id)
# joined is JoinedDataFrame[Users, Orders]
The how parameter controls the join type:
users.join(orders, on=Users.id == Orders.user_id, how="inner") # default
users.join(orders, on=Users.id == Orders.user_id, how="left")
users.join(orders, on=Users.id == Orders.user_id, how="outer")
users.join(orders, on=Users.id == Orders.user_id, how="cross")
JoinedDataFrame¶
JoinedDataFrame[S, S2] accepts columns from either schema. All schema-preserving operations return JoinedDataFrame[S, S2]:
joined.filter(Users.age > 25) # filter by left schema column
joined.filter(Orders.amount >= 100) # filter by right schema column
joined.sort(Users.name) # sort
Flattening with cast_schema¶
Use cast_schema to flatten a JoinedDataFrame into a single-schema DataFrame:
class UserOrders(Schema):
user_name: Column[Utf8] = mapped_from(Users.name)
user_id: Column[UInt64] = mapped_from(Users.id)
amount: Column[Float64]
result = joined.cast_schema(UserOrders)
# result is DataFrame[UserOrders]
Disambiguating shared column names¶
When both schemas have a column with the same name (e.g., both Users and Orders have id), use mapped_from to specify which source you want:
class JoinOutput(Schema):
user_id: Column[UInt64] = mapped_from(Users.id) # from Users
order_id: Column[UInt64] = mapped_from(Orders.id) # from Orders
amount: Column[Float64] # unambiguous name match
Without mapped_from, ambiguous column names (present in both schemas) are skipped during name matching and will produce a SchemaError for missing columns.
Explicit mapping¶
For full control, pass an explicit mapping dict:
result = joined.cast_schema(Output, mapping={
Output.person_name: Users.name,
Output.total: Orders.amount,
})
Explicit mapping takes the highest precedence, overriding both mapped_from and name matching.