Skip to content

Joins

This tutorial demonstrates joining DataFrames from different schemas and flattening the result with cast_schema.

Runnable example

The complete code is in examples/joins.py.

Define schemas

from colnade import Column, Schema, UInt64, Float64, Utf8, mapped_from

class Users(Schema):
    id: Column[UInt64]
    name: Column[Utf8]
    age: Column[UInt64]

class Orders(Schema):
    id: Column[UInt64]
    user_id: Column[UInt64]
    amount: Column[Float64]

class UserOrders(Schema):
    user_name: Column[Utf8] = mapped_from(Users.name)
    user_id: Column[UInt64] = mapped_from(Users.id)
    amount: Column[Float64]

The UserOrders output schema uses mapped_from to disambiguate — both Users and Orders have an id column.

Perform the join

joined = users.join(orders, on=Users.id == Orders.user_id)

Users.id == Orders.user_id creates a JoinCondition (not a BinOp) because the columns belong to different schemas. The result is JoinedDataFrame[Users, Orders].

Work with joined data

You can use columns from either schema on a JoinedDataFrame:

# Filter using columns from either schema
big_orders = joined.filter(Orders.amount >= 150)
young_users = joined.filter(Users.age < 30)

Flatten to output schema

result = joined.cast_schema(UserOrders)

cast_schema resolves each target column:

  • user_namemapped_from(Users.name) → selects name from the left (Users) side
  • user_idmapped_from(Users.id) → selects id from the left side
  • amount → name matches Orders.amount (unambiguous)

The result is DataFrame[UserOrders] with columns ["user_name", "user_id", "amount"].

Filter then cast

You can chain operations on the joined data before casting:

result = (
    joined
    .filter(Orders.amount >= 150)
    .cast_schema(UserOrders)
)