Expressions¶
Colnade builds expression trees from column operations. These trees are translated by backend adapters into engine-native code.
Comparisons¶
All comparisons return Expr[Bool]:
Users.age > 18 # greater than
Users.age < 65 # less than
Users.age >= 18 # greater or equal
Users.age <= 65 # less or equal
Users.age == 30 # equal
Users.age != 0 # not equal
Arithmetic¶
Arithmetic preserves the column's dtype:
Users.score * 2 # multiply
Users.score / 100 # divide
Users.age + 1 # add
Users.age - 1 # subtract
Users.age % 10 # modulo
-Users.score # negate
Reverse operators work too: 1 + Users.age produces the same tree as Users.age + 1.
Logical operators¶
Combine boolean expressions with & (and), | (or), and ~ (not):
(Users.age > 25) & (Users.score > 80) # and
Users.name.str_starts_with("A") | Users.name.str_starts_with("E") # or
~(Users.age > 65) # not
Use parentheses
Python's operator precedence requires parentheses around each comparison when using & and |.
Aggregations¶
Aggregation methods produce Agg nodes for use in group_by().agg():
Users.score.sum() # sum of values
Users.score.mean() # mean (returns Float64)
Users.score.min() # minimum
Users.score.max() # maximum
Users.id.count() # count non-null (returns UInt32)
Users.score.std() # standard deviation
Users.score.var() # variance
Users.score.first() # first value
Users.score.last() # last value
Users.name.n_unique() # unique count (returns UInt32)
Use .alias() to bind aggregation results to output columns:
class Stats(cn.Schema):
name: cn.Column[cn.Utf8]
avg_score: cn.Column[cn.Float64]
user_count: cn.Column[cn.UInt32]
df.group_by(Users.name).agg(
Users.score.mean().alias(Stats.avg_score),
Users.id.count().alias(Stats.user_count),
)
String methods¶
Available on Column[Utf8]:
Users.name.str_contains("ali") # substring search
Users.name.str_starts_with("A") # prefix check
Users.name.str_ends_with("ce") # suffix check
Users.name.str_len() # string length
Users.name.str_to_lowercase() # to lowercase
Users.name.str_to_uppercase() # to uppercase
Users.name.str_strip() # strip whitespace
Users.name.str_replace("old", "new") # replace substring
Temporal methods¶
Available on Column[Datetime]:
Events.timestamp.dt_year() # extract year
Events.timestamp.dt_month() # extract month
Events.timestamp.dt_day() # extract day
Events.timestamp.dt_hour() # extract hour
Events.timestamp.dt_minute() # extract minute
Events.timestamp.dt_second() # extract second
Events.timestamp.dt_truncate("1d") # truncate to interval
Null handling¶
Users.age.is_null() # check if null
Users.age.is_not_null() # check if not null
Users.age.fill_null(0) # replace nulls with value
Users.age.assert_non_null() # assert non-null (runtime)
NaN handling¶
Available on float columns:
Sort expressions¶
Control sort direction:
df.sort(Users.score.desc()) # descending
df.sort(Users.name.asc()) # ascending (explicit)
df.sort(Users.name) # ascending (default)
Aliasing¶
Bind an expression result to a target column:
(Users.score * 2).alias(Users.score) # overwrite score
Users.score.mean().alias(Users.score) # alias aggregation
Casting¶
Cast a column to a different type:
Conditional expressions¶
Build if/else logic with when/then/otherwise:
# Simple condition
cn.when(Users.age > 65).then("senior").otherwise("standard")
# Multi-branch (chained when)
cn.when(Users.score > 90).then("A").when(Users.score > 80).then("B").otherwise("C")
# Use in with_columns
df.with_columns(
cn.when(Users.age > 65).then(cn.lit("senior")).otherwise(cn.lit("standard")).alias(Users.tier)
)
Conditions can use any boolean expression, including combined conditions:
cn.when((Users.age > 25) & (Users.age < 65)).then("working_age").otherwise("other")
cn.when(Users.name == "Alice").then("found").otherwise("not_found")
Branch values can be column references or expressions, not just literals:
Unmatched rows default to null if .otherwise() is omitted:
Chaining .when() after .otherwise() resets the default
Calling .when().then() on a WhenThenOtherwise that already has an .otherwise() resets the default back to null. Always place .otherwise() last.
Window functions¶
Apply an aggregation within each partition and broadcast the result back to every row: