Skip to content

Expressions

Colnade builds expression trees from column operations. These trees are translated by backend adapters into engine-native code.

Comparisons

All comparisons return Expr[Bool]:

Users.age > 18       # greater than
Users.age < 65       # less than
Users.age >= 18      # greater or equal
Users.age <= 65      # less or equal
Users.age == 30      # equal
Users.age != 0       # not equal

Arithmetic

Arithmetic preserves the column's dtype:

Users.score * 2          # multiply
Users.score / 100        # divide
Users.age + 1            # add
Users.age - 1            # subtract
Users.age % 10           # modulo
-Users.score             # negate

Reverse operators work too: 1 + Users.age produces the same tree as Users.age + 1.

Logical operators

Combine boolean expressions with & (and), | (or), and ~ (not):

(Users.age > 25) & (Users.score > 80)     # and
Users.name.str_starts_with("A") | Users.name.str_starts_with("E")  # or
~(Users.age > 65)                          # not

Use parentheses

Python's operator precedence requires parentheses around each comparison when using & and |.

Aggregations

Aggregation methods produce Agg nodes for use in group_by().agg():

Users.score.sum()       # sum of values
Users.score.mean()      # mean (returns Float64)
Users.score.min()       # minimum
Users.score.max()       # maximum
Users.id.count()        # count non-null (returns UInt32)
Users.score.std()       # standard deviation
Users.score.var()       # variance
Users.score.first()     # first value
Users.score.last()      # last value
Users.name.n_unique()   # unique count (returns UInt32)

Use .alias() to bind aggregation results to output columns:

class Stats(cn.Schema):
    name: cn.Column[cn.Utf8]
    avg_score: cn.Column[cn.Float64]
    user_count: cn.Column[cn.UInt32]

df.group_by(Users.name).agg(
    Users.score.mean().alias(Stats.avg_score),
    Users.id.count().alias(Stats.user_count),
)

String methods

Available on Column[Utf8]:

Users.name.str_contains("ali")        # substring search
Users.name.str_starts_with("A")       # prefix check
Users.name.str_ends_with("ce")        # suffix check
Users.name.str_len()                  # string length
Users.name.str_to_lowercase()         # to lowercase
Users.name.str_to_uppercase()         # to uppercase
Users.name.str_strip()                # strip whitespace
Users.name.str_replace("old", "new")  # replace substring

Temporal methods

Available on Column[Datetime]:

Events.timestamp.dt_year()            # extract year
Events.timestamp.dt_month()           # extract month
Events.timestamp.dt_day()             # extract day
Events.timestamp.dt_hour()            # extract hour
Events.timestamp.dt_minute()          # extract minute
Events.timestamp.dt_second()          # extract second
Events.timestamp.dt_truncate("1d")    # truncate to interval

Null handling

Users.age.is_null()                   # check if null
Users.age.is_not_null()               # check if not null
Users.age.fill_null(0)                # replace nulls with value
Users.age.assert_non_null()           # assert non-null (runtime)

NaN handling

Available on float columns:

Users.score.is_nan()                  # check if NaN
Users.score.fill_nan(0.0)             # replace NaN with value

Sort expressions

Control sort direction:

df.sort(Users.score.desc())           # descending
df.sort(Users.name.asc())             # ascending (explicit)
df.sort(Users.name)                   # ascending (default)

Aliasing

Bind an expression result to a target column:

(Users.score * 2).alias(Users.score)           # overwrite score
Users.score.mean().alias(Users.score)            # alias aggregation

Casting

Cast a column to a different type:

Users.id.cast(Float64)                # cast UInt64 → Float64

Conditional expressions

Build if/else logic with when/then/otherwise:

# Simple condition
cn.when(Users.age > 65).then("senior").otherwise("standard")

# Multi-branch (chained when)
cn.when(Users.score > 90).then("A").when(Users.score > 80).then("B").otherwise("C")

# Use in with_columns
df.with_columns(
    cn.when(Users.age > 65).then(cn.lit("senior")).otherwise(cn.lit("standard")).alias(Users.tier)
)

Conditions can use any boolean expression, including combined conditions:

cn.when((Users.age > 25) & (Users.age < 65)).then("working_age").otherwise("other")
cn.when(Users.name == "Alice").then("found").otherwise("not_found")

Branch values can be column references or expressions, not just literals:

when(Users.age > 30).then(Users.age * 2).otherwise(Users.age).alias(Users.age)

Unmatched rows default to null if .otherwise() is omitted:

when(Users.age > 65).then("senior")  # non-seniors get null

Chaining .when() after .otherwise() resets the default

Calling .when().then() on a WhenThenOtherwise that already has an .otherwise() resets the default back to null. Always place .otherwise() last.

Window functions

Apply an aggregation within each partition and broadcast the result back to every row:

Users.score.sum().over(Users.name)    # per-name total, broadcast to each row
Users.score.mean().over(Users.name)   # per-name average