Skip to content

Conversation

@FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@FBruzzesi FBruzzesi added the enhancement New feature or request label Nov 18, 2025
@FBruzzesi FBruzzesi changed the title feat: Add {Expr, Series}.any_value feat: Add {Expr,Series}.any_value Nov 18, 2025
@FBruzzesi FBruzzesi marked this pull request as ready for review November 18, 2025 22:18
@MarcoGorelli
Copy link
Member

thanks!

some initial thoughts:

  • I think the function just needs to not make any guarantees about order, I don't think we need to intentionally add randomness, just taking first for eager backends should be enough (it's likely more performant anyway, right?)
  • i'm slightly concerned that sql ignores nulls in this function, but we're suggesting to not to. if polars were to add this method, would they ignore nulls or not? perhaps we could have a mandatory ignore_nulls keyword argument, similar to how we have it in any_horizontal?

@FBruzzesi
Copy link
Member Author

Thanks @MarcoGorelli

  • I think the function just needs to not make any guarantees about order, I don't think we need to intentionally add randomness, just taking first for eager backends should be enough (it's likely more performant anyway, right?)

Wouldn't using first actually make a lot of guarantees on order for eager backends?

  • i'm slightly concerned that sql ignores nulls in this function, but we're suggesting to not to. if polars were to add this method, would they ignore nulls or not? perhaps we could have a mandatory ignore_nulls keyword argument, similar to how we have it in any_horizontal?

Sure I will add such option πŸ™πŸΌ

@MarcoGorelli
Copy link
Member

Wouldn't using first actually make a lot of guarantees on order for eager backends?

Sure, but we don't guarantee it - an in, it happens to return the first value, but it's not guaranteed (and the name any_value strongly implies there's nothing guaranteed)

Polars does something similar in joins - in the eager case it may happen to return rows in the same order when it's more efficient to do so, but it's not guaranteed (and it's documented that it's not guaranteed)

# !NOTE: ibis arbitrary returns a random non-null value
# See: https://ibis-project.org/reference/expression-generic.html#ibis.expr.types.generic.Column.arbitrary
expr = cast("ir.Column", expr)
return expr.arbitrary() if ignore_nulls else expr.first(include_null=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expr.first(include_null=not ignore_nulls) is not behaving as expected 🫠

Comment on lines +113 to +118
if kwargs.get("ignore_nulls"):
msg = (
"`Expr.any_value(ignore_nulls=True)` is not supported in a `over` "
"context for pandas-like backend."
)
raise NotImplementedError(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue if a group has all null values

Comment on lines +198 to +203
if last_node.kwargs.get("ignore_nulls"):
msg = (
"`Expr.any_value(ignore_nulls=True)` is not supported in a `group_by` "
"context for pandas-like backend"
)
raise NotImplementedError(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue if a group has all null values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enh]: Add Expr.any_value?

3 participants