Skip to content

Incorrect results for with_row_index(order_by=...) #3289

@dangotbanned

Description

@dangotbanned

Noticed while looking at the pyarrow version for (#3224) that the index seems to be wrong when the column isn't already ordered.

Possibly a regression from #3239? (cc @MarcoGorelli)
Just guessing as that's the most recent change

Repro

pandas, pyarrow and polars all produce the same result, which I think is wrong?

import narwhals as nw

data = {"a": ["A", "B", "A"], "b": [1, 2, 3], "c": [9, 2, 4]}

df = nw.from_dict(data, backend="polars")
df.with_row_index(order_by="c").sort("index")
┌───────────────────────────┐
|    Narwhals DataFrame     |
|---------------------------|
|shape: (3, 4)              |
|┌───────┬─────┬─────┬─────┐|
|│ index ┆ a   ┆ b   ┆ c   │|
|│ ---   ┆ --- ┆ --- ┆ --- │|
|│ i64   ┆ str ┆ i64 ┆ i64 │|
|╞═══════╪═════╪═════╪═════╡|
|│ 0     ┆ A   ┆ 3   ┆ 4   │|
|│ 1     ┆ A   ┆ 1   ┆ 9   │|
|│ 2     ┆ B   ┆ 2   ┆ 2   │|
|└───────┴─────┴─────┴─────┘|
└───────────────────────────┘

Expected

duckdb, ibis, sqlframe, dask all produce this - which is an index ordered by "c"

df.lazy("duckdb").with_row_index(order_by="c").sort("index").collect("polars")
┌───────────────────────────┐
|    Narwhals DataFrame     |
|---------------------------|
|shape: (3, 4)              |
|┌───────┬─────┬─────┬─────┐|
|│ index ┆ a   ┆ b   ┆ c   │|
|│ ---   ┆ --- ┆ --- ┆ --- │|
|│ i64   ┆ str ┆ i64 ┆ i64 │|
|╞═══════╪═════╪═════╪═════╡|
|│ 0     ┆ B   ┆ 2   ┆ 2   │|
|│ 1     ┆ A   ┆ 3   ┆ 4   │|
|│ 2     ┆ A   ┆ 1   ┆ 9   │|
|└───────┴─────┴─────┴─────┘|
└───────────────────────────┘

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug: incorrect resultSomething isn't workinghigh priorityYour PR will be reviewed very quickly if you address thispandas-likeIssue is related to pandas-like backendspolarsIssue is related to polars backendpyarrowIssue is related to pyarrow backend

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions