-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
API: date_range, timedelta_range infer unit from start/end/freq #63146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Should we only do this for |
|
I think best case would be to do it all at once (xref #63018 gets everything but the string-parsing inference for td64). But if it'll make a difference in getting a release out soon, I'll happily revert. |
|
Ah, I hadn't seen #63018. Yes, my preference would also be to do it all at once. Will try to take a look at that one then. |
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for the inference itself? I could have missed it.
| unit : {'s', 'ms', 'us', 'ns'}, default 'ns' | ||
| unit : {'s', 'ms', 'us', 'ns', None}, default None | ||
| Specify the desired resolution of the result. | ||
| If not specified, this is inferred from the 'start', 'end', and 'freq' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to given an indication of how the inference works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did you have in mind? I can't think of an explanation that is meaningfully simpler than "read the code"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidentally started a separate chain here: #63146 (comment)
| unit : {'s', 'ms', 'us', 'ns'}, default 'ns' | ||
| unit : {'s', 'ms', 'us', 'ns', None}, default None | ||
| Specify the desired resolution of the result. | ||
| If not specified, this is inferred from the 'start', 'end', and 'freq' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
I didn't add dedicated tests bc existing tests hit all the cases, but will do so now. |
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| unit : {'s', 'ms', 'us', 'ns'}, default 'ns' | ||
| unit : {'s', 'ms', 'us', 'ns', None}, default None | ||
| Specify the desired resolution of the result. | ||
| If not specified, this is inferred from the 'start', 'end', and 'freq' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized writing this that we also don't document Timestamp inference (at least, in the API docs). So this is perhaps a little incomplete unless that is done, but as long as there is no opposition to doing so I can open an issue for that.
| If not specified, this is inferred from the 'start', 'end', and 'freq' | |
| If not specified, this is inferred from the 'start', 'end', and 'freq' using the same inference as :class:`Timestamp` taking the highest resolution of the three that are provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's track as a separate issue indeed (your above suggestion does not make it much more clear, right now, IMO)
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual change looks good to me!
Personally, I would do less of unit="ns" in the tests (to ensure we test the default unit more as well, and it also just makes them a bit less verbose), but it that requires more updates in the tests to get them passing, that's certainly fine for a follow-up issue.
| df = DataFrame( | ||
| {"dt": date_range("2015-01-01", periods=3, tz="Europe/Brussels", unit="ns")} | ||
| ) | ||
| result = df.apply(lambda x: x) | ||
| tm.assert_frame_equal(result, df) | ||
|
|
||
| result = df.apply(lambda x: x + pd.Timedelta("1day")) | ||
| expected = DataFrame( | ||
| {"dt": date_range("2015-01-02", periods=3, tz="Europe/Brussels")} | ||
| {"dt": date_range("2015-01-02", periods=3, tz="Europe/Brussels", unit="ns")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't (yet) work without specifying the unit in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM the Timedelta("1day") above has "ns" unit, so we need it in the expected line. In order to be robust to the Timedelta PR, need it in the df line too. Can remove both if the Timedelta PR goes in.
| "B": [1.0, 2.0, 3.0], | ||
| "C": ["foo", "bar", "baz"], | ||
| "D": date_range("20130101", periods=3), | ||
| "D": date_range("20130101", periods=3, unit="ns"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "D": date_range("20130101", periods=3, unit="ns"), | |
| "D": date_range("20130101", periods=3), |
and then remove the .as_unit("ns") below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
| start = Timestamp("2025-11-25").as_unit("ms") | ||
| end = Timestamp("2025-11-26").as_unit("s") | ||
| dti = date_range(start, end, freq=off) | ||
| assert dti.unit == "us" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| start = Timestamp("2025-11-25").as_unit("ms") | |
| end = Timestamp("2025-11-26").as_unit("s") | |
| dti = date_range(start, end, freq=off) | |
| assert dti.unit == "us" | |
| start = Timestamp("2025-11-25 09:00:00").as_unit("s") | |
| end = Timestamp("2025-11-25 09:00:02").as_unit("s") | |
| dti = date_range(start, end, freq=off) | |
| assert dti.unit == "us" | |
| off = DateOffset(milleconds=2) | |
| dti = date_range(start, end, freq=off) | |
| assert dti.unit == "ms" | |
| off = DateOffset(nanoseconds=2) | |
| dti = date_range(start, end, freq=off) | |
| assert dti.unit == "ns" |
to also cover the two other cases? (unless that is already covered elsewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though i'll change the nano case to make a smaller array
I'm of mixed opinion on this. Ended up deciding that outside of tests targeted at date_range, I prefer to be a little bit more verbose in order to be robust to e.g. decisions on other PRs |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.Waiting to update
expected's until after #62801, so this won't pass yet.