[python-package] expand test coverage

## Description

#6995 added docs on how to compute test coverage for the Python package. 

This revealed that there are some parts of the LightGBM's Python API that are not covered by any unit tests.

<img width="982" height="462" alt="Image" src="https://github.com/user-attachments/assets/59aa8f4a-160b-4202-8e02-0a08735b375e" />

This issue tracks the work of adding that testing.

## Benefits of this work

* improves release confidence (including for re-packagers)
* reduces the risk of accidental regressions

## Acceptance criteria

The following are covered by unit tests.

* [ ] `basic._cint64_array_to_numpy()`, private function used in `predict()` ([4 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L504-L509))
* [ ] `basic._json_default_with_numpy()`, private function used in serialization/deserialization of `pandas` categorical mappings ([6 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L522-L529))
* [ ] `basic._InnerPredictor`: pickling / unpickling ([4 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L1082-L1086))
* [ ] `basic._InnerPredictor.predict()` called on a `list` object ([5 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L1205-L1215))
* [ ] setting `init_score` for a Dataset that is a subset of another, and where the raw data is in a file ([6 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L2059-L2067))
* [x] `Dataset.set_field()` used to clear existing attributes ([3 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L2754-L2765)) (#7036)
* [ ] `Dataset.set_categorical_feature()` ([11 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L2915-L2933))
* [ ] 2 `if-else` branches in `Dataset._set_predictor()` that I don't understand at a glance ([4 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L2960-L2971))
* [ ] `Dataset.get_position()` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L3316))
* [ ] a possibly-unused line in `Booster.trees_to_dataframe()` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L3887))
* [ ] most of `Booster.update()` ([10 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L4094-L4112))
* [ ] `Booster.rollback_one_iter()` ([3 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L4192-L4202))
* [ ] `Booster.eval()` adding a new validation set ([2 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L4338-L4340))
* [ ] `Booster.shuffle_models()` ([2 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/basic.py#L4460-L4487))
* [ ] `callback.early_stopping()` used with DART boosting ([3 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/callback.py#L331-L334))
* [ ] several `if-else` branches in handling of `min_delta` in `callback.early_stopping()` ([11 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/callback.py#L358-L379))
* [ ] `engine.cv()`: adding `early_stopping()` based on the presence of `early_stopping_round()` in params ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/engine.py#L813-L826))
* [ ] `plotting.plot_importance()`: setting `max_num_features` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L131))
* [ ] `plotting.plot_importance()`: providing pre-allocated `ax` and `figsize` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L134-L136))
* [ ] `plotting.plot_importance()`: setting `xlim` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L148-L149))
* [ ] `plotting.plot_importance()`: setting `ylim` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L154-L155))
* [ ] `plotting.plot_split_value_histogram()`: setting `xlim` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L261-L262))
* [ ] `plotting.plot_split_value_histogram()`: setting `ylim` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L269-L270))
* [ ] `plotting.plot_metric()`: setting `xlim` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L403-L404))
* [ ] `plotting.plot_metric()`: setting `ylim` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L409-L410))
* [ ] `plotting._to_graphviz()` customizations passed through from other functions
  - [ ] "internal_count" or "data_percentage" added to nodes ([4 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L530-L533))
  - [ ] "leaf_weight", "leaf_count", or "data_percentage" added to leaf nodes ([3 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L561-L566))
* [ ] `plotting.create_tree_digraph()`: passing a 1-row `pandas` DataFrame as `example_case` ([1 line](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/plotting.py#L720-L726))
* [ ] `sklearn.LGBMRanker`: custom objective function that takes `group` ([3 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/sklearn.py#L231-L233))
* [ ] `sklearn.LGBMRanker`: custom metric function that takes `group` ([3 lines](https://github.com/microsoft/LightGBM/blob/f981fba730d1eaf26cfd6662155487e9b3d1109d/python-package/lightgbm/sklearn.py#L312-L314))

## Approach

Build the package and install it.

```shell
cmake -B build -S .
cmake --build build --target _lightgbm -j4
sh build-python.sh install --precompile
````

Calculate test coverage, following the docs at https://github.com/microsoft/LightGBM/blob/master/python-package/README.rst#development-guide

Open `htmlcov/index.html` to view the code coverage summary.

Add new tests https://github.com/microsoft/LightGBM/tree/master/tests, then re-run the tests and inspect the code coverage summary.

Tests should only use `lightgbm`'s public API, unless that is very difficult or expensive. Any function whose name begins with a `_` is considered private.

"this code looks like it'd never be reached, it should be deleted" is also a valid way to address some of these!

**Keep PRs small**... only 1 or 2 of the items above in a single PR.

When you believe your changes are ready for review, open a pull request. In the description, link to this issue and describe specifically which lines of `lightgbm`'s source code the new tests cover.

## Notes

Some tests are skipped if optional dependencies (like `pyarrow` or `dask`) are not installed. Install those to get more complete test coverage.

The list above is not exhaustive, it is just the areas I think are highest-priority to address. PRs covering other uncovered code not explicitly mentioned here are also welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[python-package] expand test coverage #7031

Description

Benefits of this work

Acceptance criteria

Approach

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[python-package] expand test coverage #7031

Description

Description

Benefits of this work

Acceptance criteria

Approach

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions