-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Description
#6995 added docs on how to compute test coverage for the Python package.
This revealed that there are some parts of the LightGBM's Python API that are not covered by any unit tests.
This issue tracks the work of adding that testing.
Benefits of this work
- improves release confidence (including for re-packagers)
- reduces the risk of accidental regressions
Acceptance criteria
The following are covered by unit tests.
-
basic._cint64_array_to_numpy(), private function used inpredict()(4 lines) -
basic._json_default_with_numpy(), private function used in serialization/deserialization ofpandascategorical mappings (6 lines) -
basic._InnerPredictor: pickling / unpickling (4 lines) -
basic._InnerPredictor.predict()called on alistobject (5 lines) - setting
init_scorefor a Dataset that is a subset of another, and where the raw data is in a file (6 lines) -
Dataset.set_field()used to clear existing attributes (3 lines) ([python-package] Add unit test for Dataset.set_field and no data #7036) -
Dataset.set_categorical_feature()(11 lines) - 2
if-elsebranches inDataset._set_predictor()that I don't understand at a glance (4 lines) -
Dataset.get_position()(1 line) - a possibly-unused line in
Booster.trees_to_dataframe()(1 line) - most of
Booster.update()(10 lines) -
Booster.rollback_one_iter()(3 lines) -
Booster.eval()adding a new validation set (2 lines) -
Booster.shuffle_models()(2 lines) -
callback.early_stopping()used with DART boosting (3 lines) - several
if-elsebranches in handling ofmin_deltaincallback.early_stopping()(11 lines) -
engine.cv(): addingearly_stopping()based on the presence ofearly_stopping_round()in params (1 line) -
plotting.plot_importance(): settingmax_num_features(1 line) -
plotting.plot_importance(): providing pre-allocatedaxandfigsize(1 line) -
plotting.plot_importance(): settingxlim(1 line) -
plotting.plot_importance(): settingylim(1 line) -
plotting.plot_split_value_histogram(): settingxlim(1 line) -
plotting.plot_split_value_histogram(): settingylim(1 line) -
plotting.plot_metric(): settingxlim(1 line) -
plotting.plot_metric(): settingylim(1 line) -
plotting._to_graphviz()customizations passed through from other functions -
plotting.create_tree_digraph(): passing a 1-rowpandasDataFrame asexample_case(1 line) -
sklearn.LGBMRanker: custom objective function that takesgroup(3 lines) -
sklearn.LGBMRanker: custom metric function that takesgroup(3 lines)
Approach
Build the package and install it.
cmake -B build -S .
cmake --build build --target _lightgbm -j4
sh build-python.sh install --precompileCalculate test coverage, following the docs at https://github.com/microsoft/LightGBM/blob/master/python-package/README.rst#development-guide
Open htmlcov/index.html to view the code coverage summary.
Add new tests https://github.com/microsoft/LightGBM/tree/master/tests, then re-run the tests and inspect the code coverage summary.
Tests should only use lightgbm's public API, unless that is very difficult or expensive. Any function whose name begins with a _ is considered private.
"this code looks like it'd never be reached, it should be deleted" is also a valid way to address some of these!
Keep PRs small... only 1 or 2 of the items above in a single PR.
When you believe your changes are ready for review, open a pull request. In the description, link to this issue and describe specifically which lines of lightgbm's source code the new tests cover.
Notes
Some tests are skipped if optional dependencies (like pyarrow or dask) are not installed. Install those to get more complete test coverage.
The list above is not exhaustive, it is just the areas I think are highest-priority to address. PRs covering other uncovered code not explicitly mentioned here are also welcome!