Skip to content

[python-package] expand test coverage #7031

@jameslamb

Description

@jameslamb

Description

#6995 added docs on how to compute test coverage for the Python package.

This revealed that there are some parts of the LightGBM's Python API that are not covered by any unit tests.

Image

This issue tracks the work of adding that testing.

Benefits of this work

  • improves release confidence (including for re-packagers)
  • reduces the risk of accidental regressions

Acceptance criteria

The following are covered by unit tests.

  • basic._cint64_array_to_numpy(), private function used in predict() (4 lines)
  • basic._json_default_with_numpy(), private function used in serialization/deserialization of pandas categorical mappings (6 lines)
  • basic._InnerPredictor: pickling / unpickling (4 lines)
  • basic._InnerPredictor.predict() called on a list object (5 lines)
  • setting init_score for a Dataset that is a subset of another, and where the raw data is in a file (6 lines)
  • Dataset.set_field() used to clear existing attributes (3 lines) ([python-package] Add unit test for Dataset.set_field and no data #7036)
  • Dataset.set_categorical_feature() (11 lines)
  • 2 if-else branches in Dataset._set_predictor() that I don't understand at a glance (4 lines)
  • Dataset.get_position() (1 line)
  • a possibly-unused line in Booster.trees_to_dataframe() (1 line)
  • most of Booster.update() (10 lines)
  • Booster.rollback_one_iter() (3 lines)
  • Booster.eval() adding a new validation set (2 lines)
  • Booster.shuffle_models() (2 lines)
  • callback.early_stopping() used with DART boosting (3 lines)
  • several if-else branches in handling of min_delta in callback.early_stopping() (11 lines)
  • engine.cv(): adding early_stopping() based on the presence of early_stopping_round() in params (1 line)
  • plotting.plot_importance(): setting max_num_features (1 line)
  • plotting.plot_importance(): providing pre-allocated ax and figsize (1 line)
  • plotting.plot_importance(): setting xlim (1 line)
  • plotting.plot_importance(): setting ylim (1 line)
  • plotting.plot_split_value_histogram(): setting xlim (1 line)
  • plotting.plot_split_value_histogram(): setting ylim (1 line)
  • plotting.plot_metric(): setting xlim (1 line)
  • plotting.plot_metric(): setting ylim (1 line)
  • plotting._to_graphviz() customizations passed through from other functions
    • "internal_count" or "data_percentage" added to nodes (4 lines)
    • "leaf_weight", "leaf_count", or "data_percentage" added to leaf nodes (3 lines)
  • plotting.create_tree_digraph(): passing a 1-row pandas DataFrame as example_case (1 line)
  • sklearn.LGBMRanker: custom objective function that takes group (3 lines)
  • sklearn.LGBMRanker: custom metric function that takes group (3 lines)

Approach

Build the package and install it.

cmake -B build -S .
cmake --build build --target _lightgbm -j4
sh build-python.sh install --precompile

Calculate test coverage, following the docs at https://github.com/microsoft/LightGBM/blob/master/python-package/README.rst#development-guide

Open htmlcov/index.html to view the code coverage summary.

Add new tests https://github.com/microsoft/LightGBM/tree/master/tests, then re-run the tests and inspect the code coverage summary.

Tests should only use lightgbm's public API, unless that is very difficult or expensive. Any function whose name begins with a _ is considered private.

"this code looks like it'd never be reached, it should be deleted" is also a valid way to address some of these!

Keep PRs small... only 1 or 2 of the items above in a single PR.

When you believe your changes are ready for review, open a pull request. In the description, link to this issue and describe specifically which lines of lightgbm's source code the new tests cover.

Notes

Some tests are skipped if optional dependencies (like pyarrow or dask) are not installed. Install those to get more complete test coverage.

The list above is not exhaustive, it is just the areas I think are highest-priority to address. PRs covering other uncovered code not explicitly mentioned here are also welcome!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions