Wrapper for GMM #137

kmetra1910 · 2025-04-14T11:15:06Z

Drop-in replacement for GMM from gmr

Implements:

.from_samples(...)

.sample(...)

.predict(...)

.to_responsibilities(...)

.condition(...)

Carefully replicates gmr behavior, including:

Manual initialization when n_samples < 2

Safe handling of degenerate weights (e.g., nan or negative)

Adds warnings instead of exceptions for better debug experience

This pull request includes several changes across multiple files to improve code readability and consistency. The most important changes involve adding missing imports, reformatting code for better readability, and updating string formatting.

Code Readability Improvements:

bamt/networks/base.py: Reformatted import statements and long lines to improve readability. [1] [2] [3] [4] [5]
bamt/nodes/conditional_mixture_gaussian_node.py: Reformatted long lines and updated the choose and predict methods for better readability. [1] [2] [3] [4]

Consistency Updates:

bamt/external/pyitlib/DiscreteRandomVariableUtils.py: Updated string formatting to use double quotes and reformatted long lines for consistency. [1] [2] [3] [4]
bamt/nodes/conditional_logit_node.py: Reformatted long lines and updated string formatting for consistency.

Minor Additions:

Added missing author information in test files. [1] [2] [3] [4]

Bug Fixes:

bamt/external/pyitlib/DiscreteRandomVariableUtils.py: Fixed assertion statements and improved error handling in _estimate_probabilities function. [1] [2] [3]

Import Updates:

bamt/nodes/conditional_mixture_gaussian_node.py: Replaced import from gmr to bamt.utils.gmm_wrapper for GMM.

…ackage

…o methods

…ts because of library restrictions.

…ded methods, but i checked method sample.

…dition. It requires manual entry of params because we cant garantee that sklearn's from_samples and gmr's going to be the same

Copilot

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

bamt/utils/gmm_wrapper.py:160

The assignment of given_values using indexing [0] may unintentionally drop dimensions when conditioning on multiple variables. Consider using np.array(given_values) without the [0] indexing so that the full array is preserved.

given_values = np.array(given_values)[0]

bamt/external/pyitlib/DiscreteRandomVariableUtils.py

bamt/utils/gmm_wrapper.py

kmetra1910 · 2025-04-14T17:15:10Z

Обновил исходя из PR review, добавил комменты, докстринги и остальное

jrzkaminski · 2025-04-18T11:51:08Z

Разве что надо убрать gmr теперь из requirements и pyproject.toml

Roman223

NB: I didn't run these changes on my own!

Low-Level Observations

The low-level implementation looks solid overall.
A few commented-out lines should be removed to clean things up.
Exception handling is clear and well-structured.
Computation is efficient and numerically stable — e.g., good use of np.linalg.solve instead of inverse matrix operations.

High-Level Issues

Redundant Validations in `from_samples`

There's a manual check to ensure X.shape[0] >= 2, like:

if X.shape[0] < 2:
    raise ValueError("Need at least 2 samples...")

But this check already exists within sklearn.GaussianMixture through:

Snippet from fit_predict

X = validate_data(self, X, dtype=[np.float64, np.float32], ensure_min_samples=2)

Which internally uses:

if ensure_min_samples > 0:
    n_samples = _num_samples(array)
    if n_samples < ensure_min_samples:
        raise ValueError(
            "Found array with %d sample(s) (shape=%s) while a"
            " minimum of %d is required%s."
            % (n_samples, array.shape, ensure_min_samples, context)
        )

Recommendation: These checks are redundant and can be removed or delegated to sklearn.

`manual_init` Duplication

Instead of rewriting the initialization logic, it's more maintainable to reuse sklearn's internal _initialize method. For reference, here's the original:

def _initialize(self, X, resp):
    """Initialization of the Gaussian mixture parameters."""
    n_samples, _ = X.shape
    weights, means, covariances = None, None, None
    if resp is not None:
        weights, means, covariances = _estimate_gaussian_parameters(
            X, resp, self.reg_covar, self.covariance_type
        )
        if self.weights_init is None:
            weights /= n_samples

    self.weights_ = weights if self.weights_init is None else self.weights_init
    self.means_ = means if self.means_init is None else self.means_init

    if self.precisions_init is None:
        self.covariances_ = covariances
        self.precisions_cholesky_ = _compute_precision_cholesky(
            covariances, self.covariance_type
        )
    else:
        self.precisions_cholesky_ = _compute_precision_cholesky_from_precisions(
            self.precisions_init, self.covariance_type
        )

Recommendation: Subclass GaussianMixture and override only what you need, e.g.:

from sklearn.mixture import GaussianMixture

class CustomGMM(GaussianMixture):
    def _initialize(self, X, resp):
        super()._initialize(X, resp)
        # Add any custom behavior here

This avoids code duplication and benefits from future improvements in scikit-learn.

Testing Strategy

Using gmr as a reference is acceptable for transitional verification.
However, the long-term goal is to remove gmr from bamt, which means the testing strategy should evolve.

Suggestions:

Shift from comparison-based testing to behavioral testing:
- Check that likelihood improves after training.
- Verify cluster consistency on toy datasets.
Add integration tests on synthetic datasets with known structure.

Action Items

Remove commented-out lines and any redundant stability checks.
Refactor by subclassing sklearn.GaussianMixture and override only required methods.
Redesign tests to focus on correctness and invariants instead of output comparison with gmr.

Roman223 · 2025-04-18T12:34:57Z

@jrzkaminski убирать тогда надо аккуратно -- перенести например в группу тестирования

kmetra1910 · 2025-04-19T17:35:35Z

Removed redundant validation from from_samples() method and cleaned up outdated comments.
Confirmed that sklearn.GaussianMixture already handles insufficient data with its internal validation logic (as @Roman223 suggested).

All tests pass
Suggest moving gmr removal and behavioral test redesign into the next PR, along with a cleaner initialization approach (as suggested by Roman) using sklearn subclassing or controlled parameter setup.

This will allow us to fully drop dependency on gmr and transition to robust behavioral testing (log-likelihood checks, cluster validation).

kmetra1910 added 24 commits April 12, 2025 00:32

Create gmm_wrapper.py

d5c9d06

Твоё сообщение к коммиту

9856928

initialisation to be as it is in gmr but using only sklearn mixture p…

ed59ed1

…ackage

Init as in gmr

c5c7c64

Init as in gmr

2e333c3

Init as in gmr

c4a643e

wouldnt have been complient with sklearn so changed from attributes t…

357f04a

…o methods

Added .from_samples() method. Unfortunately it yet not supports weigh…

28b0b12

…ts because of library restrictions.

Added .sample with RE in case not initialised self._gmm

da874ac

Added .sample with RE in case not initialised self._gmm

251e640

First testcasing for initial wrapper. Not consistent with order of ad…

542dbc9

…ded methods, but i checked method sample.

added checks for to_responsibilities method

2503932

added to_responsibilities

cce0580

Finnaly finished god forsaken condition

0139033

Did unit-tests to see if gmr's GMM's condition works as our GMM's con…

277da8f

…dition. It requires manual entry of params because we cant garantee that sklearn's from_samples and gmr's going to be the same

updated to get parametrized n_components and n_features

33d3ffa

relevant parameters names for sklearn from samples

d1601b9

method renamed in wrapper

e072457

method renamed in wrapper

9367be8

edging

7e07be8

edging

b072ea6

warning and nans added in sampling

adecf8a

final update

cf3f49e

PEP8 with black

8741395

github-project-automation bot added this to BAMT backlog Apr 14, 2025

jrzkaminski requested review from Roman223 and Copilot April 14, 2025 13:40

Copilot AI reviewed Apr 14, 2025

View reviewed changes

bamt/external/pyitlib/DiscreteRandomVariableUtils.py Show resolved Hide resolved

jrzkaminski reviewed Apr 14, 2025

View reviewed changes

bamt/utils/gmm_wrapper.py Outdated Show resolved Hide resolved

bamt/utils/gmm_wrapper.py Show resolved Hide resolved

bamt/utils/gmm_wrapper.py Outdated Show resolved Hide resolved

bamt/utils/gmm_wrapper.py Show resolved Hide resolved

PR review updated

b9b8a0e

jrzkaminski previously approved these changes Apr 18, 2025

View reviewed changes

Roman223 reviewed Apr 18, 2025

View reviewed changes

PR review 2 updated, delegatesample size check to sklearn

7ff0332

kmetra1910 dismissed jrzkaminski’s stale review via 7ff0332 April 19, 2025 17:23

Roman223 approved these changes Apr 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrapper for GMM #137

Wrapper for GMM #137

Uh oh!

kmetra1910 commented Apr 14, 2025 •

edited by jrzkaminski

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kmetra1910 commented Apr 14, 2025

Uh oh!

jrzkaminski commented Apr 18, 2025

Uh oh!

Roman223 left a comment •

edited

Loading

Uh oh!

Roman223 commented Apr 18, 2025 •

edited

Loading

Uh oh!

kmetra1910 commented Apr 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wrapper for GMM #137

Are you sure you want to change the base?

Wrapper for GMM #137

Uh oh!

Conversation

kmetra1910 commented Apr 14, 2025 • edited by jrzkaminski Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Readability Improvements:

Consistency Updates:

Minor Additions:

Bug Fixes:

Import Updates:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kmetra1910 commented Apr 14, 2025

Uh oh!

jrzkaminski commented Apr 18, 2025

Uh oh!

Roman223 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Low-Level Observations

High-Level Issues

Redundant Validations in from_samples

manual_init Duplication

Testing Strategy

Action Items

Uh oh!

Roman223 commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmetra1910 commented Apr 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kmetra1910 commented Apr 14, 2025 •

edited by jrzkaminski

Loading

Roman223 left a comment •

edited

Loading

Redundant Validations in `from_samples`

`manual_init` Duplication

Roman223 commented Apr 18, 2025 •

edited

Loading