Skip to content

Conversation

@panslava
Copy link
Contributor

@panslava panslava commented Sep 3, 2025

  • Centralize informer lifecycle with InformerSet for simpler, shared wiring.
  • Harden ProviderConfigControllersManager:
    • Add NEG cleanup finalizer on start; roll back on failures.
    • Idempotent, concurrency-safe start; stop closes channel and removes finalizer using latest object.
    • Introduce startNEGController test seam; unexport controllerSet; wrap errors; add docs.
  • Add README explaining multiproject structure and flow.
  • Tests: pkg/multiproject/manager/manager_test.go covers finalizer add/rollback, idempotent start, stop behavior, and concurrent single-shot.
  • No happy-path behavior change; improves robustness and testability.

/assign @swetharepakula

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: panslava

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 3, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 3, 2025
@panslava panslava force-pushed the improve-multiproject branch 2 times, most recently from 69aec65 to 438d992 Compare September 3, 2025 13:21
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2025
@panslava panslava force-pushed the improve-multiproject branch from 72a34fc to 2eb739c Compare September 7, 2025 10:30
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2025
@panslava panslava force-pushed the improve-multiproject branch from 2eb739c to 3f6e8d2 Compare October 10, 2025 02:25
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2025
@panslava panslava force-pushed the improve-multiproject branch from 3f6e8d2 to 291e886 Compare October 20, 2025 11:47
Comment on lines +157 to +161
// Start optional informers
startInformer(i.SvcNeg, stopCh)
startInformer(i.Network, stopCh)
startInformer(i.GkeNetworkParams, stopCh)
startInformer(i.NodeTopology, stopCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are replicating the check on whether the client is nil or not multiple times. Can we consolidate in initialization the list of informers that needs to be started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, tried to do it, but felt like overengineering. Adding a new abstraction to store with not so much gain, and makes it easier to miss and just seems even less readable to store same informers in additional list... I think we should leave it like this...

}

// hasSyncedFuncs returns a list of HasSynced functions for all non-nil informers.
func (i *InformerSet) hasSyncedFuncs() []func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we consolidate these checks perhaps at initialization?

So we save the set of functions and don't have to re-calculate every time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment... If you insist I can do it but seems a bit like overengineering, and I don't think re-calculation is too bad.... and nil checks are cheap

if err != nil {
t.Fatalf("Start() returned error: %v", err)
}
if ok := cache.WaitForCacheSync(stop, inf.CombinedHasSynced()); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand this to show that it is waiting for multiple informers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, if i correctly understood what is needed

pcLatest, err := pccm.providerConfigClient.CloudV1().ProviderConfigs().Get(context.Background(), pc.Name, metav1.GetOptions{})
if err != nil {
pcCopy := pc.DeepCopy()
pcCopy.Finalizers = append(pcCopy.Finalizers, finalizer.ProviderConfigNEGCleanupFinalizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should optionally add this finalizer if it doesn't exist.
Is this to cover the case our cache hasn't updated to after us putting it on ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a bit overengineering, but this function is only used to get PC on deletion

Idea is -- if by some case "get" failed (idk, some transient error), we don't want to leave PC hanging and still want to try to remove finalizer, so that's the logic besides this function

// TestStart_AddsFinalizer_AndIsIdempotent verifies that starting controllers
// for a ProviderConfig adds the NEG cleanup finalizer and that repeated starts
// are idempotent (the underlying controller is launched only once).
func TestStart_AddsFinalizer_AndIsIdempotent(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a testcase for starting a controller for a PC that already has a finalizer (in the case of controller restart)

Copy link
Contributor Author

@panslava panslava Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestStart_WithExistingFinalizer_IsIdempotent


// TestStart_FailureCleansFinalizer verifies that when starting controllers
// fails, the added finalizer is rolled back and removed from the object.
func TestStart_FailureCleansFinalizer(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the case that the PC already had a finalizer to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - was a bug. We were removing pre-existing finalizers on failure. Fixed + added test.

@swetharepakula swetharepakula mentioned this pull request Nov 26, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2025
Replace factory-based informer creation with centralized InformerSet that:
- Creates base SharedIndexInformers directly without factories
- Manages lifecycle and synchronization in one place
- Provides filtered views for ProviderConfig-specific controllers
- Reduces boilerplate and improves maintainability

Using informers directly without factories simplifies the logic and
eliminates potential mistakes from unnecessary factory usage, such as
cidentally creating duplicate informers or incorrect factory scoping.
- Add manager_test.go covering:
  - Start adds NEG cleanup finalizer and is idempotent
  - Start failure and GCE client creation failure roll back finalizer
  - Stop closes controller channel and removes finalizer
  - Concurrent starts for same ProviderConfig are single-shot
- Improve manager.go:
  - Introduce test seam via package var startNEGController
  - Ensure finalizer is added before start; roll back on any startup error
  - Delete finalizer using latest ProviderConfig from API to avoid stale updates
  - Wrap errors with %w and add GoDoc comments
  - Rename exported ControllerSet to unexported controllerSet
- No expected behavior change in the happy path; robustness and testability improved.
@panslava panslava force-pushed the improve-multiproject branch from 291e886 to ea200f4 Compare December 2, 2025 04:11
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2025
- Fix bug: preserve pre-existing finalizer on startup failure
- Expand informerset tests to verify multiple informers sync
- Add tests for controller restart with existing finalizer
- Add comment explaining closeProvider bool in neg_test
@panslava panslava force-pushed the improve-multiproject branch from ea200f4 to ce5c037 Compare December 2, 2025 04:12
@panslava
Copy link
Contributor Author

panslava commented Dec 2, 2025

Ready for re-review @swetharepakula

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants