Skip to content

Conversation

@LiangquanLi930
Copy link
Contributor

@LiangquanLi930 LiangquanLi930 commented Nov 24, 2025

What type of PR is this?
/kind flake

What this PR does / why we need it:
fix AWSMachineTemplate autoscaler test

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

fix AWSMachineTemplate autoscaler test

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-priority needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 24, 2025
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 24, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LiangquanLi930. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 24, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2025
@chrischdi
Copy link
Member

/test help

/retitle 🌱 [WIP] e2e: fix AWSMachineTemplate autoscaler test
/ok-to-test

@k8s-ci-robot k8s-ci-robot changed the title fix PR: 5711 e2e 🌱 [WIP] e2e: fix AWSMachineTemplate autoscaler test Nov 24, 2025
@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 24, 2025
@chrischdi
Copy link
Member

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-docker
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-docker
pull-cluster-api-provider-aws-e2e-blocking
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member

/test pull-cluster-api-provider-aws-e2e

1 similar comment
@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e-eks

@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@chrischdi
Copy link
Member

xref: tries to take a look at the test failure coming from this PR: #5711 (comment)

It seems to be very flaky!

See: k8s-triage

Example: prow.k8s.io/view/gs/kubernetes-ci-logs/logs/periodic-cluster-api-provider-aws-e2e-eks-canary/1992633554630086656

@LiangquanLi930
Copy link
Contributor Author

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5765/pull-cluster-api-provider-aws-e2e/1992881339380011008

capa-e2e: [It] [unmanaged] [functional] Workload cluster with AWS SSM Parameter as the Secret Backend should be creatable and deletable 

pass

@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e-eks

@nrb
Copy link
Contributor

nrb commented Nov 24, 2025

/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 24, 2025
@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e-eks

@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot k8s-ci-robot changed the title 🌱 [WIP] e2e: fix AWSMachineTemplate autoscaler test 🌱 e2e: fix AWSMachineTemplate autoscaler test #5765 Nov 25, 2025
@LiangquanLi930 LiangquanLi930 changed the title 🌱 e2e: fix AWSMachineTemplate autoscaler test #5765 🌱 e2e: fix AWSMachineTemplate autoscaler test Nov 26, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 26, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 26, 2025
@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@damdo
Copy link
Member

damdo commented Nov 26, 2025

@LiangquanLi930 could you rebase also? TY

@LiangquanLi930 LiangquanLi930 force-pushed the fix-e2e branch 2 times, most recently from 7e094af to ff4df47 Compare November 26, 2025 14:29
…condition

Add MachineDeployment and KubeadmControlPlane watchers to trigger
AWSMachineTemplate reconciliation, ensuring nodeInfo is populated
before cache sync completes.

Related: kubernetes-sigs#5711
@damdo
Copy link
Member

damdo commented Nov 26, 2025

/test pull-cluster-api-provider-aws-test

@damdo
Copy link
Member

damdo commented Nov 26, 2025

/test pull-cluster-api-provider-aws-e2e

2 similar comments
@nrb
Copy link
Contributor

nrb commented Nov 26, 2025

/test pull-cluster-api-provider-aws-e2e

@damdo
Copy link
Member

damdo commented Nov 27, 2025

/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Nov 27, 2025

@LiangquanLi930: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e-eks ea9e47d link false /test pull-cluster-api-provider-aws-e2e-eks

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@chrischdi
Copy link
Member

Analyzing the test failure at pull-cluster-api-provider-aws-e2e:

TL/DR: This is a failure unrelated to this test.

  • The failing test is untouched in this PR.
  • The test fails because the AWSCluster does not get the VPC Created:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
  ...
  creationTimestamp: "2025-11-27T07:56:46Z"
  name: cluster-mb82rq-rt7g5
  namespace: functional-multitenancy-nested-clusterclass-o7tz51
  ...
spec:
  ...
status:
  conditions:
  ...
  - lastTransitionTime: "2025-11-27T08:31:48Z"
    message: 'failed to describe VPC resources by name: failed to query ec2 for VPCs
      by name "cluster-mb82rq-vpc": operation error EC2: DescribeVpcs, get identity:
      get credentials: failed to refresh cached credentials, failed to refresh cached
      credentials, operation error STS: AssumeRole, https response error StatusCode:
      403, RequestID: 1567e191-f13a-4f66-b377-fcf4803fe514, api error ExpiredToken:
      The security token included in the request is expired'
    reason: VpcReconciliationFailed
    severity: Warning
    status: "False"
    type: VpcReady
  ...

The logs show 403's with:

E1127 07:56:47.262707       1 controller.go:353] "Reconciler error" err="failed to describe VPC resources by name: failed to query ec2 for VPCs by name \"cluster-mb82rq-vpc\": operation error EC2: DescribeVpcs, get identity: get credentials: failed to refresh cached credentials, failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: b73c243c-bf3c-4b66-bdce-d243fb1e1280, api error ExpiredToken: The security token included in the request is expired" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="functional-multitenancy-nested-clusterclass-o7tz51/cluster-mb82rq-rt7g5" namespace="functional-multitenancy-nested-clusterclass-o7tz51" name="cluster-mb82rq-rt7g5" reconcileID="ae20dfa2-e48f-4f84-b813-488ff785fe2a"

The test uses AWSClusterRoleIdentity which has a duration defaulted to 15 mins.

The AWSClusterRoleIdentity was created at 2025-11-27T07:16:51Z which is more than 15 minutes before the AWSCluster got created. So the credentials were not valid anymore.


Proposal: we should merge this PR (cc @damdo @richardcase @nrb ) because it solves the perma-red test-failure in periodics.

And we should also do a separate PR fixing this issue by setting a higher duration for the AWSClusterRoleIdentity at test/e2e/data/infrastructure-aws/withclusterclass/kustomize_sources/nested-multitenancy-clusterclass/role.yaml

@richardcase
Copy link
Member

Proposal: we should merge this PR (cc @damdo @richardcase @nrb ) because it solves the perma-red test-failure in periodics.

The proposal seems reasonable to me and thanks for this change.

/approve

/override pull-cluster-api-provider-aws-e2e

@k8s-ci-robot
Copy link
Contributor

@richardcase: Overrode contexts on behalf of richardcase: pull-cluster-api-provider-aws-e2e

In response to this:

Proposal: we should merge this PR (cc @damdo @richardcase @nrb ) because it solves the perma-red test-failure in periodics.

The proposal seems reasonable to me and thanks for this change.

/approve

/override pull-cluster-api-provider-aws-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2025
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

We might want to also watch MachineSets, in case there are users of MachineSets that don't use MachineDeployment

But let's merge this for now and iterate

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 52ac2192cbb7e748010399cecea8fc54239cdfa7

@k8s-ci-robot k8s-ci-robot merged commit cf0fed0 into kubernetes-sigs:main Nov 27, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants