Skip to content

Conversation

@ArkaSaha30
Copy link
Contributor

@ArkaSaha30 ArkaSaha30 commented Oct 23, 2025

This PR will add AutoProvider certificate manager as a part of generating default self-signed certificates:

  • interface functions
  • reconciler logic
  • e2e tests
etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k get pods
NAME                   READY   STATUS    RESTARTS   AGE
etcdcluster-sample-0   1/1     Running   0          34m
etcdcluster-sample-1   1/1     Running   0          34m
etcdcluster-sample-2   1/1     Running   0          33m


etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe pod etcdcluster-sample-0  
Name:             etcdcluster-sample-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             kind-control-plane/172.18.0.2
Start Time:       Mon, 27 Oct 2025 23:47:43 +0530
Labels:           app=etcdcluster-sample
                  apps.kubernetes.io/pod-index=0
                  controller=etcdcluster-sample
                  controller-revision-hash=etcdcluster-sample-754c4cfbcd
                  statefulset.kubernetes.io/pod-name=etcdcluster-sample-0
Annotations:      <none>
Status:           Running
IP:               10.244.0.9
IPs:
  IP:           10.244.0.9
Controlled By:  StatefulSet/etcdcluster-sample
Containers:
  etcd:
    Container ID:  containerd://faa47a05746374496ff1b3d0147a93efc4d72bc07572e68183df1df9decdcdbf
    Image:         gcr.io/etcd-development/etcd:v3.5.21
    Image ID:      gcr.io/etcd-development/etcd@sha256:fd158fbe55240e252947bbd2e8dddc217997ff43978071fac2bd202b6ad15c03
    Ports:         2379/TCP, 2380/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /usr/local/bin/etcd
    Args:
      --name=$(POD_NAME)
      --listen-peer-urls=http://0.0.0.0:2380
      --listen-client-urls=http://0.0.0.0:2379
      --initial-advertise-peer-urls=http://$(POD_NAME).etcdcluster-sample.$(POD_NAMESPACE).svc.cluster.local:2380
      --advertise-client-urls=http://$(POD_NAME).etcdcluster-sample.$(POD_NAMESPACE).svc.cluster.local:2379
    State:          Running
      Started:      Mon, 27 Oct 2025 23:47:52 +0530
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      etcdcluster-sample-state  ConfigMap  Optional: false
    Environment:
      POD_NAME:       etcdcluster-sample-0 (v1:metadata.name)
      POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kwksc (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  server-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcdcluster-sample-server-tls
    Optional:    false
  peer-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcdcluster-sample-peer-tls
    Optional:    false
  kube-api-access-kwksc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  34m   default-scheduler  Successfully assigned default/etcdcluster-sample-0 to kind-control-plane
  Normal  Pulling    34m   kubelet            Pulling image "gcr.io/etcd-development/etcd:v3.5.21"
  Normal  Pulled     34m   kubelet            Successfully pulled image "gcr.io/etcd-development/etcd:v3.5.21" in 8.646s (8.646s including waiting). Image size: 20653564 bytes.
  Normal  Created    34m   kubelet            Created container: etcd
  Normal  Started    34m   kubelet            Started container etcd

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k get secrets
NAME                            TYPE                DATA   AGE
etcdcluster-sample-client-tls   kubernetes.io/tls   3      35m
etcdcluster-sample-peer-tls     kubernetes.io/tls   3      35m
etcdcluster-sample-server-tls   kubernetes.io/tls   3      35m

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-client-tls
Name:         etcdcluster-sample-client-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
ca.crt:   769 bytes
tls.crt:  769 bytes
tls.key:  365 bytes

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-peer-tls  
Name:         etcdcluster-sample-peer-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
tls.crt:  774 bytes
tls.key:  365 bytes
ca.crt:   774 bytes

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-server-tls
Name:         etcdcluster-sample-server-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
ca.crt:   769 bytes
tls.crt:  769 bytes
tls.key:  365 bytes

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ArkaSaha30
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 3 times, most recently from f819caf to 82651bc Compare October 27, 2025 18:17
@ArkaSaha30 ArkaSaha30 changed the title [WIP] Add AutoProvider interface functions Add AutoProvider interface functions Oct 27, 2025
@ArkaSaha30 ArkaSaha30 requested review from ahrtr and ivanvc October 27, 2025 18:49
@ahrtr
Copy link
Member

ahrtr commented Oct 28, 2025

@abdurrehman107 @ballista01 @ivanvc @hakman PTAL.

Reviewing this PR is also an opportunity to get familiar with the certificate design & implementation.

Reference: https://docs.google.com/document/d/1k8os77Hqxe36nUaZ31YqBGBuXcgdozVPppJYOfwUDFA/edit?tab=t.0

@ahrtr
Copy link
Member

ahrtr commented Oct 28, 2025

cc @frederiko

@ArkaSaha30 ArkaSaha30 changed the title Add AutoProvider interface functions Add AutoProvider certificate manager Oct 28, 2025
return nil, fmt.Errorf("failed to get certificate: %w", err)
}

return &interfaces.Config{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get the info from the certificate included in the secret.

Copy link
Contributor Author

@ArkaSaha30 ArkaSaha30 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahrtr , I explored getting the original config from the secret itself but it will be a complicated one since we are manipulating the config and passing it to transport.SelfCert() which in turn creats the certificate.
Another workaround is to retrieve the certificateConfig from the etcdCluster object via the certificateSecret.ownerRef.Name, but it will cause a circular dependency

What can be the suggested solution here?
cc @ivanvc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it will be a complicated one

Where is the complication? I don't see a PoC so far, nor a reasonable explanation.

The easiest solution is to just save the config passed to EnsureCertificateSecret, and return it in GetCertificateConfig. The problem is that it isn't the final real configuration that we used to create the certificate.

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, Arka. I left some observations, comments, and suggestions.

if autoConfig.AltNames.DNSNames != nil {
getAltNames = certInterface.AltNames{
DNSNames: autoConfig.AltNames.DNSNames,
IPs: make([]net.IP, len(autoConfig.AltNames.DNSNames)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? Is the plan to fill it later? I can't find it in the current code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are preparing the certificate config here from the user-defined config for etcd-operator. This config will be passed to the createCertificate function where based on the provider EnsureCertificateSecretwith the above mentioned config parameter will be passed to create the certificateSecret

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, this is not being used anywhere. I can see how it is filled for CertManager:

ipAddresses = make([]net.IP, len(cmCertificate.Spec.IPAddresses))

And used:

IPAddresses: strings.Fields(strings.Trim(fmt.Sprint(cfg.AltNames.IPs), "[]")),

Unless I'm not following through, as this slice is always empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But IPs are not being used, should it come as a part of the hosts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the DNS names, not the IP addresses. This slice is kept empty with a capacity of len(autoConfig.AltNames.DNSNames). @ahrtr, do we need the IPs?

Copy link
Member

@ahrtr ahrtr Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should populate the IPs with whatever the IPs passed by users (otherwise we should remove IPAddress from the user-facing config), although usually in practice IPAddresses aren't used, because POD IPs are dynamic.

Please update the createCMCertificateConfig to populate the IPs with user provided values as well. It can be resolved in a separate PR.

We can think about to refactor the user interface or implementation in future when we get more feedback from real users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should populate the IPs with whatever the IPs passed by users (otherwise we should remove IPAddress from the user-facing config), although usually in practice IPAddresses aren't used, because POD IPs are dynamic.

I think we should delete the IPAddressess property because, as you said, this will get invalidated with pod rotations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should delete the IPAddressess property because, as you said, this will get invalidated with pod rotations.

I suggest that we follow comment to proceed for now. For any user facing API change, we should get more feedback. Let's discuss it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we follow comment to proceed for now

I mean #227 (comment)

This commit will add the initial scaffolding for AutoProvider.

Signed-off-by: ArkaSaha30 <[email protected]>
@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 2 times, most recently from c1da674 to 9d90c9e Compare October 29, 2025 18:03
@ArkaSaha30
Copy link
Contributor Author

/test pull-etcd-operator-test-e2e

@ivanvc
Copy link
Member

ivanvc commented Nov 9, 2025

@ArkaSaha30, where do we stand in this pull request? Is this ready for another review?

}
}

tlsInfo, selfCertErr := transport.SelfCert(zap.NewNop(), tmpDir, hosts, uint(validity/DefaultValidity))
Copy link

@neolit123 neolit123 Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you decide to use the fake host ports as we discussed during the meeting of 25 Nov, due to the bug in SelfCert not working for hosts without port, then remember to use net.JoinHostPort for adding a port. the function is IPv4/6 compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants