Skip to content

Commit 3bbf6ef

Browse files
committed
docs: update blue green update document
Signed-off-by: Rory Z <[email protected]>
1 parent 44e9730 commit 3bbf6ef

File tree

2 files changed

+153
-135
lines changed

2 files changed

+153
-135
lines changed

docs/en_US/tasks/configure-emqx-blueGreenUpdate.md

Lines changed: 77 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -18,31 +18,6 @@ This feature only supports `apps.emqx.io/v1beta4 EmqxEnterprise` and `apps.emqx.
1818

1919
2. During the rolling update process, only N - 1 Pods can provide services because it takes some time for new Pods to start up and become ready. This may lead to a decrease in service availability.
2020

21-
```mermaid
22-
timeline
23-
section Update start
24-
Current Cluster<br>Have Endpoint
25-
: pod-0
26-
: pod-1
27-
: pod-2
28-
section Rolling update
29-
Current Cluster<br>Have Endpoint
30-
: pod-0
31-
: pod-1
32-
Update Cluster<br>Have Endpoint
33-
: pod-2
34-
Current Cluster<br>Have Endpoint
35-
: pod-0
36-
Update Cluster<br>Have Endpoint
37-
: pod-1
38-
: pod-2
39-
section Finish Update
40-
Update Cluster<br>Have Endpoint
41-
: pod-0
42-
: pod-1
43-
: pod-2
44-
```
45-
4621
## Solution
4722

4823
Regarding the issue of rolling updates mentioned in the previous text, EMQX Operator provides a blue-green deployment upgrade solution. When upgrading the EMQX cluster using EMQX custom resources, EMQX Operator will create a new EMQX cluster and redirect the Kubernetes Service to the new EMQX cluster after it is ready. It will then gradually delete Pods from the old EMQX cluster to achieve the purpose of updating the EMQX cluster.
@@ -51,51 +26,67 @@ When deleting Pods from the old EMQX cluster, EMQX Operator can also take advant
5126

5227
The entire upgrade process can be roughly divided into the following steps:
5328

54-
1. Create a cluster with the same specifications.
55-
56-
2. After the new cluster is ready, redirect the service to the new cluster and remove the old cluster from the service. At this time, the new cluster starts to receive traffic, and existing connections in the old cluster are not affected.
57-
58-
3. (Only supported by EMQX Enterprise Edition) Use EMQX node evacuation function to evacuate connections on each node one by one.
59-
60-
4. Gradually scale down the old cluster to 0 nodes.
61-
29+
1. Create new Pods with the EMQX custom resource, and join the new Pods to the EMQX cluster.
30+
2. After the new Pods are ready, redirect the Service to the new Pods and remove the old Pods from the Service. At this time, the new Pods start to receive traffic, and existing connections in the old Pods are not affected.
31+
3. (Only supported by EMQX Enterprise Edition) Use the EMQX node evacuation function to evacuate connections on each node one by one.
32+
4. Gradually scale down the old Pods to 0.
6233
5. Complete the upgrade.
6334

6435
```mermaid
65-
timeline
66-
section Update start
67-
Current Cluster<br>Have Endpoint
68-
: pod-0
69-
: pod-1
70-
: pod-2
71-
section Create update cluster
72-
Current Cluster
73-
: pod-0
74-
: pod-1
75-
: pod-2
76-
Update Cluster<br>Have Endpoint
77-
: pod-0
78-
: pod-1
79-
: pod-2
80-
section Updating cluster
81-
Current Cluster
82-
: pod-0
83-
: pod-1
84-
Update Cluster<br>Have Endpoint
85-
: pod-0
86-
: pod-1
87-
: pod-2
88-
Current Cluster
89-
: pod-0
90-
Update Cluster<br>Have Endpoint
91-
: pod-0
92-
: pod-1
93-
: pod-2
94-
section Finish Update
95-
Update Cluster<br>Have Endpoint
96-
: pod-0
97-
: pod-1
98-
: pod-2
36+
stateDiagram-v2
37+
[*] --> Step1
38+
Step1: Create new pods
39+
state Step1 {
40+
[*] --> CreateNewPods
41+
CreateNewPods: Create new pods
42+
CreateNewPods --> NewPodsJoinTheCluster
43+
NewPodsJoinTheCluster: New pods join the cluster
44+
NewPodsJoinTheCluster --> WaitNewPodsReady
45+
WaitNewPodsReady: Wait new pods ready
46+
WaitNewPodsReady --> LBServiceSelectNewPods
47+
LBServiceSelectNewPods: Redirect the Service to the new Pods and remove the old Pods from the Service
48+
LBServiceSelectNewPods --> [*]
49+
}
50+
Step1 --> Step2
51+
Step2: Delete old pods
52+
state HasReplNode <<choice>>
53+
state NodeEvacuationToNewPod1 <<choice>>
54+
state NodeEvacuationToNewPod2 <<choice>>
55+
state Step2 {
56+
[*] --> SelectOldestPod
57+
SelectOldestPod: Select oldest pod
58+
SelectOldestPod --> HasReplNode
59+
60+
HasReplNode --> SelectOldestReplPod: Has EMQX's replicant node pod
61+
SelectOldestReplPod: Select oldest EMQX's replicant node pod
62+
SelectOldestReplPod --> NodeEvacuationToNewPod1
63+
NodeEvacuationToNewPod1 --> NodeEvacuationToNewReplPod1: New pods has EMQX's replicant node
64+
NodeEvacuationToNewReplPod1: Node evacuation to new EMQX's replicant node pod
65+
NodeEvacuationToNewReplPod1 --> DeleteThisOldestPod1
66+
67+
NodeEvacuationToNewPod1 --> NodeEvacuationToNewCorePod1: New pods has no EMQX's replicant node
68+
NodeEvacuationToNewCorePod1: Node evacuation to new EMQX's core node pod
69+
NodeEvacuationToNewCorePod1 --> DeleteThisOldestPod1
70+
71+
DeleteThisOldestPod1: Delete this oldest pod
72+
DeleteThisOldestPod1 --> HasReplNode
73+
74+
HasReplNode --> SelectOldestCorePod: Has no EMQX's replicant node pod
75+
SelectOldestCorePod: Select oldest EMQX's core node pod
76+
SelectOldestCorePod --> NodeEvacuationToNewPod2
77+
NodeEvacuationToNewPod2 --> NodeEvacuationToNewReplPod2: New pods has EMQX's replicant node
78+
NodeEvacuationToNewReplPod2: Node evacuation to new EMQX's replicant node pod
79+
NodeEvacuationToNewReplPod2 --> DeleteThisOldestPod2
80+
81+
NodeEvacuationToNewPod2 --> NodeEvacuationToNewCorePod2: New pods has no EMQX's replicant node
82+
NodeEvacuationToNewCorePod2: Node evacuation to new EMQX's core node pod
83+
NodeEvacuationToNewCorePod2 --> DeleteThisOldestPod2
84+
85+
DeleteThisOldestPod2: Delete this oldest pod
86+
DeleteThisOldestPod2 --> [*]
87+
}
88+
Step2 --> Complete
89+
Complete --> [*]
9990
```
10091

10192
## How to update the EMQX cluster through blue-green deployment.
@@ -131,7 +122,7 @@ spec:
131122

132123
`sessEvictRate`: MQTT Session evacuation rate, only supported by EMQX Enterprise Edition (unit: count/second)。
133124

134-
Save the above content as: `emqx-update.yaml`, execute the following command to deploy EMQX:
125+
Save the above content as: `emqx-update.yaml`, execute the following command to deploy EMQX, in this example, just deploy EMQX core node cluster without replicant node:
135126

136127
```bash
137128
$ kubectl apply -f emqx-update.yaml
@@ -231,10 +222,26 @@ Output is similar to:
231222
emqx.apps.emqx.io/emqx-ee patched
232223
```
233224

234-
- Check status.
225+
- Check the status of the EMQX core nodes
226+
227+
```bash
228+
kubectl get emqx emqx-ee -o json | jq '.status.coreNodesStatus'
229+
{
230+
"currentReplicas": 2,
231+
"currentRevision": "54fc496fb4",
232+
"readyReplicas": 4,
233+
"replicas": 2,
234+
"updateReplicas": 2,
235+
"updateRevision": "5d87d4c6bd"
236+
}
237+
```
238+
239+
In this example, the old StatefulSet is `emqx-${currentRevision}`, which has 2 ready pods. The new StatefulSet is `emqx-${updateRevision}`, which has 2 ready pods. Then the EMQX Operator will start the node evacuation process.
240+
241+
- Check node evacuations status.
235242

236243
```bash
237-
$ kubectl get emqx emqx-ee -o json | jq ".status.nodEvacuationsStatus"
244+
$ kubectl get emqx emqx-ee -o json | jq ".status.nodeEvacuationsStatus"
238245
239246
[
240247
{

docs/zh_CN/tasks/configure-emqx-blueGreenUpdate.md

Lines changed: 76 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -17,31 +17,6 @@
1717
1. 在进行滚动更新时,对应的 Service 会同时选中新的和旧的 Pod。这可能导致 MQTT 客户端连接到错误的 Pod 上,从而频繁断开连接并进行重连操作。
1818
2. 在滚动更新过程中,只有 N - 1 个 Pod 能够提供服务,因为新的 Pod 需要一定时间来启动和准备就绪。这可能导致服务的可用性下降。
1919

20-
```mermaid
21-
timeline
22-
section Update start
23-
Current Cluster<br>Have Endpoint
24-
: pod-0
25-
: pod-1
26-
: pod-2
27-
section Rolling update
28-
Current Cluster<br>Have Endpoint
29-
: pod-0
30-
: pod-1
31-
Update Cluster<br>Have Endpoint
32-
: pod-2
33-
Current Cluster<br>Have Endpoint
34-
: pod-0
35-
Update Cluster<br>Have Endpoint
36-
: pod-1
37-
: pod-2
38-
section Finish Update
39-
Update Cluster<br>Have Endpoint
40-
: pod-0
41-
: pod-1
42-
: pod-2
43-
```
44-
4520
## 解决方案
4621

4722
针对上文提到的滚动更新的问题,EMQX Operator 提供了蓝绿发布的升级方案,通过 EMQX 自定义资源升级 EMQX 集群时,EMQX Operator 会创建新的 EMQX 集群,并在集群就绪后将 Kubernetes Service 指向新的 EMQX 集群,并逐步删除旧的 EMQX 集群的 Pod,从而达到更新 EMQX 集群的目的。
@@ -50,47 +25,67 @@ timeline
5025

5126
整个升级流程大致可分为以下几步:
5227

53-
1. 创建一个相同规格的集群
54-
2. 新集群就绪后,将 service 指向新集群,并将旧集群从 service 中摘除,此时新集群开始接受流量,旧集群现有的连接不受影响
55-
3.仅支持 EMQX 企业版)通过 EMQX 节点疏散功能,逐个对节点上的连接进行疏散
56-
4. 将旧的集群逐步缩容到 0 个节点
28+
1. 使用 EMQX 自定义资源创建新的 Pods,并将新的 Pods 加入 EMQX 集群
29+
2. 当新的 Pods 就绪后,将 Service 重定向到新的 Pods,并从 Service 中移除旧的 Pods。此时,新的 Pods 开始接收流量,旧的 Pods 中的现有连接不受影响
30+
3. EMQX 企业版支持)使用 EMQX 节点疏散功能逐个疏散每个节点上的连接
31+
4. 逐渐将旧的 Pods 缩减至 0
5732
5. 完成升级。
5833

5934
```mermaid
60-
timeline
61-
section Update start
62-
Current Cluster<br>Have Endpoint
63-
: pod-0
64-
: pod-1
65-
: pod-2
66-
section Create update cluster
67-
Current Cluster
68-
: pod-0
69-
: pod-1
70-
: pod-2
71-
Update Cluster<br>Have Endpoint
72-
: pod-0
73-
: pod-1
74-
: pod-2
75-
section Updating cluster
76-
Current Cluster
77-
: pod-0
78-
: pod-1
79-
Update Cluster<br>Have Endpoint
80-
: pod-0
81-
: pod-1
82-
: pod-2
83-
Current Cluster
84-
: pod-0
85-
Update Cluster<br>Have Endpoint
86-
: pod-0
87-
: pod-1
88-
: pod-2
89-
section Finish Update
90-
Update Cluster<br>Have Endpoint
91-
: pod-0
92-
: pod-1
93-
: pod-2
35+
stateDiagram-v2
36+
[*] --> Step1
37+
Step1: Create new pods
38+
state Step1 {
39+
[*] --> CreateNewPods
40+
CreateNewPods: Create new pods
41+
CreateNewPods --> NewPodsJoinTheCluster
42+
NewPodsJoinTheCluster: New pods join the cluster
43+
NewPodsJoinTheCluster --> WaitNewPodsReady
44+
WaitNewPodsReady: Wait new pods ready
45+
WaitNewPodsReady --> LBServiceSelectNewPods
46+
LBServiceSelectNewPods: Redirect the Service to the new Pods and remove the old Pods from the Service
47+
LBServiceSelectNewPods --> [*]
48+
}
49+
Step1 --> Step2
50+
Step2: Delete old pods
51+
state HasReplNode <<choice>>
52+
state NodeEvacuationToNewPod1 <<choice>>
53+
state NodeEvacuationToNewPod2 <<choice>>
54+
state Step2 {
55+
[*] --> SelectOldestPod
56+
SelectOldestPod: Select oldest pod
57+
SelectOldestPod --> HasReplNode
58+
59+
HasReplNode --> SelectOldestReplPod: Has EMQX's replicant node pod
60+
SelectOldestReplPod: Select oldest EMQX's replicant node pod
61+
SelectOldestReplPod --> NodeEvacuationToNewPod1
62+
NodeEvacuationToNewPod1 --> NodeEvacuationToNewReplPod1: New pods has EMQX's replicant node
63+
NodeEvacuationToNewReplPod1: Node evacuation to new EMQX's replicant node pod
64+
NodeEvacuationToNewReplPod1 --> DeleteThisOldestPod1
65+
66+
NodeEvacuationToNewPod1 --> NodeEvacuationToNewCorePod1: New pods has no EMQX's replicant node
67+
NodeEvacuationToNewCorePod1: Node evacuation to new EMQX's core node pod
68+
NodeEvacuationToNewCorePod1 --> DeleteThisOldestPod1
69+
70+
DeleteThisOldestPod1: Delete this oldest pod
71+
DeleteThisOldestPod1 --> HasReplNode
72+
73+
HasReplNode --> SelectOldestCorePod: Has no EMQX's replicant node pod
74+
SelectOldestCorePod: Select oldest EMQX's core node pod
75+
SelectOldestCorePod --> NodeEvacuationToNewPod2
76+
NodeEvacuationToNewPod2 --> NodeEvacuationToNewReplPod2: New pods has EMQX's replicant node
77+
NodeEvacuationToNewReplPod2: Node evacuation to new EMQX's replicant node pod
78+
NodeEvacuationToNewReplPod2 --> DeleteThisOldestPod2
79+
80+
NodeEvacuationToNewPod2 --> NodeEvacuationToNewCorePod2: New pods has no EMQX's replicant node
81+
NodeEvacuationToNewCorePod2: Node evacuation to new EMQX's core node pod
82+
NodeEvacuationToNewCorePod2 --> DeleteThisOldestPod2
83+
84+
DeleteThisOldestPod2: Delete this oldest pod
85+
DeleteThisOldestPod2 --> [*]
86+
}
87+
Step2 --> Complete
88+
Complete --> [*]
9489
```
9590

9691
## 如何通过蓝绿发布更新 EMQX 集群
@@ -126,7 +121,7 @@ spec:
126121

127122
`sessEvictRate`: MQTT Session 疏散速率,仅支持 EMQX 企业版(单位:count/second)。
128123

129-
将上述内容保存为:emqx-update.yaml,执行如下命令部署 EMQX:
124+
将上述内容保存为:emqx-update.yaml,执行如下命令部署 EMQX,在这个示例中,只部署 EMQX core node 集群,不部署 EMQX replicant node
130125

131126
```bash
132127
$ kubectl apply -f emqx-update.yaml
@@ -226,7 +221,23 @@ mqttx bench conn -h ${IP} -p ${PORT} -c 3000
226221
emqx.apps.emqx.io/emqx-ee patched
227222
```
228223

229-
- 检查蓝绿升级的状态
224+
- 检查 EMQX 集群状态
225+
226+
```bash
227+
kubectl get emqx emqx-ee -o json | jq '.status.coreNodesStatus'
228+
{
229+
"currentReplicas": 2,
230+
"currentRevision": "54fc496fb4",
231+
"readyReplicas": 4,
232+
"replicas": 2,
233+
"updateReplicas": 2,
234+
"updateRevision": "5d87d4c6bd"
235+
}
236+
```
237+
238+
在这个例子中,旧的 StatefulSet 是 `emqx-${currentRevision}`,有 2 个 Pod。新的 StatefulSet 是 `emqx-${updateRevision}`,有 2 个 Pod。然后 EMQX Operator 将开始节点疏散过程。
239+
240+
- 检查节点疏散状态
230241

231242
```bash
232243
$ kubectl get emqx emqx-ee -o json | jq ".status.nodEvacuationsStatus"

0 commit comments

Comments
 (0)