Bug 1771986
| Summary: | [MSTR-485] kube-apiserver etcd encryption is too slow on fresh installed env, always taking nearly 15mins | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xingxing Xia <xxia> | |
| Component: | kube-apiserver | Assignee: | Lukasz Szaszkiewicz <lszaszki> | |
| Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.4 | CC: | aos-bugs, mfojtik, sttts | |
| Target Milestone: | --- | |||
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1776797 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-13 21:52:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1776797 | |||
Verified in following ENV:
$ oc version
Client Version: v4.4.0
Server Version: 4.4.0-0.nightly-2020-01-05-221122
Kubernetes Version: v1.17.0
Prepare resources:
cat > create-resources.sh << EOF
#!/bin/bash
N1=$1
N2=$2
cd ~/my
for i in `seq -w $1 $2`
do
oc new-project ke-test-proj-$i --skip-config-write
for j in `seq -w 01 09`
do
oc create secret generic mysecret-$j --from-literal abcdefg='12345^&*()' -n ke-test-proj-$i
done
for j in `seq -w 1 20`
do
sed "s/my/my-$j/" route-test.yaml | oc create -f - -n ke-test-proj-$i
done
done
EOF
bash create-resources.sh 1 100 & # may cost quite some time
bash create-resources.sh 101 200 & # may cost quite some time
cat > watch-migration.sh << EOF
#!/bin/bash
OAS_Y=
KAS_Y=
while true
do
if [ "$OAS_Y" != "Y" ]; then
date
echo "Checking OAS"
oc get po -n openshift-apiserver -l apiserver --show-labels
oc get openshiftapiserver cluster -o json | jq -r '.status.conditions[] | select(.type == "EncryptionMigrationControllerProgressing")'
oc get secret -n openshift-config-managed encryption-key-openshift-apiserver-1 -o json | jq -r '.metadata.annotations'
if oc get secret -n openshift-config-managed encryption-key-openshift-apiserver-1 -o json | jq -r '.metadata.annotations' | grep "migrated-resources.*routes" | grep oauthaccesstokens | grep oauthauthorizetokens > /dev/null; then
date
echo "OAS-O migration completed"
OAS_Y=Y
fi
fi
if [ "$KAS_Y" != "Y" ]; then
date
echo "Checking KAS"
oc get po -n openshift-kube-apiserver -l apiserver --show-labels
oc get kubeapiserver cluster -o json | jq -r '.status.conditions[] | select(.type == "EncryptionMigrationControllerProgressing")'
oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 -o json | jq -r '.metadata.annotations'
if oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 -o json | jq -r '.metadata.annotations' | grep "migrated-resources.*secrets" | grep configmaps > /dev/null; then
date
echo "KAS-O migration completed"
KAS_Y=Y
fi
fi
[ "$OAS_Y" == "Y" ] && [ "$KAS_Y" == "Y" ] && break
sleep 20
echo "===================="
done
EOF
Then enable etcd encryption.
Then watch migration:
bash watch-migration.sh | tee watch-migration.log
2020年 01月 06日 星期一 17:08:39 CST
Checking OAS
...
Checking KAS
...
Check the time gaps from watch-migration.log:
2020年 01月 06日 星期一 17:13:23 CST
...
"lastTransitionTime": "2020-01-06T09:13:18Z",
"message": "migrating resources to a new write key: [route.openshift.io/routes]",
...
020年 01月 06日 星期一 17:15:47 CST
OAS-O migration completed
...
2020年 01月 06日 星期一 17:20:56 CST
...
"lastTransitionTime": "2020-01-06T09:20:43Z",
"message": "migrating resources to a new write key: [core/secrets]",
...
2020年 01月 06日 星期一 17:23:19 CST
KAS-O migration completed
Get the migrating resources number,
$ oc get route --no-headers -A | wc -l
4007
$ oc get secret,cm --no-headers -A | wc -l
4912
for KAS, 4912 resources / 2m13s, for OAS, 4007 resources / 2m24s. The speed is about 1800 resources / min, this is expected fast.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |
Description of problem: kube-apiserver etcd encryption is too slow on fresh installed env, always taking nearly 15mins Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-13-004353 How reproducible: Always Steps to Reproduce: 1. Install fresh env, login with some users. Check resources counts: oc get secret -A | wc -l 1004 oc get cm -A | wc -l 282 oc get oauthaccesstoken | wc -l 58 oc get route -A | wc -l 10 2. Enable encryption: $ export PS1='[\u \D{%F %T} \W]\$ ' # add timestamp in shell prompt, my local time is CST (UTC+8) [xxia 2019-11-13 17:40:48 enc-story]$ oc edit apiserver cluster # add below and save/exit spec: encryption: type: aescbc apiserver.config.openshift.io/cluster edited [xxia 2019-11-13 17:42:02 my]$ oc get secret -n openshift-config-managed | grep enc # Approximately use this time in shell prompt as the encryption starting time, let it as Time1=2019-11-13 17:42:02 (UTC+8) encryption-config-openshift-apiserver Opaque 1 30s encryption-config-openshift-kube-apiserver Opaque 1 29s encryption-key-openshift-apiserver-1 Opaque 1 32s encryption-key-openshift-kube-apiserver-1 Opaque 1 32s 3. In terminal A, repeatedly check: [xxia 2019-11-13 17:46:10 enc-story]$ oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 encryption-key-openshift-apiserver-1 -o yaml | grep -A 5 annotations | grep encryption.apiserver.operator.openshift.io/migrated encryption.apiserver.operator.openshift.io/migrated-resources: '{"resources":[{"Group":"oauth.openshift.io","Resource":"oauthauthorizetokens"},{"Group":"route.openshift.io","Resource":"routes"},{"Group":"oauth.openshift.io","Resource":"oauthaccesstokens"}]}' encryption.apiserver.operator.openshift.io/migrated-timestamp: "2019-11-13T09:46:25Z" [xxia 2019-11-13 17:57:58 enc-story]$ oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 encryption-key-openshift-apiserver-1 -o yaml | grep -A 5 annotations | grep encryption.apiserver.operator.openshift.io/migrated encryption.apiserver.operator.openshift.io/migrated-resources: '{"resources":[{"Group":"","Resource":"configmaps"}]}' encryption.apiserver.operator.openshift.io/migrated-timestamp: "2019-11-13T09:57:33Z" encryption.apiserver.operator.openshift.io/migrated-resources: '{"resources":[{"Group":"oauth.openshift.io","Resource":"oauthauthorizetokens"},{"Group":"route.openshift.io","Resource":"routes"},{"Group":"oauth.openshift.io","Resource":"oauthaccesstokens"}]}' encryption.apiserver.operator.openshift.io/migrated-timestamp: "2019-11-13T09:46:25Z 4. In terminal B, repeatedly check: [xxia 2019-11-13 17:42:28 my]$ oc get po -n openshift-kube-apiserver -l apiserver --show-labels -w NAME READY STATUS RESTARTS AGE LABELS kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 3/3 Running 0 6h34m apiserver=true,app=openshift-kube-apiserver,revision=6 kube-apiserver-ip-10-0-149-168.ap-northeast-1.compute.internal 0/3 Init:0/1 0 12s apiserver=true,app=openshift-kube-apiserver,revision=7 kube-apiserver-ip-10-0-170-230.ap-northeast-1.compute.internal 3/3 Running 0 6h32m apiserver=true,app=openshift-kube-apiserver,revision=6 ... kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 0/3 Terminating 0 6h39m apiserver=true,app=openshift-kube-apiserver,revision=6 kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 0/3 Pending 0 0s apiserver=true,app=openshift-kube-apiserver,revision=8 kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 0/3 Init:0/1 0 8s apiserver=true,app=openshift-kube-apiserver,revision=8 [xxia 2019-11-13 17:48:53 my]$ [xxia 2019-11-13 17:52:55 my]$ oc get po -n openshift-kube-apiserver -l apiserver --show-labels -w NAME READY STATUS RESTARTS AGE LABELS kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 3/3 Running 0 5m44s apiserver=true,app=openshift-kube-apiserver,revision=8 ... kube-apiserver-ip-10-0-170-230.ap-northeast-1.compute.internal 3/3 Running 0 80s apiserver=true,app=openshift-kube-apiserver,revision=9 ... kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 0/3 PodInitializing 0 71s apiserver=true,app=openshift-kube-apiserver,revision=9 kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 2/3 Running 0 73s apiserver=true,app=openshift-kube-apiserver,revision=9 kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal 3/3 Running 0 81s apiserver=true,app=openshift-kube-apiserver,revision=9 Actual results: 3. For KAS-O encryption, calculate the time gap: Time1 (see above) = 2019-11-13 17:42:02 (UTC+8) = 2019-11-13T09:42:02Z Time2 = encryption.apiserver.operator.openshift.io/migrated-timestamp = 2019-11-13T09:57:33Z Time used is nearly 15 mins, too slow for the resources counts in step 1 which are not many In step 3, for kube-apiserver secret, after repeatedly checking yaml, the annotations finally show up. 4. the KAS pods always need restarts of 3 (9 minus 6) times of revisions to complete finally Running. Expected results: 4. Etcd encryption should be not too slow on fresh installed env with not many resources Additional info: