Bug 1771986 - [MSTR-485] kube-apiserver etcd encryption is too slow on fresh installed env, always taking nearly 15mins
Summary: [MSTR-485] kube-apiserver etcd encryption is too slow on fresh installed env,...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.4.0
Assignee: Lukasz Szaszkiewicz
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks: 1776797
TreeView+ depends on / blocked
 
Reported: 2019-11-13 10:39 UTC by Xingxing Xia
Modified: 2020-05-13 21:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1776797 (view as bug list)
Environment:
Last Closed: 2020-05-13 21:52:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 670 'None' closed Bug 1771986: encryption: backup/restore support + speedup + fixes 2020-09-30 05:58:01 UTC
Github openshift cluster-openshift-apiserver-operator pull 267 'None' closed Bug 1771986: encryption: backup/restore support + speedup + fixes 2020-09-30 05:58:01 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-13 21:52:51 UTC

Description Xingxing Xia 2019-11-13 10:39:42 UTC
Description of problem:
kube-apiserver etcd encryption is too slow on fresh installed env, always taking nearly 15mins

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-13-004353

How reproducible:
Always

Steps to Reproduce:
1. Install fresh env, login with some users. Check resources counts:
oc get secret -A | wc -l
1004
oc get cm -A | wc -l
282
oc get oauthaccesstoken | wc -l  
58
oc get route -A | wc -l
10

2. Enable encryption:
$ export PS1='[\u \D{%F %T} \W]\$ ' # add timestamp in shell prompt, my local time is CST (UTC+8)
[xxia 2019-11-13 17:40:48 enc-story]$ oc edit apiserver cluster # add below and save/exit
spec:
  encryption:
    type: aescbc

apiserver.config.openshift.io/cluster edited
[xxia 2019-11-13 17:42:02 my]$ oc get secret -n openshift-config-managed | grep enc # Approximately use this time in shell prompt as the encryption starting time, let it as Time1=2019-11-13 17:42:02 (UTC+8)
encryption-config-openshift-apiserver        Opaque                                1      30s
encryption-config-openshift-kube-apiserver   Opaque                                1      29s
encryption-key-openshift-apiserver-1         Opaque                                1      32s
encryption-key-openshift-kube-apiserver-1    Opaque                                1      32s

3. In terminal A, repeatedly check:
[xxia 2019-11-13 17:46:10 enc-story]$ oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 encryption-key-openshift-apiserver-1  -o yaml | grep -A 5 annotations | grep encryption.apiserver.operator.openshift.io/migrated
      encryption.apiserver.operator.openshift.io/migrated-resources: '{"resources":[{"Group":"oauth.openshift.io","Resource":"oauthauthorizetokens"},{"Group":"route.openshift.io","Resource":"routes"},{"Group":"oauth.openshift.io","Resource":"oauthaccesstokens"}]}'
      encryption.apiserver.operator.openshift.io/migrated-timestamp: "2019-11-13T09:46:25Z"

[xxia 2019-11-13 17:57:58 enc-story]$ oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 encryption-key-openshift-apiserver-1  -o yaml | grep -A 5 annotations | grep encryption.apiserver.operator.openshift.io/migrated
      encryption.apiserver.operator.openshift.io/migrated-resources: '{"resources":[{"Group":"","Resource":"configmaps"}]}'
      encryption.apiserver.operator.openshift.io/migrated-timestamp: "2019-11-13T09:57:33Z"
      encryption.apiserver.operator.openshift.io/migrated-resources: '{"resources":[{"Group":"oauth.openshift.io","Resource":"oauthauthorizetokens"},{"Group":"route.openshift.io","Resource":"routes"},{"Group":"oauth.openshift.io","Resource":"oauthaccesstokens"}]}'
      encryption.apiserver.operator.openshift.io/migrated-timestamp: "2019-11-13T09:46:25Z

4. In terminal B, repeatedly check:
[xxia 2019-11-13 17:42:28 my]$ oc get po -n openshift-kube-apiserver -l apiserver --show-labels -w
NAME                                                             READY   STATUS     RESTARTS   AGE     LABELS
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   3/3     Running    0          6h34m   apiserver=true,app=openshift-kube-apiserver,revision=6
kube-apiserver-ip-10-0-149-168.ap-northeast-1.compute.internal   0/3     Init:0/1   0          12s     apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-ip-10-0-170-230.ap-northeast-1.compute.internal   3/3     Running    0          6h32m   apiserver=true,app=openshift-kube-apiserver,revision=6
...
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   0/3     Terminating       0          6h39m   apiserver=true,app=openshift-kube-apiserver,revision=6
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   0/3     Pending           0          0s      apiserver=true,app=openshift-kube-apiserver,revision=8
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   0/3     Init:0/1          0          8s      apiserver=true,app=openshift-kube-apiserver,revision=8
[xxia 2019-11-13 17:48:53 my]$
[xxia 2019-11-13 17:52:55 my]$ oc get po -n openshift-kube-apiserver -l apiserver --show-labels -w
NAME                                                             READY   STATUS    RESTARTS   AGE     LABELS
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   3/3     Running   0          5m44s   apiserver=true,app=openshift-kube-apiserver,revision=8
...
kube-apiserver-ip-10-0-170-230.ap-northeast-1.compute.internal   3/3     Running   0          80s     apiserver=true,app=openshift-kube-apiserver,revision=9
...
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   0/3     PodInitializing   0          71s     apiserver=true,app=openshift-kube-apiserver,revision=9
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   2/3     Running           0          73s     apiserver=true,app=openshift-kube-apiserver,revision=9
kube-apiserver-ip-10-0-137-119.ap-northeast-1.compute.internal   3/3     Running           0          81s     apiserver=true,app=openshift-kube-apiserver,revision=9

Actual results:
3. For KAS-O encryption, calculate the time gap:
Time1 (see above) = 2019-11-13 17:42:02 (UTC+8) = 2019-11-13T09:42:02Z
Time2 = encryption.apiserver.operator.openshift.io/migrated-timestamp = 2019-11-13T09:57:33Z
Time used is nearly 15 mins, too slow for the resources counts in step 1 which are not many

In step 3, for kube-apiserver secret, after repeatedly checking yaml, the annotations finally show up.

4. the KAS pods always need restarts of 3 (9 minus 6) times of revisions to complete finally Running.

Expected results:
4. Etcd encryption should be not too slow on fresh installed env with not many resources

Additional info:

Comment 5 Ke Wang 2020-01-06 10:05:06 UTC
Verified in following ENV:
$ oc version
Client Version: v4.4.0
Server Version: 4.4.0-0.nightly-2020-01-05-221122
Kubernetes Version: v1.17.0


Prepare resources:
cat > create-resources.sh << EOF
#!/bin/bash
N1=$1
N2=$2
cd ~/my
for i in `seq -w $1 $2`
do
  oc new-project ke-test-proj-$i --skip-config-write
  for j in `seq -w 01 09`
  do
    oc create secret generic mysecret-$j --from-literal abcdefg='12345^&*()' -n ke-test-proj-$i
  done
  for j in `seq -w 1 20`
  do
    sed "s/my/my-$j/" route-test.yaml | oc create -f - -n ke-test-proj-$i
  done
done
EOF

bash create-resources.sh 1 100 & # may cost quite some time
bash create-resources.sh 101 200 & # may cost quite some time


cat > watch-migration.sh << EOF
#!/bin/bash
OAS_Y=
KAS_Y=
while true
do
  if [ "$OAS_Y" != "Y" ]; then
    date
    echo "Checking OAS"
    oc get po -n openshift-apiserver -l apiserver --show-labels
    oc get openshiftapiserver cluster -o json | jq -r '.status.conditions[] | select(.type == "EncryptionMigrationControllerProgressing")'
    oc get secret -n openshift-config-managed encryption-key-openshift-apiserver-1 -o json | jq -r '.metadata.annotations'

    if oc get secret -n openshift-config-managed encryption-key-openshift-apiserver-1 -o json | jq -r '.metadata.annotations' | grep "migrated-resources.*routes" | grep oauthaccesstokens | grep oauthauthorizetokens > /dev/null; then
      date
      echo "OAS-O migration completed"
      OAS_Y=Y
    fi
  fi

  if [ "$KAS_Y" != "Y" ]; then
    date
    echo "Checking KAS"
    oc get po -n openshift-kube-apiserver -l apiserver --show-labels
    oc get kubeapiserver cluster -o json | jq -r '.status.conditions[] | select(.type == "EncryptionMigrationControllerProgressing")'
    oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 -o json | jq -r '.metadata.annotations'

    if oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 -o json | jq -r '.metadata.annotations' | grep "migrated-resources.*secrets" | grep configmaps > /dev/null; then
      date
      echo "KAS-O migration completed"
      KAS_Y=Y
    fi
  fi

  [ "$OAS_Y" == "Y" ] && [ "$KAS_Y" == "Y" ] && break
  sleep 20
  echo "===================="
done
EOF

Then enable etcd encryption.
Then watch migration:
bash watch-migration.sh | tee watch-migration.log

2020年 01月 06日 星期一 17:08:39 CST
Checking OAS
...
Checking KAS
...

Check the time gaps from watch-migration.log:
2020年 01月 06日 星期一 17:13:23 CST
...
    "lastTransitionTime": "2020-01-06T09:13:18Z",
    "message": "migrating resources to a new write key: [route.openshift.io/routes]",   
...
020年 01月 06日 星期一 17:15:47 CST
OAS-O migration completed

...
2020年 01月 06日 星期一 17:20:56 CST
...
    "lastTransitionTime": "2020-01-06T09:20:43Z",
    "message": "migrating resources to a new write key: [core/secrets]",
...
2020年 01月 06日 星期一 17:23:19 CST
KAS-O migration completed


Get the migrating resources number,
$ oc get route --no-headers -A | wc -l
4007

$ oc get secret,cm --no-headers -A | wc -l
4912

for KAS, 4912 resources / 2m13s, for OAS, 4007 resources / 2m24s. The speed is about 1800 resources / min, this is expected fast.

Comment 7 errata-xmlrpc 2020-05-13 21:52:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.