Bug 1776797

Summary: [MSTR-485] kube-apiserver etcd encryption is too slow on fresh installed env, always taking nearly 15mins
Product: OpenShift Container Platform Reporter: Stefan Schimanski <sttts>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.0CC: aos-bugs, lmeyer, lszaszki, mfojtik, shiywang, sttts, xxia, yuxzhu
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1771986 Environment:
Last Closed: 2020-01-23 11:14:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1771986    
Bug Blocks: 1776811    

Comment 2 Xingxing Xia 2019-12-02 10:03:44 UTC
Above PRs of repos cluster-kube-apiserver-operator and cluster-openshift-apiserver-operator didn't yet land in latest payload 4.3.0-0.nightly-2019-12-02-055401. Pending new build from ART team.
Moving to MODIFIED. Thx.

Comment 5 Xingxing Xia 2019-12-03 09:21:55 UTC
Verified in 4.3.0-0.nightly-2019-12-02-232545:
Prepare resources:
cat > create-resources.sh << EOF
#!/bin/bash
N1=$1
N2=$2
cd ~/my
for i in `seq -w $1 $2`
do
  oc new-project xxia-test-proj-$i --skip-config-write
  for j in `seq -w 01 09`
  do
    oc create secret generic mysecret-$j --from-literal abcdefg='12345^&*()' -n xxia-test-proj-$i
  done
  for j in `seq -w 1 20`
  do
    sed "s/my/my-$j/" route-test.yaml | oc create -f - -n xxia-test-proj-$i
  done
done
EOF

bash create-resources.sh 1 100 & # may cost quite some time
bash create-resources.sh 101 200 & # may cost quite some time


cat > watch-migration.sh << EOF
#!/bin/bash
OAS_Y=
KAS_Y=
while true
do
  if [ "$OAS_Y" != "Y" ]; then
    date
    echo "Checking OAS"
    oc get po -n openshift-apiserver -l apiserver --show-labels
    oc get openshiftapiserver cluster -o json | jq -r '.status.conditions[] | select(.type == "EncryptionMigrationControllerProgressing")'
    oc get secret -n openshift-config-managed encryption-key-openshift-apiserver-1 -o json | jq -r '.metadata.annotations'

    if oc get secret -n openshift-config-managed encryption-key-openshift-apiserver-1 -o json | jq -r '.metadata.annotations' | grep "migrated-resources.*routes" | grep oauthaccesstokens | grep oauthauthorizetokens > /dev/null; then
      date
      echo "OAS-O migration completed"
      OAS_Y=Y
    fi
  fi

  if [ "$KAS_Y" != "Y" ]; then
    date
    echo "Checking KAS"
    oc get po -n openshift-kube-apiserver -l apiserver --show-labels
    oc get kubeapiserver cluster -o json | jq -r '.status.conditions[] | select(.type == "EncryptionMigrationControllerProgressing")'
    oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 -o json | jq -r '.metadata.annotations'

    if oc get secret -n openshift-config-managed encryption-key-openshift-kube-apiserver-1 -o json | jq -r '.metadata.annotations' | grep "migrated-resources.*secrets" | grep configmaps > /dev/null; then
      date
      echo "KAS-O migration completed"
      KAS_Y=Y
    fi
  fi

  [ "$OAS_Y" == "Y" ] && [ "$KAS_Y" == "Y" ] && break
  sleep 20
  echo "===================="
done
EOF

Then enable etcd encryption.
Then watch migration:
bash watch-migration.sh | tee watch-migration.log
Tue Dec  3 15:16:12 CST 2019
Checking OAS
...
Checking KAS
...
Tue Dec  3 15:23:14 CST 2019
OAS-O migration completed
...
...
Tue Dec  3 15:31:42 CST 2019
KAS-O migration completed

Check the time gaps from watch-migration.log:
Tue Dec  3 15:21:14 CST 2019
...
  "lastTransitionTime": "2019-12-03T07:20:58Z",
  "message": "migrating resources to a new write key: [route.openshift.io/routes]",
...
...
Tue Dec  3 15:23:11 CST 2019
...
  "encryption.apiserver.operator.openshift.io/migrated-resources": "{\"resources\":[{\"Group\":\"oauth.openshift.io\",\"Resource\":\"oauthaccesstokens\"},{\"Group\":\"oauth.openshift.io\",\"Resource\":\"oauthauthorizetokens\"},{\"Group\":\"route.openshift.io\",\"Resource\":\"routes\"}]}",
  "encryption.apiserver.operator.openshift.io/migrated-timestamp": "2019-12-03T07:23:10Z",


Tue Dec  3 15:28:48 CST 2019
...
  "lastTransitionTime": "2019-12-03T07:28:54Z",
  "message": "migrating resources to a new write key: [core/configmaps core/secrets]",
...
...
Tue Dec  3 15:31:40 CST 2019
...
  "encryption.apiserver.operator.openshift.io/migrated-resources": "{\"resources\":[{\"Group\":\"\",\"Resource\":\"configmaps\"},{\"Group\":\"\",\"Resource\":\"secrets\"}]}",
  "encryption.apiserver.operator.openshift.io/migrated-timestamp": "2019-12-03T07:31:35Z",

oc get route --no-headers -A | wc -l
4006
oc get secret,cm --no-headers -A | wc -l
4847
for KAS, 4847 resources / 2m41s, for OAS, 4006 resources / 2m12s. The speed is about 1800 resources / min, this is expected fast.

Comment 7 errata-xmlrpc 2020-01-23 11:14:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062