Bug 1776811

Summary: [MSTR-485] Cluster is abnormal after etcd backup/restore when the backup is conducted during etcd encryption is migrating
Product: OpenShift Container Platform Reporter: Stefan Schimanski <sttts>
Component: RoutingAssignee: Andrew McDermott <amcdermo>
Status: CLOSED DEFERRED QA Contact: Hongan Li <hongli>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.0CC: ahoffer, amcdermo, aos-bugs, bbennett, geliu, lszaszki, mfojtik, sbatsche, skolicha, sttts, xxia
Target Milestone: ---Flags: xxia: needinfo?
xxia: needinfo? (sbatsche)
sttts: needinfo? (sttts)
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1775057 Environment:
Last Closed: 2020-05-12 16:17:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1775057, 1776797    
Bug Blocks:    

Comment 4 Xingxing Xia 2019-12-02 11:20:05 UTC
Hi, Stefan and Lukasz, saw https://github.com/openshift/enhancements/pull/131/files#diff-29a58870b4078595bb0b7d5a2a3bee18R279 :
"encryption-config ... mounted via host mount as ... in the kube-apiserver pod"
"A restore must put ... the backup in place ... before starting up kube-apiserver"

I've a question about the restore: for etcd restore, it has doc https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html ; for encryption-config restore, what's right steps to do the host mount into the static pods? Modify /etc/kubernetes/manifests/kube-apiserver-pod.yaml on each master? Or modify /etc/kubernetes/static-pod-resources/kube-apiserver-pod-$LATEST_REVISION/kube-apiserver-pod.yaml? Or whatever? Thanks.

Comment 30 Xingxing Xia 2020-01-07 15:46:13 UTC
Tried 4.3.0-0.nightly-2020-01-06-185654 env twice, one time did not hit above issue, another time hit above issue. For the time that hit the issue, tried to restart the pods by: oc delete po router-default-6b44978bc4-mrslh router-default-6b44978bc4-z6st7 -n openshift-ingress . Then wait several mins, the issue is gone.

Comment 32 ge liu 2020-01-08 10:34:15 UTC
Sam, i filed a doc bug to trace this workaround, https://bugzilla.redhat.com/show_bug.cgi?id=1788895

Comment 33 ge liu 2020-01-08 10:34:42 UTC
Hello Sam, i filed a doc bug to trace this workaround, https://bugzilla.redhat.com/show_bug.cgi?id=1788895

Comment 35 Ben Bennett 2020-05-12 16:17:35 UTC
Closing this for now because the documentation is an appropriate fix: https://bugzilla.redhat.com/show_bug.cgi?id=1788895


We can consider a backport if we work out what, if anything, we can do to fix it on the master bug.