Bug 1776811 - [MSTR-485] Cluster is abnormal after etcd backup/restore when the backup is conducted during etcd encryption is migrating [NEEDINFO]
Summary: [MSTR-485] Cluster is abnormal after etcd backup/restore when the backup is c...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.z
Assignee: Andrew McDermott
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On: 1775057 1776797
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-26 12:30 UTC by Stefan Schimanski
Modified: 2020-05-12 16:17 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1775057
Environment:
Last Closed: 2020-05-12 16:17:35 UTC
Target Upstream Version:
xxia: needinfo?
xxia: needinfo? (sbatsche)
sttts: needinfo? (sttts)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift library-go pull 606 'None' closed Bug 1776811: encryption: keep last read key after migration for easier backup/restore 2020-10-06 15:08:44 UTC

Comment 4 Xingxing Xia 2019-12-02 11:20:05 UTC
Hi, Stefan and Lukasz, saw https://github.com/openshift/enhancements/pull/131/files#diff-29a58870b4078595bb0b7d5a2a3bee18R279 :
"encryption-config ... mounted via host mount as ... in the kube-apiserver pod"
"A restore must put ... the backup in place ... before starting up kube-apiserver"

I've a question about the restore: for etcd restore, it has doc https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html ; for encryption-config restore, what's right steps to do the host mount into the static pods? Modify /etc/kubernetes/manifests/kube-apiserver-pod.yaml on each master? Or modify /etc/kubernetes/static-pod-resources/kube-apiserver-pod-$LATEST_REVISION/kube-apiserver-pod.yaml? Or whatever? Thanks.

Comment 30 Xingxing Xia 2020-01-07 15:46:13 UTC
Tried 4.3.0-0.nightly-2020-01-06-185654 env twice, one time did not hit above issue, another time hit above issue. For the time that hit the issue, tried to restart the pods by: oc delete po router-default-6b44978bc4-mrslh router-default-6b44978bc4-z6st7 -n openshift-ingress . Then wait several mins, the issue is gone.

Comment 32 ge liu 2020-01-08 10:34:15 UTC
Sam, i filed a doc bug to trace this workaround, https://bugzilla.redhat.com/show_bug.cgi?id=1788895

Comment 33 ge liu 2020-01-08 10:34:42 UTC
Hello Sam, i filed a doc bug to trace this workaround, https://bugzilla.redhat.com/show_bug.cgi?id=1788895

Comment 35 Ben Bennett 2020-05-12 16:17:35 UTC
Closing this for now because the documentation is an appropriate fix: https://bugzilla.redhat.com/show_bug.cgi?id=1788895


We can consider a backport if we work out what, if anything, we can do to fix it on the master bug.


Note You need to log in before you can comment on or make changes to this bug.