Bug 1776811

Summary:	[MSTR-485] Cluster is abnormal after etcd backup/restore when the backup is conducted during etcd encryption is migrating
Product:	OpenShift Container Platform	Reporter:	Stefan Schimanski <sttts>
Component:	Networking	Assignee:	Andrew McDermott <amcdermo>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED DEFERRED	Docs Contact:
Severity:	high
Priority:	high	CC:	ahoffer, amcdermo, aos-bugs, bbennett, geliu, lszaszki, mfojtik, sbatsche, skolicha, sttts, xxia
Version:	4.3.0
Target Milestone:	---
Target Release:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1775057	Environment:
Last Closed:	2020-05-12 16:17:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1775057, 1776797
Bug Blocks:

Comment 4 Xingxing Xia 2019-12-02 11:20:05 UTC

Hi, Stefan and Lukasz, saw https://github.com/openshift/enhancements/pull/131/files#diff-29a58870b4078595bb0b7d5a2a3bee18R279 :
"encryption-config ... mounted via host mount as ... in the kube-apiserver pod"
"A restore must put ... the backup in place ... before starting up kube-apiserver"

I've a question about the restore: for etcd restore, it has doc https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html ; for encryption-config restore, what's right steps to do the host mount into the static pods? Modify /etc/kubernetes/manifests/kube-apiserver-pod.yaml on each master? Or modify /etc/kubernetes/static-pod-resources/kube-apiserver-pod-$LATEST_REVISION/kube-apiserver-pod.yaml? Or whatever? Thanks.

Comment 30 Xingxing Xia 2020-01-07 15:46:13 UTC

Tried 4.3.0-0.nightly-2020-01-06-185654 env twice, one time did not hit above issue, another time hit above issue. For the time that hit the issue, tried to restart the pods by: oc delete po router-default-6b44978bc4-mrslh router-default-6b44978bc4-z6st7 -n openshift-ingress . Then wait several mins, the issue is gone.

Comment 32 ge liu 2020-01-08 10:34:15 UTC

Sam, i filed a doc bug to trace this workaround, https://bugzilla.redhat.com/show_bug.cgi?id=1788895

Comment 33 ge liu 2020-01-08 10:34:42 UTC

Hello Sam, i filed a doc bug to trace this workaround, https://bugzilla.redhat.com/show_bug.cgi?id=1788895

Comment 35 Ben Bennett 2020-05-12 16:17:35 UTC

Closing this for now because the documentation is an appropriate fix: https://bugzilla.redhat.com/show_bug.cgi?id=1788895


We can consider a backport if we work out what, if anything, we can do to fix it on the master bug.

Comment 36 Red Hat Bugzilla 2023-09-15 01:28:55 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days