Bug 1718436

Summary: [DR] recovery broken in 4.1
Product: OpenShift Container Platform Reporter: Sam Batschelet <sbatsche>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED NOTABUG QA Contact: ge liu <geliu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: bleanhar, christoph.obexer, eparis, gblomqui, kgarriso, lmeyer, mfojtik, rsawhill, schoudha, sponnaga, vlaad, vrutkovs, xtian
Target Milestone: ---Keywords: NeedsTestCase, OSE41z_next
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: 4.1.4
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1746176 (view as bug list) Environment:
Last Closed: 2020-02-10 14:08:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1714457, 1715377, 1746176    

Description Sam Batschelet 2019-06-07 19:00:48 UTC
Description of problem: Disaster recovery team has fixed numerous bugs in DR workflow. As these fixes are not currently released, DR is broken for various deployments such as bare metal and vSphere. 

Version-Release number of selected component (if applicable):


How reproducible: always

Steps to Reproduce:
1. Attempt documented DR on vSPhere or bare-metal
2.
3.

Actual results: DR will fail 


Expected results: DR will pass


Additional info:

Comment 1 Sam Batschelet 2019-06-07 19:12:12 UTC
This BZ covers backporting the following 6 commits fixes which add stability to the DR workflow and resolve bugs which would not allow bare-metal instances to be restored with the documented process.

- https://github.com/openshift/machine-config-operator/pull/804	"DR: use param to populate etcd name for etcd-member-recover" commit : b8994e0

- https://github.com/openshift/machine-config-operator/pull/793	"DR: add validate_environment to openshift-recovery-tools" commit : 4e0e2d1

- https://github.com/openshift/machine-config-operator/pull/788	"Reload systemd services on disk before starting kubelet" commit : 9287919

- https://github.com/openshift/machine-config-operator/pull/779	"Store stopped manifests in ASSET_DIR" commit : 8554f3a

- https://github.com/openshift/machine-config-operator/pull/779	"DR start_static_pods: reread a list of static pods when starting them" commit : dd3be74

- https://github.com/openshift/machine-config-operator/pull/780	"templates/master/00-master: resolve unpopulated var when ETCD_CONNSTR" commit 9fb4732

Comment 3 Kirsten Garrison 2019-06-07 22:17:01 UTC
This BZ needs a target release for it to get picked up eventually in github.

Comment 4 Sudha Ponnaganti 2019-06-18 13:15:18 UTC
Is this in 4.1.2 or 4.1.3? Will update the white board field once confirmed. There is no build so not sure why this is in MODIFIED status

Comment 5 Brenton Leanhardt 2019-06-18 14:17:14 UTC
https://github.com/openshift/machine-config-operator/pull/834 merged 8 days ago.  I just checked in dist-git and verified the code is not in 4.1.2.  Is there anything else the bug owner needs to do to get it in to 4.1.3 at this point given the code is merged and the bug is in modified?

Comment 6 Sudha Ponnaganti 2019-06-18 15:13:17 UTC
No worries. Will get this in to 4.1.3

Comment 10 ge liu 2019-06-21 12:48:55 UTC
Try to verify it, but blocked by bug: Bug 1722807 - [DR]Fail to login to cluster when adding new master hosts to etcd cluster(vshpere)

Comment 11 Sudha Ponnaganti 2019-06-21 21:22:30 UTC
This is blocked by another bug. Bug 1722807
The fix went in to 4.1.3 but will be validated in 4.1.4

Comment 17 ge liu 2019-06-28 01:56:55 UTC
@Sudha Ponnagantiļ¼Œ it blocked by Bug 1722807, so i'm not very sure if Bug 1722807 also must be fixed in 4.1.4, so feel free inform me if any update, thx

Comment 18 ge liu 2019-07-02 09:36:24 UTC
hello, @Sudha Ponnaganti, Sam, 4.1.4 will be released soon, until now, this bug still be blocked by Bug 1722807, what QE could do is try it on other platform(such as:baremetal),  but block bug was opened on vsphere, so it still have not be resolved even if we verified it on baremetal. Now, The time is not enough think about bug 1722807 be fixed ASAP and QE verify it ASAP, as I mentioned before in comment 10, I still suggest to drop it.

Comment 21 ge liu 2019-07-18 05:44:10 UTC
it blocked by Bug 1722807

Comment 28 ge liu 2019-09-19 11:23:07 UTC
Fail to verify it, it depends on Bug 1722807, thx

Comment 30 Greg Blomquist 2019-11-21 15:15:01 UTC
*** Bug 1746176 has been marked as a duplicate of this bug. ***