Bug 1718436 - [DR] recovery broken in 4.1
Summary: [DR] recovery broken in 4.1
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.4.0
Assignee: Sam Batschelet
QA Contact: ge liu
Whiteboard: 4.1.4
: 1746176 (view as bug list)
Depends On:
Blocks: 1714457 1715377 1746176
TreeView+ depends on / blocked
Reported: 2019-06-07 19:00 UTC by Sam Batschelet
Modified: 2020-02-10 14:08 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1746176 (view as bug list)
Last Closed: 2020-02-10 14:08:45 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Sam Batschelet 2019-06-07 19:00:48 UTC
Description of problem: Disaster recovery team has fixed numerous bugs in DR workflow. As these fixes are not currently released, DR is broken for various deployments such as bare metal and vSphere. 

Version-Release number of selected component (if applicable):

How reproducible: always

Steps to Reproduce:
1. Attempt documented DR on vSPhere or bare-metal

Actual results: DR will fail 

Expected results: DR will pass

Additional info:

Comment 1 Sam Batschelet 2019-06-07 19:12:12 UTC
This BZ covers backporting the following 6 commits fixes which add stability to the DR workflow and resolve bugs which would not allow bare-metal instances to be restored with the documented process.

- https://github.com/openshift/machine-config-operator/pull/804	"DR: use param to populate etcd name for etcd-member-recover" commit : b8994e0

- https://github.com/openshift/machine-config-operator/pull/793	"DR: add validate_environment to openshift-recovery-tools" commit : 4e0e2d1

- https://github.com/openshift/machine-config-operator/pull/788	"Reload systemd services on disk before starting kubelet" commit : 9287919

- https://github.com/openshift/machine-config-operator/pull/779	"Store stopped manifests in ASSET_DIR" commit : 8554f3a

- https://github.com/openshift/machine-config-operator/pull/779	"DR start_static_pods: reread a list of static pods when starting them" commit : dd3be74

- https://github.com/openshift/machine-config-operator/pull/780	"templates/master/00-master: resolve unpopulated var when ETCD_CONNSTR" commit 9fb4732

Comment 3 Kirsten Garrison 2019-06-07 22:17:01 UTC
This BZ needs a target release for it to get picked up eventually in github.

Comment 4 Sudha Ponnaganti 2019-06-18 13:15:18 UTC
Is this in 4.1.2 or 4.1.3? Will update the white board field once confirmed. There is no build so not sure why this is in MODIFIED status

Comment 5 Brenton Leanhardt 2019-06-18 14:17:14 UTC
https://github.com/openshift/machine-config-operator/pull/834 merged 8 days ago.  I just checked in dist-git and verified the code is not in 4.1.2.  Is there anything else the bug owner needs to do to get it in to 4.1.3 at this point given the code is merged and the bug is in modified?

Comment 6 Sudha Ponnaganti 2019-06-18 15:13:17 UTC
No worries. Will get this in to 4.1.3

Comment 10 ge liu 2019-06-21 12:48:55 UTC
Try to verify it, but blocked by bug: Bug 1722807 - [DR]Fail to login to cluster when adding new master hosts to etcd cluster(vshpere)

Comment 11 Sudha Ponnaganti 2019-06-21 21:22:30 UTC
This is blocked by another bug. Bug 1722807
The fix went in to 4.1.3 but will be validated in 4.1.4

Comment 17 ge liu 2019-06-28 01:56:55 UTC
@Sudha Ponnaganti, it blocked by Bug 1722807, so i'm not very sure if Bug 1722807 also must be fixed in 4.1.4, so feel free inform me if any update, thx

Comment 18 ge liu 2019-07-02 09:36:24 UTC
hello, @Sudha Ponnaganti, Sam, 4.1.4 will be released soon, until now, this bug still be blocked by Bug 1722807, what QE could do is try it on other platform(such as:baremetal),  but block bug was opened on vsphere, so it still have not be resolved even if we verified it on baremetal. Now, The time is not enough think about bug 1722807 be fixed ASAP and QE verify it ASAP, as I mentioned before in comment 10, I still suggest to drop it.

Comment 21 ge liu 2019-07-18 05:44:10 UTC
it blocked by Bug 1722807

Comment 28 ge liu 2019-09-19 11:23:07 UTC
Fail to verify it, it depends on Bug 1722807, thx

Comment 30 Greg Blomquist 2019-11-21 15:15:01 UTC
*** Bug 1746176 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.