Description of problem: Disaster recovery team has fixed numerous bugs in DR workflow. As these fixes are not currently released, DR is broken for various deployments such as bare metal and vSphere. Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1. Attempt documented DR on vSPhere or bare-metal 2. 3. Actual results: DR will fail Expected results: DR will pass Additional info:
This BZ covers backporting the following 6 commits fixes which add stability to the DR workflow and resolve bugs which would not allow bare-metal instances to be restored with the documented process. - https://github.com/openshift/machine-config-operator/pull/804 "DR: use param to populate etcd name for etcd-member-recover" commit : b8994e0 - https://github.com/openshift/machine-config-operator/pull/793 "DR: add validate_environment to openshift-recovery-tools" commit : 4e0e2d1 - https://github.com/openshift/machine-config-operator/pull/788 "Reload systemd services on disk before starting kubelet" commit : 9287919 - https://github.com/openshift/machine-config-operator/pull/779 "Store stopped manifests in ASSET_DIR" commit : 8554f3a - https://github.com/openshift/machine-config-operator/pull/779 "DR start_static_pods: reread a list of static pods when starting them" commit : dd3be74 - https://github.com/openshift/machine-config-operator/pull/780 "templates/master/00-master: resolve unpopulated var when ETCD_CONNSTR" commit 9fb4732
This BZ needs a target release for it to get picked up eventually in github.
Is this in 4.1.2 or 4.1.3? Will update the white board field once confirmed. There is no build so not sure why this is in MODIFIED status
https://github.com/openshift/machine-config-operator/pull/834 merged 8 days ago. I just checked in dist-git and verified the code is not in 4.1.2. Is there anything else the bug owner needs to do to get it in to 4.1.3 at this point given the code is merged and the bug is in modified?
No worries. Will get this in to 4.1.3
Try to verify it, but blocked by bug: Bug 1722807 - [DR]Fail to login to cluster when adding new master hosts to etcd cluster(vshpere)
This is blocked by another bug. Bug 1722807 The fix went in to 4.1.3 but will be validated in 4.1.4
@Sudha Ponnaganti, it blocked by Bug 1722807, so i'm not very sure if Bug 1722807 also must be fixed in 4.1.4, so feel free inform me if any update, thx
hello, @Sudha Ponnaganti, Sam, 4.1.4 will be released soon, until now, this bug still be blocked by Bug 1722807, what QE could do is try it on other platform(such as:baremetal), but block bug was opened on vsphere, so it still have not be resolved even if we verified it on baremetal. Now, The time is not enough think about bug 1722807 be fixed ASAP and QE verify it ASAP, as I mentioned before in comment 10, I still suggest to drop it.
it blocked by Bug 1722807
Fail to verify it, it depends on Bug 1722807, thx
*** Bug 1746176 has been marked as a duplicate of this bug. ***