Bug 1897546

Summary:	Backup taken on one master cannot be restored on other masters
Product:	OpenShift Container Platform	Reporter:	Suresh Kolichala <skolicha>
Component:	Etcd	Assignee:	Suresh Kolichala <skolicha>
Status:	CLOSED ERRATA	QA Contact:	ge liu <geliu>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	4.5	CC:	geliu, sbatsche
Target Milestone:	---
Target Release:	4.4.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1897543	Environment:
Last Closed:	2021-02-03 10:11:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1897543
Bug Blocks:

Description Suresh Kolichala 2020-11-13 11:58:40 UTC

+++ This bug was initially created as a clone of Bug #1897543 +++

+++ This bug was initially created as a clone of Bug #1897542 +++

+++ This bug was initially created as a clone of Bug #1895509 +++

Description of problem:
A recent change in 4.5 introduced a bug that disallows backups taken on one master to be restored on other masters.

Version-Release number of selected component (if applicable):
4.5.16

How reproducible:
Always

Steps to Reproduce:
1. Take a cluster-backup using cluster-backup.sh on one master.
2. Copy the backup on to another master.
3. Attempt to restore the database on the other master using procedure documented.

Actual results:
The etcds fail to come up on all masters.

Expected results:
All etcds successfully start on all masters and the cluster recovers.

Additional info:

--- Additional comment from Suresh Kolichala on 2020-11-06 20:40:14 UTC ---

As a workaround, a backup should be restored on the same master it is taken from. 

To determine the master where the backup is taken from, one may run the following command against the backup directory:

sudo tar xvzf <backup>/static_kuberesources_*.tar.gz  *restore-etcd-pod/pod.yaml --to-stdout  2>&1 | grep 'ETCD_NAME='

Comment 3 ge liu 2020-12-23 08:31:58 UTC

Tried with 4.4.0-0.ci.test-2020-12-23-042527-ci-ln-cs764w2

got backup from master-0, then restore with this backup on master-2, it succeed.
# oc get pods -n openshift-etcd | grep etcd
etcd-ip-10-0-139-137.us-east-2.compute.internal                4/4     Running     0          3m34s
etcd-ip-10-0-168-222.us-east-2.compute.internal                4/4     Running     0          2m33s
etcd-ip-10-0-205-133.us-east-2.compute.internal                4/4     Running     0          3m3s

Comment 5 ge liu 2021-01-21 09:35:42 UTC

Verified at comments 3, but I just curious about why bug status have not updated to 'verified' since I have executed pre-merge verification.

Comment 8 errata-xmlrpc 2021-02-03 10:11:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.4.33 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0281