Bug 1895509

Summary: Backup taken on one master cannot be restored on other masters
Product: OpenShift Container Platform Reporter: Suresh Kolichala <skolicha>
Component: EtcdAssignee: Suresh Kolichala <skolicha>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.5CC: geliu, sbatsche
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Backups include recovery yaml file that is specific to the master node. Consequence: Backups taken on one master cannot be restored on other master. Fix: Make recovery yaml file generic so that it can be restored on any master. Result: A backup taken on one master can be restored on any other master.
Story Points: ---
Clone Of:
: 1897542 (view as bug list) Environment:
Last Closed: 2021-02-24 15:31:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1897542    

Description Suresh Kolichala 2020-11-06 20:37:05 UTC
Description of problem:
A recent change in 4.5 introduced a bug that disallows backups taken on one master to be restored on other masters.

Version-Release number of selected component (if applicable):
4.5.16

How reproducible:
Always

Steps to Reproduce:
1. Take a cluster-backup using cluster-backup.sh on one master.
2. Copy the backup on to another master.
3. Attempt to restore the database on the other master using procedure documented.

Actual results:
The etcds fail to come up on all masters.

Expected results:
All etcds successfully start on all masters and the cluster recovers.

Additional info:

Comment 1 Suresh Kolichala 2020-11-06 20:40:14 UTC
As a workaround, a backup should be restored on the same master it is taken from. 

To determine the master where the backup is taken from, one may run the following command against the backup directory:

sudo tar xvzf <backup>/static_kuberesources_*.tar.gz  *restore-etcd-pod/pod.yaml --to-stdout  2>&1 | grep 'ETCD_NAME='

Comment 9 errata-xmlrpc 2021-02-24 15:31:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633