+++ This bug was initially created as a clone of Bug #1755557 +++
1. Perform DR steps (remove two masters, wait until quorum lost, physically delete machines)
2. Recover following instructions
3. Repeat 1 but pick a new master (one that was created in step 2)
Able to start a new etcd quorum on the first master with only itself
/usr/local/bin/etcd-snapshot-restore.sh /root/assets/backup/snapshot.db etcd-member-ip-10-0-149-142.ec2.internal=https://etcd-2.ci-ln-shp2psk-d5d6b.origin-ci-int-aws.dev.rhcloud.com:2380
restored the cluster containing the previous (etcd-0, etcd-1) members, preventing further progress.
The root cause is that the etcd-snapshot-restore script accepts ETCD_INITIAL_CLUSTER as an argument, but then sources /run/etcd/environment which may contain the value ETCD_INITIAL_CLUSTER. So the user's intent was to start a new cluster with one member, but we started with 3 members (including the two that are permanently gone).
The script needs to preserve ETCD_INITIAL_CLUSTER correctly even if it's set in /run/etcd/environment.
Needs to be back ported to all releases.
I checked latest payload(4.1.0-0.nightly-2019-11-20-192514), code have not be merged into it.
Verfied with 4.1.25
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.