+++ This bug was initially created as a clone of Bug #1755557 +++ Scenario: 1. Perform DR steps (remove two masters, wait until quorum lost, physically delete machines) 2. Recover following instructions 3. Repeat 1 but pick a new master (one that was created in step 2) Expected: Able to start a new etcd quorum on the first master with only itself Actual: /usr/local/bin/etcd-snapshot-restore.sh /root/assets/backup/snapshot.db etcd-member-ip-10-0-149-142.ec2.internal=https://etcd-2.ci-ln-shp2psk-d5d6b.origin-ci-int-aws.dev.rhcloud.com:2380 restored the cluster containing the previous (etcd-0, etcd-1) members, preventing further progress. The root cause is that the etcd-snapshot-restore script accepts ETCD_INITIAL_CLUSTER as an argument, but then sources /run/etcd/environment which may contain the value ETCD_INITIAL_CLUSTER. So the user's intent was to start a new cluster with one member, but we started with 3 members (including the two that are permanently gone). The script needs to preserve ETCD_INITIAL_CLUSTER correctly even if it's set in /run/etcd/environment. Needs to be back ported to all releases.
This is high severity, I need this verified soon so we can get into 4.1 and 4.2 (and enable the disruptive suites to prevent future regressions).
Verified with 4.3.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062