Bug 1826021 - [4.3] etcd-snapshot-restore.sh fails due to "Error: snapshot restore requires exactly one argument"
Summary: [4.3] etcd-snapshot-restore.sh fails due to "Error: snapshot restore requires...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.3.z
Assignee: Suresh Kolichala
QA Contact: ge liu
Depends On:
Blocks: 1826023
TreeView+ depends on / blocked
Reported: 2020-04-20 17:15 UTC by Robert Bost
Modified: 2020-08-05 10:54 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A regular expression was incorrectly defined for obtaining etcd member name. Consequence: More than one etcd member names are returned. Work around is to define the member name first in the INITIAL_CLUSTER Fix: Fix the greedy regular expression. Result: Matches exactly one etcd member name.
Clone Of:
: 1826023 (view as bug list)
Last Closed: 2020-08-05 10:54:06 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4999481 None None None 2020-04-20 17:48:18 UTC
Red Hat Product Errata RHBA-2020:3180 None None None 2020-08-05 10:54:34 UTC

Description Robert Bost 2020-04-20 17:15:37 UTC
Description of problem:

When trying to run etcd-snapshot-restore.sh with more than one member listed in INITIAL_CLUSTER, it is possible for the script to fail:

  [core@etcd-1 ~]$ export INITIAL_CLUSTER="etcd-0.example.com=https://etcd-0.example.com:2380,etcd-1.example.com=https://etcd-1.example.com:2380,etcd-2.example.com=https://etcd-2.example.com:2380"
  [core@etcd-1 ~]$ sudo /usr/local/bin/etcd-snapshot-restore.sh /home/core/assets/backup/snapshot.db $INITIAL_CLUSTER
  Removing etcd data-dir /var/lib/etcd
  Restoring etcd member etcd-1.example.com
  etcd-1.example.com from snapshot..
  Error: snapshot restore requires exactly one argument

The workaround is to order INITIAL_CLUSTER and make sure the node you are executing commands on is listed *first* in INITIAL_ORDER. 

Additional info:
I filed this under MCO component since that's where the etcd-snapshot-restore.sh script is shipped. Not sure if this is the right de- cision though. 

- The INITIAL_CLUSTER is parsed incorrectly in the bash function linked below. The regular expression is too greedy with the "*" and capture all node names at and before the ETCD_DNS_NAME. 


  For example:

    $ export INITIAL_CLUSTER="etcd-0.example.com=https://etcd-0.example.com:2380,etcd-1.example.com=https://etcd-1.example.com:2380,etcd-2.example.com=https://etcd-2.example.com:2380"
    $ ETCD_DNS_NAME=etcd-1.example.com
    $ validate_etcd_name 

  I would only expect etcd-1.example.com to be listed and so does the etcd snapshot restore command this information is passed to.

Comment 7 Sam Batschelet 2020-05-22 18:44:17 UTC
  export ETCD_DNS_NAME="etcd-0.example.com"
  export ETCD_INITIAL_CLUSTER="etcd-0.example.com=https://etcd-0.example.com:2380,etcd-1.example.com=https://etcd-1.example.com:2380,etcd-2.example.com=https://etcd-2.example.com:2380"
  echo "test 1"
  echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=)[^,,\s]*(?==[^=]*${ETCD_DNS_NAME}\b)"
  echo "test 2"
  echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=,)${ETCD_DNS_NAME}(?==)"

test 1
test 2

The function takes the ETCD_DNS_NAME then verifies that it is listed in ETCD_INITIAL_CLUSTER then returns the name in which matches that record. The format for INITIAL_CLUSTER is <name>=<peer-url>. Given that you can see that test 1 matches and test 2 does not. But as Suresh said the issue is not with that function.

> Error: snapshot restore requires exactly one argument

The following example would show how the error could happen. The command `etcdctl snapshot restore` takes a single argument with is $SNAPSHOT_FILE[1]. Notice space in path.

SNAPSHOT_FILE="/home/core/assets/backup dir/snapshot.db"

etcdctl snapshot restore $SNAPSHOT_FILE
Error: snapshot restore requires exactly one argument

Can you confirm your ocp version?


Comment 9 Robert Bost 2020-05-26 16:08:56 UTC
I will take the same test from c#7 and set ETCD_DNS_NAME=etcd-1.example.com. The command output is shared below and demonstrates the problem that customers have run into when following our docs [1]. The multiple output in "test 1" is passed to the snapshot restore command, resulting in the "snapshot restore requires exactly one argument"

Perhaps my assumption that someone would be configuring ETCD_DNS_NAME and ETCD_INITIAL_CLUSTER in this way is not acceptable. Can you please verify and I will file a documentation bug if needed?

export ETCD_DNS_NAME="etcd-1.example.com"
export ETCD_INITIAL_CLUSTER="etcd-0.example.com=https://etcd-0.example.com:2380,etcd-1.example.com=https://etcd-1.example.com:2380,etcd-2.example.com=https://etcd-2.example.com:2380"
echo "test 1"
echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=)[^,,\s]*(?==[^=]*${ETCD_DNS_NAME}\b)"
echo "test 2"
echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=,)${ETCD_DNS_NAME}(?==)"

test 1
test 2

[1] https://docs.openshift.com/container-platform/4.3/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state

Comment 10 Sam Batschelet 2020-06-20 12:41:00 UTC
Iā€™m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 14 Suresh Kolichala 2020-07-09 13:48:21 UTC
This is a 4.3 bug, and the bug no longer exists in 4.4 and 4.4+. Please don't change the target release. The fix is available and being reviewed:


Comment 25 errata-xmlrpc 2020-08-05 10:54:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.3.31 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.