Bug 1826021
| Summary: | [4.3] etcd-snapshot-restore.sh fails due to "Error: snapshot restore requires exactly one argument" | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Robert Bost <rbost> | |
| Component: | Etcd | Assignee: | Suresh Kolichala <skolicha> | |
| Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.3.z | CC: | amurdaca, sbatsche, skolicha, smilner, xtian | |
| Target Milestone: | --- | |||
| Target Release: | 4.3.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: A regular expression was incorrectly defined for obtaining etcd member name.
Consequence: More than one etcd member names are returned. Work around is to define the member name first in the INITIAL_CLUSTER
Fix: Fix the greedy regular expression.
Result: Matches exactly one etcd member name.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1826023 (view as bug list) | Environment: | ||
| Last Closed: | 2020-08-05 10:54:06 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1826023 | |||
|
Description
Robert Bost
2020-04-20 17:15:37 UTC
``` export ETCD_DNS_NAME="etcd-0.example.com" export ETCD_INITIAL_CLUSTER="etcd-0.example.com=https://etcd-0.example.com:2380,etcd-1.example.com=https://etcd-1.example.com:2380,etcd-2.example.com=https://etcd-2.example.com:2380" echo "test 1" echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=)[^,,\s]*(?==[^=]*${ETCD_DNS_NAME}\b)" echo "test 2" echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=,)${ETCD_DNS_NAME}(?==)" test 1 etcd-0.example.com test 2 `` The function takes the ETCD_DNS_NAME then verifies that it is listed in ETCD_INITIAL_CLUSTER then returns the name in which matches that record. The format for INITIAL_CLUSTER is <name>=<peer-url>. Given that you can see that test 1 matches and test 2 does not. But as Suresh said the issue is not with that function. > Error: snapshot restore requires exactly one argument The following example would show how the error could happen. The command `etcdctl snapshot restore` takes a single argument with is $SNAPSHOT_FILE[1]. Notice space in path. SNAPSHOT_FILE="/home/core/assets/backup dir/snapshot.db" etcdctl snapshot restore $SNAPSHOT_FILE Error: snapshot restore requires exactly one argument Can you confirm your ocp version? [1]https://github.com/openshift/machine-config-operator/blob/release-4.3/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml#L186 I will take the same test from c#7 and set ETCD_DNS_NAME=etcd-1.example.com. The command output is shared below and demonstrates the problem that customers have run into when following our docs [1]. The multiple output in "test 1" is passed to the snapshot restore command, resulting in the "snapshot restore requires exactly one argument" Perhaps my assumption that someone would be configuring ETCD_DNS_NAME and ETCD_INITIAL_CLUSTER in this way is not acceptable. Can you please verify and I will file a documentation bug if needed? ``` export ETCD_DNS_NAME="etcd-1.example.com" export ETCD_INITIAL_CLUSTER="etcd-0.example.com=https://etcd-0.example.com:2380,etcd-1.example.com=https://etcd-1.example.com:2380,etcd-2.example.com=https://etcd-2.example.com:2380" echo "test 1" echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=)[^,,\s]*(?==[^=]*${ETCD_DNS_NAME}\b)" echo "test 2" echo ${ETCD_INITIAL_CLUSTER} | grep -oP "(?<=,)${ETCD_DNS_NAME}(?==)" test 1 etcd-0.example.com etcd-1.example.com test 2 etcd-1.example.com ``` References: [1] https://docs.openshift.com/container-platform/4.3/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. This is a 4.3 bug, and the bug no longer exists in 4.4 and 4.4+. Please don't change the target release. The fix is available and being reviewed: https://github.com/openshift/machine-config-operator/pull/1913 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.3.31 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3180 |