Bug 1758681 - 4.3 control plane recovery failed to add first extra member back in etcd-member-recover script
Summary: 4.3 control plane recovery failed to add first extra member back in etcd-memb...
Keywords:
Status: CLOSED DUPLICATE of bug 1758687
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.3.0
Assignee: Suresh Kolichala
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-04 20:08 UTC by Clayton Coleman
Modified: 2019-12-13 12:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-13 12:51:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2019-10-04 20:08:04 UTC
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/3

    failed running "sudo -i env SETUP_ETCD_ENVIRONMENT=registry.svc.ci.openshift.org/ocp/4.3-2019-10-04-083724@sha256:acbe70b215217b35b9bb8f481bcb55d6687f36936979f1d6358920b815d5246d KUBE_CLIENT_AGENT=registry.svc.ci.openshift.org/ocp/4.3-2019-10-04-083724@sha256:2dd3dd5070897236699cbe60848536d2f9dc74824213fe1c2d87ed80fbb4744b /bin/bash -x /usr/local/bin/etcd-member-recover.sh 10.0.139.1 \"etcd-member-ip-10-0-135-17.ec2.internal\"": <nil> (exit code 2, stderr + set -o errexit
    + set -o pipefail
    + [[ 0 -ne 0 ]]
    + : registry.svc.ci.openshift.org/ocp/4.3-2019-10-04-083724@sha256:acbe70b215217b35b9bb8f481bcb55d6687f36936979f1d6358920b815d5246d
    + : registry.svc.ci.openshift.org/ocp/4.3-2019-10-04-083724@sha256:2dd3dd5070897236699cbe60848536d2f9dc74824213fe1c2d87ed80fbb4744b
    + '[' 10.0.139.1 == '' ']'
    + '[' etcd-member-ip-10-0-135-17.ec2.internal == '' ']'
    + RECOVERY_SERVER_IP=10.0.139.1
    + ETCD_NAME=etcd-member-ip-10-0-135-17.ec2.internal
    + ASSET_DIR=./assets
    + ASSET_DIR_TMP=./assets/tmp
    + CONFIG_FILE_DIR=/etc/kubernetes
    + MANIFEST_DIR=/etc/kubernetes/manifests
    + RUN_ENV=/run/etcd/environment
    + MANIFEST_STOPPED_DIR=./assets/manifests-stopped
    + ETCD_MANIFEST=/etc/kubernetes/manifests/etcd-member.yaml
    + ETCD_CONFIG=/run/etcd/environment
    + ETCDCTL=./assets/bin/etcdctl
    + ETCD_VERSION=v3.3.10
    + ETCD_DATA_DIR=/var/lib/etcd
    + ETCD_STATIC_RESOURCES=/etc/kubernetes/static-pod-resources/etcd-member
    + SHARED=/usr/local/share/openshift-recovery
    + TEMPLATE=/usr/local/share/openshift-recovery/template/etcd-generate-certs.yaml.template
    + source /usr/local/bin/openshift-recovery-tools
    ++ export ETCDCTL_API=3
    ++ ETCDCTL_API=3
    ++ ETCDCTL_WITH_TLS='./assets/bin/etcdctl --cert ./assets/backup/etcd-client.crt --key ./assets/backup/etcd-client.key --cacert ./assets/backup/etcd-ca-bundle.crt'
    + run
    + init
    + ASSET_BIN=./assets/bin
    + '[' '!' -d ./assets/bin ']'
    + echo 'Creating asset directory ./assets'
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/bin
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/tmp
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/shared
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/backup
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/templates
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/restore
    + for dir in {bin,tmp,shared,backup,templates,restore,manifests}
    + /usr/bin/mkdir -p ./assets/manifests
    + dl_etcdctl
    + GOOGLE_URL=https://storage.googleapis.com/etcd
    + DOWNLOAD_URL=https://storage.googleapis.com/etcd
    + echo 'Downloading etcdctl binary..'
    + curl -s -L https://storage.googleapis.com/etcd/v3.3.10/etcd-v3.3.10-linux-amd64.tar.gz -o ./assets/tmp/etcd-v3.3.10-linux-amd64.tar.gz
    + tar -xzf ./assets/tmp/etcd-v3.3.10-linux-amd64.tar.gz -C ./assets/shared --strip-components=1
    + mv ./assets/shared/etcdctl ./assets/bin
    + rm ./assets/shared/etcd
    + ./assets/bin/etcdctl version
    + backup_manifest
    + '[' -e ./assets/backup/etcd-member.yaml ']'
    + echo 'Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/'
    + cp /etc/kubernetes/manifests/etcd-member.yaml ./assets/backup/
    ++ grep -oP '(?<=discovery-srv=).*[^"]' ./assets/backup/etcd-member.yaml
    + DISCOVERY_DOMAIN=ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    + '[' -z ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com ']'
    + validate_environment
    + '[' -f /run/etcd/environment ']'
    ++ dig +noall +answer SRV _etcd-server-ssl._tcp.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    ++ grep -oP '(?<=2380 ).*[^\.]'
    ++ xargs
    + SRV_A_RECORD='etcd-0.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com etcd-2.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com'
    ++ ip -o addr
    ++ grep -oP '(?<=inet )(\d{1,3}\.?){4}'
    + HOST_IPS='127.0.0.1
    10.0.135.17
    10.129.0.1'
    + '[' -z 'etcd-0.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com etcd-2.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com' ']'
    + '[' -z '127.0.0.1
    10.0.135.17
    10.129.0.1' ']'
    + for a in ${SRV_A_RECORD[@]}
    + echo 'checking against etcd-0.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com'
    + for i in ${HOST_IPS[@]}
    ++ dig +short etcd-0.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    + DIG_IP=10.0.142.1
    + '[' -z 10.0.142.1 ']'
    + '[' 10.0.142.1 == 127.0.0.1 ']'
    + for i in ${HOST_IPS[@]}
    ++ dig +short etcd-0.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    + DIG_IP=10.0.142.1
    + '[' -z 10.0.142.1 ']'
    + '[' 10.0.142.1 == 10.0.135.17 ']'
    + for i in ${HOST_IPS[@]}
    ++ dig +short etcd-0.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    + DIG_IP=10.0.142.1
    + '[' -z 10.0.142.1 ']'
    + '[' 10.0.142.1 == 10.129.0.1 ']'
    + for a in ${SRV_A_RECORD[@]}
    + echo 'checking against etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com'
    + for i in ${HOST_IPS[@]}
    ++ dig +short etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    + DIG_IP=10.0.135.17
    + '[' -z 10.0.135.17 ']'
    + '[' 10.0.135.17 == 127.0.0.1 ']'
    + for i in ${HOST_IPS[@]}
    ++ dig +short etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    + DIG_IP=10.0.135.17
    + '[' -z 10.0.135.17 ']'
    + '[' 10.0.135.17 == 10.0.135.17 ']'
    + echo 'dns name is etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com'
    + cat
    + return 0
    + source /run/etcd/environment
    ++ ETCD_IPV4_ADDRESS=10.0.135.17
    ++ ETCD_DNS_NAME=etcd-1.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com
    ++ ETCD_WILDCARD_DNS_NAME='*.ci-op-17sfbx7t-2770b.origin-ci-int-aws.dev.rhcloud.com'
    + backup_etcd_conf
    + '[' -e ./assets/backup/etcd.conf ']'
    + echo 'Backing up /etc/etcd/etcd.conf to ./assets/backup/'
    + cp /etc/etcd/etcd.conf ./assets/backup/
    + backup_etcd_client_certs
    + echo 'Trying to backup etcd client certs..'
    + '[' -f ./assets/backup/etcd-ca-bundle.crt ']'
    + STATIC_DIRS=($(ls -d "${CONFIG_FILE_DIR}"/static-pod-resources/kube-apiserver-pod-[0-9]*))
    ++ ls -d /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7
    + '[' 0 -ne 0 ']'
    + for APISERVER_POD_DIR in "${STATIC_DIRS[@]}"
    + SECRET_DIR=/etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client
    + CONFIGMAP_DIR=/etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/configmaps/etcd-serving-ca
    + '[' -f /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/configmaps/etcd-serving-ca/ca-bundle.crt ']'
    + '[' -f /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client/tls.crt ']'
    + '[' -f /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client/tls.key ']'
    + echo 'etcd client certs found in /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7 backing up to ./assets/backup/'
    + cp /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/configmaps/etcd-serving-ca/ca-bundle.crt ./assets/backup/etcd-ca-bundle.crt
    + cp /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client/tls.crt ./assets/backup/etcd-client.crt
    + cp /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client/tls.key ./assets/backup/etcd-client.key
    + return 0
    + stop_etcd
    + echo 'Stopping etcd..'
    + '[' '!' -d ./assets/manifests-stopped ']'
    + mkdir ./assets/manifests-stopped
    + '[' -e /etc/kubernetes/manifests/etcd-member.yaml ']'
    + mv /etc/kubernetes/manifests/etcd-member.yaml ./assets/manifests-stopped
    + for name in {etcd-member,etcd-metric}
    ++ crictl pods -name etcd-member --state Ready -q
    + '[' '!' -z efa3f0ab9193d15f33f55a67a914c491354773f6c17d20ff9e3e9c1d74648ba6 ']'
    + echo 'Waiting for etcd-member to stop'
    + sleep 10
    ++ crictl pods -name etcd-member --state Ready -q
    + '[' '!' -z '' ']'
    + for name in {etcd-member,etcd-metric}
    ++ crictl pods -name etcd-metric --state Ready -q
    + '[' '!' -z '' ']'
    + backup_data_dir
    + '[' -f ./assets/backup/etcd/member/snap/db ']'
    + '[' '!' -f /var/lib/etcd/member/snap/db ']'
    + echo 'Local etcd snapshot file not found, backup skipped..'
    + backup_certs
    ++ ls '/etc/kubernetes/static-pod-resources/etcd-member/system:etcd-*'
    ++ wc -l
    + COUNT=0
    + true
    ++ ls './assets/backup/system:etcd-*'
    ++ wc -l
    + BACKUP_COUNT=0
    + true
    + '[' 0 -gt 1 ']'
    + '[' 0 -eq 0 ']'
    + echo 'etcd TLS certificates not found, backup skipped..'
    + remove_certs
    ++ ls '/etc/kubernetes/static-pod-resources/etcd-member/system:etcd-*'
    ++ wc -l
    + COUNT=0
    )

Important to understand what flake (in test) or failure (in recovery) happened here.

Comment 3 Michal Fojtik 2019-12-13 12:51:06 UTC

*** This bug has been marked as a duplicate of bug 1758687 ***


Note You need to log in before you can comment on or make changes to this bug.