Description of problem: Attempted to backup etcd on s390 KVM cluster. Following the docs at https://docs.openshift.com/container-platform/4.1/backup_and_restore/backing-up-etcd.html#backing-up-etcd-data_backup-etcd Running etcd-snapshot-backup.sh gives exec format error. Steps to Reproduce: 1. ssh into a master node on a cluster. 2. run `sudo /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db` 3. Get error ```Downloading etcdctl binary.. /usr/local/bin/openshift-recovery-tools: line 26: ./assets/bin/etcdctl: cannot execute binary file: Exec format error ``` Expected results: A saved snapshot of etcd
The MCO generates the snapshot script which downloads etcdctl binary from upstream. It downloads the x86 version and is not arch specific: https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml#L28
Unfortunately for s390x, there is no etcdctl binary available in storage.googleapis.com. only x86 and ppc64le versions are available
s390x needs to be added as a supported release for the etcd binaries and etcdctl to be released similar to ppc64le. I will work on getting that added upstream
This is a MCO bug - we even ship etcd with the MCO, so the fix should be...hm, something like: diff --git a/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml b/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml index bd399c07..8e86db51 100644 --- a/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml +++ b/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml @@ -21,15 +21,14 @@ contents: # download and test etcdctl from upstream release assets dl_etcdctl() { - GOOGLE_URL=https://storage.googleapis.com/etcd - DOWNLOAD_URL=${GOOGLE_URL} - - echo "Downloading etcdctl binary.." - curl -s -L ${DOWNLOAD_URL}/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -o $ASSET_DIR/tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz \ - && tar -xzf $ASSET_DIR/tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C $ASSET_DIR/shared --strip-components=1 \ - && mv $ASSET_DIR/shared/etcdctl $ASSET_DIR/bin \ - && rm $ASSET_DIR/shared/etcd \ - && $ASSET_DIR/bin/etcdctl version + local etcdimg="{{.Images.etcdKey}}" + podman pull "${etcdimg}" + local etcdctr=$(podman create ${etcdimg}) + local etcdmnt=$(podman mount "${etcdctr}") + cp ${etcdmnt}/bin/etcdctl $ASSET_DIR/bin + umount "${etcdmnt}" + podman rm "${etcdctr}" + podman rmi "${etcdimg}" } #backup etcd client certs
Really of course we want to run the *containerized* etcd tools, and in fact the whole recovery stuff should be containerized.
I agree with Colin, DR scripts are currently being ported to golang and will be part of the cluster-etcd-operator image. The reason we have not grabbed etcdctl from the etcd image has been the extra step involved to authenticate podman with PullSecret. But this bug and air-gapped clusters are a good reason to take that approach.
Colin's diff works....except that the `podman rmi ${etcdimg}` is not allowing the image to be deleted, probably because it's part of the deployment? Otherwise, i was able to create the backup. Thanks Colin!
> The reason we have not grabbed etcdctl from the etcd image has been the extra step involved to authenticate podman with PullSecret. That's just `podman --authfile=/var/lib/kubelet/config.json` right? (And yes...it is confusing that podman and kubelet have separate default auth files)
(In reply to Colin Walters from comment #9) > > The reason we have not grabbed etcdctl from the etcd image has been the extra step involved to authenticate podman with PullSecret. > > That's just `podman --authfile=/var/lib/kubelet/config.json` right? > > (And yes...it is confusing that podman and kubelet have separate default > auth files) That's one of the related questions i had - in the PR i didn't pull the image because it was already present. Is that a correct assumption or are there some cases where the image might need to be pulled ?
*** Bug 1808496 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581
*** Bug 1809576 has been marked as a duplicate of this bug. ***