Bug 1780396

Summary: Etcd-snapshot-backup exec format error
Product: OpenShift Container Platform Reporter: Tom Dale <tdale>
Component: Multi-ArchAssignee: Prashanth Sundararaman <psundara>
Status: CLOSED ERRATA QA Contact: Barry Donahue <bdonahue>
Severity: low Docs Contact:
Priority: low    
Version: 4.4CC: aghadge, alklein, cfillekes, chanphil, christian.lapolt, dbenoit, erich, hannsj_uhl, Holger.Wolf, krmoser, nbziouec, psundara, rcgingra, sbatsche, skolicha, tdale, walters
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1797035 1797038 (view as bug list) Environment:
Last Closed: 2020-05-04 11:18:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1765215, 1797035    

Description Tom Dale 2019-12-05 21:03:16 UTC
Description of problem:
Attempted to backup etcd on s390 KVM cluster. Following the docs at https://docs.openshift.com/container-platform/4.1/backup_and_restore/backing-up-etcd.html#backing-up-etcd-data_backup-etcd
Running etcd-snapshot-backup.sh gives exec format error.


Steps to Reproduce:
1. ssh into a master node on a cluster.
2. run `sudo /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db`
3. Get error 
```Downloading etcdctl binary..
/usr/local/bin/openshift-recovery-tools: line 26: ./assets/bin/etcdctl: cannot execute binary file: Exec format error
```

Expected results:
A saved snapshot of etcd

Comment 1 Prashanth Sundararaman 2020-01-16 19:53:09 UTC
The MCO generates the snapshot script which downloads etcdctl binary from upstream. It downloads the x86 version and is not arch specific: https://github.com/openshift/machine-config-operator/blob/master/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml#L28

Comment 2 Prashanth Sundararaman 2020-01-16 20:15:55 UTC
Unfortunately for s390x, there is no etcdctl binary available in storage.googleapis.com. only x86 and ppc64le versions are available

Comment 3 Prashanth Sundararaman 2020-01-16 20:57:05 UTC
s390x needs to be added as a supported release for the etcd binaries and etcdctl to be released similar to ppc64le. I will work on getting that added upstream

Comment 4 Colin Walters 2020-01-17 00:11:08 UTC
This is a MCO bug - we even ship etcd with the MCO, so the fix should be...hm, something like:

diff --git a/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml b/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml
index bd399c07..8e86db51 100644
--- a/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml
+++ b/templates/master/00-master/_base/files/usr-local-bin-openshift-recovery-tools-sh.yaml
@@ -21,15 +21,14 @@ contents:
 
     # download and test etcdctl from upstream release assets
     dl_etcdctl() {
-      GOOGLE_URL=https://storage.googleapis.com/etcd
-      DOWNLOAD_URL=${GOOGLE_URL}
-
-      echo "Downloading etcdctl binary.."
-      curl -s -L ${DOWNLOAD_URL}/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -o $ASSET_DIR/tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz \
-        && tar -xzf $ASSET_DIR/tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C $ASSET_DIR/shared --strip-components=1 \
-        && mv $ASSET_DIR/shared/etcdctl $ASSET_DIR/bin \
-        && rm $ASSET_DIR/shared/etcd \
-        && $ASSET_DIR/bin/etcdctl version
+      local etcdimg="{{.Images.etcdKey}}"
+      podman pull "${etcdimg}"
+      local etcdctr=$(podman create ${etcdimg})
+      local etcdmnt=$(podman mount "${etcdctr}")
+      cp ${etcdmnt}/bin/etcdctl $ASSET_DIR/bin
+      umount "${etcdmnt}"
+      podman rm "${etcdctr}"
+      podman rmi "${etcdimg}"
     }
 
     #backup etcd client certs

Comment 5 Colin Walters 2020-01-17 00:11:41 UTC
Really of course we want to run the *containerized* etcd tools, and in fact the whole recovery stuff should be containerized.

Comment 6 Sam Batschelet 2020-01-17 20:59:55 UTC
I agree with Colin, DR scripts are currently being ported to golang and will be part of the cluster-etcd-operator image. The reason we have not grabbed etcdctl from the etcd image has been the extra step involved to authenticate podman with PullSecret. But this bug and air-gapped clusters are a good reason to take that approach.

Comment 7 Prashanth Sundararaman 2020-01-27 18:41:40 UTC
Colin's diff works....except that the `podman rmi ${etcdimg}` is not allowing the image to be deleted, probably because it's part of the deployment? Otherwise, i was able to create the backup. Thanks Colin!

Comment 9 Colin Walters 2020-01-29 22:58:52 UTC
> The reason we have not grabbed etcdctl from the etcd image has been the extra step involved to authenticate podman with PullSecret.

That's just `podman --authfile=/var/lib/kubelet/config.json` right?

(And yes...it is confusing that podman and kubelet have separate default auth files)

Comment 10 Prashanth Sundararaman 2020-01-29 23:44:18 UTC
(In reply to Colin Walters from comment #9)
> > The reason we have not grabbed etcdctl from the etcd image has been the extra step involved to authenticate podman with PullSecret.
> 
> That's just `podman --authfile=/var/lib/kubelet/config.json` right?
> 
> (And yes...it is confusing that podman and kubelet have separate default
> auth files)

That's one of the related questions i had - in the PR i didn't pull the image because it was already present. Is that a correct assumption or are there some cases where the image might need to be pulled ?

Comment 12 Suresh Kolichala 2020-03-10 17:44:44 UTC
*** Bug 1808496 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-05-04 11:18:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Comment 15 Suresh Kolichala 2020-05-19 12:44:59 UTC
*** Bug 1809576 has been marked as a duplicate of this bug. ***