Bug 1991508

Summary: ppc64le and s390x CI jobs are failing with exec format errors
Product: OpenShift Container Platform Reporter: Stephen Benjamin <stbenjam>
Component: Multi-ArchAssignee: Deep Mistry <dmistry>
Status: CLOSED ERRATA QA Contact: Jeremy Poulin <jpoulin>
Severity: high Docs Contact:
Priority: high    
Version: 4.9CC: brad.williams, danili, dmistry, sippy
Target Milestone: ---Flags: dmistry: needinfo+
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
job=periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-installer-remote-libvirt-ppc64le=all job=periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-installer-remote-libvirt-s390x=all
Last Closed: 2021-10-18 17:45:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Benjamin 2021-08-09 10:13:58 UTC
periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-compact-remote-libvirt-ppc64le
periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-installer-remote-libvirt-s390x

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-compact-remote-libvirt-ppc64le

Error:
[36mINFO[0m[2021-08-09T04:56:54Z] standard_init_linux.go:219: exec user process caused: exec format error 
[36mINFO[0m[2021-08-09T04:58:17Z] Imported release 4.9.0-0.nightly-2021-08-07-175228 created at 2021-08-07 17:54:17 +0000 UTC with 141 images to tag release:latest 
[36mINFO[0m[2021-08-09T04:58:17Z] Ran for 1m36s                                
[31mERRO[0m[2021-08-09T04:58:17Z] Some steps failed:                           
[31mERRO[0m[2021-08-09T04:58:17Z] 
  * could not run steps: step [release:s390x-latest] failed: failed to get CLI image: unable to find the 'cli' image in the provided release image: the pod ci-op-53q5qixg/release-images-s390x-latest-cli failed after 9s (failed containers: release): ContainerFailed one or more containers exited

Comment 1 Deep Mistry 2021-08-09 13:18:52 UTC
Can you point out the periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-e2e-compact-remote-libvirt-ppc64le job which failed with the similar issue?

Can you confirm if the job failing for ppc64le is https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-installer-remote-libvirt-ppc64le ?

Comment 3 brad.williams 2021-08-09 15:38:39 UTC
The *s390x* job that is successful (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiar[…]4.9-ocp-installer-remote-libvirt-s390x/1424421723737952256) is pulling *BOTH* the s390x and ppc64le images...


INFO[2021-08-08T17:26:22Z] Resolved release latest to registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-07-175228 
INFO[2021-08-08T17:26:22Z] Resolved release ppc64le-initial to registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.9.0-0.nightly-ppc64le-2021-08-07-155716 
INFO[2021-08-08T17:26:22Z] Resolved release ppc64le-latest to registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.9.0-0.nightly-ppc64le-2021-08-07-172251 
INFO[2021-08-08T17:26:22Z] Resolved release s390x-initial to registry.ci.openshift.org/ocp-s390x/release-s390x:4.9.0-0.nightly-s390x-2021-08-07-155712 
INFO[2021-08-08T17:26:22Z] Resolved release s390x-latest to registry.ci.openshift.org/ocp-s390x/release-s390x:4.9.0-0.nightly-s390x-2021-08-07-172256 
INFO[2021-08-08T17:26:22Z] Resolved release initial to registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-06-170119 


The *s390x* jobs that are failing (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiar[…]4.9-ocp-installer-remote-libvirt-s390x/1424595440749252608) are only pulling the *ppc64le* images...

INFO[2021-08-09T04:56:41Z] Resolved release initial to registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-06-170119 
INFO[2021-08-09T04:56:41Z] Resolved release latest to registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-07-175228 
INFO[2021-08-09T04:56:41Z] Resolved release ppc64le-initial to registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.9.0-0.nightly-ppc64le-2021-08-07-155716 
INFO[2021-08-09T04:56:41Z] Resolved release ppc64le-latest to registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.9.0-0.nightly-ppc64le-2021-08-07-172251 

I checked the release-controller logic and it doesnt appear to have crossed the streams (s390x <-> ppc64le) anywhere. 

I have also verified that the *ppc64le* failures are caused by the same issue, except they are pulling the *s390x* images.

Comment 5 Deep Mistry 2021-08-10 13:17:12 UTC
*** Bug 1991629 has been marked as a duplicate of this bug. ***

Comment 7 Dan Li 2021-08-10 14:21:44 UTC
Setting "Blocker-" after chat with Deep

Comment 8 Dan Li 2021-08-10 16:38:20 UTC
Hi Deep, do you think this bug will reach "ON_QA" by the end of this sprint (August 14th)? If not, we might want to add the "reviewed-in-sprint" flag.

Comment 12 errata-xmlrpc 2021-10-18 17:45:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759