Bug 1882767 - No s390x build for community-operator-index:v4.6
Summary: No s390x build for community-operator-index:v4.6
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.6
Hardware: s390x
OS: Linux
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Dylan Orzel
QA Contact: Barry Donahue
URL:
Whiteboard: multi-arch
Depends On:
Blocks: ocp-46-z-tracker
TreeView+ depends on / blocked
 
Reported: 2020-09-25 15:50 UTC by Tom Dale
Modified: 2020-12-16 15:24 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-16 15:24:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log file from one running community-operators pod (707.63 KB, text/plain)
2020-09-25 17:40 UTC, Cheryl A Fillekes
no flags Details

Description Tom Dale 2020-09-25 15:50:01 UTC
Description of problem:
Openshift Server Version: 4.6.0-0.nightly-s390x-2020-09-25-054206 uses image registry.redhat.io/redhat/community-operator-index:v4.6 . This results in an exec format error for community operators as there is only an x86 image built. 

I used `skopeo inspect docker://registry.redhat.io/redhat/community-operator-index:v4.6` to verify there is no s390x build (nor is there a ppc64le build) for this image. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
All "community-operator-*" pods are failing in the "openshift-marketplace" namespace.

Expected results: `oc get pods -n openshift-marketplace` should show all "Running" pods.

Comment 1 Cheryl A Fillekes 2020-09-25 17:40:04 UTC
Created attachment 1716673 [details]
log file from one running community-operators pod

Comment 2 Cheryl A Fillekes 2020-09-25 17:42:30 UTC
[root@ospamgrs3 ~]# oc get pods -n openshift-marketplace
NAME                                    READY   STATUS             RESTARTS   AGE
certified-operators-56648cfd98-scwr9    1/1     Running            0          9h
certified-operators-gnh6c               1/1     Running            0          4h7m
community-operators-5c686               0/1     CrashLoopBackOff   117        9h
community-operators-77df6c68b7-jbdgb    1/1     Running            0          9h
community-operators-ddgwg               0/1     CrashLoopBackOff   116        9h
marketplace-operator-784d9f5896-64bt7   1/1     Running            0          9h
redhat-marketplace-5765ff97c-gfp59      1/1     Running            0          9h
redhat-marketplace-5lfs9                1/1     Running            0          3h50m
redhat-operators-58b4d5c978-qmqwd       1/1     Running            0          9h
redhat-operators-vrfln                  1/1     Running            0          9h
[root@ospamgrs3 ~]# oc logs community-operators-ddgwg -n openshift-marketplace
standard_init_linux.go:219: exec user process caused: exec format error
[root@ospamgrs3 ~]# oc logs community-operators-5c686 -n openshift-marketplace
standard_init_linux.go:219: exec user process caused: exec format error
[root@ospamgrs3 ~]# oc logs community-operators-77df6c68b7-jbdgb -n openshift-marketplace > community-operators-77df6c68b7-jbdgb.log

(log attached)

Comment 3 Dan Li 2020-10-02 22:12:18 UTC
This error below is also observed on ppc64le as of October 2nd, 2020

openshift-marketplace                              community-operators-2dnhq                                    0/1     CrashLoopBackOff   7          16m
openshift-marketplace                              community-operators-qrhpp                                    0/1     CrashLoopBackOff   11         37m

This error was discovered on 4.6 install as well as 4.4.27 install.

Comment 4 Dylan Orzel 2020-10-08 21:45:25 UTC
This looks to be fixed for 4.6 s390x as of the latest nightlies (4.6.0-0.nightly-s390x-2020-10-08-182421 here)

[dorzel@rock-kvmlp-1 ocp4-workdir]$ oc get pods -n openshift-marketplace
NAME                                 READY   STATUS    RESTARTS   AGE
certified-operators-gf8c5            1/1     Running   0          45m
community-operators-vnwxt            1/1     Running   0          45m
marketplace-operator-8dd9598-9jcz8   1/1     Running   0          50m
redhat-marketplace-cvq7k             1/1     Running   0          45m
redhat-operators-vl724               1/1     Running   0          45m

[dorzel@rock-kvmlp-1 ocp4-workdir]$ oc get pod community-operators-vnwxt -n openshift-marketplace -o jsonpath={.spec.containers[0].image}
registry.redhat.io/redhat/community-operator-index:latest


It is my understanding that ppc64le should be fixed as well as the manifest list is now working correctly for this image.

Comment 5 Dan Li 2020-10-14 18:11:46 UTC
According to the latest z-stream testing, this error still occurs on 4.4.28 nightly for ppc64le. This error is not discovered in the 4.5.z and 4.6.0-RC nightlies.

Link to the test results here: https://docs.google.com/spreadsheets/d/1PuW0zyBg7moLIiXq8tQ0cFyz427NGx8R-cStv5Mt7ok/edit#gid=1433717023

Comment 7 Dan Li 2020-12-02 15:12:41 UTC
Adding "UpcomingSprint" as team will not get to this bug during this sprint

Comment 8 Dylan Orzel 2020-12-15 19:35:24 UTC
Looking into the 4.4.28 ppc64le community operator errors, it does not seem those are related to this bug. There are no image pull errors or exec format errors that would indicate a missing build for ppc64le. In fact, there are no errors in the logs at all. The only thing that seems to be wrong there is this message from cluster events:

"Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s"

I'm getting the same error intermittently with other operator pods which leads me to believe it is resource related. This is typical of my ppc64le cluster, and it is saying that the API server has quite a high latency. Maybe this is similar to what's going on in CI? 

Since this is still resolved for 4.5/4.6, and the 4.4.28 (ppc64le only) seems to not actually be a bug, I think this can be closed.


Note You need to log in before you can comment on or make changes to this bug.