Bug 1882767

Summary:

No s390x build for community-operator-index:v4.6

Product:

OpenShift Container Platform

Reporter:

Tom Dale <tdale>

Component:

Multi-Arch

Assignee:

Dylan Orzel <dorzel>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Barry Donahue <bdonahue>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.6

CC:

cfillekes, chanphil, christian.lapolt, danili, dorzel, Holger.Wolf, krmoser, nbziouec, tdale

Target Milestone:

---

Target Release:

4.7.0

Hardware:

s390x

OS:

Linux

Whiteboard:

multi-arch

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-12-16 15:24:58 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1881153

Attachments:

Description	Flags
log file from one running community-operators pod	none

Description Tom Dale 2020-09-25 15:50:01 UTC

Description of problem:
Openshift Server Version: 4.6.0-0.nightly-s390x-2020-09-25-054206 uses image registry.redhat.io/redhat/community-operator-index:v4.6 . This results in an exec format error for community operators as there is only an x86 image built. 

I used `skopeo inspect docker://registry.redhat.io/redhat/community-operator-index:v4.6` to verify there is no s390x build (nor is there a ppc64le build) for this image. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
All "community-operator-*" pods are failing in the "openshift-marketplace" namespace.

Expected results: `oc get pods -n openshift-marketplace` should show all "Running" pods.

Comment 1 Cheryl A Fillekes 2020-09-25 17:40:04 UTC

Created attachment 1716673 [details]
log file from one running community-operators pod

Comment 2 Cheryl A Fillekes 2020-09-25 17:42:30 UTC

[root@ospamgrs3 ~]# oc get pods -n openshift-marketplace
NAME                                    READY   STATUS             RESTARTS   AGE
certified-operators-56648cfd98-scwr9    1/1     Running            0          9h
certified-operators-gnh6c               1/1     Running            0          4h7m
community-operators-5c686               0/1     CrashLoopBackOff   117        9h
community-operators-77df6c68b7-jbdgb    1/1     Running            0          9h
community-operators-ddgwg               0/1     CrashLoopBackOff   116        9h
marketplace-operator-784d9f5896-64bt7   1/1     Running            0          9h
redhat-marketplace-5765ff97c-gfp59      1/1     Running            0          9h
redhat-marketplace-5lfs9                1/1     Running            0          3h50m
redhat-operators-58b4d5c978-qmqwd       1/1     Running            0          9h
redhat-operators-vrfln                  1/1     Running            0          9h
[root@ospamgrs3 ~]# oc logs community-operators-ddgwg -n openshift-marketplace
standard_init_linux.go:219: exec user process caused: exec format error
[root@ospamgrs3 ~]# oc logs community-operators-5c686 -n openshift-marketplace
standard_init_linux.go:219: exec user process caused: exec format error
[root@ospamgrs3 ~]# oc logs community-operators-77df6c68b7-jbdgb -n openshift-marketplace > community-operators-77df6c68b7-jbdgb.log

(log attached)

Comment 3 Dan Li 2020-10-02 22:12:18 UTC

This error below is also observed on ppc64le as of October 2nd, 2020

openshift-marketplace                              community-operators-2dnhq                                    0/1     CrashLoopBackOff   7          16m
openshift-marketplace                              community-operators-qrhpp                                    0/1     CrashLoopBackOff   11         37m

This error was discovered on 4.6 install as well as 4.4.27 install.

Comment 4 Dylan Orzel 2020-10-08 21:45:25 UTC

This looks to be fixed for 4.6 s390x as of the latest nightlies (4.6.0-0.nightly-s390x-2020-10-08-182421 here)

[dorzel@rock-kvmlp-1 ocp4-workdir]$ oc get pods -n openshift-marketplace
NAME                                 READY   STATUS    RESTARTS   AGE
certified-operators-gf8c5            1/1     Running   0          45m
community-operators-vnwxt            1/1     Running   0          45m
marketplace-operator-8dd9598-9jcz8   1/1     Running   0          50m
redhat-marketplace-cvq7k             1/1     Running   0          45m
redhat-operators-vl724               1/1     Running   0          45m

[dorzel@rock-kvmlp-1 ocp4-workdir]$ oc get pod community-operators-vnwxt -n openshift-marketplace -o jsonpath={.spec.containers[0].image}
registry.redhat.io/redhat/community-operator-index:latest


It is my understanding that ppc64le should be fixed as well as the manifest list is now working correctly for this image.

Comment 5 Dan Li 2020-10-14 18:11:46 UTC

According to the latest z-stream testing, this error still occurs on 4.4.28 nightly for ppc64le. This error is not discovered in the 4.5.z and 4.6.0-RC nightlies.

Link to the test results here: https://docs.google.com/spreadsheets/d/1PuW0zyBg7moLIiXq8tQ0cFyz427NGx8R-cStv5Mt7ok/edit#gid=1433717023

Comment 7 Dan Li 2020-12-02 15:12:41 UTC

Adding "UpcomingSprint" as team will not get to this bug during this sprint

Comment 8 Dylan Orzel 2020-12-15 19:35:24 UTC

Looking into the 4.4.28 ppc64le community operator errors, it does not seem those are related to this bug. There are no image pull errors or exec format errors that would indicate a missing build for ppc64le. In fact, there are no errors in the logs at all. The only thing that seems to be wrong there is this message from cluster events:

"Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s"

I'm getting the same error intermittently with other operator pods which leads me to believe it is resource related. This is typical of my ppc64le cluster, and it is saying that the API server has quite a high latency. Maybe this is similar to what's going on in CI? 

Since this is still resolved for 4.5/4.6, and the 4.4.28 (ppc64le only) seems to not actually be a bug, I think this can be closed.