Bug 1882767
| Summary: | No s390x build for community-operator-index:v4.6 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Tom Dale <tdale> | ||||
| Component: | Multi-Arch | Assignee: | Dylan Orzel <dorzel> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Barry Donahue <bdonahue> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.6 | CC: | cfillekes, chanphil, christian.lapolt, danili, dorzel, Holger.Wolf, krmoser, nbziouec, tdale | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.7.0 | ||||||
| Hardware: | s390x | ||||||
| OS: | Linux | ||||||
| Whiteboard: | multi-arch | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-12-16 15:24:58 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1881153 | ||||||
| Attachments: |
|
||||||
|
Description
Tom Dale
2020-09-25 15:50:01 UTC
Created attachment 1716673 [details]
log file from one running community-operators pod
[root@ospamgrs3 ~]# oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-56648cfd98-scwr9 1/1 Running 0 9h certified-operators-gnh6c 1/1 Running 0 4h7m community-operators-5c686 0/1 CrashLoopBackOff 117 9h community-operators-77df6c68b7-jbdgb 1/1 Running 0 9h community-operators-ddgwg 0/1 CrashLoopBackOff 116 9h marketplace-operator-784d9f5896-64bt7 1/1 Running 0 9h redhat-marketplace-5765ff97c-gfp59 1/1 Running 0 9h redhat-marketplace-5lfs9 1/1 Running 0 3h50m redhat-operators-58b4d5c978-qmqwd 1/1 Running 0 9h redhat-operators-vrfln 1/1 Running 0 9h [root@ospamgrs3 ~]# oc logs community-operators-ddgwg -n openshift-marketplace standard_init_linux.go:219: exec user process caused: exec format error [root@ospamgrs3 ~]# oc logs community-operators-5c686 -n openshift-marketplace standard_init_linux.go:219: exec user process caused: exec format error [root@ospamgrs3 ~]# oc logs community-operators-77df6c68b7-jbdgb -n openshift-marketplace > community-operators-77df6c68b7-jbdgb.log (log attached) This error below is also observed on ppc64le as of October 2nd, 2020 openshift-marketplace community-operators-2dnhq 0/1 CrashLoopBackOff 7 16m openshift-marketplace community-operators-qrhpp 0/1 CrashLoopBackOff 11 37m This error was discovered on 4.6 install as well as 4.4.27 install. This looks to be fixed for 4.6 s390x as of the latest nightlies (4.6.0-0.nightly-s390x-2020-10-08-182421 here)
[dorzel@rock-kvmlp-1 ocp4-workdir]$ oc get pods -n openshift-marketplace
NAME READY STATUS RESTARTS AGE
certified-operators-gf8c5 1/1 Running 0 45m
community-operators-vnwxt 1/1 Running 0 45m
marketplace-operator-8dd9598-9jcz8 1/1 Running 0 50m
redhat-marketplace-cvq7k 1/1 Running 0 45m
redhat-operators-vl724 1/1 Running 0 45m
[dorzel@rock-kvmlp-1 ocp4-workdir]$ oc get pod community-operators-vnwxt -n openshift-marketplace -o jsonpath={.spec.containers[0].image}
registry.redhat.io/redhat/community-operator-index:latest
It is my understanding that ppc64le should be fixed as well as the manifest list is now working correctly for this image.
According to the latest z-stream testing, this error still occurs on 4.4.28 nightly for ppc64le. This error is not discovered in the 4.5.z and 4.6.0-RC nightlies. Link to the test results here: https://docs.google.com/spreadsheets/d/1PuW0zyBg7moLIiXq8tQ0cFyz427NGx8R-cStv5Mt7ok/edit#gid=1433717023 Adding "UpcomingSprint" as team will not get to this bug during this sprint Looking into the 4.4.28 ppc64le community operator errors, it does not seem those are related to this bug. There are no image pull errors or exec format errors that would indicate a missing build for ppc64le. In fact, there are no errors in the logs at all. The only thing that seems to be wrong there is this message from cluster events: "Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s" I'm getting the same error intermittently with other operator pods which leads me to believe it is resource related. This is typical of my ppc64le cluster, and it is saying that the API server has quite a high latency. Maybe this is similar to what's going on in CI? Since this is still resolved for 4.5/4.6, and the 4.4.28 (ppc64le only) seems to not actually be a bug, I think this can be closed. |