Bug 1905489
Summary: | Latest OCP 4.7 on Z builds fail to complete installation as SRO operator does not install | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | krmoser |
Component: | Special Resource Operator | Assignee: | Andy McCrae <amccrae> |
Status: | CLOSED NOTABUG | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.7 | CC: | aos-bugs, bbreard, chanphil, christian.lapolt, danili, Holger.Wolf, imcleod, jligon, miabbott, nstielau, rdossant, wvoesch |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | s390x | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-15 14:13:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1903544 |
Description
krmoser
2020-12-08 13:32:27 UTC
One of the issues is that the operator hasn't been built Multi-arch: $ oc get pods -A | grep Crash openshift-sro special-resource-controller-manager-57f8bff587-lxzhl 0/2 CrashLoopBackOff 14 33m $ oc logs special-resource-controller-manager-57f8bff587-lxzhl -n openshift-sro --all-containers standard_init_linux.go:219: exec user process caused: exec format error standard_init_linux.go:219: exec user process caused: exec format error This was fixed by Andy McCrae on PR https://github.com/openshift/special-resource-operator/pull/6 but it seems that hasn't made its way to the installer image yet. There are other issues with the operator that affect all arches and are being currently discussed and fixed. Confirmed with the creator that this issue is a blocker+ @Andy since you've already been playing in this space, I'm going to assign it to you for first looks. Maybe it should be under the Multi-Arch component? This should be fixed - I'll double check the builds, but we fixed this with: https://github.com/openshift/special-resource-operator/commit/5f2cb4aff31207dcb82f4e0b9df5bc6700e99165#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557 I'll do some tests just to make sure - but the issue was that the SRO is new and wasn't set to be built MA yet (it had hardcoded x86_64/linux), that has been fixed and backported so newer builds should include that fix. I tried today and it was not fixed yet. Folks, 1. We tested with the OCP 4.7.0-0.nightly-s390x-2020-12-08-141200 build and it seems that this issue is fixed. 2. The "special-resource-operator" cluster operator is no longer listed when the "oc get co" command is issued, and the cluster AVAILABLE status does become True with the successful installation of the OCP cluster. Here is an example: NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-s390x-2020-12-08-141200 True False 3m20s Cluster version is 4.7.0-0.nightly-s390x-2020-12-08-141200 NAME STATUS ROLES AGE VERSION master-0.pok-96.ocptest.pok.stglabs.ibm.com Ready master 28m v1.19.2+ad738ba master-1.pok-96.ocptest.pok.stglabs.ibm.com Ready master 28m v1.19.2+ad738ba master-2.pok-96.ocptest.pok.stglabs.ibm.com Ready master 28m v1.19.2+ad738ba worker-0.pok-96.ocptest.pok.stglabs.ibm.com Ready worker 20m v1.19.2+ad738ba worker-1.pok-96.ocptest.pok.stglabs.ibm.com Ready worker 20m v1.19.2+ad738ba NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 11m baremetal 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m cloud-credential 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m cluster-autoscaler 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m config-operator 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m console 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 16m csi-snapshot-controller 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m dns 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m etcd 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m image-registry 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 21m ingress 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 18m insights 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 6m10s kube-apiserver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m kube-controller-manager 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m kube-scheduler 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 24m kube-storage-version-migrator 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 19m machine-api 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m machine-approver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m machine-config 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m marketplace 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m monitoring 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 18m network 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 18m node-tuning 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m openshift-apiserver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 21m openshift-controller-manager 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m openshift-samples 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 20m operator-lifecycle-manager 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m operator-lifecycle-manager-catalog 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 11m service-ca 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m storage 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m 3. Just to confirm, the "special-resource-operator" no longer being listed from the "oc get co" command output is as expected? Thank you, Kyle I still see the operator on 4.7.0-0.nightly-s390x-2020-12-08-174134: $ oc adm release info --pullspecs registry.svc.ci.openshift.org/ocp-s390x/release-s390x:4.7.0-0.nightly-s390x-2020-12-08-174134 | grep special special-resource-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:db19bf6617957ae94c17426b545ff867f5966d817d191ae5ffcdb1c2ab48890c $ oc get pods -A | grep -i Crash openshift-sro special-resource-controller-manager-79c4bdb869-rxwl8 0/2 CrashLoopBackOff 8 10m $ oc logs special-resource-controller-manager-79c4bdb869-rxwl8 -n openshift-sro --all-containers standard_init_linux.go:219: exec user process caused: exec format error standard_init_linux.go:219: exec user process caused: exec format error So not yet solved for all installer images. As far as I know, yes, the expected outcome is for this operator to be removed for the time being. I'll follow up on this, the issue is that the operator pulls in 2 images: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0 quay.io/openshift-psap/special-resource-operator:conditions Neither of these are built for additional architectures so will fail with 'exec format error' - these aren't built as part of the OCP release. It looks like this will be removed from the release so we should be fine for 4.7, but i'll follow up to ensure this gets resolved if it will be included in 4.8+ Folks, As a follow-up to comments 6 and 7, our continued OCP 4.7 build testing with the 2 latest public mirror OCP 4.7 builds indicates that this issue is not yet resolved. 1. OCP 4.7 builds 4.7.0-0.nightly-s390x-2020-12-08-174134 and 4.7.0-0.nightly-s390x-2020-12-09-160115 both fail with the previous above "the cluster operator special-resource-operator has not yet successfully rolled out" issue. Thank you, Kyle Hi Andy, following up on this bug. I remember we discussed that this bug is being resolved. Do you know if that is fixed in the latest build? Sorry for the delay - it has been resolved, there was an issue with removing the SRO from the releases for MA but since the 4.7.0-0.nightly-s390x-2020-12-09-183623 build the SRO has not been included (and won't be included in 4.7 for any architecture). I'll close this bug out - there is still some work for the SRO (across all architectures), we may not see any further issues, but if we do (in 4.8+) we can address those in a new bz. Andy |