Description of problem: From the 4.7.0-0.nightly-s390x-2020-12-04-114524 OCP 4.7 build to the current 4.7.0-0.nightly-s390x-2020-12-07-232930 build (and most likely beyond), the OCP 4.7 builds do not complete installation as the special-resource-operator (SRO) does not install. This issue has been seen with both zVM and KVM OCP 4.7 on Z hypervisors. Version-Release number of selected component (if applicable): OCP 4.7 How reproducible: Easily reproducible for any OCP 4.7 on Z public mirror build from 4.7.0-0.nightly-s390x-2020-12-04-114524 to 4.7.0-0.nightly-s390x-2020-12-07-232930. Steps to Reproduce: 1. Attempt to install any of the public mirror builds from 4.7.0-0.nightly-s390x-2020-12-04-114524 to 4.7.0-0.nightly-s390x-2020-12-07-232930. Actual results: The OCP 4.7 builds never completes. Expected results: The OCP 4.7 builds successfully completes. Additional info: Here is the output from the "oc get clusterversion", "oc get nodes", and "oc get co" commands after over 2 hours since the install started. Usually an OCP 4.7 install completes in 20-25 minutes or less, but here the install never completes as it appears the "special-resource-operator" does not seem to even start to install. NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 128m Unable to apply 4.7.0-0.nightly-s390x-2020-12-07-232930: the cluster operator special-resource-operator has not yet successfully rolled out NAME STATUS ROLES AGE VERSION master-0.pok-96.ocptest.pok.stglabs.ibm.com Ready master 127m v1.19.2+ad738ba master-1.pok-96.ocptest.pok.stglabs.ibm.com Ready master 127m v1.19.2+ad738ba master-2.pok-96.ocptest.pok.stglabs.ibm.com Ready master 127m v1.19.2+ad738ba worker-0.pok-96.ocptest.pok.stglabs.ibm.com Ready worker 119m v1.19.2+ad738ba worker-1.pok-96.ocptest.pok.stglabs.ibm.com Ready worker 119m v1.19.2+ad738ba NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 110m baremetal 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 126m cloud-credential 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 126m cluster-autoscaler 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m config-operator 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 126m console 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 115m csi-snapshot-controller 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m dns 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m etcd 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 124m image-registry 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 119m ingress 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 118m insights 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 105m kube-apiserver 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 122m kube-controller-manager 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 124m kube-scheduler 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 124m kube-storage-version-migrator 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 118m machine-api 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m machine-approver 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m machine-config 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m marketplace 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 124m monitoring 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 117m network 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 117m node-tuning 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m openshift-apiserver 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 119m openshift-controller-manager 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 123m openshift-samples 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 118m operator-lifecycle-manager 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m operator-lifecycle-manager-catalog 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 125m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 119m service-ca 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 126m special-resource-operator storage 4.7.0-0.nightly-s390x-2020-12-07-232930 True False False 126m
One of the issues is that the operator hasn't been built Multi-arch: $ oc get pods -A | grep Crash openshift-sro special-resource-controller-manager-57f8bff587-lxzhl 0/2 CrashLoopBackOff 14 33m $ oc logs special-resource-controller-manager-57f8bff587-lxzhl -n openshift-sro --all-containers standard_init_linux.go:219: exec user process caused: exec format error standard_init_linux.go:219: exec user process caused: exec format error This was fixed by Andy McCrae on PR https://github.com/openshift/special-resource-operator/pull/6 but it seems that hasn't made its way to the installer image yet. There are other issues with the operator that affect all arches and are being currently discussed and fixed.
Confirmed with the creator that this issue is a blocker+
@Andy since you've already been playing in this space, I'm going to assign it to you for first looks. Maybe it should be under the Multi-Arch component?
This should be fixed - I'll double check the builds, but we fixed this with: https://github.com/openshift/special-resource-operator/commit/5f2cb4aff31207dcb82f4e0b9df5bc6700e99165#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557 I'll do some tests just to make sure - but the issue was that the SRO is new and wasn't set to be built MA yet (it had hardcoded x86_64/linux), that has been fixed and backported so newer builds should include that fix.
I tried today and it was not fixed yet.
Folks, 1. We tested with the OCP 4.7.0-0.nightly-s390x-2020-12-08-141200 build and it seems that this issue is fixed. 2. The "special-resource-operator" cluster operator is no longer listed when the "oc get co" command is issued, and the cluster AVAILABLE status does become True with the successful installation of the OCP cluster. Here is an example: NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-s390x-2020-12-08-141200 True False 3m20s Cluster version is 4.7.0-0.nightly-s390x-2020-12-08-141200 NAME STATUS ROLES AGE VERSION master-0.pok-96.ocptest.pok.stglabs.ibm.com Ready master 28m v1.19.2+ad738ba master-1.pok-96.ocptest.pok.stglabs.ibm.com Ready master 28m v1.19.2+ad738ba master-2.pok-96.ocptest.pok.stglabs.ibm.com Ready master 28m v1.19.2+ad738ba worker-0.pok-96.ocptest.pok.stglabs.ibm.com Ready worker 20m v1.19.2+ad738ba worker-1.pok-96.ocptest.pok.stglabs.ibm.com Ready worker 20m v1.19.2+ad738ba NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 11m baremetal 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m cloud-credential 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m cluster-autoscaler 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m config-operator 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m console 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 16m csi-snapshot-controller 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m dns 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m etcd 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m image-registry 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 21m ingress 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 18m insights 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 6m10s kube-apiserver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m kube-controller-manager 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m kube-scheduler 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 24m kube-storage-version-migrator 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 19m machine-api 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m machine-approver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m machine-config 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m marketplace 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m monitoring 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 18m network 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 18m node-tuning 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m openshift-apiserver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 21m openshift-controller-manager 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 25m openshift-samples 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 20m operator-lifecycle-manager 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m operator-lifecycle-manager-catalog 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 26m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 11m service-ca 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m storage 4.7.0-0.nightly-s390x-2020-12-08-141200 True False False 27m 3. Just to confirm, the "special-resource-operator" no longer being listed from the "oc get co" command output is as expected? Thank you, Kyle
I still see the operator on 4.7.0-0.nightly-s390x-2020-12-08-174134: $ oc adm release info --pullspecs registry.svc.ci.openshift.org/ocp-s390x/release-s390x:4.7.0-0.nightly-s390x-2020-12-08-174134 | grep special special-resource-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:db19bf6617957ae94c17426b545ff867f5966d817d191ae5ffcdb1c2ab48890c $ oc get pods -A | grep -i Crash openshift-sro special-resource-controller-manager-79c4bdb869-rxwl8 0/2 CrashLoopBackOff 8 10m $ oc logs special-resource-controller-manager-79c4bdb869-rxwl8 -n openshift-sro --all-containers standard_init_linux.go:219: exec user process caused: exec format error standard_init_linux.go:219: exec user process caused: exec format error So not yet solved for all installer images. As far as I know, yes, the expected outcome is for this operator to be removed for the time being.
I'll follow up on this, the issue is that the operator pulls in 2 images: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0 quay.io/openshift-psap/special-resource-operator:conditions Neither of these are built for additional architectures so will fail with 'exec format error' - these aren't built as part of the OCP release. It looks like this will be removed from the release so we should be fine for 4.7, but i'll follow up to ensure this gets resolved if it will be included in 4.8+
Folks, As a follow-up to comments 6 and 7, our continued OCP 4.7 build testing with the 2 latest public mirror OCP 4.7 builds indicates that this issue is not yet resolved. 1. OCP 4.7 builds 4.7.0-0.nightly-s390x-2020-12-08-174134 and 4.7.0-0.nightly-s390x-2020-12-09-160115 both fail with the previous above "the cluster operator special-resource-operator has not yet successfully rolled out" issue. Thank you, Kyle
Hi Andy, following up on this bug. I remember we discussed that this bug is being resolved. Do you know if that is fixed in the latest build?
Sorry for the delay - it has been resolved, there was an issue with removing the SRO from the releases for MA but since the 4.7.0-0.nightly-s390x-2020-12-09-183623 build the SRO has not been included (and won't be included in 4.7 for any architecture). I'll close this bug out - there is still some work for the SRO (across all architectures), we may not see any further issues, but if we do (in 4.8+) we can address those in a new bz. Andy