On the latest commit on s390x, the cluster samples operator is reporting that it is available and finished progressing, however the openshift installer is not detecting that the operator has finished updating, and ends up timing out. DEBUG Built from commit 6ed04f65b0f6a1e11f10afe658465ba8195ac459 INFO Waiting up to 30m0s for the cluster at https://api.test.example.com:6443 to initialize... DEBUG Still waiting for the cluster to initialize: Cluster operator openshift-samples is still updating``` The offending commit is https://github.com/openshift/cluster-samples-operator/pull/187 which we do need overall but apparently isn't quite right.
```[root@ocp-z-dev-2-9 ocp4-workdir]# oc logs cluster-samples-operator-66dcb6fddf-npdlc time="2019-12-05T00:55:29Z" level=info msg="Go Version: go1.11.13" time="2019-12-05T00:55:29Z" level=info msg="Go OS/Arch: linux/s390x" time="2019-12-05T00:55:29Z" level=info msg="template client &v1.TemplateV1Client{restClient:(*rest.RESTClient)(0xc0003d0300)}" time="2019-12-05T00:55:29Z" level=info msg="image client &v1.ImageV1Client{restClient:(*rest.RESTClient)(0xc0003d03c0)}" time="2019-12-05T00:55:29Z" level=info msg="creating default Config" time="2019-12-05T00:55:32Z" level=info msg="got already exists error on create default" time="2019-12-05T00:55:32Z" level=info msg="waiting for informer caches to sync" time="2019-12-05T00:55:32Z" level=info msg="started events processor" time="2019-12-05T00:55:32Z" level=info msg="processing secret watch event while in Managed state; deletion event: false" time="2019-12-05T00:55:32Z" level=info msg="creation/update of credential in openshift namespace recognized" time="2019-12-05T00:55:32Z" level=info msg="processing secret watch event while in Managed state; deletion event: false" time="2019-12-05T00:55:32Z" level=info msg="Copying secret pull-secret from the openshift-config namespace into the operator's namespace" time="2019-12-05T00:55:32Z" level=info msg="management state set to managed" time="2019-12-05T00:55:32Z" level=info msg="Spec is valid because this operator has not processed a config yet" time="2019-12-05T00:55:32Z" level=info msg="samples are not installed on non-x86 architectures" time="2019-12-05T01:05:32Z" level=info msg="processing secret watch event while in Managed state; deletion event: false" time="2019-12-05T01:05:32Z" level=info msg="Copying secret pull-secret from the openshift-config namespace into the operator's namespace" time="2019-12-05T01:05:32Z" level=info msg="processing secret watch event while in Managed state; deletion event: false" time="2019-12-05T01:05:32Z" level=info msg="creation/update of credential in openshift namespace recognized" time="2019-12-05T01:05:32Z" level=info msg="management state set to managed" time="2019-12-05T01:05:32Z" level=info msg="Spec is valid because this operator has not processed a config yet" time="2019-12-05T01:05:32Z" level=info msg="samples are not installed on non-x86 architectures" ```
can you provide "oc get clusteroperator/openshift-samples -o yaml"?
yaml was provided in slack: apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2019-12-05T00:22:00Z" generation: 1 name: openshift-samples resourceVersion: "10094" selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-samples uid: 40e8ab8d-16f5-11ea-868b-0200000c2211 spec: {} status: conditions: - lastTransitionTime: "2019-12-05T00:22:00Z" reason: NonX86Platform status: "False" type: Progressing - lastTransitionTime: "2019-12-05T00:22:00Z" reason: NonX86Platform status: "False" type: Degraded - lastTransitionTime: "2019-12-05T00:22:03Z" reason: NonX86Platform status: "True" type: Available extension: null relatedObjects: - group: samples.operator.openshift.io name: cluster resource: configs - group: "" name: openshift-cluster-samples-operator resource: namespaces - group: "" name: openshift resource: namespaces https://coreos.slack.com/files/UFHEG5WQ3/FRCGZA431/untitled as part of discussion: https://coreos.slack.com/archives/CFFJUNP6C/p1575522976131700 not seeing anything obviously wrong w/ it, so possibly this is a CVO problem? or the samples operator status updated to healthy after the failure?
I bet the issue is the missing version. Compare versions: - name: operator version: 0.0.1-2019-12-05-035621 from this random, successful CI job [1]. Docs in [2]. [1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/2753/pull-ci-openshift-installer-master-e2e-aws/8850/artifacts/e2e-aws/must-gather/registry-svc-ci-openshift-org-ci-op-tytlx80s-stable-sha256-c31e3068a603b8d8add473dbaaa5b933323a23dc862cb266855248e7cba5ac99/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml [2]: https://github.com/openshift/cluster-version-operator/blame/98d173e9f8679a7db87877cbdb177bc309dda6a2/docs/user/reconciliation.md#L120
Samples operator saying "yes, expect me to set an 'operator' version" [1]. [1]: https://github.com/openshift/cluster-samples-operator/blob/c8d02cb18cf94dd774c9391292ae1fd27ba32346/manifests/07-clusteroperator.yaml#L7-L9
yeah that would do it. Thanks Trevor. Hopefully Gabe can fix this in the morning.
Setting code might be [1,2]. Not sure where the multi-arch-ness is guarding from that. [1]: https://github.com/openshift/cluster-samples-operator/blob/9d88c47dc607029e6ea48256697fea837dd0df40/pkg/operatorstatus/operatorstatus.go#L177 [2]: https://github.com/openshift/cluster-samples-operator/blob/9d88c47dc607029e6ea48256697fea837dd0df40/pkg/operatorstatus/operatorstatus.go#L203-L207
Ah, guard is [1], but [2] is not setting a version. [1]: https://github.com/openshift/cluster-samples-operator/blob/c8d02cb18cf94dd774c9391292ae1fd27ba32346/pkg/operatorstatus/operatorstatus.go#L90-L93 [2]: https://github.com/openshift/cluster-samples-operator/blob/c8d02cb18cf94dd774c9391292ae1fd27ba32346/pkg/operatorstatus/operatorstatus.go#L66-L82
Please provide the must-gather info, which contains the logs for the samples operator. This code _should_ be setting the operator version, but if it is failing to do so we would see errors in the log.
I don't need must gather ... I believe I know why the version is not getting set in our special case path for s390 I should have a PR up soon.
Hi Gabe, So per #comment 14 and #comment 15, this bug should be moved to 4.5 and set it to assigned status?
No Wei Sun we should mark this as verified, as what we did in 4.4 was not attempt to install x86 samples on s390/ppc that were doomed to fail. However, samples operator was originally failing to set the version it was at as part of this, and thus the install complained. The PR with this bug addressed that. #Comment 14 and #Comment 15 talk to the next step, which is installing samples on s390/ppc that reference images that work on those platforms. Specifically 1) https://issues.redhat.com/browse/DEVEXP-465 and https://github.com/openshift/cluster-samples-operator/pull/225 will result in samples getting installed 2) https://issues.redhat.com/browse/MULTIARCH-149 is the work on the multi-arch side to enable testing of those sample in CI, to verify those imagestreams/images and templates from the non-openshift teams are functional We will merge 1) once 2) is ready.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581