Note: Filed this on GitHub (see links) but opening here too for internal tracking, as it's blocking ARO from moving to 4.9. We have the following override in our `ClusterVersion`: - group: imageregistry.operator.openshift.io kind: Config name: cluster namespace: "" unmanaged: true This is causing cluster provisioning to fail, because when the operator encounters this manifest... $ cat 0000_30_config-operator_01_operator.cr.yaml apiVersion: operator.openshift.io/v1 kind: Config metadata: name: cluster annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" spec: managementState: Managed ... the getOverrideForManifest function [1] is improperly matching it to the above "imageregistry.operator.openshift.io" override because it disregards the Group in its comparison ("imageregistry.operator.openshift.io" != "operator.openshift.io"). As a result, the cluster-config-operator has no custom resource to act on and it blocks the cluster-version-operator from ever completing: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 3h18m Working towards 4.9.7: 725 of 735 done (98% complete), waiting on config-operator [1] https://github.com/openshift/cluster-version-operator/blob/4c3a08036da8a96175b7c0445de83b58d0ea5515/pkg/cvo/sync_worker.go#L1060-L1071
> $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 3h18m Working towards 4.9.7: 725 of 735 done (98% complete), waiting on config-operator The error looks like a status during initializing or updating, right? > This is causing cluster provisioning to fail I think "cluster provisioning" here means installing a cluster? I'm not quite clear how to install a cluster with overrides set in the clusterversion. I don't think it would mean updating a cluster, because setting overrides in CV will block the update of cluster. Hi Matthew Barnes Currently i have no idea how QE reproduce the issue. Could you help give more steps on how to provision such a cluster with spec.overrides of cv?
> Currently i have no idea how QE reproduce the issue. Could you help give more steps on how to provision such a cluster with spec.overrides of cv? The example is for an Azure Red Hat OpenShift (ARO) cluster, which embeds a forked openshift-installer in our custom Azure Resource Provider code. But I think this should be reproducible with the vanilla installer. The "CVOIgnore" asset in the installer I believe is the entry point. The cluster-version-operator overrides are specified in a "manifests/cvo-overrides.yaml" file: https://github.com/openshift/installer/blob/f3f56e279b729663e3184a06e38bf27d42d58279/pkg/asset/ignition/bootstrap/cvoignore.go#L21 First run "bin/openshift-install create manifests --dir assets" and then add the override from comment #0 in "assets/manifests/cvo-overrides.yaml" So the manifest spec would look something like: spec: channel: stable-4.9 clusterID: $CLUSTERID overrides: - group: imageregistry.operator.openshift.io kind: Config name: cluster namespace: "" unmanaged: true Then create the cluster as per usual. During install, once the bootstrap phase is complete, obtain a .kubeconfig and verify this resource is missing: $ oc get config.operator cluster Also the openshift-config-operator logs will be filled with this message: ConfigOperatorController reconciliation failed: configs.operator.openshift.io "cluster" not found This will indefinitely block the cluster-version-operator from reaching 100%.
The pr has landed in the oldest available v4.10 nightly build. So I can not reproduce it on v4.10 now. Instead, with the steps, reproduced the bug on v4.9.7. 1. Add overrides in manifests/cvo-overrides.yaml before triggering an installation. spec: channel: stable-4.9 clusterID: 9a263f40-6865-475d-919c-705fc7f49f57 overrides: - kind: Config group: imageregistry.operator.openshift.io name: cluster namespace: "" unmanaged: true 2. Trigger installation with above manifest, checked that the instillation fail. level=info msg=Waiting up to 40m0s for the cluster at https://api.jliu49.qe.devcluster.openshift.com:6443 to initialize... ... level=debug msg=Still waiting for the cluster to initialize: Working towards 4.9.7: 733 of 735 done (99% complete), waiting on config-operator # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 53m Working towards 4.9.7: 733 of 735 done (99% complete), waiting on config-operator # ./oc get config cluster Error from server (NotFound): configs.operator.openshift.io "cluster" not found
Verified on 4.10.0-0.nightly-2021-11-14-184249 1. Add overrides in manifests/cvo-overrides.yaml before triggering an installation. spec: channel: stable-4.10 clusterID: 52b6a00c-aae7-422f-9673-5b5629fd23d6 overrides: - group: imageregistry.operator.openshift.io kind: Config name: cluster namespace: '' unmanaged: true 2. Trigger installation with above manifest, checked that the instillation succeed. # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-11-14-184249 True False 73m Cluster version is 4.10.0-0.nightly-2021-11-14-184249 # ./oc get clusterversion -o json|jq .items[].spec { "channel": "stable-4.10", "clusterID": "52b6a00c-aae7-422f-9673-5b5629fd23d6", "overrides": [ { "group": "imageregistry.operator.openshift.io", "kind": "Config", "name": "cluster", "namespace": "", "unmanaged": true } ] } # ./oc get config cluster NAME AGE cluster 96m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056