Bug 2022570

Summary: getOverrideForManifest does not check manifest.GVK.Group
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Cluster Version OperatorAssignee: Matthew Barnes <mbarnes>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: aos-bugs, dramseur, jiajliu, jokerman, nmalik, rogbas, wking
Target Milestone: ---Keywords: ServiceDeliveryBlocker
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ARO
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-29 10:53:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 2022509    
Bug Blocks:    

Description OpenShift BugZilla Robot 2021-11-12 02:35:26 UTC
+++ This bug was initially created as a clone of Bug #2022509 +++

Note: Filed this on GitHub (see links) but opening here too for internal tracking, as it's blocking ARO from moving to 4.9.

We have the following override in our `ClusterVersion`:

    - group: imageregistry.operator.openshift.io
      kind: Config
      name: cluster
      namespace: ""
      unmanaged: true

This is causing cluster provisioning to fail, because when the operator encounters this manifest...

$ cat 0000_30_config-operator_01_operator.cr.yaml
apiVersion: operator.openshift.io/v1
kind: Config
  name: cluster
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  managementState: Managed

... the getOverrideForManifest function [1] is improperly matching it to the above "imageregistry.operator.openshift.io" override because it disregards the Group in its comparison ("imageregistry.operator.openshift.io" != "operator.openshift.io").

As a result, the cluster-config-operator has no custom resource to act on and it blocks the cluster-version-operator from ever completing:

$ oc get clusterversion
version             False       True          3h18m   Working towards 4.9.7: 725 of 735 done (98% complete), waiting on config-operator

[1] https://github.com/openshift/cluster-version-operator/blob/4c3a08036da8a96175b7c0445de83b58d0ea5515/pkg/cvo/sync_worker.go#L1060-L1071

Comment 1 liujia 2021-11-22 06:10:24 UTC
Build a release image with openshift/cluster-version-operator#690, and checked with registry.build01.ci.openshift.org/ci-ln-btc11jk/release:latest.

1. Add overrides in manifests/cvo-overrides.yaml before triggering an installation.
    channel: stable-4.9
    clusterID: 62ed702c-99f0-4d08-a298-d7f7ab6ce15b
    - kind: Config
      group: imageregistry.operator.openshift.io
      name: cluster
      namespace: ""
      unmanaged: true

2. Trigger installation with above manifest, checked that the instillation succeed.
# ./oc get clusterversion
NAME      VERSION                                               AVAILABLE   PROGRESSING   SINCE   STATUS
version   0.0.1-0.test-2021-11-22-034619-ci-ln-btc11jk-latest   True        False         75m     Cluster version is 0.0.1-0.test-2021-11-22-034619-ci-ln-btc11jk-latest

# ./oc get clusterversion -o json|jq .items[].spec
  "channel": "stable-4.9",
  "clusterID": "62ed702c-99f0-4d08-a298-d7f7ab6ce15b",
  "overrides": [
      "group": "imageregistry.operator.openshift.io",
      "kind": "Config",
      "name": "cluster",
      "namespace": "",
      "unmanaged": true

# ./oc get config cluster
cluster   102m

Comment 5 liujia 2021-11-25 04:13:37 UTC
# ./oc adm release info registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-11-24-185059 --commits|grep cluster-version
  cluster-version-operator                       https://github.com/openshift/cluster-version-operator                       3f8522a6535648099b955f150e31b100bc6b23ef

# git log --oneline 3f8522|grep '#690'
3f8522a6 Merge pull request #690 from openshift-cherrypick-robot/cherry-pick-689-to-release-4.9

The PR was included into 4.9.0-0.nightly-2021-11-24-185059. The bug has been verified via pre-merge (comment#1) but the bot did not move it to "verified" automatically. Change the status manually.

Comment 7 errata-xmlrpc 2021-11-29 10:53:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.9 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.