Bug 2022570 - getOverrideForManifest does not check manifest.GVK.Group
Summary: getOverrideForManifest does not check manifest.GVK.Group
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.9.z
Assignee: Matthew Barnes
QA Contact: liujia
Whiteboard: ARO
Depends On: 2022509
TreeView+ depends on / blocked
Reported: 2021-11-12 02:35 UTC by OpenShift BugZilla Robot
Modified: 2021-11-30 14:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-11-29 10:53:41 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 690 0 None open [release-4.9] Bug 2022570: cvo: Compare manifest group in getOverrideForManifest 2021-11-15 13:32:14 UTC
Red Hat Product Errata RHBA-2021:4834 0 None None None 2021-11-29 10:54:14 UTC

Description OpenShift BugZilla Robot 2021-11-12 02:35:26 UTC
+++ This bug was initially created as a clone of Bug #2022509 +++

Note: Filed this on GitHub (see links) but opening here too for internal tracking, as it's blocking ARO from moving to 4.9.

We have the following override in our `ClusterVersion`:

    - group: imageregistry.operator.openshift.io
      kind: Config
      name: cluster
      namespace: ""
      unmanaged: true

This is causing cluster provisioning to fail, because when the operator encounters this manifest...

$ cat 0000_30_config-operator_01_operator.cr.yaml
apiVersion: operator.openshift.io/v1
kind: Config
  name: cluster
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  managementState: Managed

... the getOverrideForManifest function [1] is improperly matching it to the above "imageregistry.operator.openshift.io" override because it disregards the Group in its comparison ("imageregistry.operator.openshift.io" != "operator.openshift.io").

As a result, the cluster-config-operator has no custom resource to act on and it blocks the cluster-version-operator from ever completing:

$ oc get clusterversion
version             False       True          3h18m   Working towards 4.9.7: 725 of 735 done (98% complete), waiting on config-operator

[1] https://github.com/openshift/cluster-version-operator/blob/4c3a08036da8a96175b7c0445de83b58d0ea5515/pkg/cvo/sync_worker.go#L1060-L1071

Comment 1 liujia 2021-11-22 06:10:24 UTC
Build a release image with openshift/cluster-version-operator#690, and checked with registry.build01.ci.openshift.org/ci-ln-btc11jk/release:latest.

1. Add overrides in manifests/cvo-overrides.yaml before triggering an installation.
    channel: stable-4.9
    clusterID: 62ed702c-99f0-4d08-a298-d7f7ab6ce15b
    - kind: Config
      group: imageregistry.operator.openshift.io
      name: cluster
      namespace: ""
      unmanaged: true

2. Trigger installation with above manifest, checked that the instillation succeed.
# ./oc get clusterversion
NAME      VERSION                                               AVAILABLE   PROGRESSING   SINCE   STATUS
version   0.0.1-0.test-2021-11-22-034619-ci-ln-btc11jk-latest   True        False         75m     Cluster version is 0.0.1-0.test-2021-11-22-034619-ci-ln-btc11jk-latest

# ./oc get clusterversion -o json|jq .items[].spec
  "channel": "stable-4.9",
  "clusterID": "62ed702c-99f0-4d08-a298-d7f7ab6ce15b",
  "overrides": [
      "group": "imageregistry.operator.openshift.io",
      "kind": "Config",
      "name": "cluster",
      "namespace": "",
      "unmanaged": true

# ./oc get config cluster
cluster   102m

Comment 5 liujia 2021-11-25 04:13:37 UTC
# ./oc adm release info registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-11-24-185059 --commits|grep cluster-version
  cluster-version-operator                       https://github.com/openshift/cluster-version-operator                       3f8522a6535648099b955f150e31b100bc6b23ef

# git log --oneline 3f8522|grep '#690'
3f8522a6 Merge pull request #690 from openshift-cherrypick-robot/cherry-pick-689-to-release-4.9

The PR was included into 4.9.0-0.nightly-2021-11-24-185059. The bug has been verified via pre-merge (comment#1) but the bot did not move it to "verified" automatically. Change the status manually.

Comment 7 errata-xmlrpc 2021-11-29 10:53:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.9 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.