Bug 2022509
Summary: | getOverrideForManifest does not check manifest.GVK.Group | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matthew Barnes <mbarnes> |
Component: | Cluster Version Operator | Assignee: | Matthew Barnes <mbarnes> |
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.10 | CC: | aos-bugs, dramseur, jiajliu, nmalik, wking |
Target Milestone: | --- | Keywords: | ServiceDeliveryBlocker |
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The cluster-version operator (CVO) previously ignored spec.overrides[].group when deciding whether to override a manifest.
Consequence: An overrides entry might match multiple resources which only differed by group, and override more resources than the admin intended. An overrides entry with an invalid group was also still considered a match, so admins might be using invalid group values without noticing.
Fix: The CVO now requires group matching when applying configured overrides.
Result: The CVO will no longer match multiple manifests with a single override, and instead only matches the manifest with the correct group. Admins who had been using an invalid group will have to update to the correct group in order to have their override continue to match.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-10 16:26:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2022570 |
Description
Matthew Barnes
2021-11-11 20:17:23 UTC
> $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 3h18m Working towards 4.9.7: 725 of 735 done (98% complete), waiting on config-operator The error looks like a status during initializing or updating, right? > This is causing cluster provisioning to fail I think "cluster provisioning" here means installing a cluster? I'm not quite clear how to install a cluster with overrides set in the clusterversion. I don't think it would mean updating a cluster, because setting overrides in CV will block the update of cluster. Hi Matthew Barnes Currently i have no idea how QE reproduce the issue. Could you help give more steps on how to provision such a cluster with spec.overrides of cv? > Currently i have no idea how QE reproduce the issue. Could you help give more steps on how to provision such a cluster with spec.overrides of cv? The example is for an Azure Red Hat OpenShift (ARO) cluster, which embeds a forked openshift-installer in our custom Azure Resource Provider code. But I think this should be reproducible with the vanilla installer. The "CVOIgnore" asset in the installer I believe is the entry point. The cluster-version-operator overrides are specified in a "manifests/cvo-overrides.yaml" file: https://github.com/openshift/installer/blob/f3f56e279b729663e3184a06e38bf27d42d58279/pkg/asset/ignition/bootstrap/cvoignore.go#L21 First run "bin/openshift-install create manifests --dir assets" and then add the override from comment #0 in "assets/manifests/cvo-overrides.yaml" So the manifest spec would look something like: spec: channel: stable-4.9 clusterID: $CLUSTERID overrides: - group: imageregistry.operator.openshift.io kind: Config name: cluster namespace: "" unmanaged: true Then create the cluster as per usual. During install, once the bootstrap phase is complete, obtain a .kubeconfig and verify this resource is missing: $ oc get config.operator cluster Also the openshift-config-operator logs will be filled with this message: ConfigOperatorController reconciliation failed: configs.operator.openshift.io "cluster" not found This will indefinitely block the cluster-version-operator from reaching 100%. The pr has landed in the oldest available v4.10 nightly build. So I can not reproduce it on v4.10 now. Instead, with the steps, reproduced the bug on v4.9.7. 1. Add overrides in manifests/cvo-overrides.yaml before triggering an installation. spec: channel: stable-4.9 clusterID: 9a263f40-6865-475d-919c-705fc7f49f57 overrides: - kind: Config group: imageregistry.operator.openshift.io name: cluster namespace: "" unmanaged: true 2. Trigger installation with above manifest, checked that the instillation fail. level=info msg=Waiting up to 40m0s for the cluster at https://api.jliu49.qe.devcluster.openshift.com:6443 to initialize... ... level=debug msg=Still waiting for the cluster to initialize: Working towards 4.9.7: 733 of 735 done (99% complete), waiting on config-operator # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 53m Working towards 4.9.7: 733 of 735 done (99% complete), waiting on config-operator # ./oc get config cluster Error from server (NotFound): configs.operator.openshift.io "cluster" not found Verified on 4.10.0-0.nightly-2021-11-14-184249 1. Add overrides in manifests/cvo-overrides.yaml before triggering an installation. spec: channel: stable-4.10 clusterID: 52b6a00c-aae7-422f-9673-5b5629fd23d6 overrides: - group: imageregistry.operator.openshift.io kind: Config name: cluster namespace: '' unmanaged: true 2. Trigger installation with above manifest, checked that the instillation succeed. # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-11-14-184249 True False 73m Cluster version is 4.10.0-0.nightly-2021-11-14-184249 # ./oc get clusterversion -o json|jq .items[].spec { "channel": "stable-4.10", "clusterID": "52b6a00c-aae7-422f-9673-5b5629fd23d6", "overrides": [ { "group": "imageregistry.operator.openshift.io", "kind": "Config", "name": "cluster", "namespace": "", "unmanaged": true } ] } # ./oc get config cluster NAME AGE cluster 96m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |