Bug 1741475
Summary: | OLM doesn't create new role for CLO when upgrading CLO from 4.1 to 4.2. | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | ||||
Component: | OLM | Assignee: | Evan Cordell <ecordell> | ||||
OLM sub component: | OLM | QA Contact: | Qiaoling Tang <qitang> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | bandrade, chuo, jfan, scolange | ||||
Version: | 4.2.0 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.2.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-10-16 06:36:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Qiaoling Tang
2019-08-15 09:17:00 UTC
OLM failed to create "prometheusrules" in the existing apiGroup: "monitoring.coreos.com". Requirement Status: Message: namespaced rule:{"verbs":["*"],"apiGroups":["monitoring.coreos.com"],"resources":["servicemonitors","prometheusrules"]} Status: NotSatisfied Version: v1beta1 Group: rbac.authorization.k8s.io Based on my understanding, the root cause is that OLM failed to create new resources into an already existing API group. From my observation, the resources SA, role, rolebindings and secrets created by OLM are all have some labels indicate the resources are owned by which CSV, e.g.: labels: olm.owner: clusterlogging.4.1.12-201908130938 olm.owner.kind: ClusterServiceVersion olm.owner.namespace: openshift-logging name: clusterlogging.4.1.12-201908130938-gp5h9 namespace: openshift-logging ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: clusterlogging.4.1.12-201908130938 uid: aa96d9d5-bf26-11e9-ba3f-0a1f60e86372 when upgrading to another version, the OLM would create new resources or update resources as needed. So I guess the problem here is: when upgrading CLO from 4.1 to 4.2, the permissions in the role are changed, but the OLM doesn't create a new role, then the upgrade failed. I've tried to add the correct permissions to the role, then the upgrade could go on. Since I didn't not update the labels and ownerReferences, the resources related to csv clusterlogging.4.1.12-201908130938 were all deleted after upgrading to 4.2 successfully. Could you share the InstallPlans that are generated in the namespace? That will help debug this. I wrote an additional e2e test to verify this here: https://github.com/operator-framework/operator-lifecycle-manager/pull/998 (our CI is currently blocked an another issues that will be resolved soon, and these should pass) Created attachment 1606813 [details]
install plan
The new installplan is in the Failed state: ``` conditions: - lastTransitionTime: "2019-08-22T00:57:11Z" lastUpdateTime: "2019-08-22T00:57:11Z" message: 'error missing existing CRD version(s) in new CRD: clusterloggings.logging.openshift.io: not allowing CRD (clusterloggings.logging.openshift.io) update with unincluded version {v1 true true nil nil []}' reason: InstallComponentFailed status: "False" type: Installed phase: Failed ``` From the error, it looks like CLO removed a CRD apiversion in an update. Please see our WIP docs on CRD versioning rules: https://github.com/operator-framework/operator-lifecycle-manager/blob/61d66d74ca8e76cef7692d1cc4cbac7da7b3a87a/Documentation/design/dependency-resolution.md (these will be merged shortly). The e2e test that I linked above passes with no issues, and tests both amplification and attenuation of permissions between operator upgrades. Verified with lasted nightly build, the CLO could upgrade successfully. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-22-201424 True False 2m33s Cluster version is 4.2.0-0.nightly-2019-08-22-201424 $ oc exec -n openshift-operator-lifecycle-manager olm-operator-69bc98c6ff-kz9bg -- olm --version OLM version: 0.11.0 git commit: 55d504a1de95e8820d0dcc02b14f6c8d15edff4f $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.v4.2.0 Cluster Logging 4.2.0 clusterlogging.4.1.12-201908130938 Succeeded elasticsearch-operator.v4.2.0 Elasticsearch Operator 4.2.0 elasticsearch-operator.4.1.12-201908130938 Succeeded $ oc get role NAME AGE clusterlogging.v4.2.0-wjnw8 36s log-collector-privileged 4m56s sharing-config-reader 5m2s $ oc get rolebindings NAME AGE clusterlogging.v4.2.0-wjnw8-cluster-logging-operator-k4j47 45s log-collector-privileged-binding 5m5s openshift-logging-sharing-config-reader-binding 5m10s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |