Description of problem: Failed to upgrade cluster-logging-operator from 4.1 to 4.2, the OLM doesn't create new role for CLO: $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.1.12-201908130938 Cluster Logging 4.1.12-201908130938 Replacing clusterlogging.v4.2.0 Cluster Logging 4.2.0 clusterlogging.4.1.12-201908130938 Pending elasticsearch-operator.v4.2.0 Elasticsearch Operator 4.2.0 elasticsearch-operator.4.1.12-201908130938 Succeeded status in clusterlogging.v4.2.0: Status: Certs Last Updated: <nil> Certs Rotate At: <nil> Conditions: Last Transition Time: 2019-08-15T06:40:33Z Last Update Time: 2019-08-15T06:40:33Z Message: requirements not yet checked Phase: Pending Reason: RequirementsUnknown Last Transition Time: 2019-08-15T06:40:33Z Last Update Time: 2019-08-15T06:40:33Z Message: one or more requirements couldn't be found Phase: Pending Reason: RequirementsNotMet Last Transition Time: 2019-08-15T06:40:33Z Last Update Time: 2019-08-15T06:40:33Z Message: one or more requirements couldn't be found Phase: Pending Reason: RequirementsNotMet Requirement Status: Message: namespaced rule:{"verbs":["*"],"apiGroups":["monitoring.coreos.com"],"resources":["servicemonitors","prometheusrules"]} Status: NotSatisfied Version: v1beta1 Group: rbac.authorization.k8s.io Group: Kind: ServiceAccount Message: Policy rule not satisfied for service account Name: cluster-logging-operator Status: PresentNotSatisfied Version: v1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RequirementsUnknown 26m operator-lifecycle-manager requirements not yet checked Normal RequirementsNotMet 26m operator-lifecycle-manager one or more requirements couldn't be found $ oc get role NAME AGE clusterlogging.4.1.12-201908130938-gp5h9 38m log-collector-privileged 37m sharing-config-reader 38m $ oc get role clusterlogging.4.1.12-201908130938-gp5h9 -oyaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: creationTimestamp: "2019-08-15T06:34:01Z" labels: olm.owner: clusterlogging.4.1.12-201908130938 olm.owner.kind: ClusterServiceVersion olm.owner.namespace: openshift-logging name: clusterlogging.4.1.12-201908130938-gp5h9 namespace: openshift-logging ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: clusterlogging.4.1.12-201908130938 uid: aa96d9d5-bf26-11e9-ba3f-0a1f60e86372 resourceVersion: "101322" selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles/clusterlogging.4.1.12-201908130938-gp5h9 uid: ab303e8e-bf26-11e9-ba3f-0a1f60e86372 rules: - apiGroups: - logging.openshift.io resources: - '*' verbs: - '*' - apiGroups: - "" resources: - pods - services - endpoints - persistentvolumeclaims - events - configmaps - secrets - serviceaccounts verbs: - '*' - apiGroups: - apps resources: - deployments - daemonsets - replicasets - statefulsets verbs: - '*' - apiGroups: - route.openshift.io resources: - routes - routes/custom-host verbs: - '*' - apiGroups: - batch resources: - cronjobs verbs: - '*' - apiGroups: - rbac.authorization.k8s.io resources: - roles - rolebindings verbs: - '*' - apiGroups: - security.openshift.io resourceNames: - privileged resources: - securitycontextconstraints verbs: - use - apiGroups: - monitoring.coreos.com resources: - servicemonitors verbs: - '*' Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-14-235244 True False 4h23m Cluster version is 4.2.0-0.nightly-2019-08-14-235244 How reproducible: Always Steps to Reproduce: 1.Create a new opsrc, set registryNamespace to `aosqe42`, there have bundle files of 4.1 and 4.2 2.subscribe CLO 4.1, the channel is `preview`, the operator could be deployed successfully 3.manually change the channel to `4.2` to subscribe 4.2 CLO 4.check the status of csv, the clusterlogging.v4.2.0 is in pending status Actual results: upgrade CLO from 4.1 to 4.2 failed Expected results: Could upgrade successfully Additional info: The CSV file for CLO: https://raw.githubusercontent.com/openshift/cluster-logging-operator/release-4.2/manifests/4.2/cluster-logging.v4.2.0.clusterserviceversion.yaml and https://raw.githubusercontent.com/openshift/cluster-logging-operator/release-4.1/manifests/4.1/cluster-logging.v4.1.0.clusterserviceversion.yaml
OLM failed to create "prometheusrules" in the existing apiGroup: "monitoring.coreos.com". Requirement Status: Message: namespaced rule:{"verbs":["*"],"apiGroups":["monitoring.coreos.com"],"resources":["servicemonitors","prometheusrules"]} Status: NotSatisfied Version: v1beta1 Group: rbac.authorization.k8s.io Based on my understanding, the root cause is that OLM failed to create new resources into an already existing API group.
From my observation, the resources SA, role, rolebindings and secrets created by OLM are all have some labels indicate the resources are owned by which CSV, e.g.: labels: olm.owner: clusterlogging.4.1.12-201908130938 olm.owner.kind: ClusterServiceVersion olm.owner.namespace: openshift-logging name: clusterlogging.4.1.12-201908130938-gp5h9 namespace: openshift-logging ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: clusterlogging.4.1.12-201908130938 uid: aa96d9d5-bf26-11e9-ba3f-0a1f60e86372 when upgrading to another version, the OLM would create new resources or update resources as needed. So I guess the problem here is: when upgrading CLO from 4.1 to 4.2, the permissions in the role are changed, but the OLM doesn't create a new role, then the upgrade failed. I've tried to add the correct permissions to the role, then the upgrade could go on. Since I didn't not update the labels and ownerReferences, the resources related to csv clusterlogging.4.1.12-201908130938 were all deleted after upgrading to 4.2 successfully.
Could you share the InstallPlans that are generated in the namespace? That will help debug this. I wrote an additional e2e test to verify this here: https://github.com/operator-framework/operator-lifecycle-manager/pull/998 (our CI is currently blocked an another issues that will be resolved soon, and these should pass)
Created attachment 1606813 [details] install plan
The new installplan is in the Failed state: ``` conditions: - lastTransitionTime: "2019-08-22T00:57:11Z" lastUpdateTime: "2019-08-22T00:57:11Z" message: 'error missing existing CRD version(s) in new CRD: clusterloggings.logging.openshift.io: not allowing CRD (clusterloggings.logging.openshift.io) update with unincluded version {v1 true true nil nil []}' reason: InstallComponentFailed status: "False" type: Installed phase: Failed ``` From the error, it looks like CLO removed a CRD apiversion in an update. Please see our WIP docs on CRD versioning rules: https://github.com/operator-framework/operator-lifecycle-manager/blob/61d66d74ca8e76cef7692d1cc4cbac7da7b3a87a/Documentation/design/dependency-resolution.md (these will be merged shortly). The e2e test that I linked above passes with no issues, and tests both amplification and attenuation of permissions between operator upgrades.
Verified with lasted nightly build, the CLO could upgrade successfully. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-22-201424 True False 2m33s Cluster version is 4.2.0-0.nightly-2019-08-22-201424 $ oc exec -n openshift-operator-lifecycle-manager olm-operator-69bc98c6ff-kz9bg -- olm --version OLM version: 0.11.0 git commit: 55d504a1de95e8820d0dcc02b14f6c8d15edff4f $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.v4.2.0 Cluster Logging 4.2.0 clusterlogging.4.1.12-201908130938 Succeeded elasticsearch-operator.v4.2.0 Elasticsearch Operator 4.2.0 elasticsearch-operator.4.1.12-201908130938 Succeeded $ oc get role NAME AGE clusterlogging.v4.2.0-wjnw8 36s log-collector-privileged 4m56s sharing-config-reader 5m2s $ oc get rolebindings NAME AGE clusterlogging.v4.2.0-wjnw8-cluster-logging-operator-k4j47 45s log-collector-privileged-binding 5m5s openshift-logging-sharing-config-reader-binding 5m10s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922