Description of problem: With the 4.4.11 release, the AMQ Online CSV is failing to become ready. This issue did not appear with OpenShift 4.4.10 (when AMQ Online 1.4.4 was verified against OpenShift 4.4.10). The issue also manifests in 4.5.1. Version-Release number of selected component (if applicable): 4.4.11 4.5.1 How reproducible: 100% Steps to Reproduce: 1. oc apply -n openshift-operators -f << apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: amq-online spec: channel: stable name: amq-online source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: amq-online.1.4.4 installPlanApproval: Manual 2. Manually approve the subscription in the OpenShift Console. 3. Observe the CSV's status: oc edit csv amq-online.1.4.4 4. CSV Status sticks at "Pending" rather than "Succeeded" - dependents: - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":["iot.enmasse.io"],"resources":["iotprojects"]} status: NotSatisfied version: v1beta1 group: "" kind: ServiceAccount message: Policy rule not satisfied for service account name: iot-tenant-service status: PresentNotSatisfied Actual results: CSV Status sticks at "Pending" Expected results: CSV Status ought to be reported as "Succeed" Additional info: The issue is the same if the subscription is made from another openshift namespace. The issue also affects with AMQ Online's most recent release 1.5.0.
Relates to second aspect of this case: https://access.redhat.com/support/cases/#/case/02697817
Created attachment 1700854 [details] CSV - 4.4.10 -working
Created attachment 1700855 [details] CSV - 4.4.11 -failing
I've attached two dumps of the CSV showing the difference when AMQ Online 1.4.4 is installed on OCP 4.4.10 (working) and OCP 4.4.11 (failing - sticks in Pending state).
Hey team, Can anyone please have a look at the case, The client is not happy.
To quickly help the customer, it should be enough to create a ClusterRole/RoleBinding that grants {"verbs":["get","list","watch"],"apiGroups":["iot.enmasse.io"],"resources":["iotprojects"]} to the iot-tenant-service serviceacounnt manually. To further debug this, it would be helpful if we could see the installplan for the 4.4.11 install.
Created attachment 1701035 [details] amq-online-1.4.4-installplan on OpenShift 4.4.11
Evan I can confirm that manually applying the ClusterRoleBinding/ClusterRole for the affected ServiceAccount from AMQ Online's install bundle does workaround the issue. This works around the issue for AMQ Online 1.4.4. The same approach works for AMQ Online 1.5.0 which had two Service Accounts exhibiting the PresentNotSatisfied symptom. Can I help with any more artefacts to help you establish root cause?
Any progress in understanding root cause? The workaround has been given to the client, but this still means AMQ Online is broken when installed from OLM for all other users.
I've confirmed that this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1855088 Here's the excerpt from `clusterPermissions` on AMQ's CSV { "serviceAccountName": "iot-protocol-adapter", "rules": [ { "verbs": [ "get", "list", "watch" ], "apiGroups": [ "iot.enmasse.io" ], "resources": [ "iotprojects" ] } ] }, { "serviceAccountName": "iot-tenant-service", "rules": [ { "verbs": [ "get", "list", "watch" ], "apiGroups": [ "iot.enmasse.io" ], "resources": [ "iotprojects" ] } ] } this triggers the linked bug because both ClusterRoles have the same set of permissions. Any cluster that has hit this issue can be fixed by manually creating the missing ClusterRoles. We can work around this issue in AMQ manifests by adding an extra, dummy permissionrule to any ClusterRoles with duplicated rules. (perhaps access to a specific resource that the serivceaccount should already have access to?) And the fix in OLM will come via 1855088 *** This bug has been marked as a duplicate of bug 1855088 ***
Thanks Evan for the explanation.