Bug 1856413 - AMQ Online CSV sticks at Pending since OpenShift 4.4.11
Summary: AMQ Online CSV sticks at Pending since OpenShift 4.4.11
Keywords:
Status: CLOSED DUPLICATE of bug 1855088
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-13 14:51 UTC by Keith Wall
Modified: 2020-07-17 07:34 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-16 15:21:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CSV - 4.4.10 -working (87.36 KB, text/plain)
2020-07-13 15:00 UTC, Keith Wall
no flags Details
CSV - 4.4.11 -failing (86.56 KB, text/plain)
2020-07-13 15:01 UTC, Keith Wall
no flags Details
amq-online-1.4.4-installplan on OpenShift 4.4.11 (129.46 KB, text/plain)
2020-07-14 13:50 UTC, Keith Wall
no flags Details

Description Keith Wall 2020-07-13 14:51:22 UTC
Description of problem:

With the 4.4.11 release, the AMQ Online CSV is failing to become ready.  This issue did not appear with OpenShift 4.4.10 (when AMQ Online 1.4.4 was verified against OpenShift 4.4.10).

The issue also manifests in 4.5.1.


Version-Release number of selected component (if applicable):

4.4.11
4.5.1


How reproducible:

100%


Steps to Reproduce:

1. oc apply -n openshift-operators -f  <<
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: amq-online
spec:
  channel: stable
  name: amq-online
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  startingCSV: amq-online.1.4.4
  installPlanApproval: Manual
2. Manually approve the subscription in the OpenShift Console.
3. Observe the CSV's status: oc edit csv amq-online.1.4.4
4. CSV Status sticks at "Pending" rather than "Succeeded"

 - dependents:
    - group: rbac.authorization.k8s.io
      kind: PolicyRule
      message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":["iot.enmasse.io"],"resources":["iotprojects"]}
      status: NotSatisfied
      version: v1beta1
    group: ""
    kind: ServiceAccount
    message: Policy rule not satisfied for service account
    name: iot-tenant-service
    status: PresentNotSatisfied


Actual results:

CSV Status sticks at "Pending"

Expected results:

CSV Status ought to be reported as "Succeed"

Additional info:

The issue is the same if the subscription is made from another openshift namespace.

The issue also affects with AMQ Online's most recent release 1.5.0.

Comment 1 Keith Wall 2020-07-13 14:54:06 UTC
Relates to second aspect of this case: https://access.redhat.com/support/cases/#/case/02697817

Comment 2 Keith Wall 2020-07-13 15:00:32 UTC
Created attachment 1700854 [details]
CSV - 4.4.10 -working

Comment 3 Keith Wall 2020-07-13 15:01:22 UTC
Created attachment 1700855 [details]
CSV - 4.4.11 -failing

Comment 4 Keith Wall 2020-07-13 15:02:23 UTC
I've attached two dumps of the CSV showing the difference when AMQ Online 1.4.4 is installed on OCP 4.4.10 (working) and OCP 4.4.11 (failing - sticks in Pending state).

Comment 6 Raif Ahmed 2020-07-14 08:32:34 UTC
Hey team,

Can anyone please have a look at the case, The client is not happy.

Comment 7 Evan Cordell 2020-07-14 12:57:54 UTC
To quickly help the customer, it should be enough to create a ClusterRole/RoleBinding that grants {"verbs":["get","list","watch"],"apiGroups":["iot.enmasse.io"],"resources":["iotprojects"]} to the iot-tenant-service serviceacounnt manually. 

To further debug this, it would be helpful if we could see the installplan for the 4.4.11 install.

Comment 8 Keith Wall 2020-07-14 13:50:22 UTC
Created attachment 1701035 [details]
amq-online-1.4.4-installplan on OpenShift 4.4.11

Comment 9 Keith Wall 2020-07-14 15:11:24 UTC
Evan I can confirm that manually applying the ClusterRoleBinding/ClusterRole for the affected ServiceAccount from AMQ Online's install bundle does workaround the issue.   This works around the issue for AMQ Online 1.4.4.  The same approach works for AMQ Online 1.5.0 which had two Service Accounts exhibiting the PresentNotSatisfied  symptom.
Can I help with any more artefacts to help you establish root cause?

Comment 10 Keith Wall 2020-07-15 07:41:31 UTC
Any progress in understanding root cause?  The workaround has been given to the client, but this still means AMQ Online is broken when installed from OLM for all other users.

Comment 11 Evan Cordell 2020-07-16 15:21:28 UTC
I've confirmed that this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1855088


Here's the excerpt from `clusterPermissions` on AMQ's CSV

          {
            "serviceAccountName": "iot-protocol-adapter",
            "rules": [
              {
                "verbs": [
                  "get",
                  "list",
                  "watch"
                ],
                "apiGroups": [
                  "iot.enmasse.io"
                ],
                "resources": [
                  "iotprojects"
                ]
              }
            ]
          },
          {
            "serviceAccountName": "iot-tenant-service",
            "rules": [
              {
                "verbs": [
                  "get",
                  "list",
                  "watch"
                ],
                "apiGroups": [
                  "iot.enmasse.io"
                ],
                "resources": [
                  "iotprojects"
                ]
              }
            ]
          }

this triggers the linked bug because both ClusterRoles have the same set of permissions.


Any cluster that has hit this issue can be fixed by manually creating the missing ClusterRoles.

We can work around this issue in AMQ manifests by adding an extra, dummy permissionrule to any ClusterRoles with duplicated rules. (perhaps access to a specific resource that the serivceaccount should already have access to?)

And the fix in OLM will come via 1855088

*** This bug has been marked as a duplicate of bug 1855088 ***

Comment 12 Keith Wall 2020-07-17 07:34:22 UTC
Thanks Evan for the explanation.


Note You need to log in before you can comment on or make changes to this bug.