Bug 1981832 - OLM fails with 'ResolutionFailed' found multiple channel heads
Summary: OLM fails with 'ResolutionFailed' found multiple channel heads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.9.0
Assignee: Kevin Rizza
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1983010 1986248 (view as bug list)
Depends On:
Blocks: 1982294
TreeView+ depends on / blocked
 
Reported: 2021-07-13 13:56 UTC by kliberti
Modified: 2023-09-18 00:28 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1982294 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:39:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Warning logs (46.14 KB, text/plain)
2021-07-13 13:56 UTC, kliberti
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 2154 0 None closed Deprecated inner channel entries cause resolution error. 2021-07-20 14:19:38 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:53:09 UTC

Description kliberti 2021-07-13 13:56:25 UTC
Created attachment 1801144 [details]
Warning logs

Created attachment 1801144 [details]
Warning logs

Created attachment 1801144 [details]
Warning logs

Description of problem:

When installing AMQ Streams operator via OperatorHub, the install hangs and never completes. The follow error is given:
```
I0710 01:13:39.895527       1 event.go:282] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"openshift-operators", UID:"e0ddaf02-3c94-4c2a-b274-4664d9a75ed1", APIVersion:"v1", ResourceVersion:"1941", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' found multiple channel heads: [amqstreams.v1.7.2 amqstreams.v1.6.2], please check the `replaces`/`skipRange` fields of the operator bundles
```

How reproducible:
100%

Steps to Reproduce:
1. Install AMQ Streams v1.6.2 via OperatorHub using the `amq-streams-1.6.x` channel
2. Upgrade to AMQ Streams v1.7.2 by switching the channel to `stable`, `amq-streams-1.7.x`, or `amq-streams-1.x`.

Actual results:

The AMQ Streams installation hangs and never completes

Expected results:

The AMQ Streams installation completes, installing AMQ Stream v1.7.2

Additional info:

Looks similar/potentially related to the following tickets [1] [2].


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1969902#c7
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1942522

Comment 1 Kevin Rizza 2021-07-13 19:08:18 UTC

*** This bug has been marked as a duplicate of bug 1969902 ***

Comment 2 Kevin Rizza 2021-07-13 20:03:56 UTC
After reviewing this, I actually believe that this issue and the bz I marked as a duplicate are unrelated. This issue appears to be due to the deprecated property, which wasn't being handled correctly by OLM on cluster. There's already a fix upstream for that issue: https://github.com/operator-framework/operator-lifecycle-manager/pull/2154 that still needs to be pulled down.

Reopening this bz to track that change making its way downstream, and marking the target release as 4.9.0

Comment 3 kliberti 2021-07-14 13:27:15 UTC
This fix will still be backported to OCP 4.7, correct?

Comment 4 Kevin Rizza 2021-07-14 16:03:59 UTC
Yes, it just needs to make its way back via the ocp backporting process.

This has also made its way into master, marking this as modified.

Comment 6 kuiwang 2021-07-15 03:29:43 UTC
Hi Kevin,

   Is the PR in downstream for this bug https://github.com/openshift/operator-framework-olm/pull/116?

Thanks

Comment 7 kuiwang 2021-07-16 00:58:31 UTC
Change to Assign to confirm the PR because I do not find the PR information in downstream to fix the issue.

Comment 8 Ben Luddy 2021-07-16 13:08:41 UTC
*** Bug 1983010 has been marked as a duplicate of this bug. ***

Comment 9 kliberti 2021-07-16 14:35:56 UTC
The number of our customers/users/engineers that are blocked by this issue is increasing everyday, therefore, I am raising the priority to urgent

Comment 12 Jian Zhang 2021-07-21 03:23:16 UTC
1, Install an OCP cluster that contains the fixed PR.
[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-07-20-125820   True        False         31m     Cluster version is 4.9.0-0.nightly-2021-07-20-125820

[cloud-user@preserve-olm-env jian]$ oc -n openshift-operator-lifecycle-manager exec deploy/catalog-operator  -- olm --version
OLM version: 0.18.3
git commit: 1dc76f08ed05a635458420ffa979aebbe59a3890

2, Subscribe to AMQ Stream v1.6.2
[cloud-user@preserve-olm-env jian]$ oc get sub -A
NAMESPACE   NAME          PACKAGE       SOURCE            CHANNEL
default     amq-streams   amq-streams   qe-app-registry   amq-streams-1.6.x
[cloud-user@preserve-olm-env jian]$ oc get ip -n default 
NAME            CSV                 APPROVAL    APPROVED
install-v2k67   amqstreams.v1.6.3   Automatic   true

[cloud-user@preserve-olm-env jian]$ oc get sub amq-streams -n default -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  creationTimestamp: "2021-07-21T02:54:44Z"
  generation: 1
  labels:
    operators.coreos.com/amq-streams.default: ""
  name: amq-streams
  namespace: default
  resourceVersion: "48479"
  uid: f8c2a486-6802-47e3-9179-af2d45895db2
spec:
  channel: amq-streams-1.6.x
  installPlanApproval: Automatic
  name: amq-streams
  source: qe-app-registry
  sourceNamespace: openshift-marketplace
  startingCSV: amqstreams.v1.6.3


3, Upgrade to AMQ Streams v1.7.2 by switching the channel to `stable`.
[cloud-user@preserve-olm-env jian]$ oc get ip -n default
NAME            CSV                 APPROVAL    APPROVED
install-hfsp2   amqstreams.v1.7.0   Automatic   true
install-v2k67   amqstreams.v1.6.3   Automatic   true
[cloud-user@preserve-olm-env jian]$ oc get csv -n default
NAME                DISPLAY                             VERSION   REPLACES            PHASE
amqstreams.v1.6.3   Red Hat Integration - AMQ Streams   1.6.3     amqstreams.v1.6.2   Pending
amqstreams.v1.7.0   Red Hat Integration - AMQ Streams   1.7.0     amqstreams.v1.6.3   Pending

[cloud-user@preserve-olm-env jian]$ oc logs catalog-operator-7db49d957f-8tv4p  | grep "multiple channel heads"

No multiple heads found now, looks good, verify it. Note that: the pending reason is this AMQ Stream still uses the v1beta1 CRD that is not supported in OCP4.9(K8s 1.22). Nothing with this bug.

 613 E0721 02:55:13.408530       1 queueinformer_operator.go:290] sync {"update" "default/install-v2k67"} failed: the server could not find the      requested resource

Comment 13 Thomas Jungbauer 2021-07-22 11:33:19 UTC
Is it possible that this issue is blocking any other installation of operators? 

We just saw the same error around AMQ as described above. 
When we tried to install Kiali, the Kiali operator just stays in status == Unknown. When you click into the operator it says "Unknown failure" ... nothing else, no event no pod is created. 

In the catalog-operator pods we see the amq error plus:

"an error was encountered during reconciliation" error="Operation cannot be fulfilled on subscriptions.operators.coreos.com \"kiali-ossm\": the object has been modified; please apply you changes the the latest version and try again[...]

I reproduced that in my lab environment version = 4.7.19:

1. install AMQ 1.6
2. try to upgrade AMQ 1.7.2 
   --> the upgrade will never happen because of above
3. install Kiali
   --> Kiali will never be installed
4. uninstall AMQ and Kiali
5. install Kiali again 
   --> Kiali gets immediately installed

Comment 14 Thomas Jungbauer 2021-07-22 11:36:35 UTC
Is it possible that this issue is blocking any other installation of operators? 

We just saw the same error around AMQ as described above. 
When we tried to install Kiali, the Kiali operator just stays in status == Unknown. When you click into the operator it says "Unknown failure" ... nothing else, no event no pod is created. 

In the catalog-operator pods we see the amq error plus:

"an error was encountered during reconciliation" error="Operation cannot be fulfilled on subscriptions.operators.coreos.com \"kiali-ossm\": the object has been modified; please apply you changes the the latest version and try again[...]

I reproduced that in my lab environment version = 4.7.19:

1. install AMQ 1.6
2. try to upgrade AMQ 1.7.2 
   --> the upgrade will never happen because of above
3. install Kiali
   --> Kiali will never be installed
4. uninstall AMQ and Kiali
5. install Kiali again 
   --> Kiali gets immediately installed

Comment 15 Raffael Mendes 2021-07-23 13:18:50 UTC
I think  that Thomas comment is spot-on, it's preventing other operators instalations

I tested installing Amq-streams operator and after install ACS operator, The acs operator never installed.

After removing the amq-streams operator, everything worked fine.

Comment 16 Jian Zhang 2021-07-26 03:40:04 UTC
Hi Thomas and Raffael,

Could you help provide more details? If the AMQ failed to install, and then, any other operator cannot be installed on the same namespace, that's as expected.

> "an error was encountered during reconciliation" error="Operation cannot be fulfilled on subscriptions.operators.coreos.com \"kiali-ossm\": the object has been modified; please apply you changes the the latest version and try again[...]

@Kevin, @Ben

I know this warning is from K8s mechanism, but can we mute it? It's really confusing for the users, thanks!

Comment 17 Thomas Jungbauer 2021-07-26 04:11:20 UTC
Hi,

Yes, thats true, it is affecting only the namespace where AMQ Streams is installed. But if AMQ is installed under "openshift-operators" it is blocking any other deployments there like it is required for Service Mesh.

i.e. IHAC who is using AMQ already since years and would now like to add Service Mesh, which is not possible. 

br
thomas

Comment 18 kliberti 2021-07-27 14:35:38 UTC
> Is the PR in downstream for this bug https://github.com/openshift/operator-framework-olm/pull/116?

> Change to Assign to confirm the PR because I do not find the PR information in downstream to fix the issue.


Can someone get the information Kui(kuiwang) requested so we can move this fix forward?


Kevin (krizza), do you know the answer to Kui's question or know someone who does? 


We are really eager to move this issue forward as we have more and more users hitting this issue everyday and it is starting to block the testing and release of our next product release.

Comment 21 Alexandre Kieling 2021-08-20 16:37:43 UTC
Is there any documentation about this issue available for customers?

Comment 22 kliberti 2021-08-20 17:11:15 UTC
This issue has been addressed and backported to OCP 4.7.24. 

It's worth noting that this issue was not the cause of the AMQ Streams upgrade failure, it was a symptom of a broken upgrade graph in the production OCP 4.7 index. The AMQ Streams upgrade graph has been fixed in the production OCP 4.7 index and the upgrade now works properly

This ticket can be closed

Comment 26 errata-xmlrpc 2021-10-18 17:39:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 27 errata-xmlrpc 2021-10-18 17:52:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 28 Per da Silva 2022-04-07 12:57:21 UTC
*** Bug 1986248 has been marked as a duplicate of this bug. ***

Comment 29 Red Hat Bugzilla 2023-09-18 00:28:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.