Bug 1899359 - Pipelines Operator will not install when other operators fail to update
Summary: Pipelines Operator will not install when other operators fail to update
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.5
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Jon Jackson
QA Contact: Yadan Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-18 23:55 UTC by cshepher
Modified: 2024-06-13 23:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-08 22:29:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Output of `oc run --generator=run-pod/v1 grpcurl-query` (122.87 KB, text/plain)
2020-11-18 23:55 UTC, cshepher
no flags Details
inspect of OLM (4.98 MB, application/gzip)
2020-11-18 23:59 UTC, cshepher
no flags Details
Catalog Operator logs (8.15 MB, text/plain)
2020-11-19 00:00 UTC, cshepher
no flags Details

Description cshepher 2020-11-18 23:55:45 UTC
Created attachment 1730735 [details]
Output of `oc run --generator=run-pod/v1 grpcurl-query`

Description of problem:
OCP 4.4.18 > 4.5.15. Pipelines operator does not fully install.  The subscription is created, but there is no Install Plan or CSV associated with it.  In the console however, it shows as installed.  All the operators in the namespace are set for Automatic Approval.  Catalog operator logs show Pipelines trying to reconcile, and then two operators failing to update:

time="2020-11-12T07:50:53Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/subscriptions/openshift-pipelines-operator-rh
E1112 07:50:57.953343       1 queueinformer_operator.go:290] sync {"update" "openshift-operators"} failed: error calculating generation changes due to new bundle: maistra.io/v1/ServiceMeshMemberRoll (servicemeshmemberrolls) already provided by servicemeshoperator.v1.1.9
E1112 07:50:58.133050       1 queueinformer_operator.go:290] sync "openshift-operators" failed: error calculating generation changes due to new bundle: monitoring.kiali.io/v1alpha1/MonitoringDashboard (monitoringdashboards) already provided by kiali-operator.v1.12.15
E1112 07:51:00.323794       1 queueinformer_operator.go:290] sync {"update" "openshift-operators"} failed: error calculating generation changes due to new bundle: monitoring.kiali.io/v1alpha1/MonitoringDashboard (monitoringdashboards) already provided by kiali-operator.v1.12.15
E1112 07:51:00.533827       1 queueinformer_operator.go:290] sync "openshift-operators" failed: error calculating generation changes due to new bundle: maistra.io/v1/ServiceMeshControlPlane (servicemeshcontrolplanes) already provided by servicemeshoperator.v1.1.9

It looked as if the Kiali and Service Mesh failures may be blocking Pipelines.  We removed Kiali and Service Mesh, and Pipelines finally installed correctly.

Version-Release number of selected component (if applicable):
Pipelines v 1.1.2
Kiali v 1.12.15
Service Mesh v 1.1.9

How reproducible:
Very on customer's system.  I could not, but I was using newer versions of operators in my 4.5.15 quicklab.

Steps to Reproduce:
1. Install older Service Mesh and Kiali operators from Operator Hub.
2. If/when they try to update themselves and fail, try to install Pipelines.
3.

Actual results:
Pipelines is stuck with no CSV or IP.

Expected results:
Pipelines to install normally.

Additional info:
We did also check to be sure Pipelines wasn't trying to use the Install Plans from other operators in the namespace by removing them; it still did not generate it's own.

Comment 1 cshepher 2020-11-18 23:59:30 UTC
Created attachment 1730736 [details]
inspect of OLM

Comment 2 cshepher 2020-11-19 00:00:38 UTC
Created attachment 1730737 [details]
Catalog Operator logs

Comment 3 cshepher 2020-11-19 00:01:29 UTC
Comment on attachment 1730735 [details]
Output of `oc run --generator=run-pod/v1 grpcurl-query`

Output from `oc run --generator=run-pod/v1 grpcurl-query -n openshift-marketplace --rm=true --restart=Never --attach=true --image=docker.io/fullstorydev/grpcurl -- -plaintext redhat-operators.openshift-marketplace.svc:50051 api.Registry/ListBundles`.

Comment 5 Kevin Rizza 2021-01-14 12:43:00 UTC
Circling back around on this one:

The issue described is expected behavior. OLM always aggregates subscriptions in a namespace and attempts to install them as a set. If one operator is failing to install in a namespace, OLM has no guarantees that installing some and not all of those operators won't cause a conflict or dependency problem in that namespace, so it does not attempt any upgrades. In order to resolve that problem, fixing any failing subscription on that namespace is required before the installation or upgrade can proceed.

From OLM's perspective, that is expected behavior and not a bug.

This has been left open and not closed because in addition to that, it seems as though there is a UI issue where the upgraded operator is marked as in a succeeded state when in reality an upgrade is prevented from proceeding. My assumption is this is either a bug or improvement needed in the console in order to aggregate that status up to the UI. As a result, I'm going to reassign this bug to the console to further triage that succeeded status problem.

Comment 6 Jakub Hadvig 2021-01-15 11:23:39 UTC
Jon could you please if this is actually a Bug or an RFE?

Comment 9 Jon Jackson 2021-02-25 15:51:56 UTC
Did some research to reproduce this. If I'm interpreting this correctly, we don't want the OperatorHub card to show the "Installed" badge if the Operator is stuck in a failed installation state. I think we will need some design input to decide what we should show instead. I believe there is already a story to revamp the OperatorHub badges, and it may cover this, but I'm not sure. Will follow up with UX next sprint to see if we can figure out where to go with this.

Comment 10 Jon Jackson 2021-03-08 22:28:43 UTC
Followed up with Tony Wu and Peter Kreuser. We are going to track this in an RFE to improve the visibility of Operator installation status on the OperatorHub page. See https://issues.redhat.com/browse/RFE-1691


Note You need to log in before you can comment on or make changes to this bug.