Created attachment 1730735 [details] Output of `oc run --generator=run-pod/v1 grpcurl-query` Description of problem: OCP 4.4.18 > 4.5.15. Pipelines operator does not fully install. The subscription is created, but there is no Install Plan or CSV associated with it. In the console however, it shows as installed. All the operators in the namespace are set for Automatic Approval. Catalog operator logs show Pipelines trying to reconcile, and then two operators failing to update: time="2020-11-12T07:50:53Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=/apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/subscriptions/openshift-pipelines-operator-rh E1112 07:50:57.953343 1 queueinformer_operator.go:290] sync {"update" "openshift-operators"} failed: error calculating generation changes due to new bundle: maistra.io/v1/ServiceMeshMemberRoll (servicemeshmemberrolls) already provided by servicemeshoperator.v1.1.9 E1112 07:50:58.133050 1 queueinformer_operator.go:290] sync "openshift-operators" failed: error calculating generation changes due to new bundle: monitoring.kiali.io/v1alpha1/MonitoringDashboard (monitoringdashboards) already provided by kiali-operator.v1.12.15 E1112 07:51:00.323794 1 queueinformer_operator.go:290] sync {"update" "openshift-operators"} failed: error calculating generation changes due to new bundle: monitoring.kiali.io/v1alpha1/MonitoringDashboard (monitoringdashboards) already provided by kiali-operator.v1.12.15 E1112 07:51:00.533827 1 queueinformer_operator.go:290] sync "openshift-operators" failed: error calculating generation changes due to new bundle: maistra.io/v1/ServiceMeshControlPlane (servicemeshcontrolplanes) already provided by servicemeshoperator.v1.1.9 It looked as if the Kiali and Service Mesh failures may be blocking Pipelines. We removed Kiali and Service Mesh, and Pipelines finally installed correctly. Version-Release number of selected component (if applicable): Pipelines v 1.1.2 Kiali v 1.12.15 Service Mesh v 1.1.9 How reproducible: Very on customer's system. I could not, but I was using newer versions of operators in my 4.5.15 quicklab. Steps to Reproduce: 1. Install older Service Mesh and Kiali operators from Operator Hub. 2. If/when they try to update themselves and fail, try to install Pipelines. 3. Actual results: Pipelines is stuck with no CSV or IP. Expected results: Pipelines to install normally. Additional info: We did also check to be sure Pipelines wasn't trying to use the Install Plans from other operators in the namespace by removing them; it still did not generate it's own.
Created attachment 1730736 [details] inspect of OLM
Created attachment 1730737 [details] Catalog Operator logs
Comment on attachment 1730735 [details] Output of `oc run --generator=run-pod/v1 grpcurl-query` Output from `oc run --generator=run-pod/v1 grpcurl-query -n openshift-marketplace --rm=true --restart=Never --attach=true --image=docker.io/fullstorydev/grpcurl -- -plaintext redhat-operators.openshift-marketplace.svc:50051 api.Registry/ListBundles`.
Circling back around on this one: The issue described is expected behavior. OLM always aggregates subscriptions in a namespace and attempts to install them as a set. If one operator is failing to install in a namespace, OLM has no guarantees that installing some and not all of those operators won't cause a conflict or dependency problem in that namespace, so it does not attempt any upgrades. In order to resolve that problem, fixing any failing subscription on that namespace is required before the installation or upgrade can proceed. From OLM's perspective, that is expected behavior and not a bug. This has been left open and not closed because in addition to that, it seems as though there is a UI issue where the upgraded operator is marked as in a succeeded state when in reality an upgrade is prevented from proceeding. My assumption is this is either a bug or improvement needed in the console in order to aggregate that status up to the UI. As a result, I'm going to reassign this bug to the console to further triage that succeeded status problem.
Jon could you please if this is actually a Bug or an RFE?
Did some research to reproduce this. If I'm interpreting this correctly, we don't want the OperatorHub card to show the "Installed" badge if the Operator is stuck in a failed installation state. I think we will need some design input to decide what we should show instead. I believe there is already a story to revamp the OperatorHub badges, and it may cover this, but I'm not sure. Will follow up with UX next sprint to see if we can figure out where to go with this.
Followed up with Tony Wu and Peter Kreuser. We are going to track this in an RFE to improve the visibility of Operator installation status on the OperatorHub page. See https://issues.redhat.com/browse/RFE-1691