Bug 1784024

Summary: OLM installs both community and redhat dependencies
Product: OpenShift Container Platform Reporter: Chris Suszynski <ksuszyns>
Component: OLMAssignee: Nick Hale <nhale>
OLM sub component: OLM QA Contact: Bruno Andrade <bandrade>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: bandrade, jiazha, scolange, scuppett
Version: 4.2.z   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The application of a newly -- non-deterministically -- resolved set of dependencies was triggered when previously resolved InstallPlans no longer contained an equivalent set of manifests. Consequence: When more than one valid set of dependencies for an operator existed, an equivalent but distinct resolution could sometimes be applied over an existing one. Fix: Add a generation field to the status of the InstallPlan API and increment it upon every resolution; only apply the InstallPlan with the newest status generation. Result: Only one set of dependencies for an operator exists on the cluster at a given time.
Story Points: ---
Clone Of:
: 1805976 (view as bug list) Environment:
Last Closed: 2020-07-13 17:12:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1805976    
Attachments:
Description Flags
Screenshot showing installed both of the dependencies
none
Logs of OLM operator
none
A configmap for serverless-operator in namespace openshift-marketplace
none
A catalogsource for serverless-operator in namespace openshift-marketplace
none
A exact source yaml used to provision the cluster
none
A two install plans in openshift-operators ns
none
A cluster service versions in openshift-operators ns
none
A list of subscriptions in openshift-operators ns
none
A catalog-operator logs none

Description Chris Suszynski 2019-12-16 14:19:37 UTC
Created attachment 1645599 [details]
Screenshot showing installed both of the dependencies

Description of problem:

I ran into this strange behavior. We were testing serverless operator images build from Brew, by adding additional source of installation. Serverless operator requires service mesh operator and it's dependencies, Jeager, Kiali, and Elasticsearch. I ended up with double for Kiali, Jeager, and Service Mesh what causes breakage. I think no matter what double operators shouldn't be installed.


Version-Release number of selected component (if applicable): OCP 4.2.10


How reproducible:


Steps to Reproduce:
1. Create a clean OCP cluster
2. Enable internal registry
3. Deploy operator release candidate images into that registry
4. Add catalog source to point to that registry
5. Subscribe to an operator

Actual results:

Both community and red hat dependencies of a given operator were installed.

Expected results:

Only one dependency gets installed with preference, using catalog source publisher. In this case: Red Hat


Additional info:

$ oc get subscription --all-namespaces                           NAMESPACE             NAME                                                                PACKAGE                  SOURCE                CHANNEL
openshift-operators   elasticsearch-operator-4.2-redhat-operators-openshift-marketplace   elasticsearch-operator   redhat-operators      4.2
openshift-operators   jaeger-product-stable-redhat-operators-openshift-marketplace        jaeger-product           redhat-operators      stable
openshift-operators   jaeger-stable-community-operators-openshift-marketplace             jaeger                   community-operators   stable
openshift-operators   kiali-ossm-stable-redhat-operators-openshift-marketplace            kiali-ossm               redhat-operators      stable
openshift-operators   kiali-stable-community-operators-openshift-marketplace              kiali                    community-operators   stable
openshift-operators   maistraoperator-1.0-community-operators-openshift-marketplace       maistraoperator          community-operators   1.0
openshift-operators   serverless-operator                                                 serverless-operator      serverless-operator   techpreview
openshift-operators   servicemeshoperator-1.0-redhat-operators-openshift-marketplace      servicemeshoperator      redhat-operators      1.0

Comment 1 Chris Suszynski 2019-12-16 14:20:55 UTC
Created attachment 1645600 [details]
Logs of OLM operator

Comment 2 Chris Suszynski 2019-12-16 14:31:01 UTC
Created attachment 1645603 [details]
A configmap for serverless-operator in namespace openshift-marketplace

Comment 3 Chris Suszynski 2019-12-16 14:32:31 UTC
Created attachment 1645605 [details]
A catalogsource for serverless-operator in namespace openshift-marketplace

Comment 4 Evan Cordell 2019-12-16 14:44:15 UTC
Could you please share the full contents of the subscriptions in the namespace, as well as the ClusterServiceVersions?

The logs of the catalog-operator pod would be helpful as well.

Comment 5 Chris Suszynski 2019-12-16 14:53:25 UTC
I tried to re-do the same thing on an another cluster and outcome was different. It installed only community operators:

$ oc get subscription --all-namespaces
NAMESPACE             NAME                                                            PACKAGE               SOURCE                CHANNEL
openshift-operators   jaeger-stable-community-operators-openshift-marketplace         jaeger                community-operators   stable
openshift-operators   kiali-stable-community-operators-openshift-marketplace          kiali                 community-operators   stable
openshift-operators   maistraoperator-1.0-community-operators-openshift-marketplace   maistraoperator       community-operators   1.0
openshift-operators   serverless-operator                                             serverless-operator   serverless-operator   techpreview

Comment 6 Chris Suszynski 2019-12-16 14:57:05 UTC
Created attachment 1645614 [details]
A exact source yaml used to provision the cluster

Comment 7 Chris Suszynski 2019-12-16 15:05:15 UTC
Created attachment 1645615 [details]
A two install plans in openshift-operators ns

Comment 8 Chris Suszynski 2019-12-16 15:06:55 UTC
Created attachment 1645616 [details]
A cluster service versions in openshift-operators ns

Comment 9 Chris Suszynski 2019-12-16 15:09:18 UTC
Created attachment 1645619 [details]
A list of subscriptions in openshift-operators ns

Comment 10 Chris Suszynski 2019-12-16 15:13:58 UTC
Created attachment 1645621 [details]
A catalog-operator logs

Comment 11 Stephen Cuppett 2019-12-17 12:06:41 UTC
Setting target release to 4.4 to perform investigation on the active development branch (will be re-set/cloned where fixes & backports, if any, are required).

Comment 12 Evan Cordell 2019-12-19 19:53:49 UTC
It looks like this a bug that can be hit when dependencies exist in multiple catalogs.

What happens:

 - Dependencies are resolved once, and an installplan is generated
 - Before the installplan is applied to the cluster and the operators are actually installed, dependencies are resolved again. This can be triggered by a number of events in the cluster
 - Because there are multiple ways to satisfy the dependencies, a different set may be resolved. A new installplan is created with a different set of operators.
 - OLM checks for "in-progress" installations by looking at the resolved set in the installplan. Duplicated installplans only occur if the set differs, so OLM thinks it has found a "new" update.

Ownership invariants are enforced at the ClusterServiceVersion layer. Even when multiple installplans and multiple dependencies are resolved and created, only one of them "wins" and actually runs.

Cleaning up after this case manually can be done by deleting any CSVs in the Failed state after resolution, any Subscriptions corresponding to the failed CSVs, and the installplan that lost.

There are two things we need to do to resolve this permanently:

- We have a bug fix ready that will prevent multiple installplans from being created from the same input set of subscriptions. This will prevent the immediate issue. This is being held until we have a reproducer test, which so far has been elusive.
- We will work on a feature to globally order dependencies, so that OLM always resolves the same set every time (given the same set of subscriptions and catalogs). 

And longer term, we are looking at other ways to specify and resolve dependencies that does not leave room for interpretation.

Comment 19 Bruno Andrade 2020-03-12 11:21:45 UTC
Installed ServiceMesh Operator from redhat-operators source and only operators from this source are installed. Marking as VERIFIED

OCP Cluster Version: 4.5.0-0.nightly-2020-03-12-003015

oc get operatorsource -n openshift-marketplace -o jsonpath='{range .items[*].metadata}{.name}{"\n"}'
certified-operators
community-operators
redhat-marketplace
redhat-operators


oc get subscription --all-namespaces                                                                                                                          
NAMESPACE             NAME                                                                PACKAGE                  SOURCE             CHANNEL
openshift-operators   elasticsearch-operator-4.3-redhat-operators-openshift-marketplace   elasticsearch-operator   redhat-operators   4.3
openshift-operators   jaeger-product-stable-redhat-operators-openshift-marketplace        jaeger-product           redhat-operators   stable
openshift-operators   kiali-ossm-stable-redhat-operators-openshift-marketplace            kiali-ossm               redhat-operators   stable
openshift-operators   servicemeshoperator                                                 servicemeshoperator      redhat-operators   1.0

Comment 21 errata-xmlrpc 2020-07-13 17:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409