Bug 1784024 - OLM installs both community and redhat dependencies
Summary: OLM installs both community and redhat dependencies
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.2.z
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Nick Hale
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On:
Blocks: 1805976
TreeView+ depends on / blocked
 
Reported: 2019-12-16 14:19 UTC by Chris Suszynski
Modified: 2020-07-13 17:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The application of a newly -- non-deterministically -- resolved set of dependencies was triggered when previously resolved InstallPlans no longer contained an equivalent set of manifests. Consequence: When more than one valid set of dependencies for an operator existed, an equivalent but distinct resolution could sometimes be applied over an existing one. Fix: Add a generation field to the status of the InstallPlan API and increment it upon every resolution; only apply the InstallPlan with the newest status generation. Result: Only one set of dependencies for an operator exists on the cluster at a given time.
Clone Of:
: 1805976 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:12:48 UTC
Target Upstream Version:


Attachments (Terms of Use)
Screenshot showing installed both of the dependencies (243.45 KB, image/png)
2019-12-16 14:19 UTC, Chris Suszynski
no flags Details
Logs of OLM operator (159.52 KB, text/plain)
2019-12-16 14:20 UTC, Chris Suszynski
no flags Details
A configmap for serverless-operator in namespace openshift-marketplace (130.05 KB, application/yaml)
2019-12-16 14:31 UTC, Chris Suszynski
no flags Details
A catalogsource for serverless-operator in namespace openshift-marketplace (1.38 KB, application/yaml)
2019-12-16 14:32 UTC, Chris Suszynski
no flags Details
A exact source yaml used to provision the cluster (66.67 KB, application/yaml)
2019-12-16 14:57 UTC, Chris Suszynski
no flags Details
A two install plans in openshift-operators ns (601.20 KB, application/yaml)
2019-12-16 15:05 UTC, Chris Suszynski
no flags Details
A cluster service versions in openshift-operators ns (617.78 KB, application/yaml)
2019-12-16 15:06 UTC, Chris Suszynski
no flags Details
A list of subscriptions in openshift-operators ns (22.62 KB, application/yaml)
2019-12-16 15:09 UTC, Chris Suszynski
no flags Details
A catalog-operator logs (118.27 KB, text/plain)
2019-12-16 15:13 UTC, Chris Suszynski
no flags Details


Links
System ID Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1316 None closed Bug 1784024: Use generations to prevent duplicate InstallPlans 2020-08-17 22:32:56 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:13:19 UTC

Description Chris Suszynski 2019-12-16 14:19:37 UTC
Created attachment 1645599 [details]
Screenshot showing installed both of the dependencies

Description of problem:

I ran into this strange behavior. We were testing serverless operator images build from Brew, by adding additional source of installation. Serverless operator requires service mesh operator and it's dependencies, Jeager, Kiali, and Elasticsearch. I ended up with double for Kiali, Jeager, and Service Mesh what causes breakage. I think no matter what double operators shouldn't be installed.


Version-Release number of selected component (if applicable): OCP 4.2.10


How reproducible:


Steps to Reproduce:
1. Create a clean OCP cluster
2. Enable internal registry
3. Deploy operator release candidate images into that registry
4. Add catalog source to point to that registry
5. Subscribe to an operator

Actual results:

Both community and red hat dependencies of a given operator were installed.

Expected results:

Only one dependency gets installed with preference, using catalog source publisher. In this case: Red Hat


Additional info:

$ oc get subscription --all-namespaces                           NAMESPACE             NAME                                                                PACKAGE                  SOURCE                CHANNEL
openshift-operators   elasticsearch-operator-4.2-redhat-operators-openshift-marketplace   elasticsearch-operator   redhat-operators      4.2
openshift-operators   jaeger-product-stable-redhat-operators-openshift-marketplace        jaeger-product           redhat-operators      stable
openshift-operators   jaeger-stable-community-operators-openshift-marketplace             jaeger                   community-operators   stable
openshift-operators   kiali-ossm-stable-redhat-operators-openshift-marketplace            kiali-ossm               redhat-operators      stable
openshift-operators   kiali-stable-community-operators-openshift-marketplace              kiali                    community-operators   stable
openshift-operators   maistraoperator-1.0-community-operators-openshift-marketplace       maistraoperator          community-operators   1.0
openshift-operators   serverless-operator                                                 serverless-operator      serverless-operator   techpreview
openshift-operators   servicemeshoperator-1.0-redhat-operators-openshift-marketplace      servicemeshoperator      redhat-operators      1.0

Comment 1 Chris Suszynski 2019-12-16 14:20:55 UTC
Created attachment 1645600 [details]
Logs of OLM operator

Comment 2 Chris Suszynski 2019-12-16 14:31:01 UTC
Created attachment 1645603 [details]
A configmap for serverless-operator in namespace openshift-marketplace

Comment 3 Chris Suszynski 2019-12-16 14:32:31 UTC
Created attachment 1645605 [details]
A catalogsource for serverless-operator in namespace openshift-marketplace

Comment 4 Evan Cordell 2019-12-16 14:44:15 UTC
Could you please share the full contents of the subscriptions in the namespace, as well as the ClusterServiceVersions?

The logs of the catalog-operator pod would be helpful as well.

Comment 5 Chris Suszynski 2019-12-16 14:53:25 UTC
I tried to re-do the same thing on an another cluster and outcome was different. It installed only community operators:

$ oc get subscription --all-namespaces
NAMESPACE             NAME                                                            PACKAGE               SOURCE                CHANNEL
openshift-operators   jaeger-stable-community-operators-openshift-marketplace         jaeger                community-operators   stable
openshift-operators   kiali-stable-community-operators-openshift-marketplace          kiali                 community-operators   stable
openshift-operators   maistraoperator-1.0-community-operators-openshift-marketplace   maistraoperator       community-operators   1.0
openshift-operators   serverless-operator                                             serverless-operator   serverless-operator   techpreview

Comment 6 Chris Suszynski 2019-12-16 14:57:05 UTC
Created attachment 1645614 [details]
A exact source yaml used to provision the cluster

Comment 7 Chris Suszynski 2019-12-16 15:05:15 UTC
Created attachment 1645615 [details]
A two install plans in openshift-operators ns

Comment 8 Chris Suszynski 2019-12-16 15:06:55 UTC
Created attachment 1645616 [details]
A cluster service versions in openshift-operators ns

Comment 9 Chris Suszynski 2019-12-16 15:09:18 UTC
Created attachment 1645619 [details]
A list of subscriptions in openshift-operators ns

Comment 10 Chris Suszynski 2019-12-16 15:13:58 UTC
Created attachment 1645621 [details]
A catalog-operator logs

Comment 11 Stephen Cuppett 2019-12-17 12:06:41 UTC
Setting target release to 4.4 to perform investigation on the active development branch (will be re-set/cloned where fixes & backports, if any, are required).

Comment 12 Evan Cordell 2019-12-19 19:53:49 UTC
It looks like this a bug that can be hit when dependencies exist in multiple catalogs.

What happens:

 - Dependencies are resolved once, and an installplan is generated
 - Before the installplan is applied to the cluster and the operators are actually installed, dependencies are resolved again. This can be triggered by a number of events in the cluster
 - Because there are multiple ways to satisfy the dependencies, a different set may be resolved. A new installplan is created with a different set of operators.
 - OLM checks for "in-progress" installations by looking at the resolved set in the installplan. Duplicated installplans only occur if the set differs, so OLM thinks it has found a "new" update.

Ownership invariants are enforced at the ClusterServiceVersion layer. Even when multiple installplans and multiple dependencies are resolved and created, only one of them "wins" and actually runs.

Cleaning up after this case manually can be done by deleting any CSVs in the Failed state after resolution, any Subscriptions corresponding to the failed CSVs, and the installplan that lost.

There are two things we need to do to resolve this permanently:

- We have a bug fix ready that will prevent multiple installplans from being created from the same input set of subscriptions. This will prevent the immediate issue. This is being held until we have a reproducer test, which so far has been elusive.
- We will work on a feature to globally order dependencies, so that OLM always resolves the same set every time (given the same set of subscriptions and catalogs). 

And longer term, we are looking at other ways to specify and resolve dependencies that does not leave room for interpretation.

Comment 19 Bruno Andrade 2020-03-12 11:21:45 UTC
Installed ServiceMesh Operator from redhat-operators source and only operators from this source are installed. Marking as VERIFIED

OCP Cluster Version: 4.5.0-0.nightly-2020-03-12-003015

oc get operatorsource -n openshift-marketplace -o jsonpath='{range .items[*].metadata}{.name}{"\n"}'
certified-operators
community-operators
redhat-marketplace
redhat-operators


oc get subscription --all-namespaces                                                                                                                          
NAMESPACE             NAME                                                                PACKAGE                  SOURCE             CHANNEL
openshift-operators   elasticsearch-operator-4.3-redhat-operators-openshift-marketplace   elasticsearch-operator   redhat-operators   4.3
openshift-operators   jaeger-product-stable-redhat-operators-openshift-marketplace        jaeger-product           redhat-operators   stable
openshift-operators   kiali-ossm-stable-redhat-operators-openshift-marketplace            kiali-ossm               redhat-operators   stable
openshift-operators   servicemeshoperator                                                 servicemeshoperator      redhat-operators   1.0

Comment 21 errata-xmlrpc 2020-07-13 17:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.