Bug 1818788 - Operator update is failing due to missing replace field in Operator CSV
Summary: Operator update is failing due to missing replace field in Operator CSV
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.z
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 4.5.0
Assignee: Alexander Greene
QA Contact: Jian Zhang
: 1821175 1829955 (view as bug list)
Depends On:
Blocks: 1827821 1828007
TreeView+ depends on / blocked
Reported: 2020-03-30 11:36 UTC by Simon Reber
Modified: 2020-07-13 17:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: If an operator is being upgraded that provides a required API whose GVK has not changed since the previous version of the operator and the operator that depends on the API uses a skipRange instead of the Spec.Replaces field, OLM fails to generate the "upgraded CSV" with the correct replaces field. Specifically, OLM would: 1. Add the new operator to the generation, and marking the APIs it provides as "present". 2. Remove the old operator from the generation, marking the APIs it provides as "absent", despite being provided by the new version of the operator. 3. Attempt to resolve the "missing" apis, overwriting the the new version of the operator with a copy that does not have its Spec.Replaces field set. Consequence: Certain operators would fail to upgrade to new versions. Fix: OLM was updated to remove the old operator from the current generation before adding the new operator to the generation. Result: The upgrade will succeed as expected.
Clone Of:
Last Closed: 2020-07-13 17:24:11 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1483 None closed Bug 1818788: Fix Operator Generation code 2020-09-21 14:49:57 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:24:33 UTC

Description Simon Reber 2020-03-30 11:36:48 UTC
Description of problem:

Updating to OpenShift Container Platform 4.3.8 triggered also `elasticsearch-operator` from being updated. This update though failed and got stuck because the CSV was not correctly rolled and therefore the CSV for the older version was active as well as the CSV for the new version.

This caused a ownership conflict which could only be resolved by manually removing the CSV from the old `elasticsearch` operator version

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.3.8

How reproducible:

 - N/A

Steps to Reproduce:
1. N/A

Actual results:

Update of `elasticsearch-operator` was stuck, impacting additional operators from being able to get installed

Expected results:

Update to work and to avoid one failing part to impact the entire operator installation and update capabilities

Additional info:

Comment 6 Evan Cordell 2020-04-06 13:47:05 UTC
*** Bug 1821175 has been marked as a duplicate of this bug. ***

Comment 8 Xiangjing Li 2020-04-09 06:28:34 UTC
That is exactly the case we encountered when install our ACM (open multicluster management) product in openshift.

There is a ACM installer operator `multiclusterhub-operator.v0.0.1` and it requires a community operator `multicluster-operators-subscription`.

After we upgrade multicluster-operators-subscription from v0.1.4 to v0.1.5, the v0.1.5 automatic upgrade is stuck in the ACM env where v0.1.4 is installed.

See below, the old version v 0.1.4 is still active while the v0.1.5 is not replacing to v0.1.4 as expected. That caused the "conflicting CRD owner in namespace" error.

$ oc get csv --all-namespaces 
NAMESPACE                              NAME                                         DISPLAY                              VERSION             REPLACES   PHASE
open-cluster-management                multicluster-operators-subscription.v0.1.4   Multicluster Subscription Operator   0.1.4                          Succeeded
open-cluster-management                multicluster-operators-subscription.v0.1.5   Multicluster Subscription Operator   0.1.5                          Failed
open-cluster-management                multiclusterhub-operator.v0.0.1              Multiclusterhub Operator             0.0.1                          Succeeded

By comparison,  the subscription operator can be automatically upgraded to v0.1.5 correctly if we installed the v0.1.4 via openshift operatorHub GUI. As shown below, there is only one active CSV

$ oc get csv -n openshift-operators
NAME                                         DISPLAY                              VERSION   REPLACES                                     PHASE
multicluster-operators-subscription.v0.1.5   Multicluster Subscription Operator   0.1.5     multicluster-operators-subscription.v0.1.4   Succeeded

It seems the issue happens when a dependent operator is upgraded. 

From my notice, the upgrade is not stuck if the operator was installed directly from operatorHub, meaning the operator is not a dependent one.

Comment 15 Jian Zhang 2020-04-26 09:18:18 UTC
Cluster version is 4.5.0-0.nightly-2020-04-25-170442

mac:~ jianzhang$ oc exec catalog-operator-57f779987b-lpwf6 -- olm --version
OLM version: 0.14.2
git commit: 280a2a64115aa0388c11c5472188cd3169e05661

1, installed a catsrc that pointed to the catalog image that only contained the 1.0.0 versions of the operator
mac:~ jianzhang$ oc create -f cs-1818788.yaml 
catalogsource.operators.coreos.com/agreene-operators created

mac:~ jianzhang$ cat cs-1818788.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
  name: agreene-operators
  namespace: openshift-marketplace
  displayName: Agreene Operators
  image: quay.io/agreene/busybox-dependencies:old
  sourceType: grpc

mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace
NAME                  DISPLAY               TYPE   PUBLISHER   AGE
agreene-operators     Agreene Operators     grpc               16s
certified-operators   Certified Operators   grpc   Red Hat     43m
community-operators   Community Operators   grpc   Red Hat     43m
redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     43m
redhat-operators      Red Hat Operators     grpc   Red Hat     43m
mac:~ jianzhang$ oc get pods -n openshift-marketplace
NAME                                    READY   STATUS    RESTARTS   AGE
agreene-operators-8cc6n                 1/1     Running   0          33s
certified-operators-6bb9c54bc-pdfhq     1/1     Running   0          43m
community-operators-7c8bb898b5-t2kmd    1/1     Running   0          43m
marketplace-operator-684575bdb9-t929c   1/1     Running   0          44m
redhat-marketplace-6c598b5785-fmj2c     1/1     Running   0          43m
redhat-operators-5c4dd844cf-488tt       1/1     Running   0          43m
mac:~ jianzhang$ oc get packagemanifest|grep busy
busybox                                      Agreene Operators     50s
busybox-dependency                           Agreene Operators     50s

2, created an OperatorGroup and a subscription 

mac:~ jianzhang$ oc get og -n openshift-marketplace
test-og   32s
mac:~ jianzhang$ cat og.yaml 
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
  name: test-og
  namespace: openshift-marketplace
  - openshift-marketplace

mac:~ jianzhang$ oc create -f sub-1818788.yaml 
subscription.operators.coreos.com/busybox created

mac:~ jianzhang$ cat sub-1818788.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
  name: busybox
  namespace: openshift-marketplace
  channel: "alpha"
  installPlanApproval: Automatic
  name: busybox
  source: agreene-operators
  sourceNamespace: openshift-marketplace
  startingCSV: busybox.v1.0.0

mac:~ jianzhang$ oc get sub
NAME                                                               PACKAGE              SOURCE              CHANNEL
busybox                                                            busybox              agreene-operators   alpha
busybox-dependency-alpha-agreene-operators-openshift-marketplace   busybox-dependency   agreene-operators   alpha

mac:~ jianzhang$ oc get csv
NAME                        DISPLAY              VERSION   REPLACES   PHASE
busybox-dependency.v1.0.0   busybox-dependency   1.0.0                Succeeded
busybox.v1.0.0              busybox              1.0.0                Succeeded

3, Update this CatalogSource image(quay.io/agreene/busybox-dependencies:old) to the new one: quay.io/agreene/busybox-dependencies:new(contains 2.0.0 version)

mac:~ jianzhang$ oc edit catalogsource agreene-operators
catalogsource.operators.coreos.com/agreene-operators edited

mac:~ jianzhang$ oc get csv
NAME                        DISPLAY              VERSION   REPLACES                    PHASE
busybox-dependency.v2.0.0   busybox-dependency   2.0.0     busybox-dependency.v1.0.0   Succeeded
busybox.v2.0.0              busybox              2.0.0     busybox.v1.0.0              Succeeded
mac:~ jianzhang$ oc get pods
NAME                                    READY   STATUS        RESTARTS   AGE
agreene-operators-nhmth                 1/1     Running       0          49s
busybox-7cb989cfcd-85hg5                1/1     Running       0          30s
busybox-844489d56c-br4k9                1/1     Terminating   0          4m36s
busybox-dependency-5b9958fd8f-85jkb     1/1     Terminating   0          4m36s
busybox-dependency-6fb84679cd-dnnkm     1/1     Running       0          28s

The operator has been upgraded to "2.0.0" version successfully. LGTM, verify it.

Comment 16 Evan Cordell 2020-04-30 16:42:08 UTC
*** Bug 1829955 has been marked as a duplicate of this bug. ***

Comment 17 Xiangjing Li 2020-05-29 23:48:48 UTC
Is there a OLM fix pack released so that we can patch it on openShift v4.3 to resolve operator upgrade failure? Or will the fix be shipped along with the new openShift version (v4.5?)

Comment 19 errata-xmlrpc 2020-07-13 17:24:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.