Bug 1827821 - Operator update is failing due to missing replace field in Operator CSV
Summary: Operator update is failing due to missing replace field in Operator CSV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.z
Assignee: Alexander Greene
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1828007 (view as bug list)
Depends On: 1818788
Blocks: 1827822
TreeView+ depends on / blocked
 
Reported: 2020-04-24 21:19 UTC by Alexander Greene
Modified: 2023-09-07 22:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: If an operator is being upgraded that provides a required API whose GVK has not changed since the previous version of the operator and the operator that depends on the API uses a skipRange instead of the Spec.Replaces field, OLM fails to generate the "upgraded CSV" with the correct replaces field. Specifically, OLM would: 1. Add the new operator to the generation, and marking the APIs it provides as "present". 2. Remove the old operator from the generation, marking the APIs it provides as "absent", despite being provided by the new version of the operator. 3. Attempt to resolve the "missing" apis, overwriting the the new version of the operator with a copy that does not have its Spec.Replaces field set. Consequence: Certain operators would fail to upgrade to new versions. Fix: OLM was updated to remove the old operator from the current generation before adding the new operator to the generation. Result: The upgrade will succeed as expected.
Clone Of:
Environment:
Last Closed: 2020-05-18 13:35:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1484 0 None closed Bug 1827821: Generation bug 4.4 Backport 2021-02-19 14:17:36 UTC
Red Hat Product Errata RHBA-2020:2133 0 None None None 2020-05-18 13:35:23 UTC

Description Alexander Greene 2020-04-24 21:19:51 UTC
This bug was initially created as a copy of Bug #1818788

I am copying this bug because: 



Description of problem:

Updating to OpenShift Container Platform 4.3.8 triggered also `elasticsearch-operator` from being updated. This update though failed and got stuck because the CSV was not correctly rolled and therefore the CSV for the older version was active as well as the CSV for the new version.

This caused a ownership conflict which could only be resolved by manually removing the CSV from the old `elasticsearch` operator version

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.3.8


How reproducible:

 - N/A


Steps to Reproduce:
1. N/A

Actual results:

Update of `elasticsearch-operator` was stuck, impacting additional operators from being able to get installed

Expected results:

Update to work and to avoid one failing part to impact the entire operator installation and update capabilities

Additional info:

Comment 2 Alexander Greene 2020-04-26 15:51:43 UTC
*** Bug 1828007 has been marked as a duplicate of this bug. ***

Comment 6 Jian Zhang 2020-05-09 03:54:50 UTC
Cluster version is 4.4.0-0.nightly-2020-05-08-202645

mac:~ jianzhang$ oc -n openshift-operator-lifecycle-manager exec catalog-operator-6dfff7dbcc-9qhp8 -- olm --version
OLM version: 0.14.2
git commit: f8ef76c241abfeeb45fa680599b9c683ec3173cf

1, installed a catsrc that pointed to the catalog image that only contained the 1.0.0 versions of the operator
mac:~ jianzhang$ oc create -f cs-1818788.yaml 
catalogsource.operators.coreos.com/agreene-operators created

mac:~ jianzhang$ cat cs-1818788.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: agreene-operators
  namespace: openshift-marketplace
spec:
  displayName: Agreene Operators
  image: quay.io/agreene/busybox-dependencies:old
  sourceType: grpc

mac:~ jianzhang$ oc project openshift-marketplace
Now using project "openshift-marketplace" on server "https://api.ci-ln-49c5tdb-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443".
mac:~ jianzhang$ oc get catalogsource
NAME                  DISPLAY               TYPE   PUBLISHER   AGE
agreene-operators     Agreene Operators     grpc               25s
certified-operators   Certified Operators   grpc   Red Hat     18m
community-operators   Community Operators   grpc   Red Hat     18m
redhat-marketplace    Red Hat Marketplace   grpc   Red Hat     18m
redhat-operators      Red Hat Operators     grpc   Red Hat     18m
mac:~ jianzhang$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
agreene-operators-698zv                1/1     Running   0          30s
certified-operators-6f864f4557-b2k4l   1/1     Running   0          18m
community-operators-585dcd69dc-c2gcp   1/1     Running   0          18m
marketplace-operator-fc5546ffb-9gbrf   1/1     Running   0          18m
redhat-marketplace-7b995fdb4d-r6lf6    1/1     Running   0          18m
redhat-operators-8496cc7d5b-cklzb      1/1     Running   0          18m

mac:~ jianzhang$ oc get packagemanifest|grep busy
busybox-dependency                           Agreene Operators     61s
busybox                                      Agreene Operators     61s

2, created an OperatorGroup and a subscription 

mac:~ jianzhang$ oc create -f og.yaml 
operatorgroup.operators.coreos.com/test-og created
mac:~ jianzhang$ oc get og
NAME      AGE
test-og   4s
mac:~ jianzhang$ cat og.yaml 
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: test-og
  namespace: openshift-marketplace
spec:
  targetNamespaces:
  - openshift-marketplace

mac:~ jianzhang$ oc create -f sub-1818788.yaml 
subscription.operators.coreos.com/busybox created
mac:~ jianzhang$ cat sub-1818788.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: busybox
  namespace: openshift-marketplace
spec:
  channel: "alpha"
  installPlanApproval: Automatic
  name: busybox
  source: agreene-operators
  sourceNamespace: openshift-marketplace
  startingCSV: busybox.v1.0.0

mac:~ jianzhang$ oc get sub
NAME                                                               PACKAGE              SOURCE              CHANNEL
busybox                                                            busybox              agreene-operators   alpha
busybox-dependency-alpha-agreene-operators-openshift-marketplace   busybox-dependency   agreene-operators   alpha
mac:~ jianzhang$ oc get csv
NAME                        DISPLAY              VERSION   REPLACES   PHASE
busybox-dependency.v1.0.0   busybox-dependency   1.0.0                Succeeded
busybox.v1.0.0              busybox              1.0.0                Succeeded

3, Update this CatalogSource image(quay.io/agreene/busybox-dependencies:old) to the new one: quay.io/agreene/busybox-dependencies:new(contains 2.0.0 version)

mac:~ jianzhang$ oc edit catalogsource agreene-operators
catalogsource.operators.coreos.com/agreene-operators edited

mac:~ jianzhang$ oc get csv
NAME                        DISPLAY              VERSION   REPLACES                    PHASE
busybox-dependency.v2.0.0   busybox-dependency   2.0.0     busybox-dependency.v1.0.0   Succeeded
busybox.v2.0.0              busybox              2.0.0     busybox.v1.0.0              Succeeded

mac:~ jianzhang$ oc get pods
NAME                                   READY   STATUS    RESTARTS   AGE
agreene-operators-ngpbh                1/1     Running   0          2m4s
busybox-8598cc9bcb-x9pv5               1/1     Running   0          105s
busybox-dependency-76f7c74648-t5rbc    1/1     Running   0          101s
...

The operator has been upgraded to "2.0.0" version successfully. LGTM, verify it.

Comment 8 errata-xmlrpc 2020-05-18 13:35:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2133


Note You need to log in before you can comment on or make changes to this bug.