Bug 1818788
Summary: | Operator update is failing due to missing replace field in Operator CSV | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Simon Reber <sreber> |
Component: | OLM | Assignee: | Alexander Greene <agreene> |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | agreene, ecordell, gvillani, kuiwang, rcernich, robertodocampo, stwalter, xiangli |
Version: | 4.3.z | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: If an operator is being upgraded that provides a required API whose GVK has not changed since the previous version of the operator and the operator that depends on the API uses a skipRange instead of the Spec.Replaces field, OLM fails to generate the "upgraded CSV" with the correct replaces field. Specifically, OLM would:
1. Add the new operator to the generation, and marking the APIs it provides as "present".
2. Remove the old operator from the generation, marking the APIs it provides as "absent", despite being provided by the new version of the operator.
3. Attempt to resolve the "missing" apis, overwriting the the new version of the operator with a copy that does not have its Spec.Replaces field set.
Consequence: Certain operators would fail to upgrade to new versions.
Fix: OLM was updated to remove the old operator from the current generation before adding the new operator to the generation.
Result: The upgrade will succeed as expected.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:24:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1827821, 1828007 |
Description
Simon Reber
2020-03-30 11:36:48 UTC
*** Bug 1821175 has been marked as a duplicate of this bug. *** That is exactly the case we encountered when install our ACM (open multicluster management) product in openshift. There is a ACM installer operator `multiclusterhub-operator.v0.0.1` and it requires a community operator `multicluster-operators-subscription`. After we upgrade multicluster-operators-subscription from v0.1.4 to v0.1.5, the v0.1.5 automatic upgrade is stuck in the ACM env where v0.1.4 is installed. See below, the old version v 0.1.4 is still active while the v0.1.5 is not replacing to v0.1.4 as expected. That caused the "conflicting CRD owner in namespace" error. $ oc get csv --all-namespaces NAMESPACE NAME DISPLAY VERSION REPLACES PHASE open-cluster-management multicluster-operators-subscription.v0.1.4 Multicluster Subscription Operator 0.1.4 Succeeded open-cluster-management multicluster-operators-subscription.v0.1.5 Multicluster Subscription Operator 0.1.5 Failed open-cluster-management multiclusterhub-operator.v0.0.1 Multiclusterhub Operator 0.0.1 Succeeded By comparison, the subscription operator can be automatically upgraded to v0.1.5 correctly if we installed the v0.1.4 via openshift operatorHub GUI. As shown below, there is only one active CSV $ oc get csv -n openshift-operators NAME DISPLAY VERSION REPLACES PHASE multicluster-operators-subscription.v0.1.5 Multicluster Subscription Operator 0.1.5 multicluster-operators-subscription.v0.1.4 Succeeded It seems the issue happens when a dependent operator is upgraded. From my notice, the upgrade is not stuck if the operator was installed directly from operatorHub, meaning the operator is not a dependent one. Cluster version is 4.5.0-0.nightly-2020-04-25-170442 mac:~ jianzhang$ oc exec catalog-operator-57f779987b-lpwf6 -- olm --version OLM version: 0.14.2 git commit: 280a2a64115aa0388c11c5472188cd3169e05661 Steps: 1, installed a catsrc that pointed to the catalog image that only contained the 1.0.0 versions of the operator mac:~ jianzhang$ oc create -f cs-1818788.yaml catalogsource.operators.coreos.com/agreene-operators created mac:~ jianzhang$ cat cs-1818788.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: agreene-operators namespace: openshift-marketplace spec: displayName: Agreene Operators image: quay.io/agreene/busybox-dependencies:old sourceType: grpc mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE agreene-operators Agreene Operators grpc 16s certified-operators Certified Operators grpc Red Hat 43m community-operators Community Operators grpc Red Hat 43m redhat-marketplace Red Hat Marketplace grpc Red Hat 43m redhat-operators Red Hat Operators grpc Red Hat 43m mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE agreene-operators-8cc6n 1/1 Running 0 33s certified-operators-6bb9c54bc-pdfhq 1/1 Running 0 43m community-operators-7c8bb898b5-t2kmd 1/1 Running 0 43m marketplace-operator-684575bdb9-t929c 1/1 Running 0 44m redhat-marketplace-6c598b5785-fmj2c 1/1 Running 0 43m redhat-operators-5c4dd844cf-488tt 1/1 Running 0 43m mac:~ jianzhang$ oc get packagemanifest|grep busy busybox Agreene Operators 50s busybox-dependency Agreene Operators 50s 2, created an OperatorGroup and a subscription mac:~ jianzhang$ oc get og -n openshift-marketplace NAME AGE test-og 32s mac:~ jianzhang$ cat og.yaml apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: test-og namespace: openshift-marketplace spec: targetNamespaces: - openshift-marketplace mac:~ jianzhang$ oc create -f sub-1818788.yaml subscription.operators.coreos.com/busybox created mac:~ jianzhang$ cat sub-1818788.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: busybox namespace: openshift-marketplace spec: channel: "alpha" installPlanApproval: Automatic name: busybox source: agreene-operators sourceNamespace: openshift-marketplace startingCSV: busybox.v1.0.0 mac:~ jianzhang$ oc get sub NAME PACKAGE SOURCE CHANNEL busybox busybox agreene-operators alpha busybox-dependency-alpha-agreene-operators-openshift-marketplace busybox-dependency agreene-operators alpha mac:~ jianzhang$ oc get csv NAME DISPLAY VERSION REPLACES PHASE busybox-dependency.v1.0.0 busybox-dependency 1.0.0 Succeeded busybox.v1.0.0 busybox 1.0.0 Succeeded 3, Update this CatalogSource image(quay.io/agreene/busybox-dependencies:old) to the new one: quay.io/agreene/busybox-dependencies:new(contains 2.0.0 version) mac:~ jianzhang$ oc edit catalogsource agreene-operators catalogsource.operators.coreos.com/agreene-operators edited mac:~ jianzhang$ oc get csv NAME DISPLAY VERSION REPLACES PHASE busybox-dependency.v2.0.0 busybox-dependency 2.0.0 busybox-dependency.v1.0.0 Succeeded busybox.v2.0.0 busybox 2.0.0 busybox.v1.0.0 Succeeded mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE agreene-operators-nhmth 1/1 Running 0 49s busybox-7cb989cfcd-85hg5 1/1 Running 0 30s busybox-844489d56c-br4k9 1/1 Terminating 0 4m36s busybox-dependency-5b9958fd8f-85jkb 1/1 Terminating 0 4m36s busybox-dependency-6fb84679cd-dnnkm 1/1 Running 0 28s The operator has been upgraded to "2.0.0" version successfully. LGTM, verify it. *** Bug 1829955 has been marked as a duplicate of this bug. *** Is there a OLM fix pack released so that we can patch it on openShift v4.3 to resolve operator upgrade failure? Or will the fix be shipped along with the new openShift version (v4.5?) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |