Bug 2056697 - odf-csi-addons-operator subscription failed while using custom catalog source
Summary: odf-csi-addons-operator subscription failed while using custom catalog source
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.11.0
Assignee: Nitin Goyal
QA Contact: Rachael
URL:
Whiteboard:
: 2066211 (view as bug list)
Depends On:
Blocks: 2091594 2093205
TreeView+ depends on / blocked
 
Reported: 2022-02-21 21:53 UTC by Yuli Persky
Modified: 2023-08-09 17:00 UTC (History)
15 users (show)

Fixed In Version: 4.11.0-89
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2093205 (view as bug list)
Environment:
Last Closed: 2022-08-24 13:48:34 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-operator pull 221 0 None Merged bundle: Add csi-addons to the dependencies.yaml 2022-06-03 09:24:39 UTC
Github red-hat-storage odf-operator pull 224 0 None open Bug 2056697:[release-4.11] bundle: Add csi-addons to the dependencies.yaml 2022-06-03 09:21:58 UTC

Internal Links: 2091594

Description Yuli Persky 2022-02-21 21:53:44 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Storage System stuck in Status "Condition: Progressing" while deploying minimal deployment cluster .


Version of all relevant components (if applicable):

(yulienv) [ypersky@ypersky ocs-ci]$ oc get csv -A
NAMESPACE                              NAME                   DISPLAY                       VERSION   REPLACES   PHASE
openshift-operator-lifecycle-manager   packageserver          Package Server                0.19.0               Succeeded
openshift-storage                      mcg-operator.v4.10.0   NooBaa Operator               4.10.0               Succeeded
openshift-storage                      ocs-operator.v4.10.0   OpenShift Container Storage   4.10.0               Succeeded
openshift-storage                      odf-operator.v4.10.0   OpenShift Data Foundation     4.10.0               Succeeded
(yulienv) [ypersky@ypersky ocs-ci]$ 


ODF build: 4.10.0.161

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes! Deployment is stuck.

Is there any workaround available to the best of your knowledge?

No 


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3


Can this issue reproducible?

Yes - saw it number of times on various deployments 


Can this issue reproduce from the UI?

Yes 


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP cluster with minimal resources(8 CPU per worker node).
2. Start deploying ODF 4.10.0.160/161
3.


Actual results:

After creating storage system, the storage system is stuck in Status Condition:Progressing forever. The deployment is stuck with this warning: 

 Warning  ReconcileFailed  22m   StorageSystem controller  InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0


Expected results:

Storage System should reach status Ready. 

Additional info:

(yulienv) [ypersky@ypersky ocs-ci]$ oc describe storagesystem -A
Name:         ocs-storagecluster-storagesystem
Namespace:    openshift-storage
Labels:       <none>
Annotations:  <none>
API Version:  odf.openshift.io/v1alpha1
Kind:         StorageSystem
Metadata:
  Creation Timestamp:  2022-02-21T21:30:03Z
  Finalizers:
    storagesystem.odf.openshift.io
  Generation:  1
  Managed Fields:
    API Version:  odf.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:kind:
        f:name:
        f:namespace:
    Manager:      Mozilla
    Operation:    Update
    Time:         2022-02-21T21:30:03Z
    API Version:  odf.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"storagesystem.odf.openshift.io":
    Manager:      manager
    Operation:    Update
    Time:         2022-02-21T21:30:03Z
    API Version:  odf.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
    Manager:         manager
    Operation:       Update
    Subresource:     status
    Time:            2022-02-21T21:30:03Z
  Resource Version:  49832
  UID:               62d24b32-c074-45c9-81f1-91722ca3da02
Spec:
  Kind:       storagecluster.ocs.openshift.io/v1
  Name:       ocs-storagecluster
  Namespace:  openshift-storage
Status:
  Conditions:
    Last Heartbeat Time:   2022-02-21T21:51:55Z
    Last Transition Time:  2022-02-21T21:30:03Z
    Message:               Reconcile is in progress
    Reason:                Reconciling
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2022-02-21T21:51:55Z
    Last Transition Time:  2022-02-21T21:30:03Z
    Message:               Reconcile is in progress
    Reason:                Reconciling
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2022-02-21T21:51:55Z
    Last Transition Time:  2022-02-21T21:30:03Z
    Message:               StorageSystem CR is valid
    Reason:                Valid
    Status:                False
    Type:                  StorageSystemInvalid
    Last Heartbeat Time:   2022-02-21T21:51:55Z
    Last Transition Time:  2022-02-21T21:30:03Z
    Message:               InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0
    Reason:                NotReady
    Status:                False
    Type:                  VendorCsvReady
    Last Heartbeat Time:   2022-02-21T21:30:03Z
    Last Transition Time:  2022-02-21T21:30:03Z
    Message:               Initializing StorageSystem
    Reason:                Init
    Status:                Unknown
    Type:                  VendorSystemPresent
Events:
  Type     Reason           Age   From                      Message
  ----     ------           ----  ----                      -------
  Warning  ReconcileFailed  22m   StorageSystem controller  InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0
(yulienv) [ypersky@ypersky ocs-ci]$

Comment 2 Yuli Persky 2022-02-21 22:03:45 UTC
Please note that must gather logs are available her: 

rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/OCS/ocs-qe-bugs/bz-2056697/

Comment 4 Nitin Goyal 2022-02-22 03:27:59 UTC
odf-operator creates this odf-csi-addons-operator subscription and it is not aware of the catalog odf-catalogsource. It is aware of the redhat-operators. So whenever you create a catalogsource create it with 
redhat-operators name only. It will work without any issues.



$ oc get catalogsources.operators.coreos.com -n openshift-marketplace
NAME                  DISPLAY                     TYPE   PUBLISHER   AGE
certified-operators   Certified Operators         grpc   Red Hat     6h19m
community-operators   Community Operators         grpc   Red Hat     6h19m
odf-catalogsource     OpenShift Data Foundation   grpc   Red Hat     5h59m
redhat-marketplace    Red Hat Marketplace         grpc   Red Hat     6h19m



$ oc get sub odf-csi-addons-operator -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: odf-csi-addons-operator
  namespace: openshift-storage
spec:
  channel: stable-4.10
  installPlanApproval: Automatic
  name: odf-csi-addons-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  startingCSV: odf-csi-addons-operator.v4.10.0
status:
  conditions:
  - lastTransitionTime: "2022-02-21T21:26:08Z"
    message: targeted catalogsource openshift-marketplace/redhat-operators missing
    reason: UnhealthyCatalogSourceFound
    status: "True"
    type: CatalogSourcesUnhealthy
  - message: 'constraints not satisfiable: subscription odf-csi-addons-operator exists,
      no operators found from catalog redhat-operators in namespace openshift-marketplace
      referenced by subscription odf-csi-addons-operator'
    reason: ConstraintsNotSatisfiable
    status: "True"
    type: ResolutionFailed

Comment 5 Mudit Agarwal 2022-02-22 03:56:19 UTC
Please try as suggested by Nitin.

Comment 6 Anna Sandler 2022-02-22 18:41:46 UTC
before running the oc create -f /<PATH>/catalog-source.yaml we also run this command: 

oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge

maybe that is what caused the problem?

Comment 7 Nitin Goyal 2022-02-23 02:53:12 UTC
It should not, As you can see in the above output there was no redhat-operators catalog. This means this command was run on the setup but the redhat-operators catalog was not created by the user as expected.

Comment 8 Yuli Persky 2022-02-23 07:56:26 UTC
@Nitin,

I've deployed another OCP 4.10 cluster and then created catalog source with this content: 

======================================

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  labels:
    ocs-operator-internal: 'true'
  name: odf-catalogsource
  namespace: openshift-marketplace
spec:
  displayName: OpenShift Data Foundation
  icon:
    base64data: PHN2ZyBpZD0iTGF5ZXJfMSIgZGF0YS1uYW1lPSJMYXllciAxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAxOTIgMTQ1Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2UwMDt9PC9zdHlsZT48L2RlZnM+PHRpdGxlPlJlZEhhdC1Mb2dvLUhhdC1Db2xvcjwvdGl0bGU+PHBhdGggZD0iTTE1Ny43Nyw2Mi42MWExNCwxNCwwLDAsMSwuMzEsMy40MmMwLDE0Ljg4LTE4LjEsMTcuNDYtMzAuNjEsMTcuNDZDNzguODMsODMuNDksNDIuNTMsNTMuMjYsNDIuNTMsNDRhNi40Myw2LjQzLDAsMCwxLC4yMi0xLjk0bC0zLjY2LDkuMDZhMTguNDUsMTguNDUsMCwwLDAtMS41MSw3LjMzYzAsMTguMTEsNDEsNDUuNDgsODcuNzQsNDUuNDgsMjAuNjksMCwzNi40My03Ljc2LDM2LjQzLTIxLjc3LDAtMS4wOCwwLTEuOTQtMS43My0xMC4xM1oiLz48cGF0aCBjbGFzcz0iY2xzLTEiIGQ9Ik0xMjcuNDcsODMuNDljMTIuNTEsMCwzMC42MS0yLjU4LDMwLjYxLTE3LjQ2YTE0LDE0LDAsMCwwLS4zMS0zLjQybC03LjQ1LTMyLjM2Yy0xLjcyLTcuMTItMy4yMy0xMC4zNS0xNS43My0xNi42QzEyNC44OSw4LjY5LDEwMy43Ni41LDk3LjUxLjUsOTEuNjkuNSw5MCw4LDgzLjA2LDhjLTYuNjgsMC0xMS42NC01LjYtMTcuODktNS42LTYsMC05LjkxLDQuMDktMTIuOTMsMTIuNSwwLDAtOC40MSwyMy43Mi05LjQ5LDI3LjE2QTYuNDMsNi40MywwLDAsMCw0Mi41Myw0NGMwLDkuMjIsMzYuMywzOS40NSw4NC45NCwzOS40NU0xNjAsNzIuMDdjMS43Myw4LjE5LDEuNzMsOS4wNSwxLjczLDEwLjEzLDAsMTQtMTUuNzQsMjEuNzctMzYuNDMsMjEuNzdDNzguNTQsMTA0LDM3LjU4LDc2LjYsMzcuNTgsNTguNDlhMTguNDUsMTguNDUsMCwwLDEsMS41MS03LjMzQzIyLjI3LDUyLC41LDU1LC41LDc0LjIyYzAsMzEuNDgsNzQuNTksNzAuMjgsMTMzLjY1LDcwLjI4LDQ1LjI4LDAsNTYuNy0yMC40OCw1Ni43LTM2LjY1LDAtMTIuNzItMTEtMjcuMTYtMzAuODMtMzUuNzgiLz48L3N2Zz4=
    mediatype: image/svg+xml
  image: quay.io/rhceph-dev/ocs-registry:4.10.0-161
  priority: 100
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 15m
(yulienv) [ypersky@ypersky ocs-ci

======================================


(yulienv) [ypersky@ypersky ocs-ci]$ oc get catalogsources.operators.coreos.com -n openshift-marketplace
NAME                  DISPLAY                     TYPE   PUBLISHER   AGE
certified-operators   Certified Operators         grpc   Red Hat     11h
community-operators   Community Operators         grpc   Red Hat     11h
odf-catalogsource     OpenShift Data Foundation   grpc   Red Hat     7m29s
redhat-marketplace    Red Hat Marketplace         grpc   Red Hat     11h
redhat-operators      Red Hat Operators           grpc   Red Hat     11h
(yulienv) [ypersky@ypersky ocs-ci]$ 


And still StorageSystem is stuck in Progressing: 

Events:
  Type     Reason           Age    From                      Message
  ----     ------           ----   ----                      -------
  Warning  ReconcileFailed  5m6s   StorageSystem controller  CSV is not successfully installed; InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0
  Warning  ReconcileFailed  4m55s  StorageSystem controller  InstallPlan not found for CSV odf-csi-addons-operator.v4.10.0
(yulienv) [ypersky@ypersky ocs-ci]$ 


Looks like the problem is persistent. 

You are welcome to check this cluster: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10227/

Comment 10 Yuli Persky 2022-02-23 10:05:18 UTC
@Nitin, 

Will try those exact steps and will let you know.

Comment 11 Yuli Persky 2022-02-24 11:19:12 UTC
@Nitin,

I've performed the following steps: 

1) Deployed another ocp cluster
2) After successtul ocp deployment run this command: 

oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge

3) created catalog source with the content in comment#9

Result: Storage System status is StorageSystem
However, the following event appears: 


  Warning  ReconcileFailed  39s   StorageSystem controller  StorageCluster.ocs.openshift.io "ocs-storagecluster" not found


The cluster details are : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10253/

(yulienv) [ypersky@ypersky ocs-ci]$ oc describe storagesystem -A
Name:         ocs-storagecluster-storagesystem
Namespace:    openshift-storage
Labels:       <none>
Annotations:  <none>
API Version:  odf.openshift.io/v1alpha1
Kind:         StorageSystem
Metadata:
  Creation Timestamp:  2022-02-24T09:50:19Z
  Finalizers:
    storagesystem.odf.openshift.io
  Generation:  1
  Managed Fields:
    API Version:  odf.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:kind:
        f:name:
        f:namespace:
    Manager:      Mozilla
    Operation:    Update
    Time:         2022-02-24T09:50:19Z
    API Version:  odf.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"storagesystem.odf.openshift.io":
    Manager:      manager
    Operation:    Update
    Time:         2022-02-24T09:50:19Z
    API Version:  odf.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
    Manager:         manager
    Operation:       Update
    Subresource:     status
    Time:            2022-02-24T09:50:19Z
  Resource Version:  511934
  UID:               a7edcf27-4e4b-4035-8426-f3d0eb007c8e
Spec:
  Kind:       storagecluster.ocs.openshift.io/v1
  Name:       ocs-storagecluster
  Namespace:  openshift-storage
Status:
  Conditions:
    Last Heartbeat Time:   2022-02-24T09:50:19Z
    Last Transition Time:  2022-02-24T09:50:19Z
    Message:               Reconcile is completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2022-02-24T09:50:19Z
    Last Transition Time:  2022-02-24T09:50:19Z
    Message:               Reconcile is completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2022-02-24T09:50:19Z
    Last Transition Time:  2022-02-24T09:50:19Z
    Message:               StorageSystem CR is valid
    Reason:                Valid
    Status:                False
    Type:                  StorageSystemInvalid
    Last Heartbeat Time:   2022-02-24T09:50:19Z
    Last Transition Time:  2022-02-24T09:50:19Z
    Reason:                Ready
    Status:                True
    Type:                  VendorCsvReady
    Last Heartbeat Time:   2022-02-24T09:50:19Z
    Last Transition Time:  2022-02-24T09:50:19Z
    Reason:                Found
    Status:                True
    Type:                  VendorSystemPresent
Events:
  Type     Reason           Age    From                      Message
  ----     ------           ----   ----                      -------
  Warning  ReconcileFailed  3m57s  StorageSystem controller  StorageCluster.ocs.openshift.io "ocs-storagecluster" not found
(yulienv) [ypersky@ypersky ocs-ci]$ 


PLease advise.

Comment 13 Yuli Persky 2022-02-24 11:49:08 UTC
Must gather logs are available here ( if needed in the future) : 

rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/OCS/ocs-qe-bugs/bz-2056697/logs-20220224-115534/

Comment 14 Mudit Agarwal 2022-02-28 09:06:40 UTC
Nitin has mentioned in #comment12 that the final status is good, I am not sure why this BZ is reopened. Please provide a justification.
If it was opened because we got a failure event then that was just an intermittent error which got rectified by the operator automatically.

Comment 20 Liutauras 2022-05-20 08:14:47 UTC
Same here - odf-csi-addons-operator could not be installed in disconnected cluster.
```
# oc get subscriptions
NAME                                                                   PACKAGE                   SOURCE                  CHANNEL
mcg-operator-stable-4.10-redhat-operator-index-openshift-marketplace   mcg-operator              redhat-operator-index   stable-4.10
ocs-operator-stable-4.10-redhat-operator-index-openshift-marketplace   ocs-operator              redhat-operator-index   stable-4.10
odf-csi-addons-operator                                                odf-csi-addons-operator   redhat-operators        stable-4.10
odf-operator                                                           odf-operator              redhat-operator-index   stable-4.10

```

CatalogSource with name `redhat-operator-index` is the name of sources in private registry. All operators installed from it, but not odf-csi-addons-operator.
Source `redhat-operators` is disabled in my cluster. 
I can install that operator in the "connected" cluster.
I fixed the problem with:
```
oc patch subscription/odf-csi-addons-operator --type merge -p '{"spec":{"source":"redhat-operator-index"}}'
```

Comment 24 Chen 2022-05-26 14:31:27 UTC
Hi Nitin,

Thank you for your reply.

Yes I totally understand we can either change the subscription source, or we can rename the catalogSource. But for customers who use disconnected environment, they might hit this problem 100% because for opm or oc-mirror, which are used to create disconnected CatalogSource, will create a CatalogSource CR with a default name "redhat-operator-index", not "redhat-operators". That's the reason for what I have mentioned, which is, we need either add a warning or note in the disconnected ODF docs, or we try to restructure the CSV of ODF so that the subscription source doesn't have to be a hard-coded name "redhat-operators". Otherwise the customers who use disconnected ODF will hit this problem 100% if they follow our official docs to setup the disconnected OperatorHub.

Comment 26 Nitin Goyal 2022-05-30 08:02:17 UTC
*** Bug 2066211 has been marked as a duplicate of this bug. ***

Comment 41 errata-xmlrpc 2022-08-24 13:48:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.