2078802 – Consumer add-on status is ready when storagecluster is in error after provider cluster is uninstalled

Bug 2078802 - Consumer add-on status is ready when storagecluster is in error after provider cluster is uninstalled

Summary: Consumer add-on status is ready when storagecluster is in error after provide...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-managed-service
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Ohad
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-26 09:27 UTC by suchita
Modified:	2024-07-11 10:26 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-07-11 10:26:45 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	SDA-5896	0	None	None	None	2022-06-08 14:36:37 UTC

Internal Links: 2078947

Description suchita 2022-04-26 09:27:15 UTC

Description of problem:
When Provider is uninstalled before consumer then consumer storage cluster state change to error and ocs-osd-deployer show the installing status 
however the consumer add-on show `ready` status

$ rosa list add-on -c sgatfane-c1-am | grep ocs-consumer-qe
ocs-consumer-qe             Red Hat OpenShift Data Foundation Managed Service Consumer (QE)       ready


Version-Release number of selected component (if applicable):
OPENSHIFT_VERSION : 4.10.9
========CSV ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.0                      NooBaa Operator               4.10.0                                                      Succeeded
ocs-operator.v4.10.0                      OpenShift Container Storage   4.10.0                                                      Succeeded
ocs-osd-deployer.v2.0.1                   OCS OSD Deployer              2.0.1             ocs-osd-deployer.v2.0.0                   Succeeded
odf-csi-addons-operator.v4.10.0           CSI Addons                    4.10.0                                                      Succeeded
odf-operator.v4.10.0                      OpenShift Data Foundation     4.10.0                                                      Succeeded
ose-prometheus-operator.4.8.0             Prometheus Operator           4.8.0                                                       Succeeded

How reproducible:
1/1

Steps to Reproduce:
1. create appliance model provider cluster using the `rosa create service ...` command 
2. Create a consumer cluster with a consumer add-on installed on it
3. Ensure that provider and consumer is in connected and add-on ready state
4. Uninstall provider service using `rosa delete service --id= `
Actual results:
consumer add-on show `ready` status

Expected results:
Consumer addon should show an appropriate error/failed status should 

Additional info:


Mon Apr 25 15:38:34 UTC 2022
--------------
========CSV ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.0                      NooBaa Operator               4.10.0                                                      Succeeded
ocs-operator.v4.10.0                      OpenShift Container Storage   4.10.0                                                      Succeeded
ocs-osd-deployer.v2.0.1                   OCS OSD Deployer              2.0.1             ocs-osd-deployer.v2.0.0                   Installing
odf-csi-addons-operator.v4.10.0           CSI Addons                    4.10.0                                                      Succeeded
odf-operator.v4.10.0                      OpenShift Data Foundation     4.10.0                                                      Succeeded
ose-prometheus-operator.4.8.0             Prometheus Operator           4.8.0                                                       Succeeded
route-monitor-operator.v0.1.408-c2256a2   Route Monitor Operator        0.1.408-c2256a2   route-monitor-operator.v0.1.406-54ff884   Succeeded
--------------
=======PODS ======
NAME                                               READY   STATUS    RESTARTS     AGE   IP             NODE                           NOMINATED NODE   READINESS GATES
alertmanager-managed-ocs-alertmanager-0            2/2     Running   0            9h    10.128.2.48    ip-10-0-128-13.ec2.internal    <none>           <none>
alertmanager-managed-ocs-alertmanager-1            2/2     Running   0            9h    10.128.2.50    ip-10-0-128-13.ec2.internal    <none>           <none>
alertmanager-managed-ocs-alertmanager-2            2/2     Running   0            9h    10.128.2.51    ip-10-0-128-13.ec2.internal    <none>           <none>
csi-addons-controller-manager-6849d8f79d-t6z4j     2/2     Running   0            9h    10.129.2.8     ip-10-0-158-82.ec2.internal    <none>           <none>
csi-cephfsplugin-dk6n9                             3/3     Running   0            9h    10.0.128.13    ip-10-0-128-13.ec2.internal    <none>           <none>
csi-cephfsplugin-fbmlc                             3/3     Running   3            9h    10.0.166.153   ip-10-0-166-153.ec2.internal   <none>           <none>
csi-cephfsplugin-provisioner-7ccffbd5d5-hdz4t      6/6     Running   0            9h    10.129.2.14    ip-10-0-158-82.ec2.internal    <none>           <none>
csi-cephfsplugin-provisioner-7ccffbd5d5-zdxr5      6/6     Running   0            9h    10.128.2.57    ip-10-0-128-13.ec2.internal    <none>           <none>
csi-cephfsplugin-sh9hx                             3/3     Running   0            9h    10.0.158.82    ip-10-0-158-82.ec2.internal    <none>           <none>
csi-rbdplugin-75h2m                                4/4     Running   0            9h    10.0.128.13    ip-10-0-128-13.ec2.internal    <none>           <none>
csi-rbdplugin-9lbnf                                4/4     Running   4            9h    10.0.166.153   ip-10-0-166-153.ec2.internal   <none>           <none>
csi-rbdplugin-h9tsx                                4/4     Running   0            9h    10.0.158.82    ip-10-0-158-82.ec2.internal    <none>           <none>
csi-rbdplugin-provisioner-6455fd4867-2k5lm         7/7     Running   0            9h    10.129.2.22    ip-10-0-158-82.ec2.internal    <none>           <none>
csi-rbdplugin-provisioner-6455fd4867-wbdwq         7/7     Running   0            9h    10.128.2.58    ip-10-0-128-13.ec2.internal    <none>           <none>
ocs-metrics-exporter-b654d74b5-gskmc               1/1     Running   0            9h    10.128.2.54    ip-10-0-128-13.ec2.internal    <none>           <none>
ocs-operator-7dfcf95b4d-s8lpn                      1/1     Running   0            9h    10.128.2.55    ip-10-0-128-13.ec2.internal    <none>           <none>
ocs-osd-controller-manager-7bd447f6d7-j8lkb        2/3     Running   0            9h    10.128.2.43    ip-10-0-128-13.ec2.internal    <none>           <none>
odf-console-6d676ff745-mm5rv                       1/1     Running   0            9h    10.128.2.44    ip-10-0-128-13.ec2.internal    <none>           <none>
odf-operator-controller-manager-54c94476f4-2jds7   2/2     Running   0            9h    10.128.2.38    ip-10-0-128-13.ec2.internal    <none>           <none>
prometheus-managed-ocs-prometheus-0                2/2     Running   1 (9h ago)   9h    10.128.2.45    ip-10-0-128-13.ec2.internal    <none>           <none>
prometheus-operator-6b8cbc545f-jkzq6               1/1     Running   0            9h    10.128.2.37    ip-10-0-128-13.ec2.internal    <none>           <none>
rook-ceph-operator-7cd868ddfc-7j7qn                1/1     Running   0            9h    10.128.2.53    ip-10-0-128-13.ec2.internal    <none>           <none>
rook-ceph-tools-56b46d6f99-ppgkj                   1/1     Running   0            9h    10.0.128.13    ip-10-0-128-13.ec2.internal    <none>           <none>
--------------
======= machine ==========
NAMESPACE               NAME                                           PHASE     TYPE         REGION      ZONE         AGE   NODE                           PROVIDERID                              STATE
openshift-machine-api   sgatfane-c1-am-lw9bp-infra-us-east-1a-h85c7    Running   r5.xlarge    us-east-1   us-east-1a   9h    ip-10-0-136-17.ec2.internal    aws:///us-east-1a/i-00de9ba0043b14865   running
openshift-machine-api   sgatfane-c1-am-lw9bp-infra-us-east-1b-pdsp5    Running   r5.xlarge    us-east-1   us-east-1b   9h    ip-10-0-149-242.ec2.internal   aws:///us-east-1b/i-0bb38329f21344aa6   running
openshift-machine-api   sgatfane-c1-am-lw9bp-infra-us-east-1c-gt992    Running   r5.xlarge    us-east-1   us-east-1c   9h    ip-10-0-167-255.ec2.internal   aws:///us-east-1c/i-0d8f21d242181e971   running
openshift-machine-api   sgatfane-c1-am-lw9bp-master-0                  Running   m5.2xlarge   us-east-1   us-east-1a   9h    ip-10-0-142-95.ec2.internal    aws:///us-east-1a/i-027db5a2f8a0ce506   running
openshift-machine-api   sgatfane-c1-am-lw9bp-master-1                  Running   m5.2xlarge   us-east-1   us-east-1b   9h    ip-10-0-155-32.ec2.internal    aws:///us-east-1b/i-05c0083b8549c46d9   running
openshift-machine-api   sgatfane-c1-am-lw9bp-master-2                  Running   m5.2xlarge   us-east-1   us-east-1c   9h    ip-10-0-164-196.ec2.internal   aws:///us-east-1c/i-038065dd3f8ba89d2   running
openshift-machine-api   sgatfane-c1-am-lw9bp-worker-us-east-1a-6js9n   Running   m5.2xlarge   us-east-1   us-east-1a   9h    ip-10-0-128-13.ec2.internal    aws:///us-east-1a/i-0147c6e5e10dac14b   running
openshift-machine-api   sgatfane-c1-am-lw9bp-worker-us-east-1b-rhgrp   Running   m5.2xlarge   us-east-1   us-east-1b   9h    ip-10-0-158-82.ec2.internal    aws:///us-east-1b/i-049cfbdfe3446cda3   running
openshift-machine-api   sgatfane-c1-am-lw9bp-worker-us-east-1c-4tmnm   Running   m5.2xlarge   us-east-1   us-east-1c   9h    ip-10-0-166-153.ec2.internal   aws:///us-east-1c/i-0be9dd1ff69bf6f0c   running
--------------
======= PVC ==========
--------------
======= storagecluster ==========
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   9h    Error   true       2022-04-25T06:29:59Z   
--------------
======= cephcluster ==========
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE       MESSAGE                                     HEALTH       EXTERNAL
ocs-storagecluster-cephcluster                                9h    Connected   Failed to configure external ceph cluster   HEALTH_ERR   true

Comment 1 Sahina Bose 2022-04-26 10:31:46 UTC

Any alerts from consumer? Was there any pods using storage from provider?

Comment 2 suchita 2022-04-26 11:13:16 UTC

(In reply to Sahina Bose from comment #1)
> Any alerts from consumer? Was there any pods using storage from provider?

No SendGrid alert was received. pagerduty was configured with this cluster so not noticed any alert for this.

Comment 13 Dhruv Bindra 2023-01-23 11:52:50 UTC

Re-test this issue with the latest build

Comment 16 suchita 2023-05-15 17:42:28 UTC

I tried to reproduce this scenario for verification.  The exact scenario is not reproducible. As per Comment#6, even force delete of project openshift-storage and force delete of cluster using ocm api command doesn't help to delete the provider cluster. Provider cluster stuck in uninstalling state. 


$ rosa list cluster
ID                                NAME            STATE         TOPOLOGY
23nk3phngdv3i1m8alik7niqttfde635  sgatfane-mp13   uninstalling  Classic (STS)
23nk4lkjovhv8qm23v4fpommvofk3src  sgatfane-cmm13  ready         Classic (STS) 

Force delete pf openshift-storage namespace, disable access of cluster : 'Unable to connect to the server: Service Unavailable'
but the provider cluster is stuck in uninstalling state.
Consumer addon status does remain in a ready state

Consumer state when provider stuck in the uninstalling state:
$ oc get csv
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.12                     NooBaa Operator               4.10.12           mcg-operator.v4.10.11                     Succeeded
observability-operator.v0.0.20            Observability Operator        0.0.20            observability-operator.v0.0.19            Succeeded
ocs-operator.v4.10.9                      OpenShift Container Storage   4.10.9            ocs-operator.v4.10.8                      Succeeded
ocs-osd-deployer.v2.0.13                  OCS OSD Deployer              2.0.13            ocs-osd-deployer.v2.0.12                  Installing
odf-csi-addons-operator.v4.10.9           CSI Addons                    4.10.9            odf-csi-addons-operator.v4.10.8           Succeeded
odf-operator.v4.10.9                      OpenShift Data Foundation     4.10.9            odf-operator.v4.10.8                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.500-6152b76   Route Monitor Operator        0.1.500-6152b76   route-monitor-operator.v0.1.498-e33e391   Succeeded
$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE       MESSAGE                                     HEALTH       EXTERNAL
ocs-storagecluster-cephcluster                                12h   Connected   Failed to configure external ceph cluster   HEALTH_ERR   true

$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   14h   Error   true       2023-05-15T03:21:54Z 

as per multiple comments in this BZ, this is states are expected   

Provider uninstall when consumer uninstalled.

Marking  this BZ as verified. 


Hence marking this as verified.

Comment 18 Ohad 2024-07-11 10:26:45 UTC

The ODF Managed Service Project has sunset and is now consider obsolete

Note You need to log in before you can comment on or make changes to this bug.