Bug 2068402

Summary: Re-installation of provider-qe add-on stuck in installing state
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: suchita <sgatfane>
Component: odf-managed-serviceAssignee: Nobody <nobody>
Status: CLOSED NOTABUG QA Contact: Neha Berry <nberry>
Severity: low Docs Contact:
Priority: low    
Version: 4.10CC: aeyal, dbindra, nberry, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-06 10:40:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description suchita 2022-03-25 07:53:17 UTC
Description of problem:

After Uninstallation of provider QE add-on when add-on installation initiated, 
installation stuck in installing state and failed to install any resources in openshift-storage space. 

Version-Release number of selected component (if applicable):
OCP 4.0.24

oc get csv -n openshift-storage -o json ocs-operator.v4.10.0 | jq '.metadata.labels["full_version"]'
"4.10.0-206"

oc get csv
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.0                      NooBaa Operator               4.10.0                                                      Succeeded
ocs-operator.v4.10.0                      OpenShift Container Storage   4.10.0                                                      Succeeded
ocs-osd-deployer.v2.0.0                   OCS OSD Deployer              2.0.0                                                       Succeeded
odf-operator.v4.10.0                      OpenShift Data Foundation     4.10.0                                                      Succeeded
ose-prometheus-operator.4.8.0             Prometheus Operator           4.8.0                                                       Succeeded
route-monitor-operator.v0.1.406-54ff884   Route Monitor Operator        0.1.406-54ff884   route-monitor-operator.v0.1.404-e29b74b   Succeeded
oc describe csv ocs-osd-deployer.v2.0.0|grep -i image
    Mediatype:   image/svg+xml
                Image:  gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.0-1
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.0-1





How reproducible:
2/2

Steps to Reproduce:
1. Create provider-consumer setup
2. uninstall consumer add-on
3. uninstall provider add-on
4. Ensure install button is available and no state in the add-on detail page of the provider-qe add-on
5. Reinstall add-on again

Actual results:
openshift-storage namespace exist however no pods are running in that space

Expected results:
ODF should install successfully



Additional info:
Discussion on this before raising BZ in engg room: https://chat.google.com/room/AAAASHA9vWs/gtIYwCL0fn0
below are QE observation during this BZ reproducer:
1. Uninstalled add-on from OCM UI and it succeeded.
2. UI now gives the option to Install again. Tried to install but nothing starts
observation - it is seen that the namespace deletion was still stuck from #1
----------------------------------------------------------
status:
  conditions:
  - lastTransitionTime: "2022-03-24T12:02:00Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2022-03-24T12:02:00Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2022-03-24T12:02:00Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2022-03-24T12:02:00Z"
    message: 'Some resources are remaining: configmaps. has 1 resource instances'
    reason: SomeResourcesRemain
    status: "True"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2022-03-24T12:02:00Z"
    message: 'Some content in the namespace has finalizers remaining: ceph.rook.io/disaster-protection
      in 1 resource instances'
    reason: SomeFinalizersRemain
    status: "True"
    type: NamespaceFinalizersRemaining
  phase: Terminating
```
output after nearly 30 min of reinstallation: 
oc get secret -n openshift-storage
No resources were found in openshift-storage namespace.
➜  203 oc get cm -n openshift-storage
NAME                      DATA   AGE
rook-ceph-mon-endpoints   4      5h57m
----------------------------------------------------
oc get cm -o yaml
apiVersion: v1
items:
- apiVersion: v1
  data:
    csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["10.0.164.129:6789","10.0.133.3:6789","10.0.210.56:6789"]}]'
    data: a=10.0.164.129:6789,b=10.0.133.3:6789,c=10.0.210.56:6789
    mapping: '{"node":{"a":{"Name":"ip-10-0-164-129.us-east-2.compute.internal","Hostname":"ip-10-0-164-129.us-east-2.compute.internal","Address":"10.0.164.129"},"b":{"Name":"ip-10-0-133-3.us-east-2.compute.internal","Hostname":"ip-10-0-133-3.us-east-2.compute.internal","Address":"10.0.133.3"},"c":{"Name":"ip-10-0-210-56.us-east-2.compute.internal","Hostname":"ip-10-0-210-56.us-east-2.compute.internal","Address":"10.0.210.56"}}}'
    maxMonId: "2"
  kind: ConfigMap
  metadata:
    creationTimestamp: "2022-03-24T07:03:53Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2022-03-24T11:46:57Z"
    finalizers:
    - ceph.rook.io/disaster-protection
    name: rook-ceph-mon-endpoints
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: ceph.rook.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      uid: 344a4a26-be67-47b7-8060-de8de9102d18
    resourceVersion: "363649"
    uid: 28f9a725-39b3-467b-a1aa-24f1a72ba0e7
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
------------------------------------------------------
cephCluster owns this configMap and cephCluster got deleted by the configMap didn't get deleted.
 ownerReferences:
    - apiVersion: ceph.rook.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      uid: 344a4a26-be67-47b7-8060-de8de9102d18

------------------------------------------------------

Initial analysis from Engineering: 
[Dhruv Bindra]
Two issues over here
1. Namespace deletion blocked by a finalizer on configMap that is owned by cephCluster(cephCluster was deleted but configMap didn't)
2. The namespace was not deleted even though the status of the addon was uninstalled

Comment 1 Ohad 2022-03-25 12:10:21 UTC
This should have a very low priority as this will have no impact on customers. 
Provider add-on will never be installed on an existing cluster, providers clusters are a package of a newly provisioned cluster and the addon.

As of the above, setting the priority and severity to Low

Comment 9 suchita 2023-02-06 10:40:52 UTC
As per the current release features, the provider is installed as a service; not as an addon install. Hence individually provider addon and reinstall is a non-reproducible scenario with current releases. Hence This bug is no more a valid-applicable bug. Hence Closing this BZ as Not a bug.