Bug 2068402
| Summary: | Re-installation of provider-qe add-on stuck in installing state | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | suchita <sgatfane> |
| Component: | odf-managed-service | Assignee: | Nobody <nobody> |
| Status: | CLOSED NOTABUG | QA Contact: | Neha Berry <nberry> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 4.10 | CC: | aeyal, dbindra, nberry, ocs-bugs, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-02-06 10:40:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This should have a very low priority as this will have no impact on customers. Provider add-on will never be installed on an existing cluster, providers clusters are a package of a newly provisioned cluster and the addon. As of the above, setting the priority and severity to Low As per the current release features, the provider is installed as a service; not as an addon install. Hence individually provider addon and reinstall is a non-reproducible scenario with current releases. Hence This bug is no more a valid-applicable bug. Hence Closing this BZ as Not a bug. |
Description of problem: After Uninstallation of provider QE add-on when add-on installation initiated, installation stuck in installing state and failed to install any resources in openshift-storage space. Version-Release number of selected component (if applicable): OCP 4.0.24 oc get csv -n openshift-storage -o json ocs-operator.v4.10.0 | jq '.metadata.labels["full_version"]' "4.10.0-206" oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.0 NooBaa Operator 4.10.0 Succeeded ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 Succeeded ocs-osd-deployer.v2.0.0 OCS OSD Deployer 2.0.0 Succeeded odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 Succeeded ose-prometheus-operator.4.8.0 Prometheus Operator 4.8.0 Succeeded route-monitor-operator.v0.1.406-54ff884 Route Monitor Operator 0.1.406-54ff884 route-monitor-operator.v0.1.404-e29b74b Succeeded oc describe csv ocs-osd-deployer.v2.0.0|grep -i image Mediatype: image/svg+xml Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0 Image: quay.io/osd-addons/ocs-osd-deployer:2.0.0-1 Image: quay.io/osd-addons/ocs-osd-deployer:2.0.0-1 How reproducible: 2/2 Steps to Reproduce: 1. Create provider-consumer setup 2. uninstall consumer add-on 3. uninstall provider add-on 4. Ensure install button is available and no state in the add-on detail page of the provider-qe add-on 5. Reinstall add-on again Actual results: openshift-storage namespace exist however no pods are running in that space Expected results: ODF should install successfully Additional info: Discussion on this before raising BZ in engg room: https://chat.google.com/room/AAAASHA9vWs/gtIYwCL0fn0 below are QE observation during this BZ reproducer: 1. Uninstalled add-on from OCM UI and it succeeded. 2. UI now gives the option to Install again. Tried to install but nothing starts observation - it is seen that the namespace deletion was still stuck from #1 ---------------------------------------------------------- status: conditions: - lastTransitionTime: "2022-03-24T12:02:00Z" message: All resources successfully discovered reason: ResourcesDiscovered status: "False" type: NamespaceDeletionDiscoveryFailure - lastTransitionTime: "2022-03-24T12:02:00Z" message: All legacy kube types successfully parsed reason: ParsedGroupVersions status: "False" type: NamespaceDeletionGroupVersionParsingFailure - lastTransitionTime: "2022-03-24T12:02:00Z" message: All content successfully deleted, may be waiting on finalization reason: ContentDeleted status: "False" type: NamespaceDeletionContentFailure - lastTransitionTime: "2022-03-24T12:02:00Z" message: 'Some resources are remaining: configmaps. has 1 resource instances' reason: SomeResourcesRemain status: "True" type: NamespaceContentRemaining - lastTransitionTime: "2022-03-24T12:02:00Z" message: 'Some content in the namespace has finalizers remaining: ceph.rook.io/disaster-protection in 1 resource instances' reason: SomeFinalizersRemain status: "True" type: NamespaceFinalizersRemaining phase: Terminating ``` output after nearly 30 min of reinstallation: oc get secret -n openshift-storage No resources were found in openshift-storage namespace. ➜ 203 oc get cm -n openshift-storage NAME DATA AGE rook-ceph-mon-endpoints 4 5h57m ---------------------------------------------------- oc get cm -o yaml apiVersion: v1 items: - apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["10.0.164.129:6789","10.0.133.3:6789","10.0.210.56:6789"]}]' data: a=10.0.164.129:6789,b=10.0.133.3:6789,c=10.0.210.56:6789 mapping: '{"node":{"a":{"Name":"ip-10-0-164-129.us-east-2.compute.internal","Hostname":"ip-10-0-164-129.us-east-2.compute.internal","Address":"10.0.164.129"},"b":{"Name":"ip-10-0-133-3.us-east-2.compute.internal","Hostname":"ip-10-0-133-3.us-east-2.compute.internal","Address":"10.0.133.3"},"c":{"Name":"ip-10-0-210-56.us-east-2.compute.internal","Hostname":"ip-10-0-210-56.us-east-2.compute.internal","Address":"10.0.210.56"}}}' maxMonId: "2" kind: ConfigMap metadata: creationTimestamp: "2022-03-24T07:03:53Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2022-03-24T11:46:57Z" finalizers: - ceph.rook.io/disaster-protection name: rook-ceph-mon-endpoints namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 344a4a26-be67-47b7-8060-de8de9102d18 resourceVersion: "363649" uid: 28f9a725-39b3-467b-a1aa-24f1a72ba0e7 kind: List metadata: resourceVersion: "" selfLink: "" ------------------------------------------------------ cephCluster owns this configMap and cephCluster got deleted by the configMap didn't get deleted. ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 344a4a26-be67-47b7-8060-de8de9102d18 ------------------------------------------------------ Initial analysis from Engineering: [Dhruv Bindra] Two issues over here 1. Namespace deletion blocked by a finalizer on configMap that is owned by cephCluster(cephCluster was deleted but configMap didn't) 2. The namespace was not deleted even though the status of the addon was uninstalled