Bug 1973603

Summary: OCS doesn't delete the pvc when a node is deleted from the UI, and the PV is stuck in "Terminating"
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Udi Kalifon <ukalifon>
Component: unclassifiedAssignee: Mudit Agarwal <muagarwa>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: aos-bugs, bniver, gmeno, jrivera, jsafrane, muagarwa, ocs-bugs, odf-bz-bot, prsurve, sostapov, tmuthami
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-12 05:26:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Kalifon 2021-06-18 09:53:58 UTC
Description of problem:
I installed a cluster (3 masters + 3 workers) with the assisted-installer and included the LSO+OCS operators. I then logged into the openshift console (GUI) and browsed to the baremetal hosts page. I marked one of the workers as unschedulable, then de-provisioned it, then deleted it completely. I then added a new worker instead of the delted one, using the day2 flow of the assisted installer.

When looking at the PVs and PVCs, I see that the PVC of the deleted worker is still not deleted, and the PV for the disk on the deleted worker is stuck in "Terminating" state.


Version-Release number of selected component (if applicable):
OCP 4.8.0-fc7


How reproducible:
100%


Steps to Reproduce:
1. Install a cluster of 3 masters and 3 workers with the assisted installer, and select openshift storage as well.
2. Log in to the openshift console UI
3. Go to Compute -> Nodes
4. From the kebab menu of one of the workers, set the node as unschedulable
5. Go to Compute -> Bare Metal Hosts
6. From the kebab menu, select to deprovision the machine
7. After deprovisioning, delete the bare metal host
8. Go back to the cloud and find the day2 cluster for your cluster
9. Add a new worker to the cluster from the Add hosts tab. The new worker should also have a suitable disk for OCS.
10. After the new worker is added, wait for LSO to create a PV for its disk, and see that OCS is claiming this PV
11. Check if the old PV and PVC are still up


Actual results:
The old PV and PVC are never deleted. It seems like OCS is still grabbing the PV so it can't terminate.


Additional info:
I tried to get the osd-removal log, but apparently it doesn't exist in this flow:

oc logs -l job-name=ocs-osd-removal-job -n openshift-storage
No resources found in openshift-storage namespace.

Comment 1 Jan Safranek 2021-06-18 14:25:58 UTC
What PV is Terminating and why? Is it because there is still PVC bound to it? It should be deleted by OCS.

In addition, deleting PVs and PVCs is quite dangerous, what if the deleted node comes back (as it was on maintenance)? In assisted installer it's probably OK to delete them, as user deleted the node, but it should not be done in generic scenario when a node disappears from the cluster.

Comment 3 Jose A. Rivera 2021-06-18 16:52:49 UTC
More information is required before anything meaningful can be assessed. Please collect full must-gather output for both OCP and OCS.

To Jan's point: Yes, PVs typically block if they are bound to a PVC that's currently in use by a Pod. Given what I understand of the situation, you decommissioned a node that had a Ceph OSD Pod running on it, using an LSO volume. If for some reason you did not follow the full OSD removal process, then Kubernetes would still think the Pod might be around, thus not deleting the PVC, thus not freeing the PVC, thus blocking the deletion of the PV. Another possibility is that if the node was not removed gracefully from the cluster, the CSI drive may think it can't unmount the PV and thus reporting it as Terminating until it eventually succeeds (which it won't!). This raises a few immediate questions:

* Does your step 10 you described mean that you were able to add a new OSD using the local volume on the new node, and Ceph is reporting HEALTH_OK?
* Does "ceph osd tree" still show the old OSD? If so, is it up or down?
* Is the old OSD Pod still present when you do "oc get pods"? If not, do the CSI provisioner logs have anything useful to say?

Finally, the following steps are vague and seem to omit a lot of detail:

8. Go back to the cloud and find the day2 cluster for your cluster
9. Add a new worker to the cluster from the Add hosts tab. The new worker should also have a suitable disk for OCS.
10. After the new worker is added, wait for LSO to create a PV for its disk, and see that OCS is claiming this PV

Please describe the full process you used, referencing any specific documentation you followed if needed.

Comment 4 Mudit Agarwal 2021-06-21 13:27:32 UTC
Not a 4.8 blocker, please re-target if required.

Comment 6 Udi Kalifon 2021-06-21 16:05:14 UTC
Output of "ceph osd tree":

sh-4.4# ceph osd tree
ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF 
-1       0.58589 root default                                
-5       0.19530     host worker-0-0                         
 1   hdd 0.19530         osd.1         down        0 1.00000 
-3       0.19530     host worker-0-1                         
 0   hdd 0.19530         osd.0           up  1.00000 1.00000 
-7       0.19530     host worker-0-2                         
 2   hdd 0.19530         osd.2           up  1.00000 1.00000 

In "ceph status" I see HEALTH_WARN. This is because this time, OCS didn't take the new LSO block as it did in the previous time when I reported the bug. I will try to find out why and will update the bug.

Comment 7 Udi Kalifon 2021-06-21 16:24:44 UTC
After editing the StorageCluster CR and changing the count to 4 and then to 3, I see this:

sh-4.4# ceph osd tree
ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF 
-1       0.78119 root default                                
-5       0.19530     host worker-0-0                         
 1   hdd 0.19530         osd.1         down        0 1.00000 
-3       0.19530     host worker-0-1                         
 0   hdd 0.19530         osd.0           up  1.00000 1.00000 
-7       0.19530     host worker-0-2                         
 2   hdd 0.19530         osd.2           up  1.00000 1.00000 
-9       0.19530     host worker-0-3                         
 3   hdd 0.19530         osd.3           up  1.00000 1.00000 

Ceph status still shows HEALTH_WARN (waited just a few minutes) and the PV of the old node's disk is still stuck in "Terminating". How do I properly release it permanently?

Comment 9 Mudit Agarwal 2021-10-12 05:26:55 UTC
No update for a long time, please reopen if the problem still exists.