Bug 1870061
Summary: | [RHEL][IBM] OCS un-install should make the devices raw | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | akgunjal <akgunjal> | |
Component: | ocs-operator | Assignee: | Raghavendra Talur <rtalur> | |
Status: | CLOSED ERRATA | QA Contact: | Anna Sandler <asandler> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.4 | CC: | ebenahar, jarrpa, madam, muagarwa, nberry, ocs-bugs, rtalur, sabose, sapillai, sostapov, tdesala | |
Target Milestone: | --- | Keywords: | AutomationBackLog | |
Target Release: | OCS 4.6.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1887468 (view as bug list) | Environment: | ||
Last Closed: | 2020-12-17 06:23:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1885648 | |||
Bug Blocks: |
Description
akgunjal@in.ibm.com
2020-08-19 09:44:31 UTC
Hi , Were the uninstall steps from the official 4.4 docs followed to Clean up the cluster? Because we have explicit steps for wiping the disk and deleting the dataDir in the Worker Nodes Doc link - https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.4/html-single/deploying_openshift_container_storage/index#assembly_uninstalling-openshift-container-storage_aws-vmware Steps for cleaning up on the Node side: 7. Clean up the storage operator artifacts on each node. 8. Delete the local volume created during the deployment and for each of the local volumes listed in step 4. 9. Wipe the disks for each of the local volumes listed in step 4 so that they can be reused. P.S: With 4.5, some of these steps are automated with Storagecluster Deletion , in case one uses the correct label for Cleanup policy. But the feature is available from OCS 4.5 onwards. Hi, We use the uninstall doc link today and remove the path and wipe the disks as given in work-around of this issue. I was asking about automation of these steps should be done when OCS is uninstalled. If its supported in 4.5 then we are fine. Point me to the doc where its supported in 4.5 version. The removal of local volume path and wipe of disk needs to be automated. In OCS 4.6 such flow should be handled automatically, hence, proposing as a blocker I don't think this qualifies as a blocker. Best I can tell this is not part of any MVP epic that was accepted during the planning phase. Just because it "should" work that way is not a reason to block the release. It is entirely possible that this will not be achievable. Indeed, handling anything with the local devices other than the data on them is likely outside the scope of OCS. @Elad, as explained by Jose this is something we should not associate with OCS uninstall feature and can be taken up separately. I agree that we have to proceed manually in such setups but this is kind of an exception and should treated in that manner. Looks like we need more time for this and it can't be done in 4.6 timeframe. @Talur, do you want to add anything here? > @Talur, do you want to add anything here? There are two requirements in the title. 1. Make the devices RAW DONE 2. Removal of the local volume paths As Jose also mentioned, this is probably outside the scope of the OCS components. We tested it recently and the paths are still left behind after uninstall, even in the case where LSO LocalVolumeSets are used. Basically, if the install requires manual steps then uninstall would require too. (Install steps for local-storage - https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.5/html-single/deploying_openshift_container_storage_using_amazon_web_services/index#creating-openshift-container-storage-cluster-on-amazon-ec2_local-storage) @talur: My understanding is the devices are made RAW now upon uninstall so the devices are zapped. But the local volume paths on nodes are not removed. These paths were not created manually before install of OCS. They are created as part of OCS install and since OCS needs it to be empty, it needs to remove the directories of /mnt/local-storage as it got created automatically. Maybe OCS can uninstall cleanup any contents in those paths as it contains OCS related data. 2. Removal of the local volume paths Is this something LSO should handle then? On removal of PV or uninstall of LSO? (In reply to Sahina Bose from comment #11) > 2. Removal of the local volume paths > Is this something LSO should handle then? On removal of PV or uninstall of > LSO? IMO, yes it is LSO which creates it and can delete it. Sahina, who can help with that? The local volume paths referred here are just symlinks that LocalVolume/LocalVolumeSet creates. The provisioner (sig-storage-local-storage-provisioner) picks up these symlinks and provisions PVs out of them. I'm also of the opinion that these symlinks (/mnt/local-storage/<storageclassname>/<symlink>) should be delete by LSO and not by OCS. OCS is not directly controlling the localvolumeset/localvolume and thus not deleting the localVolumeSet/localVolume on its deletion. So if OCS decides to delete the symlinks (/mnt/local-storage/<storageclassname>/<symlink>) or the entire storageclass directory (/mnt/local-storage/<storageclassname>), then these will get created again because localVolume/localVolumeSet daemons are still running. (In reply to Santosh Pillai from comment #13) > The local volume paths referred here are just symlinks that > LocalVolume/LocalVolumeSet creates. The provisioner > (sig-storage-local-storage-provisioner) picks up these symlinks and > provisions PVs out of them. > > I'm also of the opinion that these symlinks > (/mnt/local-storage/<storageclassname>/<symlink>) should be delete by LSO > and not by OCS. > > OCS is not directly controlling the localvolumeset/localvolume and thus not > deleting the localVolumeSet/localVolume on its deletion. So if OCS decides > to delete the symlinks (/mnt/local-storage/<storageclassname>/<symlink>) or > the entire storageclass directory (/mnt/local-storage/<storageclassname>), > then these will get created again because localVolume/localVolumeSet daemons > are still running. Based on this information, I suggest that we split this bug into two. 1. Cleanup the disks and make them RAW Component remains the same(ocs-operator) and it is fixed. 2. File a new bug on LSO asking for the local volume paths to be removed. I will do this tomorrow after waiting for a day to see if there are any objections. No complaints on my part! I have created a new bug to track the LSO changes required to remove the symlinks under /mnt/local-storage : https://bugzilla.redhat.com/show_bug.cgi?id=1887468 Renaming this bug to track only the cleanup of the disks. PR to cleanup the disks was merged in rook in 4.5 https://github.com/rook/rook/pull/5545 PR to make cleanup the default was merged in ocs operator with the first build of 4.6 https://github.com/openshift/ocs-operator/pull/731 Hi talur, IIUC, wipefs as part of storagecluster deletion is the actual fix for this bug as is independent of platform But this BZ was raised in OCS 4.4 on IBM, hence wanted to confirm if this BZ needs to be verified on IBM as well or any platform will do ? @akgunjal.com , atleast for IBM platform, can we request you to also verify from your end too, if possible? Moving to verified. was tested on ASW+LSO and devices are becoming raw after uninstall Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605 |