Description of problem: rook-ceph requires clean disks for the ceph cluster to come up successfully. The OCS integration into Assisted Installer for bare metal uses LSO to create the PVs for Ceph OSDs. In case those disks are not clean, the cluster is not created. It would be very useful to have LSO provide an option to wipe the disks before creating the PVs. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Would it make more sense to make un-installation of the previous cluster clean the volumes better when releasing the PVs? Perhaps with configurable "cleaner", either wipefs or dd. In addition, dd can be very very slow (on big disks) - do you really want to wait for hours / days for it to complete?
Better cleaning during deprovision would help, but it won't reliably resolve this issue. We can't assume that a cluster or a host was healthy enough to voluntarily clean up during deprovision. "dd" is one option with the downsides you've raised, but the primary request is to enable a ceph cluster to come up successfully. If there's a better way to clean the disks such that they can be used in a new ceph cluster, that would be great. Perhaps the ceph community has a way to handle this.
> host was healthy enough to voluntarily clean up during deprovision. If the host has a random filesystem on its volumes, Kubernetes policy is not to overwrite them - they may have some important data. I understand that users need some option use the disks regardless what is on them, still, IMO the default should be safe and especially LocalVolumeSet should not clean the volumes unless some sort of force is applied. This needs a solid design & API.
(In reply to Michael Hrivnak from comment #2) > Better cleaning during deprovision would help, but it won't reliably resolve > this issue. We can't assume that a cluster or a host was healthy enough to > voluntarily clean up during deprovision. Right. This is especially the case in bare metal setups where customers might try to reuse hosts/disks and setup their cluster afresh. > > "dd" is one option with the downsides you've raised, but the primary request > is to enable a ceph cluster to come up successfully. If there's a better way > to clean the disks such that they can be used in a new ceph cluster, that > would be great. Perhaps the ceph community has a way to handle this. I checked with Sebastien and we should be good enough with a "wipefs --all --force". The dd need not be run on the entire disk - just a few MB should do. cc @seb
We were discussing adding some `force: true` option both to LocalVolume & LocalVolumeSet, which will wipefs matching volumes. Would it be enough to satisfy this RFE? And is it future-proof enough? It may be enough for assisted installer MVP / POC, but we don't want to paint ourselves in the corner if you will need something better in the future. The API will need to be supported forever.
Some points that came up while discussing this: 1. The primary use case for this is the Bare Metal Assisted Installer. 2. The default value will the false 3. The value will be set to false for all LocalVolumeSets created with the OCP console UI. Questions: 1. What if the filters defined in multiple LocalVolumeSet definitions match the same disk? Which one will be applied? This will be problematic if one LocalVolumeSet definition wants the disk to be cleaned and the other does not. Rohan and Santosh, I would appreciate your input on this. Jan, please hold off on making any changes for now. I would like to hear from the Rohan and Santosh as well.
(In reply to Michael Hrivnak from comment #2) > Better cleaning during deprovision would help, but it won't reliably resolve > this issue. We can't assume that a cluster or a host was healthy enough to > voluntarily clean up during deprovision. > > "dd" is one option with the downsides you've raised, but the primary request > is to enable a ceph cluster to come up successfully. If there's a better way > to clean the disks such that they can be used in a new ceph cluster, that > would be great. Perhaps the ceph community has a way to handle this. https://docs.ceph.com/en/latest/ceph-volume/lvm/zap/ : This subcommand is used to zap lvs, partitions or raw devices that have been used by ceph OSDs so that they may be reused. If given a path to a logical volume it must be in the format of vg/lv. Any file systems present on the given lv or partition will be removed and all data will be purged.
One point to erase the reused disks, which is not only bare-metal, because redeployment of virtual machines can create the same issue. Second point is a better error message in case the volume is not clean for whatever reasons would still help, ie. >>> 2021-05-11 09:41:09.630086 I | cephosd: discovering hardware 2021-05-11 09:41:09.630093 D | exec: Running command: lsblk /mnt/ocs-deviceset-ha-storage-2-data-0r2dv8 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME 2021-05-11 09:41:09.632602 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-ha-storage-2-data-0r2dv8 failed to get device info for "/mnt/ocs-deviceset-ha-storage-2-data-0r2dv8": exit status 2 <<<
> What if the filters defined in multiple LocalVolumeSet definitions match the same disk? Which one will be applied? This will be problematic if one LocalVolumeSet definition wants the disk to be cleaned and the other does not. If there are multiple lvsets that match the disk, there won't be any guarantees as to which one picks it, but once it is picked up by one, it won't be influenced by the others.
On today's meeting between OCP storage (LSO) and OCS we agreed that we can add some `force: true` field to LocalVolume object - user (assisted installer) explicitly files which devices should be managed by LSO. Adding the field to LocalVolumeSet is error prone, from time to time we get a bug that LSO used a wrong device (BIOS partition, /dev/rb0, ...) and we really do not want to overwrite these.
*** Bug 1872691 has been marked as a duplicate of this bug. ***
I remember we agreed that wiping disks will be problematic for Localvolumeset as it might pick wrong disks (bios, etc). (cancelling need info)
I turned this BZ to RFE in our Jira board. Please check it's accurate and talk to our PM (Duncan) for prioritization. https://issues.redhat.com/browse/RFE-2033