1947257 – RFE: Provide an option for LSO to wipe disks before creating PVs

Bug 1947257 - RFE: Provide an option for LSO to wipe disks before creating PVs

Summary: RFE: Provide an option for LSO to wipe disks before creating PVs

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	aos-storage-staff@redhat.com
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1872691 (view as bug list)
Depends On:
Blocks:	1945043
TreeView+	depends on / blocked

Reported:	2021-04-08 05:33 UTC by N Balachandran
Modified:	2024-10-01 17:52 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-30 09:43:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description N Balachandran 2021-04-08 05:33:57 UTC

Description of problem:

rook-ceph requires clean disks for the ceph cluster to come up successfully.

The OCS integration into Assisted Installer for bare metal uses LSO to create the PVs for Ceph OSDs. In case those disks are not clean, the cluster is not created.

It would be very useful to have LSO provide an option to wipe the disks before creating the PVs.


Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2021-04-16 15:08:29 UTC

Would it make more sense to make un-installation of the previous cluster clean the volumes better when releasing the PVs? Perhaps with configurable "cleaner", either wipefs or dd.
In addition, dd can be very very slow (on big disks) - do you really want to wait for hours / days for it to complete?

Comment 2 Michael Hrivnak 2021-04-19 13:23:51 UTC

Better cleaning during deprovision would help, but it won't reliably resolve this issue. We can't assume that a cluster or a host was healthy enough to voluntarily clean up during deprovision.

"dd" is one option with the downsides you've raised, but the primary request is to enable a ceph cluster to come up successfully. If there's a better way to clean the disks such that they can be used in a new ceph cluster, that would be great. Perhaps the ceph community has a way to handle this.

Comment 3 Jan Safranek 2021-04-20 13:11:18 UTC

> host was healthy enough to voluntarily clean up during deprovision.

If the host has a random filesystem on its volumes, Kubernetes policy is not to overwrite them - they may have some important data. I understand that users need some option use the disks regardless what is on them, still, IMO the default should be safe and especially LocalVolumeSet should not clean the volumes unless some sort of force is applied. This needs a solid design & API.

Comment 4 N Balachandran 2021-04-20 13:17:04 UTC

(In reply to Michael Hrivnak from comment #2)
> Better cleaning during deprovision would help, but it won't reliably resolve
> this issue. We can't assume that a cluster or a host was healthy enough to
> voluntarily clean up during deprovision.

Right. This is especially the case in bare metal setups where customers might try to reuse hosts/disks and setup their cluster afresh.

> 
> "dd" is one option with the downsides you've raised, but the primary request
> is to enable a ceph cluster to come up successfully. If there's a better way
> to clean the disks such that they can be used in a new ceph cluster, that
> would be great. Perhaps the ceph community has a way to handle this.

I checked with Sebastien and we should be good enough with a "wipefs --all --force".  The dd need not be run on the entire disk - just a few MB should do.

cc @seb

Comment 5 Jan Safranek 2021-05-04 14:37:58 UTC

We were discussing adding some `force: true` option both to LocalVolume & LocalVolumeSet, which will wipefs matching volumes. Would it be enough to satisfy this RFE? And is it future-proof enough? It may be enough for  assisted installer MVP / POC, but we don't want to paint ourselves in the corner if you will need something better in the future. The API will need to be supported forever.

Comment 6 N Balachandran 2021-05-06 15:16:30 UTC

Some points that came up while discussing this:
1. The primary use case for this is the Bare Metal Assisted Installer.
2. The default value will the false
3. The value will be set to false for all LocalVolumeSets created with the OCP console UI.

Questions:
1. What if the filters defined in multiple LocalVolumeSet definitions match the same disk? Which one will be applied? This will be problematic if one LocalVolumeSet definition wants the disk to be cleaned and the other does not.


Rohan and Santosh, I would appreciate your input on this.

Jan, please hold off on making any changes for now. I would like to hear from the Rohan and Santosh as well.

Comment 7 Yaniv Kaul 2021-05-06 15:39:12 UTC

(In reply to Michael Hrivnak from comment #2)
> Better cleaning during deprovision would help, but it won't reliably resolve
> this issue. We can't assume that a cluster or a host was healthy enough to
> voluntarily clean up during deprovision.
> 
> "dd" is one option with the downsides you've raised, but the primary request
> is to enable a ceph cluster to come up successfully. If there's a better way
> to clean the disks such that they can be used in a new ceph cluster, that
> would be great. Perhaps the ceph community has a way to handle this.

https://docs.ceph.com/en/latest/ceph-volume/lvm/zap/ :
This subcommand is used to zap lvs, partitions or raw devices that have been used by ceph OSDs so that they may be reused. If given a path to a logical volume it must be in the format of vg/lv. Any file systems present on the given lv or partition will be removed and all data will be purged.

Comment 8 Marc Schindler 2021-05-12 15:02:14 UTC

One point to erase the reused disks, which is not only bare-metal, because redeployment of virtual machines can create the same issue.

Second point is a better error message in case the volume is not clean for whatever reasons would still help, ie.

>>>
2021-05-11 09:41:09.630086 I | cephosd: discovering hardware
2021-05-11 09:41:09.630093 D | exec: Running command: lsblk /mnt/ocs-deviceset-ha-storage-2-data-0r2dv8 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2021-05-11 09:41:09.632602 D | exec: Running command: sgdisk --print /mnt/ocs-deviceset-ha-storage-2-data-0r2dv8 
failed to get device info for "/mnt/ocs-deviceset-ha-storage-2-data-0r2dv8": exit status 2
<<<

Comment 9 Rohan CJ 2021-05-14 06:10:52 UTC

> What if the filters defined in multiple LocalVolumeSet definitions match the same disk? Which one will be applied? This will be problematic if one LocalVolumeSet definition wants the disk to be cleaned and the other does not.

If there are multiple lvsets that match the disk, there won't be any guarantees as to which one picks it, but once it is picked up by one, it won't be influenced by the others.

Comment 10 Jan Safranek 2021-05-14 16:51:59 UTC

On today's meeting between OCP storage (LSO) and OCS we agreed that we can add some `force: true` field to LocalVolume object - user (assisted installer) explicitly files which devices should be managed by LSO. Adding the field to LocalVolumeSet is error prone, from time to time we get a bug that LSO used a wrong device (BIOS partition, /dev/rb0, ...) and we really do not want to overwrite these.

Comment 11 Jan Safranek 2021-06-02 15:22:55 UTC

*** Bug 1872691 has been marked as a duplicate of this bug. ***

Comment 12 Santosh Pillai 2021-06-14 03:05:26 UTC

I remember we agreed that wiping disks will be problematic for Localvolumeset as it might pick wrong disks (bios, etc). (cancelling need info)

Comment 13 Jan Safranek 2021-07-30 09:43:18 UTC

I turned this BZ to RFE in our Jira board. Please check it's accurate and talk to our PM (Duncan) for prioritization.
https://issues.redhat.com/browse/RFE-2033

Note You need to log in before you can comment on or make changes to this bug.