rhel-osp-director: [RFE] Missing an optional argument to force zapping (cleaning) disks of ceph nodes. At the moment, we have to use the procedure described below before deploying overcloud: https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT Would be great to have an optional argument that will automate this procedure. Thanks.
*** Bug 1257307 has been marked as a duplicate of this bug. ***
*** Bug 1312190 has been marked as a duplicate of this bug. ***
*** Bug 1391199 has been marked as a duplicate of this bug. ***
The wipe-disk script provided in the documentation [1] does not fully clean the disk if the disk was a PV for LVM. When such a disk is cleaned the kernel needs to re-scan it to get the updated information so that it can be added to ceph. [1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT
Our team encountered this issue when we repurposed a server that had an LVM partition left over from its previous duty. Here is the wipe_disk code we developed that works for us: wipe_disk: type: OS::Heat::SoftwareConfig properties: config: | #!/bin/bash if [[ $HOSTNAME =~ "cephstorage" ]]; then { # LVM partitions are always in use by the kernel. Destroy all of the # LVM components here so the disks are not in use and sgdisk and # partprobe can do their thing # Destroy all the logical volumes lvs --noheadings -o lv_name | awk '{print $1}' | while read lv; do cmd="lvremove -f $lv" echo $cmd $cmd done # Destroy all the volume groups vgs --noheadings -o vg_name | awk '{print $1}' | while read lvg; do cmd="vgremove -f $lvg" echo $cmd $cmd done # Destroy all the physical volumes pvs --noheadings -o pv_name | awk '{print $1}' | while read pv; do cmd="pvremove -ff $pv" echo $cmd $cmd done lsblk -dno NAME,TYPE | \ while read disk type; do # Skip if the device type isn't "disk" or if it's mounted [ "${type}" == "disk" ] || continue device="/dev/${disk}" if grep -q ^${device}[1-9] /proc/mounts; then echo "Skipping ${device} because it's mounted" continue fi echo "Partitioning disk: ${disk}" sgdisk -og ${device} echo done partprobe parted -lm } > /root/wipe-disk.txt 2>&1 fi
David has proposed a fix upstream for this: https://review.openstack.org/#/c/420992/
*** Bug 1252158 has been marked as a duplicate of this bug. ***
Update: Changes in Ironic will affect this BZ. See 1418040 [1] for details we are doing to test this. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1418040
Upstream change has merged and will be in OSP11 GA, possibly RC1: https://review.openstack.org/#/c/420992
Summary: A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases a user could use a first-boot script to erase the disk and set a GPT label required by Ceph. A new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.
New default Ironic behaviour verified: https://bugzilla.redhat.com/show_bug.cgi?id=1418040
Shall we close this as duplicate of 1418040 ?
Alternatively, we could wait for the change in https://review.openstack.org/#/c/420992 to arrive in our package and set the fixed-in flag.
*** This bug has been marked as a duplicate of bug 1418040 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days