Bug 1377867

Summary:	Prepare OSDs with GPT label to help ensure Ceph deployment will succeed
Product:	Red Hat OpenStack	Reporter:	Alexander Chuzhoy <sasha>
Component:	rhosp-director	Assignee:	John Fulton <johfulto>
Status:	CLOSED DUPLICATE	QA Contact:	Yogev Rabl <yrabl>
Severity:	high	Docs Contact:	Don Domingo <ddomingo>
Priority:	high
Version:	11.0 (Ocata)	CC:	alan_bishop, dbecker, dcritch, ddomingo, gfidente, jliberma, johfulto, jomurphy, mburns, mcornea, morazi, racedoro, rhel-osp-director-maint, rybrown, scohen, yrabl
Target Milestone:	---	Keywords:	Triaged
Target Release:	11.0 (Ocata)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases, a user could run a first-boot script to erase the disk and set a GPT label required by Ceph. With this release, a new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-10 18:55:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1418040, 1432309
Bug Blocks:	1387433, 1399824

Description Alexander Chuzhoy 2016-09-20 20:33:14 UTC

rhel-osp-director:   [RFE] Missing an optional argument to force zapping (cleaning) disks of ceph nodes.

At the moment, we have to use the procedure described below before deploying overcloud:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Would be great to have an optional argument that will automate this procedure.
Thanks.

Comment 2 John Fulton 2016-09-23 21:35:32 UTC

*** Bug 1257307 has been marked as a duplicate of this bug. ***

Comment 3 John Fulton 2016-10-18 12:08:40 UTC

*** Bug 1312190 has been marked as a duplicate of this bug. ***

Comment 4 Giulio Fidente 2016-11-23 13:56:36 UTC

*** Bug 1391199 has been marked as a duplicate of this bug. ***

Comment 5 John Fulton 2017-01-06 22:33:24 UTC

The wipe-disk script provided in the documentation [1] does not fully clean the disk if the disk was a PV for LVM. When such a disk is cleaned the kernel needs to re-scan it to get the updated information so that it can be added to ceph. 

[1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Comment 6 Alan Bishop 2017-01-09 15:22:31 UTC

Our team encountered this issue when we repurposed a server that had an LVM partition left over from its previous duty. Here is the wipe_disk code we developed that works for us:

  wipe_disk:
    type: OS::Heat::SoftwareConfig
    properties:
      config: |
        #!/bin/bash
        if [[ $HOSTNAME =~ "cephstorage" ]]; then
        {
          # LVM partitions are always in use by the kernel.  Destroy all of the
          # LVM components here so the disks are not in use and sgdisk and
          # partprobe can do their thing

          # Destroy all the logical volumes
          lvs --noheadings -o lv_name | awk '{print $1}' | while read lv;
          do
              cmd="lvremove -f $lv"
              echo $cmd
              $cmd
          done

          # Destroy all the volume groups
          vgs --noheadings -o vg_name | awk '{print $1}' | while read lvg;
          do
              cmd="vgremove -f $lvg"
              echo $cmd
              $cmd
          done

          # Destroy all the physical volumes
          pvs --noheadings -o pv_name | awk '{print $1}' | while read pv;
          do
              cmd="pvremove -ff $pv"
              echo $cmd
              $cmd
          done

          lsblk -dno NAME,TYPE | \
          while read disk type; do
            # Skip if the device type isn't "disk" or if it's mounted
            [ "${type}" == "disk" ] || continue
            device="/dev/${disk}"
            if grep -q ^${device}[1-9] /proc/mounts; then
              echo "Skipping ${device} because it's mounted"
              continue
            fi
            echo "Partitioning disk: ${disk}"
            sgdisk -og ${device}
            echo
          done
          partprobe
          parted -lm
        } > /root/wipe-disk.txt 2>&1
        fi

Comment 7 John Fulton 2017-01-17 13:08:42 UTC

David has proposed a fix upstream for this: 

 https://review.openstack.org/#/c/420992/

Comment 8 John Fulton 2017-01-17 16:32:40 UTC

*** Bug 1252158 has been marked as a duplicate of this bug. ***

Comment 9 John Fulton 2017-01-31 16:57:05 UTC

Update: Changes in Ironic will affect this BZ. See 1418040 [1] for details we are doing to test this. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1418040

Comment 13 John Fulton 2017-02-10 14:34:14 UTC

Upstream change has merged and will be in OSP11 GA, possibly RC1: 

 https://review.openstack.org/#/c/420992

Comment 14 John Fulton 2017-02-10 16:40:53 UTC

Summary: A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases a user could use a first-boot script to erase the disk and set a GPT label required by Ceph. A new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.

Comment 15 John Fulton 2017-02-10 16:45:32 UTC

New default Ironic behaviour verified: https://bugzilla.redhat.com/show_bug.cgi?id=1418040

Comment 16 Giulio Fidente 2017-02-15 16:06:42 UTC

Shall we close this as duplicate of 1418040 ?

Comment 17 John Fulton 2017-02-15 17:02:53 UTC

Alternatively, we could wait for the change in https://review.openstack.org/#/c/420992 to arrive in our package and set the fixed-in flag.

Comment 18 John Fulton 2017-03-10 18:55:24 UTC


*** This bug has been marked as a duplicate of bug 1418040 ***

Comment 19 Red Hat Bugzilla 2023-09-14 03:31:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days