Bug 1377867

Summary: Prepare OSDs with GPT label to help ensure Ceph deployment will succeed
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: John Fulton <johfulto>
Status: CLOSED DUPLICATE QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact: Don Domingo <ddomingo>
Priority: high    
Version: 11.0 (Ocata)CC: alan_bishop, dbecker, dcritch, ddomingo, gfidente, jliberma, johfulto, jomurphy, mburns, mcornea, morazi, racedoro, rhel-osp-director-maint, rybrown, scohen, yrabl
Target Milestone: ---Keywords: Triaged
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases, a user could run a first-boot script to erase the disk and set a GPT label required by Ceph. With this release, a new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-10 18:55:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1418040, 1432309    
Bug Blocks: 1387433, 1399824    

Description Alexander Chuzhoy 2016-09-20 20:33:14 UTC
rhel-osp-director:   [RFE] Missing an optional argument to force zapping (cleaning) disks of ceph nodes.

At the moment, we have to use the procedure described below before deploying overcloud:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Would be great to have an optional argument that will automate this procedure.
Thanks.

Comment 2 John Fulton 2016-09-23 21:35:32 UTC
*** Bug 1257307 has been marked as a duplicate of this bug. ***

Comment 3 John Fulton 2016-10-18 12:08:40 UTC
*** Bug 1312190 has been marked as a duplicate of this bug. ***

Comment 4 Giulio Fidente 2016-11-23 13:56:36 UTC
*** Bug 1391199 has been marked as a duplicate of this bug. ***

Comment 5 John Fulton 2017-01-06 22:33:24 UTC
The wipe-disk script provided in the documentation [1] does not fully clean the disk if the disk was a PV for LVM. When such a disk is cleaned the kernel needs to re-scan it to get the updated information so that it can be added to ceph. 

[1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Comment 6 Alan Bishop 2017-01-09 15:22:31 UTC
Our team encountered this issue when we repurposed a server that had an LVM partition left over from its previous duty. Here is the wipe_disk code we developed that works for us:

  wipe_disk:
    type: OS::Heat::SoftwareConfig
    properties:
      config: |
        #!/bin/bash
        if [[ $HOSTNAME =~ "cephstorage" ]]; then
        {
          # LVM partitions are always in use by the kernel.  Destroy all of the
          # LVM components here so the disks are not in use and sgdisk and
          # partprobe can do their thing

          # Destroy all the logical volumes
          lvs --noheadings -o lv_name | awk '{print $1}' | while read lv;
          do
              cmd="lvremove -f $lv"
              echo $cmd
              $cmd
          done

          # Destroy all the volume groups
          vgs --noheadings -o vg_name | awk '{print $1}' | while read lvg;
          do
              cmd="vgremove -f $lvg"
              echo $cmd
              $cmd
          done

          # Destroy all the physical volumes
          pvs --noheadings -o pv_name | awk '{print $1}' | while read pv;
          do
              cmd="pvremove -ff $pv"
              echo $cmd
              $cmd
          done

          lsblk -dno NAME,TYPE | \
          while read disk type; do
            # Skip if the device type isn't "disk" or if it's mounted
            [ "${type}" == "disk" ] || continue
            device="/dev/${disk}"
            if grep -q ^${device}[1-9] /proc/mounts; then
              echo "Skipping ${device} because it's mounted"
              continue
            fi
            echo "Partitioning disk: ${disk}"
            sgdisk -og ${device}
            echo
          done
          partprobe
          parted -lm
        } > /root/wipe-disk.txt 2>&1
        fi

Comment 7 John Fulton 2017-01-17 13:08:42 UTC
David has proposed a fix upstream for this: 

 https://review.openstack.org/#/c/420992/

Comment 8 John Fulton 2017-01-17 16:32:40 UTC
*** Bug 1252158 has been marked as a duplicate of this bug. ***

Comment 9 John Fulton 2017-01-31 16:57:05 UTC
Update: Changes in Ironic will affect this BZ. See 1418040 [1] for details we are doing to test this. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1418040

Comment 13 John Fulton 2017-02-10 14:34:14 UTC
Upstream change has merged and will be in OSP11 GA, possibly RC1: 

 https://review.openstack.org/#/c/420992

Comment 14 John Fulton 2017-02-10 16:40:53 UTC
Summary: A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases a user could use a first-boot script to erase the disk and set a GPT label required by Ceph. A new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.

Comment 15 John Fulton 2017-02-10 16:45:32 UTC
New default Ironic behaviour verified: https://bugzilla.redhat.com/show_bug.cgi?id=1418040

Comment 16 Giulio Fidente 2017-02-15 16:06:42 UTC
Shall we close this as duplicate of 1418040 ?

Comment 17 John Fulton 2017-02-15 17:02:53 UTC
Alternatively, we could wait for the change in https://review.openstack.org/#/c/420992 to arrive in our package and set the fixed-in flag.

Comment 18 John Fulton 2017-03-10 18:55:24 UTC

*** This bug has been marked as a duplicate of bug 1418040 ***

Comment 19 Red Hat Bugzilla 2023-09-14 03:31:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days