1377867 – Prepare OSDs with GPT label to help ensure Ceph deployment will succeed

Bug 1377867 - Prepare OSDs with GPT label to help ensure Ceph deployment will succeed

Summary: Prepare OSDs with GPT label to help ensure Ceph deployment will succeed

Keywords:
Status:	CLOSED DUPLICATE of bug 1418040
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	11.0 (Ocata)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	11.0 (Ocata)
Assignee:	John Fulton
QA Contact:	Yogev Rabl
Docs Contact:	Don Domingo
URL:
Whiteboard:
Duplicates (4):	1252158 1257307 1312190 1391199 (view as bug list)
Depends On:	1418040 1432309
Blocks:	1387433 ciscoosp11bugs
TreeView+	depends on / blocked

Reported:	2016-09-20 20:33 UTC by Alexander Chuzhoy
Modified:	2023-09-14 03:31 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases, a user could run a first-boot script to erase the disk and set a GPT label required by Ceph. With this release, a new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.
Clone Of:
Environment:
Last Closed:	2017-03-10 18:55:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1656428	0	None	None	None	2017-01-13 21:47:15 UTC
Red Hat Bugzilla	1418040	0	high	CLOSED	Verify impact of Ironic cleaning on Ceph as deployed by OSPd	2022-03-13 15:23:54 UTC

Internal Links: 1312190

Description Alexander Chuzhoy 2016-09-20 20:33:14 UTC

rhel-osp-director:   [RFE] Missing an optional argument to force zapping (cleaning) disks of ceph nodes.

At the moment, we have to use the procedure described below before deploying overcloud:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Would be great to have an optional argument that will automate this procedure.
Thanks.

Comment 2 John Fulton 2016-09-23 21:35:32 UTC

*** Bug 1257307 has been marked as a duplicate of this bug. ***

Comment 3 John Fulton 2016-10-18 12:08:40 UTC

*** Bug 1312190 has been marked as a duplicate of this bug. ***

Comment 4 Giulio Fidente 2016-11-23 13:56:36 UTC

*** Bug 1391199 has been marked as a duplicate of this bug. ***

Comment 5 John Fulton 2017-01-06 22:33:24 UTC

The wipe-disk script provided in the documentation [1] does not fully clean the disk if the disk was a PV for LVM. When such a disk is cleaned the kernel needs to re-scan it to get the updated information so that it can be added to ceph. 

[1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Comment 6 Alan Bishop 2017-01-09 15:22:31 UTC

Our team encountered this issue when we repurposed a server that had an LVM partition left over from its previous duty. Here is the wipe_disk code we developed that works for us:

  wipe_disk:
    type: OS::Heat::SoftwareConfig
    properties:
      config: |
        #!/bin/bash
        if [[ $HOSTNAME =~ "cephstorage" ]]; then
        {
          # LVM partitions are always in use by the kernel.  Destroy all of the
          # LVM components here so the disks are not in use and sgdisk and
          # partprobe can do their thing

          # Destroy all the logical volumes
          lvs --noheadings -o lv_name | awk '{print $1}' | while read lv;
          do
              cmd="lvremove -f $lv"
              echo $cmd
              $cmd
          done

          # Destroy all the volume groups
          vgs --noheadings -o vg_name | awk '{print $1}' | while read lvg;
          do
              cmd="vgremove -f $lvg"
              echo $cmd
              $cmd
          done

          # Destroy all the physical volumes
          pvs --noheadings -o pv_name | awk '{print $1}' | while read pv;
          do
              cmd="pvremove -ff $pv"
              echo $cmd
              $cmd
          done

          lsblk -dno NAME,TYPE | \
          while read disk type; do
            # Skip if the device type isn't "disk" or if it's mounted
            [ "${type}" == "disk" ] || continue
            device="/dev/${disk}"
            if grep -q ^${device}[1-9] /proc/mounts; then
              echo "Skipping ${device} because it's mounted"
              continue
            fi
            echo "Partitioning disk: ${disk}"
            sgdisk -og ${device}
            echo
          done
          partprobe
          parted -lm
        } > /root/wipe-disk.txt 2>&1
        fi

Comment 7 John Fulton 2017-01-17 13:08:42 UTC

David has proposed a fix upstream for this: 

 https://review.openstack.org/#/c/420992/

Comment 8 John Fulton 2017-01-17 16:32:40 UTC

*** Bug 1252158 has been marked as a duplicate of this bug. ***

Comment 9 John Fulton 2017-01-31 16:57:05 UTC

Update: Changes in Ironic will affect this BZ. See 1418040 [1] for details we are doing to test this. 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1418040

Comment 13 John Fulton 2017-02-10 14:34:14 UTC

Upstream change has merged and will be in OSP11 GA, possibly RC1: 

 https://review.openstack.org/#/c/420992

Comment 14 John Fulton 2017-02-10 16:40:53 UTC

Summary: A disk can be in a variety of states which may cause director to fail when attempting to make the disk a Ceph OSD. In previous releases a user could use a first-boot script to erase the disk and set a GPT label required by Ceph. A new default setting in Ironic will erase the disks when a node is set to available and a change in puppet-ceph will give the disk a GPT label if there is no GPT label on the disk.

Comment 15 John Fulton 2017-02-10 16:45:32 UTC

New default Ironic behaviour verified: https://bugzilla.redhat.com/show_bug.cgi?id=1418040

Comment 16 Giulio Fidente 2017-02-15 16:06:42 UTC

Shall we close this as duplicate of 1418040 ?

Comment 17 John Fulton 2017-02-15 17:02:53 UTC

Alternatively, we could wait for the change in https://review.openstack.org/#/c/420992 to arrive in our package and set the fixed-in flag.

Comment 18 John Fulton 2017-03-10 18:55:24 UTC


*** This bug has been marked as a duplicate of bug 1418040 ***

Comment 19 Red Hat Bugzilla 2023-09-14 03:31:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.