Bug 1251718

Summary: [Docs][Ceph] Document the need to ensure that OSD disks have a GPT label
Product: Red Hat OpenStack Reporter: jliberma <jliberma>
Component: documentationAssignee: RHOS Documentation Team <rhos-docs>
Status: CLOSED NOTABUG QA Contact: RHOS Documentation Team <rhos-docs>
Severity: unspecified Docs Contact: Derek <dcadzow>
Priority: urgent    
Version: 7.0 (Kilo)CC: dmacpher, hbrock, jefbrown, jomurphy, lbopf, mburns, mcornea, morazi, racedoro, rhel-osp-director-maint, sasha, seb, srevivo
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of:
: 1391199 (view as bug list) Environment:
Last Closed: 2016-12-08 05:47:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1391199    

Description jliberma@redhat.com 2015-08-09 06:03:28 UTC
Description of problem:

heat overcloud stack create fails if using hiera customization and non-GOT disk labels exist on target OSD disks

Version-Release number of selected component (if applicable):

python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. deploy undercloud
2. customize ceph OSD in ceph.yaml to include a disk with a non-GPT (dos) disk label:
ceph::profile::params::osds:
    '/dev/sdc':
        journal: '/dev/sdb1'
    '/dev/sdd':
        journal: '/dev/sdb2'
3. deploy overcloud with templates including customized template location:
openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server 10.16.255.2 --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates /home/stack/templates/openstack-tripleo-heat-templates/ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml

Actual results:

On undercloud:
[stack@rhos0 ~]$ heat resource-show overcloud CephStorageNodesPostDeployment | grep resource_status_reason
| resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6"" |



On ceph server:
[root@overcloud-cephstorage-3 heat-admin]# journalctl -u os-collect-config | grep fail
Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: ***************************************************************\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: Found invalid GPT and valid MBR; converting MBR to GPT format.\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: ***************************************************************\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: Non-GPT disk; not saving changes. Use -g to override.\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: ceph-disk: Error: Command '['/usr/sbin/sgdisk', '--largest-new=1', '--change-name=1:ceph data', '--partition-guid=1:fb52fa2a-5437-4f98-b011-b4ec40b58257', '--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be', '--', '/dev/sdc']' returned non-zero exit status 3\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -b /dev/sdd\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + tes
Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: rue # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdc ; then\n  mkdir -p /dev/sdc\nfi\nceph-disk prepare  /dev/sdc /tmp/journalc\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdc ; then\n  mkdir -p /dev/sdc\nfi\nceph-disk prepare  /dev/sdc /tmp/journalc\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies\u001b[0m\n", "deploy_status_code": 6}
Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true
Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies

[root@overcloud-cephstorage-3 heat-admin]# fdisk -l | grep dos
Disk label type: dos
Disk label type: dos


Expected results:

Puppet re-labels disks or reports descriptive failure in heat resource_status_reason

Additional info:

WORKAROUND: convert disk labels to GPT and rerun deploy
[root@overcloud-cephstorage-3 heat-admin]# for i in sd{b..m}; do parted -s /dev/$i mklabel gpt; done

Comment 6 Dan Prince 2015-12-07 19:04:17 UTC
So I think the solution here might involve two parts:

1) Update puppet-ceph so that it supports creating the required disk labels for us. I think this probably goes into puppet-ceph... but if they aren't keen on it perhaps puppet-tripleo could contain this resource. Note: I'm thinking the parameters for the resource would be the device name. Perhaps something like ceph::create_labels?

2) We wire in our discovery data such that puppet-resources for #1 above get automatically created. This might be something like using:

create_resources (ceph::create_labels, hiera('sd_devices'), {})

Where we create a hiera entry for all of the available sd_devices above dynamically (somehow) from the introspection data.

If the introspection data isn't useful/or easily obtainable then just blindly looping over all the devices in puppet would also work...

Comment 7 Mike Burns 2016-04-07 20:47:27 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 seb 2016-08-23 09:20:19 UTC
Agree with Dan, puppet-ceph should detect the partition type, if not GPT, zap the current partition (if partition name does not contain 'ceph') and create a GPT table.

Comment 11 seb 2016-08-25 08:39:51 UTC
We can write a bit of documentation, then address this issue in puppet-ceph later.
I can help with the doc.

Comment 12 jomurphy 2016-10-05 16:05:10 UTC
*** Bug 1253395 has been marked as a duplicate of this bug. ***

Comment 13 Lucy Bopf 2016-11-07 06:32:09 UTC
Moving to 'NEW' to be triaged as resources allow.

Seb, can you summarize what we need to update in the documentation? I'll update the bug summary to match.

Comment 14 seb 2016-11-08 11:21:18 UTC
Lucy, sure.

"Prior to deploy any Ceph nodes, we need to make sure that OSD disks have a GPT label. This can be tested via the following command, ie for /dev/vdb disk:

$ sudo parted /dev/vdb print

Valid output:
Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start  End  Size  File system  Name  Flags

Unvalid output:
Error: /dev/vdb: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vdd: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

This is the expected result, if not you can run the following command:

$ sudo sgdisk --zap-all --clear --mbrtogpt -g -- /dev/vdb
"

Is it enough?

Comment 15 Lucy Bopf 2016-11-09 01:27:32 UTC
Thanks, Seb. I think that's sufficient for now. I've updated the summary accordingly.

Comment 16 Dan Macpherson 2016-12-08 05:36:22 UTC
I think we can close this BZ. Since OSP8, I included a script in the Ceph Guide that formats all OSD disks (except the root disk) to GPT immediately after the provisioning process:

https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

@lbopf, what do you think? IMO, we can close this BZ, but it's up to you.

Comment 17 Lucy Bopf 2016-12-08 05:47:28 UTC
Thanks for reviewing, Dan. I agree that the section you've referenced covers the request outlined in this bug.

I see we also have a note about OSD disks requiring GPT labels as far back as OSP 7, which is the version against which this bug is raised:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Advanced-Scenario_3_Using_the_CLI_to_Create_an_Advanced_Overcloud_with_Ceph_Nodes.html#sect-Advanced-Configuring_Ceph_Storage

Moving to CLOSED.