Description of problem: heat overcloud stack create fails if using hiera customization and non-GOT disk labels exist on target OSD disks Version-Release number of selected component (if applicable): python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch How reproducible: Steps to Reproduce: 1. deploy undercloud 2. customize ceph OSD in ceph.yaml to include a disk with a non-GPT (dos) disk label: ceph::profile::params::osds: '/dev/sdc': journal: '/dev/sdb1' '/dev/sdd': journal: '/dev/sdb2' 3. deploy overcloud with templates including customized template location: openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server 10.16.255.2 --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates /home/stack/templates/openstack-tripleo-heat-templates/ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml Actual results: On undercloud: [stack@rhos0 ~]$ heat resource-show overcloud CephStorageNodesPostDeployment | grep resource_status_reason | resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6"" | On ceph server: [root@overcloud-cephstorage-3 heat-admin]# journalctl -u os-collect-config | grep fail Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: ***************************************************************\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: Found invalid GPT and valid MBR; converting MBR to GPT format.\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: ***************************************************************\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: \u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: Non-GPT disk; not saving changes. Use -g to override.\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: ceph-disk: Error: Command '['/usr/sbin/sgdisk', '--largest-new=1', '--change-name=1:ceph data', '--partition-guid=1:fb52fa2a-5437-4f98-b011-b4ec40b58257', '--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be', '--', '/dev/sdc']' returned non-zero exit status 3\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: + test -b /dev/sde1\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sde]/Exec[ceph-osd-activate-/dev/sde]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + test -b /dev/sdd\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdd]/Exec[ceph-osd-activate-/dev/sdd]/returns: + tes Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: rue # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdc ; then\n mkdir -p /dev/sdc\nfi\nceph-disk prepare /dev/sdc /tmp/journalc\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements\nset -ex\nif ! test -b /dev/sdc ; then\n mkdir -p /dev/sdc\nfi\nceph-disk prepare /dev/sdc /tmp/journalc\n returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies\u001b[0m\n", "deploy_status_code": 6} Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements Aug 09 00:27:45 overcloud-cephstorage-3.localdomain os-collect-config[4803]: Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies [root@overcloud-cephstorage-3 heat-admin]# fdisk -l | grep dos Disk label type: dos Disk label type: dos Expected results: Puppet re-labels disks or reports descriptive failure in heat resource_status_reason Additional info: WORKAROUND: convert disk labels to GPT and rerun deploy [root@overcloud-cephstorage-3 heat-admin]# for i in sd{b..m}; do parted -s /dev/$i mklabel gpt; done
So I think the solution here might involve two parts: 1) Update puppet-ceph so that it supports creating the required disk labels for us. I think this probably goes into puppet-ceph... but if they aren't keen on it perhaps puppet-tripleo could contain this resource. Note: I'm thinking the parameters for the resource would be the device name. Perhaps something like ceph::create_labels? 2) We wire in our discovery data such that puppet-resources for #1 above get automatically created. This might be something like using: create_resources (ceph::create_labels, hiera('sd_devices'), {}) Where we create a hiera entry for all of the available sd_devices above dynamically (somehow) from the introspection data. If the introspection data isn't useful/or easily obtainable then just blindly looping over all the devices in puppet would also work...
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Agree with Dan, puppet-ceph should detect the partition type, if not GPT, zap the current partition (if partition name does not contain 'ceph') and create a GPT table.
We can write a bit of documentation, then address this issue in puppet-ceph later. I can help with the doc.
*** Bug 1253395 has been marked as a duplicate of this bug. ***
Moving to 'NEW' to be triaged as resources allow. Seb, can you summarize what we need to update in the documentation? I'll update the bug summary to match.
Lucy, sure. "Prior to deploy any Ceph nodes, we need to make sure that OSD disks have a GPT label. This can be tested via the following command, ie for /dev/vdb disk: $ sudo parted /dev/vdb print Valid output: Model: Virtio Block Device (virtblk) Disk /dev/vdb: 21.5GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags Unvalid output: Error: /dev/vdb: unrecognised disk label Model: Virtio Block Device (virtblk) Disk /dev/vdd: 21.5GB Sector size (logical/physical): 512B/512B Partition Table: unknown Disk Flags: This is the expected result, if not you can run the following command: $ sudo sgdisk --zap-all --clear --mbrtogpt -g -- /dev/vdb " Is it enough?
Thanks, Seb. I think that's sufficient for now. I've updated the summary accordingly.
I think we can close this BZ. Since OSP8, I included a script in the Ceph Guide that formats all OSD disks (except the root disk) to GPT immediately after the provisioning process: https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT @lbopf, what do you think? IMO, we can close this BZ, but it's up to you.
Thanks for reviewing, Dan. I agree that the section you've referenced covers the request outlined in this bug. I see we also have a note about OSD disks requiring GPT labels as far back as OSP 7, which is the version against which this bug is raised: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Advanced-Scenario_3_Using_the_CLI_to_Create_an_Advanced_Overcloud_with_Ceph_Nodes.html#sect-Advanced-Configuring_Ceph_Storage Moving to CLOSED.