Red Hat Bugzilla – Bug 1251636
Ceph Post deploy config fails hiera customization if existing logical volumes on target OSD disks
Last modified: 2016-09-21 11:26:15 EDT
Description of problem:
When specifying custom disklayout for ceph OSD in heat templates, heat post deploy fails if existing logical volumes on OSD disks.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Deploy undercloud
2. Customize hiera for OSDs including a disk with existing LVM info
[stack@rhos0 ~]$ grep -B 1 -A 1 dev templates/openstack-tripleo-heat-templates/puppet/hieradata/ceph.yaml
3. Deploy overcloud
openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server 10.16.255.2 --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates /home/stack/templates/openstack-tripleo-heat-templates/ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml
[stack@rhos0 ~]$ heat resource-show overcloud CephStorageNodesPostDeployment | grep status
resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6"" |
Existing partitions and LVM data is removed from the disks
Error on ceph storage node:
[root@overcloud-cephstorage-0 heat-admin]# journalctl -u os-collect-config | grep -i fail | tail -n 3
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config: Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config: Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies
[root@overcloud-cephstorage-0 heat-admin]# pvs
PV VG Fmt Attr PSize PFree
/dev/sdc3 myvg lvm2 a-- 929.81g 0
[root@overcloud-cephstorage-0 heat-admin]# vgs
VG #PV #LV #SN Attr VSize VFree
myvg 1 1 0 wz--n- 929.81g 0
[root@overcloud-cephstorage-0 heat-admin]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
rootvol myvg -wi-a----- 929.81g
This sounds like another cleanup issue. We could try to somehow detect dirty state in the introspection ramdisk and then fail-fast during deployment if it was detected? Similar to BZ 1252158 we might consider using Ironic clean_nodes to trigger a cleanup to happen.
Or... perhaps there is a way to do inline cleanup via puppet-ceph? (I'm not aware that there is.) Similar to BZ 1251718 we might consider driving a puppet-ceph cleanup resource via introspection data.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Can't we implement a purge function in puppet-ceph to solve this?
We can write a doc for that, then address this in puppet-ceph later with a purge function.
We should not use lvm for Ceph OSD data.
This looks like a leftover from a previous attempts to me, we can hardly expect puppet-ceph removing lvs, vgs and pvs...
So this must be cleanup prior to deploy, I don't even know how/why got configured on these drives in the first place.
To me the deployer should make sure this LVM configuration is wiped prior to start deploying any Ceph.
I'm tempted to mark this close this one as "won't fix". Will try to discuss this further with the team.