Description of problem: When specifying custom disklayout for ceph OSD in heat templates, heat post deploy fails if existing logical volumes on OSD disks. Version-Release number of selected component (if applicable): python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch How reproducible: Steps to Reproduce: 1. Deploy undercloud 2. Customize hiera for OSDs including a disk with existing LVM info [stack@rhos0 ~]$ grep -B 1 -A 1 dev templates/openstack-tripleo-heat-templates/puppet/hieradata/ceph.yaml ceph::profile::params::osds: '/dev/sdb': journal: '/tmp/journalb' '/dev/sdc': journal: '/tmp/journalc' 3. Deploy overcloud openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server 10.16.255.2 --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates /home/stack/templates/openstack-tripleo-heat-templates/ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml Actual results: [stack@rhos0 ~]$ heat resource-show overcloud CephStorageNodesPostDeployment | grep status resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6"" | Expected results: Existing partitions and LVM data is removed from the disks Additional info: Error on ceph storage node: [root@overcloud-cephstorage-0 heat-admin]# journalctl -u os-collect-config | grep -i fail | tail -n 3 Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies [root@overcloud-cephstorage-0 heat-admin]# pvs PV VG Fmt Attr PSize PFree /dev/sdc3 myvg lvm2 a-- 929.81g 0 [root@overcloud-cephstorage-0 heat-admin]# vgs VG #PV #LV #SN Attr VSize VFree myvg 1 1 0 wz--n- 929.81g 0 [root@overcloud-cephstorage-0 heat-admin]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert rootvol myvg -wi-a----- 929.81g
This sounds like another cleanup issue. We could try to somehow detect dirty state in the introspection ramdisk and then fail-fast during deployment if it was detected? Similar to BZ 1252158 we might consider using Ironic clean_nodes to trigger a cleanup to happen. Or... perhaps there is a way to do inline cleanup via puppet-ceph? (I'm not aware that there is.) Similar to BZ 1251718 we might consider driving a puppet-ceph cleanup resource via introspection data.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Can't we implement a purge function in puppet-ceph to solve this?
We can write a doc for that, then address this in puppet-ceph later with a purge function.
We should not use lvm for Ceph OSD data. This looks like a leftover from a previous attempts to me, we can hardly expect puppet-ceph removing lvs, vgs and pvs... So this must be cleanup prior to deploy, I don't even know how/why got configured on these drives in the first place. To me the deployer should make sure this LVM configuration is wiped prior to start deploying any Ceph. I'm tempted to mark this close this one as "won't fix". Will try to discuss this further with the team.