Bug 1251636 - Ceph Post deploy config fails hiera customization if existing logical volumes on target OSD disks
Ceph Post deploy config fails hiera customization if existing logical volumes...
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
urgent Severity unspecified
: ---
: 10.0 (Newton)
Assigned To: leseb
Yogev Rabl
Depends On:
  Show dependency treegraph
Reported: 2015-08-08 01:08 EDT by jliberma@redhat.com
Modified: 2016-09-21 11:26 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-09-21 11:26:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description jliberma@redhat.com 2015-08-08 01:08:01 EDT
Description of problem:

When specifying custom disklayout for ceph OSD in heat templates, heat post deploy fails if existing logical volumes on OSD disks.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Deploy undercloud

2. Customize hiera for OSDs including a disk with existing LVM info

[stack@rhos0 ~]$ grep -B 1 -A 1 dev templates/openstack-tripleo-heat-templates/puppet/hieradata/ceph.yaml 
       journal: '/tmp/journalb'   
       journal: '/tmp/journalc'    

3. Deploy overcloud

openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates /home/stack/templates/openstack-tripleo-heat-templates/ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml

Actual results:

[stack@rhos0 ~]$ heat resource-show overcloud CephStorageNodesPostDeployment | grep status

resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6"" |

Expected results:
Existing partitions and LVM data is removed from the disks

Additional info:

Error on ceph storage node:

[root@overcloud-cephstorage-0 heat-admin]# journalctl -u os-collect-config  | grep -i fail | tail -n 3
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies

[root@overcloud-cephstorage-0 heat-admin]# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sdc3  myvg lvm2 a--  929.81g    0 

[root@overcloud-cephstorage-0 heat-admin]# vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  myvg   1   1   0 wz--n- 929.81g    0 

[root@overcloud-cephstorage-0 heat-admin]# lvs
  LV      VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  rootvol myvg -wi-a----- 929.81g
Comment 6 Dan Prince 2015-12-07 15:47:24 EST
This sounds like another cleanup issue. We could try to somehow detect dirty state in the introspection ramdisk and then fail-fast during deployment if it was detected? Similar to BZ 1252158 we might consider using Ironic clean_nodes to trigger a cleanup to happen.

Or... perhaps there is a way to do inline cleanup via puppet-ceph? (I'm not aware that there is.) Similar to BZ 1251718 we might consider driving a puppet-ceph cleanup resource via introspection data.
Comment 7 Mike Burns 2016-04-07 16:47:27 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 10 seb 2016-08-23 11:49:00 EDT
Can't we implement a purge function in puppet-ceph to solve this?
Comment 11 seb 2016-08-25 04:40:39 EDT
We can write a doc for that, then address this in puppet-ceph later with a purge function.
Comment 12 seb 2016-08-31 10:00:09 EDT
We should not use lvm for Ceph OSD data.
This looks like a leftover from a previous attempts to me, we can hardly expect puppet-ceph removing lvs, vgs and pvs...
So this must be cleanup prior to deploy, I don't even know how/why got configured on these drives in the first place.

To me the deployer should make sure this LVM configuration is wiped prior to start deploying any Ceph.
I'm tempted to mark this close this one as "won't fix". Will try to discuss this further with the team.

Note You need to log in before you can comment on or make changes to this bug.