Bug 1251636 - Ceph Post deploy config fails hiera customization if existing logical volumes on target OSD disks
Summary: Ceph Post deploy config fails hiera customization if existing logical volumes...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: 10.0 (Newton)
Assignee: Sébastien Han
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-08 05:08 UTC by jliberma@redhat.com
Modified: 2016-09-21 15:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-21 15:26:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description jliberma@redhat.com 2015-08-08 05:08:01 UTC
Description of problem:

When specifying custom disklayout for ceph OSD in heat templates, heat post deploy fails if existing logical volumes on OSD disks.


Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.8-44.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Deploy undercloud

2. Customize hiera for OSDs including a disk with existing LVM info

[stack@rhos0 ~]$ grep -B 1 -A 1 dev templates/openstack-tripleo-heat-templates/puppet/hieradata/ceph.yaml 
ceph::profile::params::osds:
    '/dev/sdb':
       journal: '/tmp/journalb'   
    '/dev/sdc':
       journal: '/tmp/journalc'    

3. Deploy overcloud

openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server 10.16.255.2 --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates /home/stack/templates/openstack-tripleo-heat-templates/ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml


Actual results:

[stack@rhos0 ~]$ heat resource-show overcloud CephStorageNodesPostDeployment | grep status

resource_status_reason | ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6"" |

Expected results:
Existing partitions and LVM data is removed from the disks

Additional info:

Error on ceph storage node:

[root@overcloud-cephstorage-0 heat-admin]# journalctl -u os-collect-config  | grep -i fail | tail -n 3
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Dependency Exec[ceph-osd-prepare-/dev/sdc] has failures: true
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-prepare-/dev/sdc]/returns: change from notrun to 0 failed: /bin/true # comment to satisfy puppet syntax requirements
Aug 08 00:48:32 overcloud-cephstorage-0.localdomain os-collect-config[4911]: Warning: /Stage[main]/Ceph::Osds/Ceph::Osd[/dev/sdc]/Exec[ceph-osd-activate-/dev/sdc]: Skipping because of failed dependencies

[root@overcloud-cephstorage-0 heat-admin]# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sdc3  myvg lvm2 a--  929.81g    0 

[root@overcloud-cephstorage-0 heat-admin]# vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  myvg   1   1   0 wz--n- 929.81g    0 

[root@overcloud-cephstorage-0 heat-admin]# lvs
  LV      VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  rootvol myvg -wi-a----- 929.81g

Comment 6 Dan Prince 2015-12-07 20:47:24 UTC
This sounds like another cleanup issue. We could try to somehow detect dirty state in the introspection ramdisk and then fail-fast during deployment if it was detected? Similar to BZ 1252158 we might consider using Ironic clean_nodes to trigger a cleanup to happen.

Or... perhaps there is a way to do inline cleanup via puppet-ceph? (I'm not aware that there is.) Similar to BZ 1251718 we might consider driving a puppet-ceph cleanup resource via introspection data.

Comment 7 Mike Burns 2016-04-07 20:47:27 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 seb 2016-08-23 15:49:00 UTC
Can't we implement a purge function in puppet-ceph to solve this?

Comment 11 seb 2016-08-25 08:40:39 UTC
We can write a doc for that, then address this in puppet-ceph later with a purge function.

Comment 12 seb 2016-08-31 14:00:09 UTC
We should not use lvm for Ceph OSD data.
This looks like a leftover from a previous attempts to me, we can hardly expect puppet-ceph removing lvs, vgs and pvs...
So this must be cleanup prior to deploy, I don't even know how/why got configured on these drives in the first place.

To me the deployer should make sure this LVM configuration is wiped prior to start deploying any Ceph.
I'm tempted to mark this close this one as "won't fix". Will try to discuss this further with the team.


Note You need to log in before you can comment on or make changes to this bug.