Description of problem: When deploying Ceph by Director using osd_scrnario LVM and Bluestore, if you configure the osd/disk devices under lvm_volumes then they are not counted as OSDs and the deployment fails as follows: (undercloud) [stack@director deployment]$ ./deploy.sh Wed May 29 14:07:16 AEST 2019 Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 3cb68871-ceda-4603-b664-9fd63618bd07 Waiting for messages on queue 'tripleo' with no timeout. Creating Swift container to store the plan Creating plan from template files in: /tmp/tripleoclient-no1D_g/tripleo-heat-templates Started Mistral Workflow tripleo.plan_management.v1.create_deployment_plan. Execution ID: 8c5fe859-b307-4be9-8f3c-9367680eb370 Plan created. Processing templates in the directory /tmp/tripleoclient-no1D_g/tripleo-heat-templates Invoking workflow (tripleo.derive_params.v1.derive_parameters) specified in plan-environment file Started Mistral Workflow tripleo.derive_params.v1.derive_parameters. Execution ID: 55b451c5-8df4-43c8-ac87-0cff347383e3 Workflow execution is failed: Role 'ComputeCeph': No Ceph OSDs found in the overcloud definition ('ceph::profile::params::osds'). real 6m17.781s user 0m4.440s sys 0m0.482s Wed May 29 14:13:34 AEST 2019 (undercloud) [stack@director deployment]$ Version-Release number of selected component (if applicable): RHOSP13z6 How reproducible: Every time. 100% Steps to Reproduce: 1.Deploy Ceph by Director with a config file similar to the following: parameter_defaults: CephAnsiblePlaybookVerbosity: 1 # CephPoolDefaultSize: 1 CephConfigOverrides: mon_max_pg_per_osd: 500 CephAnsibleDisksConfig: osd_scenario: lvm osd_objectstore: bluestore dmcrypt: false # lvm_volumes: - data: /dev/sdb - data: /dev/sdc - data: /dev/sdd - data: /dev/sde crush_device_class: ssd - data: /dev/sdf crush_device_class: ssd - data: /dev/sdg crush_device_class: ssd Actual results: Fails with: Workflow execution is failed: Role 'ComputeCeph': No Ceph OSDs found in the overcloud definition ('ceph::profile::params::osds'). Expected results: Should successfully deploy. Additional info: Workaround for now is to confiure at least one device under "devices": CephAnsibleDisksConfig: osd_scenario: lvm osd_objectstore: bluestore dmcrypt: false devices: - /dev/sdb # lvm_volumes: # - data: /dev/sdb - data: /dev/sdc - data: /dev/sdd - data: /dev/sde crush_device_class: ssd - data: /dev/sdf crush_device_class: ssd - data: /dev/sdg crush_device_class: ssd I believe the problem is here: ------------------------------ ONLY COUNTS "devices" /usr/share/openstack-tripleo-common/workbooks/derive_params_formulas.yaml get_num_osds: publish: num_osds: <% $.heat_resource_tree.parameters.get('CephAnsibleDisksConfig', {}).get('default', {}).get('devices', []).count() %> on-success: - get_memory_mb: <% $.num_osds %> # If there's no CephAnsibleDisksConfig then look for OSD configuration in hiera data - get_num_osds_from_hiera: <% not $.num_osds %>
nsatsia is right. The following only uses the devices list because it was written for a pre ceph-volume world: https://github.com/openstack/tripleo-common/blob/master/workbooks/derive_params_formulas.yaml#L650 With ceph-ansible 3.2 it's also valid to pass lvm_volumes so the above Mistral workbook needs to be able to work with both. Perhaps starting with the following and then accounting for exceptions: num_osds = count(devices) + count(lvm_volumes)
WORKAROUND: Until this is fixed, if you're going to use lvm_volumes instead of devices, then deploy without the -p as described in [1] and instead derive the reserved_host_memory and cpu_allocation_ratio values manually and pass them in an env file [2]. [1] https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_cloud/13/html-single/deployment_guide/index#running-the-deploy-command-rhhi [2] https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_cloud/13/html-single/deployment_guide/index#changing-nova-reserved-memory-and-cpu-allocation-manually
(In reply to John Fulton from comment #2) > WORKAROUND: > > Until this is fixed, if you're going to use lvm_volumes instead of devices, > then deploy without the -p as described in [1] and instead derive the > reserved_host_memory and cpu_allocation_ratio values manually and pass them > in an env file [2]. > > [1] > https://access.redhat.com/documentation/en-us/ > red_hat_hyperconverged_infrastructure_for_cloud/13/html-single/ > deployment_guide/index#running-the-deploy-command-rhhi > [2] > https://access.redhat.com/documentation/en-us/ > red_hat_hyperconverged_infrastructure_for_cloud/13/html-single/ > deployment_guide/index#changing-nova-reserved-memory-and-cpu-allocation- > manually Thanks John. didn't realise it was triggered by using the "-p" option. Anyway, as I'm only deploying the data block LVM, moving one disk to devices builds the OSDs the same way anyway.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2624