+++ This bug was initially created as a clone of Bug #1552759 +++ Description of problem: Deployment of HCI enabled OpenStack Platform 12 fails when using Nova Scheduler Hints. (undercloud) [stack@director ~]$ ./deploy-now-hci.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 79e884c9-05cb-474b-9f30-6292a70cdba4 Waiting for messages on queue '51484ca6-4916-4d91-acfc-57145bf63494' with no timeout. Removing the current plan files Uploading new plan files Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: e71ebf09-9c80-44a9-82d7-64538c6291eb Plan updated. Processing templates in the directory /tmp/tripleoclient-koTgrN/tripleo-heat-templates Invoking workflow (tripleo.derive_params.v1.derive_parameters) specified in plan-environment file Started Mistral Workflow tripleo.derive_params.v1.derive_parameters. Execution ID: acea8816-4e38-466a-a8e8-cb223eca0ac4 Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}] It doesn't matter whether I use Compute or ComputeHCI roles. As soon as OS::TripleO::Services::CephOSD is added to the role deployment fails with the error above. Version-Release number of selected component (if applicable): [root@director stack]# rpm -qa | grep -i tripleo python-tripleoclient-7.3.3-7.el7ost.noarch openstack-tripleo-ui-7.4.3-4.el7ost.noarch openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch puppet-tripleo-7.4.3-11.el7ost.noarch openstack-tripleo-common-containers-7.6.3-10.el7ost.noarch openstack-tripleo-heat-templates-7.0.3-22.el7ost.noarch openstack-tripleo-validations-7.4.2-1.el7ost.noarch openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch openstack-tripleo-common-7.6.3-10.el7ost.noarch How reproducible: Every time when deploying OS::TripleO::Services::CephOSD on Compute node. Steps to Reproduce: 1. Generate roles_data.yaml file: [stack@director templates]$ openstack overcloud roles generate -o /home/stack/templates/hci/roles_data.yaml Controller Compute 2. Add OS::TripleO::Services::CephOSD service to the Compute role. 3. Use scheduler hints file to control node placement: [stack@director templates]$ cat scheduler_hints_env.yaml parameter_defaults: ControllerSchedulerHints: 'capabilities:node': 'overcloud-controller-%index%' ComputeSchedulerHints: 'capabilities:node': 'overcloud-compute-%index%' CephStorageSchedulerHints: 'capabilities:node': 'overcloud-ceph-%index%' 4. Run the deployment including customized roles_data.yaml and scheduler_hints_env.yaml 5. Observe error: Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}] Actual results: Deployment fails. Expected results: Deployment uses scheduler hints instead of flavor/profiles and finish successfully. Additional info: --- Additional comment from Alex Schultz on 2018-03-07 16:14:24 EST --- Can you please provide a sosreport from the undercloud? Thanks. --- Additional comment from Rafal Szmigiel on 2018-03-07 17:59:06 EST --- Hey Alex, It will take a while because I have to revert the environment to the previous state. In the meantime not 100% sure but I think I found it. (undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._derive_parameters_per_role | grep -B4 TODO # Getting introspection data workflow, which will take care of # 1) profile and flavor based mapping # 2) Nova placement api based mapping # Currently we have implemented profile and flavor based mapping # TODO-Nova placement api based mapping is pending, we will enchance it later. (undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._get_role_info | grep -A8 -E 'check_features:$' check_features: on-success: build_feature_dict publish: # TODO: Need to update this logic for ODL integration. # The role supports the DPDK feature if the NeutronDatapathType parameter is present. dpdk: <% $.role_services.any($.get('parameters', []).contains('NeutronDatapathType')) %> # The role supports the HCI feature if it includes both NovaCompute and CephOSD services. hci: <% $.role_services.any($.get('type', '').endsWith('::NovaCompute')) and $.role_services.any($.get('type', '').endsWith('::CephOSD')) %> --- Additional comment from Rafal Szmigiel on 2018-03-07 18:21:38 EST --- Uploaded to dropbox.redhat.com (sosreport-director.lab.rhpoc.net-20180307180957.tar.xz). Thanks in advance, Rafal --- Additional comment from Saravanan KR on 2018-03-28 05:52:04 EDT --- This deployment is using the derive parameters workflow by using the "-p" option in the deploy command. In order to use this feature, the nodes and flavors should be tagged with matching profile. And Overcloud<RoleName>Flavor parameters should provide the matching flavor name to use. In this error, there are not flavor mentioned in the parameters, which defaults to 'baremetal' and it is failing. Ensure the correct flavor name is provided. --- Additional comment from Rafal Szmigiel on 2018-03-28 06:37:21 EDT --- Hey Saravanan, This deployment uses SchedulerHints therefore no flavors other than baremetal should be used. Please check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/advanced_overcloud_customization/sect-controlling_node_placement#sect-Assign_Specific_Node_IDs for more details. Rafał --- Additional comment from Saravanan KR on 2018-03-28 06:47:30 EDT --- (In reply to Rafal Szmigiel from comment #5) > Hey Saravanan, > > This deployment uses SchedulerHints therefore no flavors other than > baremetal should be used. Please check > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/ > html/advanced_overcloud_customization/sect-controlling_node_placement#sect- > Assign_Specific_Node_IDs for more details. Derive parameters workflow supports only the role tagging and does NOT support SchedulerHints yet. Though it was earlier planned to support, but work has not started yet. Two options from here - Either you could use derive parameters with role-tagging OR use scheduler hints by providing the parameters manually without -p option. I have added Alan Bishop and Jagan who were working on the current version of derived parameters. --- Additional comment from Rafal Szmigiel on 2018-03-28 06:51:28 EDT --- Thanks for the clarification and looping Alan and Jagan. Best Regards, Rafal --- Additional comment from Alan Bishop on 2018-03-28 07:56:49 EDT --- Just to clarify Saravanan's comment, the Derived Parameters workflow relies on role tagging, but this is not incompatible with SchedulerHints. It's OK to continue to specify SchedulerHints, but you also need the nodes for which you want parameters derived (i.e. HCI) to be tagged with a role/profile. This is necessary for the Derived Parameters workflow to identify the nodes so that it can determine their hardware characteristics. This should provide a workaround until we can fix the workflow so that it can use just the SchedulerHints. --- Additional comment from Alan Bishop on 2018-05-03 12:33:28 EDT --- Patch merged upstream, and I've begun upstream backports to stable/queens and stable/pike.
Patch has merged on upstream stable/queens.
Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2574