Bug 1552759
| Summary: | Deployment fails with HCI enabled and SchedulerHints | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Rafal Szmigiel <rszmigie> | |
| Component: | openstack-tripleo-common | Assignee: | Alan Bishop <abishop> | |
| Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 12.0 (Pike) | CC: | abishop, aschultz, emacchi, gfidente, jamsmith, jpalanis, mburns, owalsh, rszmigie, skramaja, slinaber, yrabl | |
| Target Milestone: | z3 | Keywords: | Triaged, ZStream | |
| Target Release: | 12.0 (Pike) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-tripleo-common-7.6.9-5.el7ost | Doc Type: | Bug Fix | |
| Doc Text: |
The Derived Parameters workflow now supports the use of SchedulerHints parameters to identify overcloud nodes.
Previously, the workflow could not use use SchedulerHints to identify overcloud nodes associated with the corresponding TripleO overcloud role. This caused the overcloud deployment to fail. SchedulerHints support prevents these failures.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1575623 (view as bug list) | Environment: | ||
| Last Closed: | 2018-08-20 12:59:02 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1575623 | |||
| Bug Blocks: | ||||
Can you please provide a sosreport from the undercloud? Thanks. Hey Alex,
It will take a while because I have to revert the environment to the previous state.
In the meantime not 100% sure but I think I found it.
(undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._derive_parameters_per_role | grep -B4 TODO
# Getting introspection data workflow, which will take care of
# 1) profile and flavor based mapping
# 2) Nova placement api based mapping
# Currently we have implemented profile and flavor based mapping
# TODO-Nova placement api based mapping is pending, we will enchance it later.
(undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._get_role_info | grep -A8 -E 'check_features:$'
check_features:
on-success: build_feature_dict
publish:
# TODO: Need to update this logic for ODL integration.
# The role supports the DPDK feature if the NeutronDatapathType parameter is present.
dpdk: <% $.role_services.any($.get('parameters', []).contains('NeutronDatapathType')) %>
# The role supports the HCI feature if it includes both NovaCompute and CephOSD services.
hci: <% $.role_services.any($.get('type', '').endsWith('::NovaCompute')) and $.role_services.any($.get('type', '').endsWith('::CephOSD')) %>
Uploaded to dropbox.redhat.com (sosreport-director.lab.rhpoc.net-20180307180957.tar.xz). Thanks in advance, Rafal This deployment is using the derive parameters workflow by using the "-p" option in the deploy command. In order to use this feature, the nodes and flavors should be tagged with matching profile. And Overcloud<RoleName>Flavor parameters should provide the matching flavor name to use. In this error, there are not flavor mentioned in the parameters, which defaults to 'baremetal' and it is failing. Ensure the correct flavor name is provided. Hey Saravanan, This deployment uses SchedulerHints therefore no flavors other than baremetal should be used. Please check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/advanced_overcloud_customization/sect-controlling_node_placement#sect-Assign_Specific_Node_IDs for more details. RafaĆ (In reply to Rafal Szmigiel from comment #5) > Hey Saravanan, > > This deployment uses SchedulerHints therefore no flavors other than > baremetal should be used. Please check > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/ > html/advanced_overcloud_customization/sect-controlling_node_placement#sect- > Assign_Specific_Node_IDs for more details. Derive parameters workflow supports only the role tagging and does NOT support SchedulerHints yet. Though it was earlier planned to support, but work has not started yet. Two options from here - Either you could use derive parameters with role-tagging OR use scheduler hints by providing the parameters manually without -p option. I have added Alan Bishop and Jagan who were working on the current version of derived parameters. Thanks for the clarification and looping Alan and Jagan. Best Regards, Rafal Just to clarify Saravanan's comment, the Derived Parameters workflow relies on role tagging, but this is not incompatible with SchedulerHints. It's OK to continue to specify SchedulerHints, but you also need the nodes for which you want parameters derived (i.e. HCI) to be tagged with a role/profile. This is necessary for the Derived Parameters workflow to identify the nodes so that it can determine their hardware characteristics. This should provide a workaround until we can fix the workflow so that it can use just the SchedulerHints. Patch merged upstream, and I've begun upstream backports to stable/queens and stable/pike. Verified, deployed successfully with scheduler hints Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2331 |
Description of problem: Deployment of HCI enabled OpenStack Platform 12 fails when using Nova Scheduler Hints. (undercloud) [stack@director ~]$ ./deploy-now-hci.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 79e884c9-05cb-474b-9f30-6292a70cdba4 Waiting for messages on queue '51484ca6-4916-4d91-acfc-57145bf63494' with no timeout. Removing the current plan files Uploading new plan files Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: e71ebf09-9c80-44a9-82d7-64538c6291eb Plan updated. Processing templates in the directory /tmp/tripleoclient-koTgrN/tripleo-heat-templates Invoking workflow (tripleo.derive_params.v1.derive_parameters) specified in plan-environment file Started Mistral Workflow tripleo.derive_params.v1.derive_parameters. Execution ID: acea8816-4e38-466a-a8e8-cb223eca0ac4 Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}] It doesn't matter whether I use Compute or ComputeHCI roles. As soon as OS::TripleO::Services::CephOSD is added to the role deployment fails with the error above. Version-Release number of selected component (if applicable): [root@director stack]# rpm -qa | grep -i tripleo python-tripleoclient-7.3.3-7.el7ost.noarch openstack-tripleo-ui-7.4.3-4.el7ost.noarch openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch puppet-tripleo-7.4.3-11.el7ost.noarch openstack-tripleo-common-containers-7.6.3-10.el7ost.noarch openstack-tripleo-heat-templates-7.0.3-22.el7ost.noarch openstack-tripleo-validations-7.4.2-1.el7ost.noarch openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch openstack-tripleo-common-7.6.3-10.el7ost.noarch How reproducible: Every time when deploying OS::TripleO::Services::CephOSD on Compute node. Steps to Reproduce: 1. Generate roles_data.yaml file: [stack@director templates]$ openstack overcloud roles generate -o /home/stack/templates/hci/roles_data.yaml Controller Compute 2. Add OS::TripleO::Services::CephOSD service to the Compute role. 3. Use scheduler hints file to control node placement: [stack@director templates]$ cat scheduler_hints_env.yaml parameter_defaults: ControllerSchedulerHints: 'capabilities:node': 'overcloud-controller-%index%' ComputeSchedulerHints: 'capabilities:node': 'overcloud-compute-%index%' CephStorageSchedulerHints: 'capabilities:node': 'overcloud-ceph-%index%' 4. Run the deployment including customized roles_data.yaml and scheduler_hints_env.yaml 5. Observe error: Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}] Actual results: Deployment fails. Expected results: Deployment uses scheduler hints instead of flavor/profiles and finish successfully. Additional info: