Bug 1552759 - Deployment fails with HCI enabled and SchedulerHints
Summary: Deployment fails with HCI enabled and SchedulerHints
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z3
: 12.0 (Pike)
Assignee: Alan Bishop
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On: 1575623
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-07 16:44 UTC by Rafal Szmigiel
Modified: 2018-08-20 12:59 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-common-7.6.9-5.el7ost
Doc Type: Bug Fix
Doc Text:
The Derived Parameters workflow now supports the use of SchedulerHints parameters to identify overcloud nodes. Previously, the workflow could not use use SchedulerHints to identify overcloud nodes associated with the corresponding TripleO overcloud role. This caused the overcloud deployment to fail. SchedulerHints support prevents these failures.
Clone Of:
: 1575623 (view as bug list)
Environment:
Last Closed: 2018-08-20 12:59:02 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2331 None None None 2018-08-20 12:59:53 UTC
OpenStack gerrit 558313 None None None 2018-04-03 11:59:42 UTC
Launchpad 1760659 None None None 2018-04-02 17:28:54 UTC

Description Rafal Szmigiel 2018-03-07 16:44:13 UTC
Description of problem:

Deployment of HCI enabled OpenStack Platform 12 fails when using Nova Scheduler Hints.

(undercloud) [stack@director ~]$ ./deploy-now-hci.sh
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 79e884c9-05cb-474b-9f30-6292a70cdba4
Waiting for messages on queue '51484ca6-4916-4d91-acfc-57145bf63494' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: e71ebf09-9c80-44a9-82d7-64538c6291eb
Plan updated.
Processing templates in the directory /tmp/tripleoclient-koTgrN/tripleo-heat-templates
Invoking workflow (tripleo.derive_params.v1.derive_parameters) specified in plan-environment file
Started Mistral Workflow tripleo.derive_params.v1.derive_parameters. Execution ID: acea8816-4e38-466a-a8e8-cb223eca0ac4
Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}]


It doesn't matter whether I use Compute or ComputeHCI roles. As soon as OS::TripleO::Services::CephOSD is added to the role deployment fails with the error above.


Version-Release number of selected component (if applicable):
[root@director stack]# rpm -qa | grep -i tripleo
python-tripleoclient-7.3.3-7.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch
puppet-tripleo-7.4.3-11.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-10.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-22.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch
openstack-tripleo-common-7.6.3-10.el7ost.noarch

How reproducible:
Every time when deploying OS::TripleO::Services::CephOSD on Compute node.


Steps to Reproduce:

1. Generate roles_data.yaml file:
[stack@director templates]$ openstack overcloud roles generate -o /home/stack/templates/hci/roles_data.yaml Controller Compute

2. Add OS::TripleO::Services::CephOSD service to the Compute role.

3. Use scheduler hints file to control node placement:
[stack@director templates]$ cat scheduler_hints_env.yaml 
parameter_defaults:
  ControllerSchedulerHints:
    'capabilities:node': 'overcloud-controller-%index%'
  ComputeSchedulerHints:
    'capabilities:node': 'overcloud-compute-%index%'
  CephStorageSchedulerHints:
    'capabilities:node': 'overcloud-ceph-%index%'

4. Run the deployment including customized roles_data.yaml and scheduler_hints_env.yaml

5. Observe error:
Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}]

Actual results:
Deployment fails.

Expected results:
Deployment uses scheduler hints instead of flavor/profiles and finish successfully.

Additional info:

Comment 1 Alex Schultz 2018-03-07 21:14:24 UTC
Can you please provide a sosreport from the undercloud? Thanks.

Comment 2 Rafal Szmigiel 2018-03-07 22:59:06 UTC
Hey Alex,

It will take a while because I have to revert the environment to the previous state.

In the meantime not 100% sure but I think I found it.

(undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._derive_parameters_per_role | grep -B4 TODO
    # Getting introspection data workflow, which will take care of
    # 1) profile and flavor based mapping
    # 2) Nova placement api based mapping
    # Currently we have implemented profile and flavor based mapping
    # TODO-Nova placement api based mapping is pending, we will enchance it later.

(undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._get_role_info | grep -A8 -E 'check_features:$'
    check_features:
      on-success: build_feature_dict
      publish:
        # TODO: Need to update this logic for ODL integration.
        # The role supports the DPDK feature if the NeutronDatapathType parameter is present.
        dpdk: <% $.role_services.any($.get('parameters', []).contains('NeutronDatapathType')) %>

        # The role supports the HCI feature if it includes both NovaCompute and CephOSD services.
        hci: <% $.role_services.any($.get('type', '').endsWith('::NovaCompute')) and $.role_services.any($.get('type', '').endsWith('::CephOSD')) %>

Comment 3 Rafal Szmigiel 2018-03-07 23:21:38 UTC
Uploaded to dropbox.redhat.com (sosreport-director.lab.rhpoc.net-20180307180957.tar.xz).

Thanks in advance,

Rafal

Comment 4 Saravanan KR 2018-03-28 09:52:04 UTC
This deployment is using the derive parameters workflow by using the "-p" option in the deploy command. In order to use this feature, the nodes and flavors should be tagged with matching profile. And Overcloud<RoleName>Flavor parameters should provide the matching flavor name to use. In this error, there are not flavor mentioned in the parameters, which defaults to 'baremetal' and it is failing. Ensure the correct flavor name is provided.

Comment 5 Rafal Szmigiel 2018-03-28 10:37:21 UTC
Hey Saravanan,

This deployment uses SchedulerHints therefore no flavors other than baremetal should be used. Please check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/advanced_overcloud_customization/sect-controlling_node_placement#sect-Assign_Specific_Node_IDs for more details.

Rafał

Comment 6 Saravanan KR 2018-03-28 10:47:30 UTC
(In reply to Rafal Szmigiel from comment #5)
> Hey Saravanan,
> 
> This deployment uses SchedulerHints therefore no flavors other than
> baremetal should be used. Please check
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/
> html/advanced_overcloud_customization/sect-controlling_node_placement#sect-
> Assign_Specific_Node_IDs for more details.

Derive parameters workflow supports only the role tagging and does NOT support SchedulerHints yet. Though it was earlier planned to support, but work has not started yet. Two options from here - Either you could use derive parameters with role-tagging OR use scheduler hints by providing the parameters manually without -p option. I have added Alan Bishop and Jagan who were working on the current version of derived parameters.

Comment 7 Rafal Szmigiel 2018-03-28 10:51:28 UTC
Thanks for the clarification and looping Alan and Jagan.

Best Regards,

Rafal

Comment 8 Alan Bishop 2018-03-28 11:56:49 UTC
Just to clarify Saravanan's comment, the Derived Parameters workflow relies on role tagging, but this is not incompatible with SchedulerHints. It's OK to continue to specify SchedulerHints, but you also need the nodes for which you want parameters derived (i.e. HCI) to be tagged with a role/profile. This is necessary for the Derived Parameters workflow to identify the nodes so that it can determine their hardware characteristics. This should provide a workaround until we can fix the workflow so that it can use just the SchedulerHints.

Comment 9 Alan Bishop 2018-05-03 16:33:28 UTC
Patch merged upstream, and I've begun upstream backports to stable/queens and stable/pike.

Comment 14 Yogev Rabl 2018-07-30 17:03:36 UTC
Verified, deployed successfully with scheduler hints

Comment 16 errata-xmlrpc 2018-08-20 12:59:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331


Note You need to log in before you can comment on or make changes to this bug.