Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1575623 - Deployment fails with HCI enabled and SchedulerHints
Deployment fails with HCI enabled and SchedulerHints
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
medium Severity medium
: z2
: 13.0 (Queens)
Assigned To: Alan Bishop
Yogev Rabl
: Rebase, Triaged, ZStream
Depends On:
Blocks: 1552759
  Show dependency treegraph
 
Reported: 2018-05-07 08:59 EDT by Alan Bishop
Modified: 2018-08-29 12:36 EDT (History)
14 users (show)

See Also:
Fixed In Version: openstack-tripleo-common-8.6.3-2.el7ost
Doc Type: Bug Fix
Doc Text:
The Derived Parameters workflow now supports the use of SchedulerHints to identify overcloud nodes. Previously, the workflow could not use use SchedulerHints to identify overcloud nodes associated with the corresponding TripleO overcloud role. This caused the overcloud deployment to fail. SchedulerHints support prevents these failures.
Story Points: ---
Clone Of: 1552759
Environment:
Last Closed: 2018-08-29 12:35:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1760659 None None None 2018-05-07 08:59 EDT
OpenStack gerrit 557318 None stable/queens: MERGED cinder: Reduce scope of the lock for image volume cache (I547fb4bcdd4783225b8ca96d157c61ca3bcf4ef4) 2018-07-18 21:46 EDT
OpenStack gerrit 558313 None master: MERGED tripleo-common: Use scheduler hints in derived_parameters workflow (I7eff355620aecaca49e77112ba491a5f3ce2eed6) 2018-07-18 21:46 EDT
OpenStack gerrit 566110 None stable/queens: MERGED tripleo-common: Use scheduler hints in derived_parameters workflow (I7eff355620aecaca49e77112ba491a5f3ce2eed6) 2018-07-18 21:46 EDT
Red Hat Product Errata RHBA-2018:2574 None None None 2018-08-29 12:36 EDT

  None (edit)
Description Alan Bishop 2018-05-07 08:59:09 EDT
+++ This bug was initially created as a clone of Bug #1552759 +++

Description of problem:

Deployment of HCI enabled OpenStack Platform 12 fails when using Nova Scheduler Hints.

(undercloud) [stack@director ~]$ ./deploy-now-hci.sh
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 79e884c9-05cb-474b-9f30-6292a70cdba4
Waiting for messages on queue '51484ca6-4916-4d91-acfc-57145bf63494' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: e71ebf09-9c80-44a9-82d7-64538c6291eb
Plan updated.
Processing templates in the directory /tmp/tripleoclient-koTgrN/tripleo-heat-templates
Invoking workflow (tripleo.derive_params.v1.derive_parameters) specified in plan-environment file
Started Mistral Workflow tripleo.derive_params.v1.derive_parameters. Execution ID: acea8816-4e38-466a-a8e8-cb223eca0ac4
Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}]


It doesn't matter whether I use Compute or ComputeHCI roles. As soon as OS::TripleO::Services::CephOSD is added to the role deployment fails with the error above.


Version-Release number of selected component (if applicable):
[root@director stack]# rpm -qa | grep -i tripleo
python-tripleoclient-7.3.3-7.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch
puppet-tripleo-7.4.3-11.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-10.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-22.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch
openstack-tripleo-common-7.6.3-10.el7ost.noarch

How reproducible:
Every time when deploying OS::TripleO::Services::CephOSD on Compute node.


Steps to Reproduce:

1. Generate roles_data.yaml file:
[stack@director templates]$ openstack overcloud roles generate -o /home/stack/templates/hci/roles_data.yaml Controller Compute

2. Add OS::TripleO::Services::CephOSD service to the Compute role.

3. Use scheduler hints file to control node placement:
[stack@director templates]$ cat scheduler_hints_env.yaml 
parameter_defaults:
  ControllerSchedulerHints:
    'capabilities:node': 'overcloud-controller-%index%'
  ComputeSchedulerHints:
    'capabilities:node': 'overcloud-compute-%index%'
  CephStorageSchedulerHints:
    'capabilities:node': 'overcloud-ceph-%index%'

4. Run the deployment including customized roles_data.yaml and scheduler_hints_env.yaml

5. Observe error:
Workflow execution is failed: [{u'status': u'SUCCESS', u'message': u'', u'role_name': u'Controller'}, {u'status': u'FAILED', u'message': u'Unable to determine profile for flavor (flavor name: baremetal)', u'role_name': u'Compute'}]

Actual results:
Deployment fails.

Expected results:
Deployment uses scheduler hints instead of flavor/profiles and finish successfully.

Additional info:

--- Additional comment from Alex Schultz on 2018-03-07 16:14:24 EST ---

Can you please provide a sosreport from the undercloud? Thanks.

--- Additional comment from Rafal Szmigiel on 2018-03-07 17:59:06 EST ---

Hey Alex,

It will take a while because I have to revert the environment to the previous state.

In the meantime not 100% sure but I think I found it.

(undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._derive_parameters_per_role | grep -B4 TODO
    # Getting introspection data workflow, which will take care of
    # 1) profile and flavor based mapping
    # 2) Nova placement api based mapping
    # Currently we have implemented profile and flavor based mapping
    # TODO-Nova placement api based mapping is pending, we will enchance it later.

(undercloud) [stack@director ~]$ mistral workflow-get-definition tripleo.derive_params.v1._get_role_info | grep -A8 -E 'check_features:$'
    check_features:
      on-success: build_feature_dict
      publish:
        # TODO: Need to update this logic for ODL integration.
        # The role supports the DPDK feature if the NeutronDatapathType parameter is present.
        dpdk: <% $.role_services.any($.get('parameters', []).contains('NeutronDatapathType')) %>

        # The role supports the HCI feature if it includes both NovaCompute and CephOSD services.
        hci: <% $.role_services.any($.get('type', '').endsWith('::NovaCompute')) and $.role_services.any($.get('type', '').endsWith('::CephOSD')) %>

--- Additional comment from Rafal Szmigiel on 2018-03-07 18:21:38 EST ---

Uploaded to dropbox.redhat.com (sosreport-director.lab.rhpoc.net-20180307180957.tar.xz).

Thanks in advance,

Rafal

--- Additional comment from Saravanan KR on 2018-03-28 05:52:04 EDT ---

This deployment is using the derive parameters workflow by using the "-p" option in the deploy command. In order to use this feature, the nodes and flavors should be tagged with matching profile. And Overcloud<RoleName>Flavor parameters should provide the matching flavor name to use. In this error, there are not flavor mentioned in the parameters, which defaults to 'baremetal' and it is failing. Ensure the correct flavor name is provided.

--- Additional comment from Rafal Szmigiel on 2018-03-28 06:37:21 EDT ---

Hey Saravanan,

This deployment uses SchedulerHints therefore no flavors other than baremetal should be used. Please check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/advanced_overcloud_customization/sect-controlling_node_placement#sect-Assign_Specific_Node_IDs for more details.

Rafał

--- Additional comment from Saravanan KR on 2018-03-28 06:47:30 EDT ---

(In reply to Rafal Szmigiel from comment #5)
> Hey Saravanan,
> 
> This deployment uses SchedulerHints therefore no flavors other than
> baremetal should be used. Please check
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/
> html/advanced_overcloud_customization/sect-controlling_node_placement#sect-
> Assign_Specific_Node_IDs for more details.

Derive parameters workflow supports only the role tagging and does NOT support SchedulerHints yet. Though it was earlier planned to support, but work has not started yet. Two options from here - Either you could use derive parameters with role-tagging OR use scheduler hints by providing the parameters manually without -p option. I have added Alan Bishop and Jagan who were working on the current version of derived parameters.

--- Additional comment from Rafal Szmigiel on 2018-03-28 06:51:28 EDT ---

Thanks for the clarification and looping Alan and Jagan.

Best Regards,

Rafal

--- Additional comment from Alan Bishop on 2018-03-28 07:56:49 EDT ---

Just to clarify Saravanan's comment, the Derived Parameters workflow relies on role tagging, but this is not incompatible with SchedulerHints. It's OK to continue to specify SchedulerHints, but you also need the nodes for which you want parameters derived (i.e. HCI) to be tagged with a role/profile. This is necessary for the Derived Parameters workflow to identify the nodes so that it can determine their hardware characteristics. This should provide a workaround until we can fix the workflow so that it can use just the SchedulerHints.

--- Additional comment from Alan Bishop on 2018-05-03 12:33:28 EDT ---

Patch merged upstream, and I've begun upstream backports to stable/queens and stable/pike.
Comment 2 Alan Bishop 2018-05-07 09:01:57 EDT
Patch has merged on upstream stable/queens.
Comment 12 Yogev Rabl 2018-08-21 21:54:08 EDT
Verified
Comment 14 errata-xmlrpc 2018-08-29 12:35:54 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574

Note You need to log in before you can comment on or make changes to this bug.