Created attachment 1319943 [details] roles_data Description of problem: OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker fails on composable roles deployment with: ERROR: Property error: : resources.Compute<nested_stack>.resources[0].properties: : Unknown Property NovaComputeSchedulerHints" Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.0-0.20170821194253.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 with composable roles 2. Adjust roles_data file for OSP12 3. Upgrade to OSP12 Actual results: major-upgrade-composable-steps-docker fails with: u'message': u"Failed to run action [action_ex_id=0022936d-54f4-4d5b-a7bf-f6a9b231f8dc, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 240}']\n ERROR: Property error: : resources.Compute<nested_stack>.resources[0].properties: : Unknown Property NovaComputeSchedulerHints", u'status': u'FAILED'} Expected results: major-upgrade-composable-steps-docker doesn't fail. Additional info: Attaching the roles_data.yaml
I was able to move past this error by assigning the following param to the Compute role in the custom roles data: deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints' but then it failed with a new error: 2017-08-30 09:27:44Z [overcloud]: UPDATE_FAILED resources.Controller: resources[2]: BadRequest: resources.Controller: No valid host was found. No valid host found for resize (HTTP 400) (Request-ID: req-1e3a1b62-af8f-4efb-8e33-a136d536535e) Stack overcloud UPDATE_FAILED overcloud.Compute.1.Compute: resource_type: OS::TripleO::ComputeServer physical_resource_id: 862aab51-d3ab-46a1-a321-b2e6c11e5053 status: CREATE_FAILED status_reason: | ResourceInError: resources.Compute: Went to status ERROR due to "Message: No valid host was found. , Code: 500" overcloud.Compute.0.Compute: resource_type: OS::TripleO::ComputeServer physical_resource_id: 4b319bd6-701a-4269-a833-8834e018daa0 status: CREATE_FAILED status_reason: | ResourceInError: resources.Compute: Went to status ERROR due to "Message: No valid host was found. , Code: 500" overcloud.Controller.1.Controller: resource_type: OS::TripleO::Server physical_resource_id: 9953be5a-64a4-4b9d-b690-48857a6d628a status: UPDATE_FAILED status_reason: | BadRequest: resources.Controller: No valid host was found. No valid host found for resize (HTTP 400) (Request-ID: req-c8d74b27-bda8-46a8-9008-4af0545ecb39) overcloud.Controller.0.Controller: resource_type: OS::TripleO::Server physical_resource_id: 4980e829-bdec-44f5-99c2-39962650bd14 status: UPDATE_FAILED status_reason: | BadRequest: resources.Controller: No valid host was found. No valid host found for resize (HTTP 400) (Request-ID: req-6cdb6bf3-61e0-4a08-af8c-b1138a9c8c5f) overcloud.Controller.2.Controller: resource_type: OS::TripleO::Server physical_resource_id: c4fc238f-40bd-431a-9a36-9f1e675b6254 status: UPDATE_FAILED status_reason: | BadRequest: resources.Controller: No valid host was found. No valid host found for resize (HTTP 400) (Request-ID: req-1e3a1b62-af8f-4efb-8e33-a136d536535e) Heat Stack update failed. Heat Stack update failed.
I tried assigning the deprecated_param_flavor option to the Controller and Compute role in the custom roles data file: deprecated_param_flavor: 'OvercloudControlFlavor' deprecated_param_flavor: 'OvercloudComputeFlavor' but I ended with duplicate compute instances which are in ERROR state: (undercloud) [stack@undercloud-0 ~]$ nova list /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | 20bdaf46-3960-45b9-9781-40f54777ce8f | compute-0 | ERROR | - | NOSTATE | | | 64b7ccc9-2090-4a0f-a647-7c783b68ee0f | compute-0 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | 9ed4466e-bf6c-4e24-87d8-9f29860c86f1 | compute-0 | ERROR | - | NOSTATE | | | 23ac5849-bd8c-4af2-9db3-145bc6e27f23 | compute-1 | ACTIVE | - | Running | ctlplane=192.168.24.12 | | e269062a-7239-4367-9d56-532849f6e18d | compute-1 | ERROR | - | NOSTATE | | | f8960e82-fd62-460a-88af-42193d7e7fe1 | compute-1 | ERROR | - | NOSTATE | | | 4980e829-bdec-44f5-99c2-39962650bd14 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.23 | | 9953be5a-64a4-4b9d-b690-48857a6d628a | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | c4fc238f-40bd-431a-9a36-9f1e675b6254 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.24 | | 0a368bdf-479c-4385-b825-460aa0e943a9 | database-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f36656e8-b60b-4a34-b8c8-24f091fe3cc2 | database-1 | ACTIVE | - | Running | ctlplane=192.168.24.20 | | a00f242e-7fc7-42de-ab3b-80a43418a3d7 | database-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | e8a234ad-15c1-48ee-8eb6-61fc97aa4fc4 | messaging-0 | ACTIVE | - | Running | ctlplane=192.168.24.14 | | 22cad3e7-d066-4336-9693-f83dbb0c091e | messaging-1 | ACTIVE | - | Running | ctlplane=192.168.24.15 | | 384061cf-a195-452b-859a-3276651de8a6 | messaging-2 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 594d2ea4-5e26-4d6e-9952-35f375e70f87 | networker-0 | ACTIVE | - | Running | ctlplane=192.168.24.16 | | 97204f57-396d-4388-8a2c-a2d116aaef0f | networker-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | +--------------------------------------+--------------+--------+------------+-------------+------------------------+
Trying to delete one of the nova instances in error state by uuid ends up deleting all the instances with the same name (including the active one running workloads): (undercloud) [stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud 20bdaf46-3960-45b9-9781-40f54777ce8f Deleting the following nodes from stack overcloud: - 20bdaf46-3960-45b9-9781-40f54777ce8f Started Mistral Workflow tripleo.scale.v1.delete_node. Execution ID: bb5518fe-4f00-491a-9420-8d40731aae0c ('The read operation timed out',) (undercloud) [stack@undercloud-0 ~]$ nova list /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+------------------------+ | 23ac5849-bd8c-4af2-9db3-145bc6e27f23 | compute-1 | ACTIVE | - | Running | ctlplane=192.168.24.12 | | 44b3774b-9542-49d6-b780-1f8a170536d0 | compute-1 | BUILD | scheduling | NOSTATE | | | f8960e82-fd62-460a-88af-42193d7e7fe1 | compute-1 | ERROR | - | NOSTATE | | | 4980e829-bdec-44f5-99c2-39962650bd14 | controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.23 | | 9953be5a-64a4-4b9d-b690-48857a6d628a | controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | c4fc238f-40bd-431a-9a36-9f1e675b6254 | controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.24 | | 0a368bdf-479c-4385-b825-460aa0e943a9 | database-0 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | f36656e8-b60b-4a34-b8c8-24f091fe3cc2 | database-1 | ACTIVE | - | Running | ctlplane=192.168.24.20 | | a00f242e-7fc7-42de-ab3b-80a43418a3d7 | database-2 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | e8a234ad-15c1-48ee-8eb6-61fc97aa4fc4 | messaging-0 | ACTIVE | - | Running | ctlplane=192.168.24.14 | | 22cad3e7-d066-4336-9693-f83dbb0c091e | messaging-1 | ACTIVE | - | Running | ctlplane=192.168.24.15 | | 384061cf-a195-452b-859a-3276651de8a6 | messaging-2 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 594d2ea4-5e26-4d6e-9952-35f375e70f87 | networker-0 | ACTIVE | - | Running | ctlplane=192.168.24.16 | | 97204f57-396d-4388-8a2c-a2d116aaef0f | networker-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | +--------------------------------------+--------------+--------+------------+-------------+------------------------+
can we try again and run the upgrade with all the 'new' flags in roles_data (this seems at least related to BZ 1486311 ). If it passes great, otherwise we need to reach out to someone from DFG:DF possibly (or whoever it is that added the mistral support for and those https://github.com/openstack/tripleo-heat-templates/blob/master/roles_data.yaml#L152-L168 deprecated param flags in the roles_data.yaml ) to help out too.
(In reply to marios from comment #4) > can we try again and run the upgrade with all the 'new' flags in roles_data > (this seems at least related to BZ 1486311 ). If it passes great, otherwise > we need to reach out to someone from DFG:DF possibly (or whoever it is that > added the mistral support for and those > https://github.com/openstack/tripleo-heat-templates/blob/master/roles_data. > yaml#L152-L168 deprecated param flags in the roles_data.yaml ) to help out > too. After adding the 'new' flags before starting the upgrade the major upgrade composable step completed fine. We still need to see what these flags represent and how they could impact environments using custom values for them. FWIW these are the flags that I applied for each roles: https://review.gerrithub.io/#/c/376753/1/tasks/convert_roles_data.yaml