Created attachment 1338117 [details] mistral logs Description of problem: OSP11 -> OSP12 upgrade: major-upgrade-composable-steps-docker.yaml fails on deployment with custom roles, node placement and predictable IPs with resources.WorkflowTasks_Step2_Execution: ERROR: (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list --long overcloud overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: ab643ce7-bba7-454a-b669-ab496733380c status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR I suspect this might be related to the tripleo-ansible-inventory not containing any node: (undercloud) [stack@undercloud-0 ~]$ tripleo-ansible-inventory --list {"undercloud": {"hosts": ["localhost"], "vars": {"username": "admin", "overcloud_keystone_url": "http://172.16.18.25:5000/v2.0", "project_name": "admin", "undercloud_service_list": ["openstack-nova-compute", "openstack-heat-engine", "openstack-ironic-conductor", "openstack-swift-container", "openstack-swift-object", "openstack-mistral-engine"], "overcloud_horizon_url": "http://172.16.18.25:80/dashboard", "os_auth_token": "gAAAAABZ4HNhSSO1IQAxJVBOOPCQrjw55v4t0w3fJGRJZ8V4A_ESxMKt8TFJftvFCv45HdYgsGVoQz4IHHw-N2-qq5ycPfjMnRjL3pMiinNNL2TJ0c40y-7XJc7r2yMKpODFf-gx09C4WxcPcSPp0zqrM0p7ak5rT-Xn4roaLmR8F5AnXT5vxco", "overcloud_admin_password": "XMvaBqZBErUptjwVCfpjWHBQn", "auth_url": "https://192.168.0.2:13000/", "ansible_connection": "local", "cacert": null, "undercloud_swift_url": "https://192.168.0.2:13808/v1/AUTH_0f4654c661a1416eb5a5877609a016bb", "plan": "overcloud"}}} while nova list: (undercloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------+--------+------------+-------------+-----------------------+ | a6b930c2-20b0-452c-b18e-e9a3a88d1444 | comp-r00-00 | ACTIVE | - | Running | ctlplane=192.168.0.12 | | e67288af-6467-4da8-8f2d-b977ae5b2cf8 | comp-r01-01 | ACTIVE | - | Running | ctlplane=192.168.0.21 | | ed5bb150-f171-45ff-90d9-aa3ac3eedfac | ctrl-r00-00 | ACTIVE | - | Running | ctlplane=192.168.0.29 | | eaff2859-3deb-4278-baad-d9df47e5e5f5 | ctrl-r01-01 | ACTIVE | - | Running | ctlplane=192.168.0.23 | | 40085d53-cbca-420d-91f8-033664dc4144 | ctrl-r02-02 | ACTIVE | - | Running | ctlplane=192.168.0.16 | | 13a95d87-cd78-4413-97bc-8856bbdf7ccd | db-r00-00 | ACTIVE | - | Running | ctlplane=192.168.0.22 | | b58ff94e-b3fb-409a-97a8-df276a0beebf | db-r01-01 | ACTIVE | - | Running | ctlplane=192.168.0.15 | | 8245d1ec-e054-452a-99e7-bd70fe4b2116 | db-r02-02 | ACTIVE | - | Running | ctlplane=192.168.0.27 | | 125e9157-8cc7-488e-a48b-e3100ebaa672 | msg-r00-00 | ACTIVE | - | Running | ctlplane=192.168.0.14 | | b290ca88-0920-4df0-af14-517e1563d272 | msg-r01-01 | ACTIVE | - | Running | ctlplane=192.168.0.17 | | b8152572-9ae1-4236-9cd1-b08374e8ba19 | msg-r02-02 | ACTIVE | - | Running | ctlplane=192.168.0.25 | | d02a43c0-f3f3-4d0f-ac84-02c8252f7ad6 | net-r00-00 | ACTIVE | - | Running | ctlplane=192.168.0.20 | | f88aa624-7a8c-459f-bbaa-b6cf656add67 | net-r01-01 | ACTIVE | - | Running | ctlplane=192.168.0.24 | | 68136719-1c0e-4481-add3-a6db7e132e80 | stor-r00-00 | ACTIVE | - | Running | ctlplane=192.168.0.26 | | 23d8339d-0b50-4c59-9cd9-77224b304355 | stor-r01-01 | ACTIVE | - | Running | ctlplane=192.168.0.18 | | 4ae838fa-42ab-4ac4-bcaa-46e88635a6cc | stor-r02-02 | ACTIVE | - | Running | ctlplane=192.168.0.30 | +--------------------------------------+-------------+--------+------------+-------------+-----------------------+ Version-Release number of selected component (if applicable): (undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo openstack-tripleo-common-7.6.2-0.20171007061449.el7ost.noarch openstack-tripleo-puppet-elements-7.0.0-0.20170914203705.2094778.el7ost.noarch openstack-tripleo-validations-7.4.1-0.20171007010758.2e43f1a.el7ost.noarch puppet-tripleo-7.4.2-0.20171007035632.195db7c.el7ost.noarch openstack-tripleo-heat-templates-7.0.2-0.20171007062243.el7ost.noarch openstack-tripleo-common-containers-7.6.2-0.20171007061449.el7ost.noarch python-tripleoclient-7.3.2-0.20171006220926.456a1f3.el7ost.noarch openstack-tripleo-image-elements-7.0.0-0.20170914203419.526772d.el7ost.noarch openstack-tripleo-ui-7.4.2-0.20171007034723.15a76df.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 env with ceph nodes, custom roles, predictable IPs and node placement 2. Upgrade to OSP12 Actual results: major-upgrade-composable-steps-docker.yaml fails with: (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list --long overcloud overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: ab643ce7-bba7-454a-b669-ab496733380c status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR Expected results: There is no error. Additional info: This error typically points to an issue while running the ceph-ansible playbook but I couldn't find any ceph-ansible worklow log in mistral logs: ls /var/log/mistral/ api.log engine.log executor.log mistral-db-manage.log Moreover there's no failed workflow execution: openstack workflow execution list | grep -i error Attaching mistral logs.
At the failure time we can see in the mistral engine log: 2017-10-12 19:26:16.294 7167 INFO mistral.engine.engine_server [req-85201fba-753b-40ee-b944-1c1bc013c975 73e4634836184739b4ebf267ce91d7a1 0f4654c661a1416eb5a5877609a016bb - default default] Received RPC request 'on_action_complete'[action_ ex_id=4cd994fa-5079-408f-a508-83ec0b94c8bb, result=Result [data={}, error=Timeout for heat deployment 'create_admin', cancel=False]] 2017-10-12 19:26:16.308 7167 INFO workflow_trace [req-85201fba-753b-40ee-b944-1c1bc013c975 73e4634836184739b4ebf267ce91d7a1 0f4654c661a1416eb5a5877609a016bb - default default] Action 'tripleo.deployment.config' (4cd994fa-5079-408f-a508-83ec0b94c8bb)(task=deploy_config) [RUNNING -> ERROR, error = Timeout for heat deployment 'create_admin'] 2017-10-12 19:26:16.364 7167 INFO workflow_trace [req-85201fba-753b-40ee-b944-1c1bc013c975 73e4634836184739b4ebf267ce91d7a1 0f4654c661a1416eb5a5877609a016bb - default default] Task 'deploy_config' (919cbf05-1e55-4332-bd24-cb60272acf2f) [RUNNING -> ERROR, msg=Timeout for heat deployment 'create_admin'] (execution_id=a7a675f2-083e-4c5e-a269-80d9077fdb5d) 2017-10-12 19:26:16.424 7167 ERROR mistral.engine.task_handler [req-85201fba-753b-40ee-b944-1c1bc013c975 73e4634836184739b4ebf267ce91d7a1 0f4654c661a1416eb5a5877609a016bb - default default] Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function "#property#deploy_stderr", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 63, in run_task task.run() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 315, in run self._run_new() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 341, in _run_new self._schedule_actions() File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 399, in _schedule_actions input_dict = self._get_action_input() File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 438, in _get_action_input ctx_view File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively data[key] = _evaluate_item(data[key], context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 89, in _evaluate_item return evaluate_recursively(item, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively data[key] = _evaluate_item(data[key], context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 89, in _evaluate_item return evaluate_recursively(item, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively data[key] = _evaluate_item(data[key], context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 89, in _evaluate_item return evaluate_recursively(item, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively data[key] = _evaluate_item(data[key], context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 79, in _evaluate_item return evaluate(item, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 71, in evaluate return evaluator.evaluate(expression, context) File "/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py", line 100, in evaluate cls).evaluate(trim_expr, data_context) File "/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py", line 54, in evaluate ", data=%s]" % (expression, str(e), data_context)
It looks that this is bug 1485189. In order to inject the undercloud cert to the overcloud nodes I added the CAMap while running major-upgrade-composable-steps-docker.yaml but since we specifically disable the upgrade for the compute role then the compute nodes didn't get updated and fail. We need to document that before starting upgrade all the overcloud nodes are able to reach the SSL enabled public endpoint of the undercloud. [stack@undercloud-0 ~]$ cat composable_docker_upgrade.sh source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -r ~/openstack_deployment/roles/roles_data.yaml \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/ceph-ansible/ceph-ansible.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/scheduler_hints_env.yaml \ -e ~/openstack_deployment/environments/ips-from-pool-all.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ -e /home/stack/undercloud_ssl_camap.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \ -e /home/stack/ceph-ansible-env.yaml \ -e /home/stack/docker-osp12.yaml \ [stack@undercloud-0 ~]$ [stack@undercloud-0 ~]$ [stack@undercloud-0 ~]$ cat /home/stack/undercloud_ssl_camap.yaml parameter_defaults: CAMap: undercloud-injected-ca: content: | -----BEGIN CERTIFICATE----- MIIDgjCCAmqgAwIBAgIQF3W8bCloSPexX20s6D8WETANBgkqhkiG9w0BAQsFADBQ MSAwHgYDVQQDDBdMb2NhbCBTaWduaW5nIEF1dGhvcml0eTEsMCoGA1UEAwwjMTc3 NWJjNmMtMjk2ODQ4ZjctYjE1ZjZkMmMtZTgzZjEyM2QwHhcNMTcxMDEyMjA0NTA2 WhcNMTgxMDEyMjAzNzEzWjAWMRQwEgYDVQQDEwsxOTIuMTY4LjAuMjCCASIwDQYJ KoZIhvcNAQEBBQADggEPADCCAQoCggEBANfez7op+DXfiF+D5DYsF3v4XS0a7fnF ViPPoxdRtKwkkoZA0odfi7KaIeSNY0tl29JkMN4CG5JYp3X92XZLQqdE7VGNSMNw uO1iNBFXdRcntnHRVsaFQHUFXsUHuO6o2nhtbb4Z0UEt5kVs6ZiWAlXZ6L48CNRy uYNxbBcN5GJqgq9yT3Pzaxw2Sio+ktRBa+kxntUVkoUce5lTbIxrNnpNGgf09nCQ xErahMOkkHPcMEufdR+MwzqMvCO6wxNZbr1B4rFMGqmlGpyC2Xz8wEskzJcc2DKZ D3+M5US/W6IYBy9UniwMCiXby6T6T8WDMvG4zChDrf0+DHiOlvfXqI0CAwEAAaOB kTCBjjAZBgNVHREBAQAEDzANggsxOTIuMTY4LjAuMjAgBgNVHSUBAQAEFjAUBggr BgEFBQcDAgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAgBgNVHQ4BAQAEFgQUTksv MhPDFk7p5Dus6Ms/n7Jet8cwHwYDVR0jBBgwFoAUiJCFENu3ArLEDG3mdcqK66Vw N9IwDQYJKoZIhvcNAQELBQADggEBABKIv1MdxdIuFNM/kAkoKuKgjFINfJLRS4jF 89TFFkb1hc3rqCumB8rJ3gviWceVFpIV3wpAw9AK0qNGh8yQeZuBE4q6cuEUI0FB Op5WQWuSZLK4vLJSAIfkZ1PSeC7iAv+tLFN4tw/USWpUfQ3vVgnud3NIyk4/d6Ld MGfaL/WrEOD+el/WhoaLShiM2z8NJRY4mJEfr000KUR4rgQwnfYBhTc5nd0Ga2Yt rbF57x0sjbEK92nRq1SUoq20l29FN6ldIMcyNPwlxUGFe/0L4Kz6BvA8yl2RD7jJ H1XNmQYO0FRk/Ve1bDtaaGKP46JHsPogXX602a+Kd41L2L21F0k= -----END CERTIFICATE----- Bag Attributes localKeyID: E0 50 59 77 C1 D1 DB 03 A1 E8 BE 11 F6 BF BA D0 4C D9 81 65 friendlyName: Local Signing Authority subject=/CN=Local Signing Authority/CN=1775bc6c-296848f7-b15f6d2c-e83f123d issuer=/CN=Local Signing Authority/CN=1775bc6c-296848f7-b15f6d2c-e83f123d -----BEGIN CERTIFICATE----- MIIDjTCCAnWgAwIBAgIQF3W8bCloSPexX20s6D8SPTANBgkqhkiG9w0BAQsFADBQ MSAwHgYDVQQDDBdMb2NhbCBTaWduaW5nIEF1dGhvcml0eTEsMCoGA1UEAwwjMTc3 NWJjNmMtMjk2ODQ4ZjctYjE1ZjZkMmMtZTgzZjEyM2QwHhcNMTcxMDEyMjAzNzEz WhcNMTgxMDEyMjAzNzEzWjBQMSAwHgYDVQQDDBdMb2NhbCBTaWduaW5nIEF1dGhv cml0eTEsMCoGA1UEAwwjMTc3NWJjNmMtMjk2ODQ4ZjctYjE1ZjZkMmMtZTgzZjEy M2QwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC+Pk2lPC74aNRN2B8l sVVqgEEunDiXHXGeh7b352bflOjcS6WrtO2VoKtg9tpPQUzn+IwDdXiY0H+hhu9E hKIMSpOnXzywHGuGOyQnsjggZWTjv/7JSEfIfid4ZWSCd27urUyFZrk4zzqqHmb0 eGCE4t4m3MdWRO/ByGTANG2bBujwZ3YlLhhwN6nzhtX11uqtOPUAIjBzxr+RBz+/ GqiR7A65B64MHlRiB6W3ZI9VpBRy+B8hhC33E7zhVpys66DOvAQB4N1kfKAjmumC 00NzxsbpkKw2gprafr3QFZe6CqtxoM33/7nKc75/7WRhG6U+b49FMbAzTxYF8qXi sYppAgMBAAGjYzBhMA8GA1UdEwEB/wQFMAMBAQEwHQYDVR0OBBYEFIiQhRDbtwKy xAxt5nXKiuulcDfSMB8GA1UdIwQYMBaAFIiQhRDbtwKyxAxt5nXKiuulcDfSMA4G A1UdDwEB/wQEAwIBhjANBgkqhkiG9w0BAQsFAAOCAQEAcQymAtjiQD5ysYJZMxkx kT8D7/ljrdWPQ5ZM5tjemeO53ub6KghyCUD64x77i65ko+7O9uvgRsftWbn7raS9 580ESJroV9DEHgQKuRfyVHcEAd+HAsqYky8+9d9+1JSegfy38OPqrFGmOiHx/JQ1 MA+rDSX6ZoMIW8yB7HSUDtnwNlHNjGWht4ITnTk7YUa+PmnrWM4NKRJPOIYAx0V/ zAP0gObjoLyo1tnxmxY5Ap0fAPMSD675gmWHmlma8vTuoBHK56xZSonTqrfets7a BEW4JwzWkFrPc6lNWqHZl8u6Mk8ZDFQPJyrcE/S+i/08ZiKmU8mfheOPz6S81RQz ag== -----END CERTIFICATE----- -----BEGIN PRIVATE KEY----- MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDX3s+6Kfg134hf g+Q2LBd7+F0tGu35xVYjz6MXUbSsJJKGQNKHX4uymiHkjWNLZdvSZDDeAhuSWKd1 /dl2S0KnRO1RjUjDcLjtYjQRV3UXJ7Zx0VbGhUB1BV7FB7juqNp4bW2+GdFBLeZF bOmYlgJV2ei+PAjUcrmDcWwXDeRiaoKvck9z82scNkoqPpLUQWvpMZ7VFZKFHHuZ U2yMazZ6TRoH9PZwkMRK2oTDpJBz3DBLn3UfjMM6jLwjusMTWW69QeKxTBqppRqc gtl8/MBLJMyXHNgymQ9/jOVEv1uiGAcvVJ4sDAol28uk+k/FgzLxuMwoQ639Pgx4 jpb316iNAgMBAAECggEAIauBHvpY2p5I+QzrVX+/EfkFH3np0GF1NBS4zXRTB96U dBg8Ph0q/uqHisx6xlHW3ZP/A0G05ziotgCoIIlQliJsGaI9zS4RygTdNi879iad kFckf8Zc7WOvCnBP6fmwScRXr6T7PH1aQ03fiYssRhO8958JiYlzYgsY3uetzaOh 9NrNbPgqMIYZXhNuQmxwTSx/5Carbv/JvLUCndr4Mau2HfoIceu05hcsYooFF3UV PS2GpkXA3kVngvBlSZJqOIsjpjYWrXl6wu1OCI8F6AMJ8JtAB1OrOeoLXpESOR7E 0ea45RURedGsf6P41q+mA3H/5hdCG5UZsRtD2VGciQKBgQDsismSRcvh509CZSo5 rZMmFVUcFRFyKzUYuhajnLfAi8YVPkrhsR/DV4FivFcXUIeJiljyKEKaU9SeIvbA /wx2dLQ1QZIydDA+gntRUg8uflRsz7Hid++NNtvAWMtBiDSu2V8q7LqZ0Q9i1Cvy 7R8m5IbwUKpZxBzre6abDr9+owKBgQDpoLVN1xyWdnXiWYU9AVkBN3doCW5fnuUD ffKfF1S+MVjklsVrJp52j/jAagQafBUSfOLXOBfxZzAvBjpCpMMVKQOUdXFbeOvQ HtYSKeZYX4xbFF+S8OpCSCjFvTXZCC6g/yrsnZWx39BolRbgz5AoOUWcIL+kMVXj xm+/sKifDwKBgHXtJ1suQtwH9sLSLr/8oizNW9YZRs5Vbi46sAi3nAB5brKukKR+ Kqi3moDABudPtZLDj16C5dmMy6ZfJgfH3826lxEp9JoExPyVDqfXMkxqnOp2jWer hZkwbVQysHqmTiWRp1l+FfWTfYk24AZHY01/hyqN/K+uDwDzb3dEXgHjAoGBAKWC 2g8MT091GuzBmPfwJXsMLYbB77TEX+BKcQEuSTX4xc4j1jakBG1gb8z5DnEo6NDR Mu9f6O53uRYHZmziRuaNyOB7F1TDZORrhCMYFf0Tq962n0L9dCiC8IeuFSDtgANE 4scAmRWLxxzgSnX39lvYvyztsncDEKMuaOq3n64XAoGBANQGnBNBpAZ8XR5XHToT rRcSAD4bpCu47bja8DTK9HcPXe1vbofzY0l9caBpWjQXFEjBHKRkQYrlDHwcUfCn WPgU6629buAGKNzNScQw15QaNBa38R/wNZHveabgDMEHEsOHk5Rewxz7NuJh+OFT te/bU7wLMjTwwHGLMaTAO+vj -----END PRIVATE KEY-----
Maybe we could create a pre-upgrade validation that checks the overcloud nodes connectivity to the undercloud public endpoint when it is ssl enabled?
There was some discussion re if this is not reporting an error because of these issues related to puppet apply exit codes: https://bugs.launchpad.net/tripleo/+bug/1723163 https://review.openstack.org/#/c/511509/ https://etherpad.openstack.org/p/tripleo-puppet-error-recap It would be good to re-test when these fixes land to confirm it an error is then correctly raised.
Hi, this is a duplication of 1485189, unless I got this wrong, so I'm closing this one and raise awareness of the needed upgrade path in that bz. *** This bug has been marked as a duplicate of bug 1485189 ***