I have run into the following error when trying to perform an update to OSP 12. Appears very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1507888, however I am not running TLS everywhere. Run upgrade script: time openstack overcloud deploy --templates --stack chrisp \ --ntp-server 10.9.71.7 \ --control-flavor control --control-scale 3 \ --compute-flavor compute --compute-scale 2 \ --ceph-storage-flavor ceph-storage --ceph-storage-scale 3 \ -e templates/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e templates/storage-environment.yaml \ -e /home/stack/templates/overcloud_images.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml Deployment fails: AllNodesPostUpgradeSteps-z54w4r3gybsb.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2018-01-10 22:43:44Z [chrisp-AllNodesDeploySteps-yvt3kgjzwecp-AllNodesPostUpgradeSteps-z54w4r3gybsb]: UPDATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2018-01-10 22:43:45Z [chrisp-AllNodesDeploySteps-yvt3kgjzwecp.AllNodesPostUpgradeSteps]: UPDATE_FAILED resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR 2018-01-10 22:43:46Z [chrisp-AllNodesDeploySteps-yvt3kgjzwecp]: UPDATE_FAILED resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR 2018-01-10 22:43:46Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR 2018-01-10 22:43:47Z [chrisp]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR Stack chrisp UPDATE_FAILED chrisp.AllNodesDeploySteps.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: 1cb39f7a-ca7b-4f35-ab53-e16017d93996 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR Heat Stack update failed. Heat Stack update failed. Failed Mistral Workloads (undercloud) [stack@chrisp-undercloud ~]$ openstack workflow execution list | grep -i error | f322eeab-f3b7-4ed0-bd9a-60debb5a93f0 | 226e5aa1-0de1-4563-b9cd-367260b876b3 | tripleo.validations.v1.copy_ssh_key | | <none> | ERROR | Failure caused by error i... | 2018-01-09 21:25:20 | 2018-01-09 21:30:30 | | db01b2be-7833-467e-a7ac-9d5f7308422f | 93dcee89-3c59-416b-b58a-ca01b9b9a672 | tripleo.validations.v1.run_groups | | <none> | ERROR | None | 2018-01-09 21:25:22 | 2018-01-09 21:25:37 | | 65c807bf-5731-4f7b-8381-fe7d7ecd906d | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 | | 69377751-330f-49ea-8121-844231149e98 | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 | | 9565f3a4-ede3-4950-88bf-bb0a2f2ffa09 | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:28 | | a41aca48-84de-4f5a-8f9e-966b20ed93eb | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:28 | | c10d8f45-4017-484f-a733-7c84756fc42d | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:28 | | d11f89b9-35b2-42e7-8151-b9192156115c | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 | | e89898e2-1751-424f-9c25-5f64523445bb | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 | | f36417ba-b58c-4bfd-b545-6eab77045ac5 | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server | sub-workflow execution | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 | | 34bfc551-121e-49a5-b735-9b6aa7ee7184 | c23dbaec-e71a-42b8-81f2-e2f3dfb745d1 | tripleo.validations.v1.run_validation | sub-workflow execution | 9ccacb26-e321-4523-acb5-77e5a994b6bb | ERROR | None | 2018-01-09 21:25:25 | 2018-01-09 21:25:34 | | 7b32a515-8e40-4f72-a2a9-30450bfb093a | c23dbaec-e71a-42b8-81f2-e2f3dfb745d1 | tripleo.validations.v1.run_validation | sub-workflow execution | 9ccacb26-e321-4523-acb5-77e5a994b6bb | ERROR | None | 2018-01-09 21 Mistral Logs from Undercloud: 2018-01-10 17:42:20,838 p=24835 u=mistral | TASK [ceph-mon : wait for monitor socket to exist] ***************************** 2018-01-10 17:42:21,384 p=24835 u=mistral | FAILED - RETRYING: wait for monitor socket to exist (5 retries left). 2018-01-10 17:42:36,766 p=24835 u=mistral | FAILED - RETRYING: wait for monitor socket to exist (4 retries left). 2018-01-10 17:42:52,146 p=24835 u=mistral | FAILED - RETRYING: wait for monitor socket to exist (3 retries left). 2018-01-10 17:43:07,520 p=24835 u=mistral | FAILED - RETRYING: wait for monitor socket to exist (2 retries left). 2018-01-10 17:43:22,879 p=24835 u=mistral | FAILED - RETRYING: wait for monitor socket to exist (1 retries left). 2018-01-10 17:43:38,275 p=24835 u=mistral | fatal: [172.16.0.115]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["docker", "exec", "ceph-mon-chrisp-controller-2", "stat", "/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok"], "delta": "0:00:00.076018", "end": "2018-01-10 22:43:40.387215", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-01-10 22:43:40.311197", "stderr": "stat: cannot stat '/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok': No such file or directory", "stderr_lines": ["stat: cannot stat '/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok': No such file or directory"], "stdout": "", "stdout_lines": []} 2018-01-10 17:43:38,276 p=24835 u=mistral | PLAY RECAP ********************************************************************* 2018-01-10 17:43:38,276 p=24835 u=mistral | 172.16.0.104 : ok=4 changed=0 unreachable=0 failed=0 2018-01-10 17:43:38,277 p=24835 u=mistral | 172.16.0.106 : ok=4 changed=0 unreachable=0 failed=0 2018-01-10 17:43:38,277 p=24835 u=mistral | 172.16.0.107 : ok=4 changed=0 unreachable=0 failed=0 2018-01-10 17:43:38,277 p=24835 u=mistral | 172.16.0.109 : ok=4 changed=0 unreachable=0 failed=0 2018-01-10 17:43:38,277 p=24835 u=mistral | 172.16.0.112 : ok=4 changed=0 unreachable=0 failed=0 2018-01-10 17:43:38,277 p=24835 u=mistral | 172.16.0.115 : ok=43 changed=4 unreachable=0 failed=1 2018-01-10 17:43:38,277 p=24835 u=mistral | localhost : ok=0 changed=0 unreachable=0 failed=0 SSH to 172.16.0.115. Missing socket file in /var/run/ceph/, however ceph status looks fine. (undercloud) [stack@chrisp-undercloud ~]$ ssh heat-admin.0.115 Last login: Wed Jan 10 22:49:54 2018 from 172.16.0.11 [heat-admin@chrisp-controller-2 ~]$ sudo ceph -s cluster 86f961a6-f561-11e7-8e3f-fa163e50c7f3 health HEALTH_OK monmap e1: 3 mons at {chrisp-controller-0=172.16.3.13:6789/0,chrisp-controller-1=172.16.3.20:6789/0,chrisp-controller-2=172.16.3.22:6789/0} election epoch 14, quorum 0,1,2 chrisp-controller-0,chrisp-controller-1,chrisp-controller-2 osdmap e27: 3 osds: 3 up, 3 in flags sortbitwise,require_jewel_osds,recovery_deletes pgmap v175: 176 pgs, 8 pools, 0 bytes data, 0 objects 100 MB used, 284 GB / 284 GB avail 176 active+clean Docker containter for rhceph2 is running [root@chrisp-controller-2 mistral]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 583b7a82c746 registry.access.redhat.com/rhceph/rhceph-2-rhel7:2.4-4 "/entrypoint.sh" 38 minutes ago Up 38 minutes ceph-mon-chrisp-controller-2 (undercloud) [stack@chrisp-undercloud ~]$ for i in `openstack workflow execution list|awk '/ERROR/ {print $2}'`; do echo $i; for j in $(openstack task execution list $i|awk '/ERROR/ {print $2}'); do openstack task execution result show $j|sed 's/\\n/\n/g'|grep -i error; done; done f322eeab-f3b7-4ed0-bd9a-60debb5a93f0 "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}] db01b2be-7833-467e-a7ac-9d5f7308422f Message: An unhandled exception occurred while running the lookup plugin 'stack_resources'. Error was a <class 'heatclient.exc.HTTPNotFound'>, original message: ERROR: The Stack (overcloud) could not be found. 65c807bf-5731-4f7b-8381-fe7d7ecd906d 69377751-330f-49ea-8121-844231149e98 9565f3a4-ede3-4950-88bf-bb0a2f2ffa09 a41aca48-84de-4f5a-8f9e-966b20ed93eb c10d8f45-4017-484f-a733-7c84756fc42d d11f89b9-35b2-42e7-8151-b9192156115c e89898e2-1751-424f-9c25-5f64523445bb f36417ba-b58c-4bfd-b545-6eab77045ac5 34bfc551-121e-49a5-b735-9b6aa7ee7184 Message: An unhandled exception occurred while running the lookup plugin 'stack_resources'. Error was a <class 'heatclient.exc.HTTPNotFound'>, original message: ERROR: The Stack (overcloud) could not be found. 7b32a515-8e40-4f72-a2a9-30450bfb093a There is no asok file in /var/run/ceph at all.
This error looks pretty much similar to the one reported in bug 1519842. Would it be possible to check whether the workaround mentioned there works on your environment? Also could you check if there is any asok file in /var/run/ceph inside the monitor container, e.g: [root@controller-0 ~]# docker exec -it ceph-mon-controller-0 ls /var/run/ceph
Unfortunately I have deleted this environment due to a lack of testing resources.
*** This bug has been marked as a duplicate of bug 1519842 ***