Bug 1533283 - OSP 11 to OSP 12 Fails with Ceph Error in Mistral Logs - stat: cannot stat '/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok'
Summary: OSP 11 to OSP 12 Fails with Ceph Error in Mistral Logs - stat: cannot stat '/...
Keywords:
Status: CLOSED DUPLICATE of bug 1519842
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: James Slagle
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-10 23:59 UTC by Chris Paquin
Modified: 2018-01-23 17:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-23 17:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Chris Paquin 2018-01-10 23:59:40 UTC
I have run into the following error when trying to perform an update to OSP 12. Appears very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1507888, however I am not running TLS everywhere. 


Run upgrade script:

time openstack overcloud deploy --templates --stack chrisp \
     --ntp-server 10.9.71.7 \
     --control-flavor control --control-scale 3 \
     --compute-flavor compute --compute-scale 2 \
     --ceph-storage-flavor ceph-storage --ceph-storage-scale 3 \
     -e templates/network-environment.yaml \
     -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
     -e templates/storage-environment.yaml \
     -e /home/stack/templates/overcloud_images.yaml  \
     -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml



Deployment fails:


AllNodesPostUpgradeSteps-z54w4r3gybsb.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2018-01-10 22:43:44Z [chrisp-AllNodesDeploySteps-yvt3kgjzwecp-AllNodesPostUpgradeSteps-z54w4r3gybsb]: UPDATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2018-01-10 22:43:45Z [chrisp-AllNodesDeploySteps-yvt3kgjzwecp.AllNodesPostUpgradeSteps]: UPDATE_FAILED  resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR
2018-01-10 22:43:46Z [chrisp-AllNodesDeploySteps-yvt3kgjzwecp]: UPDATE_FAILED  resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR
2018-01-10 22:43:46Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR
2018-01-10 22:43:47Z [chrisp]: UPDATE_FAILED  resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack chrisp UPDATE_FAILED 

chrisp.AllNodesDeploySteps.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: 1cb39f7a-ca7b-4f35-ab53-e16017d93996
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack update failed.
Heat Stack update failed.

Failed Mistral Workloads

(undercloud) [stack@chrisp-undercloud ~]$ openstack workflow execution list | grep -i error
| f322eeab-f3b7-4ed0-bd9a-60debb5a93f0 | 226e5aa1-0de1-4563-b9cd-367260b876b3 | tripleo.validations.v1.copy_ssh_key                                    |                                                                                                                                                                                                                                   | <none>                               | ERROR   | Failure caused by error i... | 2018-01-09 21:25:20 | 2018-01-09 21:30:30 |
| db01b2be-7833-467e-a7ac-9d5f7308422f | 93dcee89-3c59-416b-b58a-ca01b9b9a672 | tripleo.validations.v1.run_groups                                      |                                                                                                                                                                                                                                   | <none>                               | ERROR   | None                         | 2018-01-09 21:25:22 | 2018-01-09 21:25:37 |
| 65c807bf-5731-4f7b-8381-fe7d7ecd906d | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 |
| 69377751-330f-49ea-8121-844231149e98 | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 |
| 9565f3a4-ede3-4950-88bf-bb0a2f2ffa09 | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:28 |
| a41aca48-84de-4f5a-8f9e-966b20ed93eb | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:28 |
| c10d8f45-4017-484f-a733-7c84756fc42d | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:28 |
| d11f89b9-35b2-42e7-8151-b9192156115c | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 |
| e89898e2-1751-424f-9c25-5f64523445bb | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 |
| f36417ba-b58c-4bfd-b545-6eab77045ac5 | c3add141-be70-402f-bfb4-2b79760b7fe9 | tripleo.deployment.v1.deploy_on_server                                 | sub-workflow execution                                                                                                                                                                                                            | 28480dfa-9a6c-4059-bfe8-c87b1baf05cc | ERROR   | Failed to run task [error... | 2018-01-09 21:25:24 | 2018-01-09 21:30:29 |
| 34bfc551-121e-49a5-b735-9b6aa7ee7184 | c23dbaec-e71a-42b8-81f2-e2f3dfb745d1 | tripleo.validations.v1.run_validation                                  | sub-workflow execution                                                                                                                                                                                                            | 9ccacb26-e321-4523-acb5-77e5a994b6bb | ERROR   | None                         | 2018-01-09 21:25:25 | 2018-01-09 21:25:34 |
| 7b32a515-8e40-4f72-a2a9-30450bfb093a | c23dbaec-e71a-42b8-81f2-e2f3dfb745d1 | tripleo.validations.v1.run_validation                                  | sub-workflow execution                                                                                                                                                                                                            | 9ccacb26-e321-4523-acb5-77e5a994b6bb | ERROR   | None                         | 2018-01-09 21

Mistral Logs from Undercloud:

2018-01-10 17:42:20,838 p=24835 u=mistral |  TASK [ceph-mon : wait for monitor socket to exist] *****************************
2018-01-10 17:42:21,384 p=24835 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (5 retries left).
2018-01-10 17:42:36,766 p=24835 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (4 retries left).
2018-01-10 17:42:52,146 p=24835 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (3 retries left).
2018-01-10 17:43:07,520 p=24835 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (2 retries left).
2018-01-10 17:43:22,879 p=24835 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (1 retries left).
2018-01-10 17:43:38,275 p=24835 u=mistral |  fatal: [172.16.0.115]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["docker", "exec", "ceph-mon-chrisp-controller-2", "stat", "/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok"], "delta": "0:00:00.076018", "end": "2018-01-10 22:43:40.387215", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-01-10 22:43:40.311197", "stderr": "stat: cannot stat '/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok': No such file or directory", "stderr_lines": ["stat: cannot stat '/var/run/ceph/ceph-mon.chrisp-controller-2.localdomain.asok': No such file or directory"], "stdout": "", "stdout_lines": []}
2018-01-10 17:43:38,276 p=24835 u=mistral |  PLAY RECAP *********************************************************************
2018-01-10 17:43:38,276 p=24835 u=mistral |  172.16.0.104               : ok=4    changed=0    unreachable=0    failed=0   
2018-01-10 17:43:38,277 p=24835 u=mistral |  172.16.0.106               : ok=4    changed=0    unreachable=0    failed=0   
2018-01-10 17:43:38,277 p=24835 u=mistral |  172.16.0.107               : ok=4    changed=0    unreachable=0    failed=0   
2018-01-10 17:43:38,277 p=24835 u=mistral |  172.16.0.109               : ok=4    changed=0    unreachable=0    failed=0   
2018-01-10 17:43:38,277 p=24835 u=mistral |  172.16.0.112               : ok=4    changed=0    unreachable=0    failed=0   
2018-01-10 17:43:38,277 p=24835 u=mistral |  172.16.0.115               : ok=43   changed=4    unreachable=0    failed=1   
2018-01-10 17:43:38,277 p=24835 u=mistral |  localhost                  : ok=0    changed=0    unreachable=0    failed=0   

SSH to 172.16.0.115.

Missing socket file in /var/run/ceph/, however ceph status looks fine.

(undercloud) [stack@chrisp-undercloud ~]$ ssh heat-admin.0.115
Last login: Wed Jan 10 22:49:54 2018 from 172.16.0.11
[heat-admin@chrisp-controller-2 ~]$ sudo ceph -s
    cluster 86f961a6-f561-11e7-8e3f-fa163e50c7f3
     health HEALTH_OK
     monmap e1: 3 mons at {chrisp-controller-0=172.16.3.13:6789/0,chrisp-controller-1=172.16.3.20:6789/0,chrisp-controller-2=172.16.3.22:6789/0}
            election epoch 14, quorum 0,1,2 chrisp-controller-0,chrisp-controller-1,chrisp-controller-2
     osdmap e27: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
      pgmap v175: 176 pgs, 8 pools, 0 bytes data, 0 objects
            100 MB used, 284 GB / 284 GB avail
                 176 active+clean

Docker containter for rhceph2 is running
[root@chrisp-controller-2 mistral]# docker ps
CONTAINER ID        IMAGE                                                           COMMAND                  CREATED             STATUS              PORTS               NAMES
583b7a82c746        registry.access.redhat.com/rhceph/rhceph-2-rhel7:2.4-4          "/entrypoint.sh"         38 minutes ago      Up 38 minutes                           ceph-mon-chrisp-controller-2

(undercloud) [stack@chrisp-undercloud ~]$ for i in `openstack workflow execution list|awk '/ERROR/ {print $2}'`; do echo $i; for j in $(openstack task execution list $i|awk '/ERROR/ {print $2}'); do openstack task execution result show $j|sed 's/\\n/\n/g'|grep -i error; done; done
f322eeab-f3b7-4ed0-bd9a-60debb5a93f0
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
        "result": "Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function \"#property#deploy_stderr\", data={}]
db01b2be-7833-467e-a7ac-9d5f7308422f
Message: An unhandled exception occurred while running the lookup plugin 'stack_resources'. Error was a <class 'heatclient.exc.HTTPNotFound'>, original message: ERROR: The Stack (overcloud) could not be found.
65c807bf-5731-4f7b-8381-fe7d7ecd906d
69377751-330f-49ea-8121-844231149e98
9565f3a4-ede3-4950-88bf-bb0a2f2ffa09
a41aca48-84de-4f5a-8f9e-966b20ed93eb
c10d8f45-4017-484f-a733-7c84756fc42d
d11f89b9-35b2-42e7-8151-b9192156115c
e89898e2-1751-424f-9c25-5f64523445bb
f36417ba-b58c-4bfd-b545-6eab77045ac5
34bfc551-121e-49a5-b735-9b6aa7ee7184
Message: An unhandled exception occurred while running the lookup plugin 'stack_resources'. Error was a <class 'heatclient.exc.HTTPNotFound'>, original message: ERROR: The Stack (overcloud) could not be found.
7b32a515-8e40-4f72-a2a9-30450bfb093a


There is no asok file in /var/run/ceph at all.

Comment 1 Marius Cornea 2018-01-11 22:25:58 UTC
This error looks pretty much similar to the one reported in bug 1519842. Would it be possible to check whether the workaround mentioned there works on your environment?

Also could you check if there is any asok file in /var/run/ceph inside the monitor container, e.g:

[root@controller-0 ~]# docker exec -it ceph-mon-controller-0 ls /var/run/ceph

Comment 2 Chris Paquin 2018-01-11 22:32:52 UTC
Unfortunately I have deleted this environment due to a lack of testing resources.

Comment 3 Giulio Fidente 2018-01-23 17:25:32 UTC

*** This bug has been marked as a duplicate of bug 1519842 ***


Note You need to log in before you can comment on or make changes to this bug.