Bug 1486916 - OSP12 w/ceph: WorkflowTasks_Step2_Execution went to CREATE_FAILED
Summary: OSP12 w/ceph: WorkflowTasks_Step2_Execution went to CREATE_FAILED
Keywords:
Status: CLOSED DUPLICATE of bug 1485189
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: 12.0 (Pike)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-30 19:07 UTC by Michele Baldessari
Modified: 2017-08-31 15:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-31 15:26:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Michele Baldessari 2017-08-30 19:07:49 UTC
In a container OSP12 with 3 ceph nodes deployed with:
(undercloud) [stack@undercloud-0 ~]$ more overcloud_deploy.sh
openstack overcloud deploy --templates \ 
--libvirt-type kvm \                     
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \                                             
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/templates/nodes_data.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/rhos12.yaml \             
--log-file overcloud_deployment_0.log    
 
Heat errors with:
| AllNodesDeploySteps | 6862b171-251e-477e-90e5-ccf6118b17d3 | OS::TripleO::PostDeploySteps | CREATE_FAILED   | 2017-08-30T17:23:30Z | overcloud  |
| WorkflowTasks_Step2_Execution             | b86cde36-f3e4-4733-9b10-6aeb29b6009e | OS::Mistral::ExternalResource  | CREATE_FAILED   | 2017-08-30T17:34:11Z | overcloud-AllNodesDeploySteps-yqgleyshtdkp|
                                         
    [wf_ex_id=c4dc8a30-a78f-40ba-9fe3-2f3cb429141d, idx=4]: Failed to run task [error=Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function "#property#deploy_stderr", data={}], wf=tripleo.deployment.v1.deploy_on_server, task=send_message]:
Traceback (most recent call last):       
  File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 62, in run_task
    task.run()                           
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 306, in run
    self._run_new()                      
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
    return f(*args, **kwargs)            
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 332, in _run_new
    self._schedule_actions()             
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 390, in _schedule_actions
    input_dict = self._get_action_input()
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
    return f(*args, **kwargs)            
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 429, in _get_action_input
    ctx_view                             
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively
    data[key] = _evaluate_item(data[key], context) 
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 89, in _evaluate_item
    return evaluate_recursively(item, context)     
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively
    data[key] = _evaluate_item(data[key], context)
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 89, in _evaluate_item
    return evaluate_recursively(item, context)     
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively
    data[key] = _evaluate_item(data[key], context) 
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 89, in _evaluate_item
    return evaluate_recursively(item, context)     
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively
    data[key] = _evaluate_item(data[key], context) 
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 79, in _evaluate_item
    return evaluate(item, context)       
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 71, in evaluate
    return evaluator.evaluate(expression, context) 
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(deploy_config).result.deploy_stderr, error=Unknown function "#property#deploy_stderr", data={}]
                                         
Note that no deploy failed on the overcloud itself:
(undercloud) [stack@undercloud-0 ~]$ for i in 17 14 19 9 15; do ssh 192.168.24.$i "sudo grep -ir status_code /var/lib/heat-config/deployed"; done |grep -v -e 0$ -e 0,$

Comment 2 Alexander Chuzhoy 2017-08-30 21:47:04 UTC
Reproducing.
Environment:
puppet-ceph-2.4.0-0.20170816093514.3cc04b6.el7ost.noarch
instack-undercloud-7.2.1-0.20170821194210.el7ost.noarch
openstack-puppet-modules-10.0.0-0.20170712001959.0333c73.el7ost.noarch
ceph-ansible-3.0.0-0.1.rc3.el7cp.noarch
openstack-tripleo-heat-templates-7.0.0-0.20170821194253.el7ost.noarch

Comment 3 Alexander Chuzhoy 2017-08-30 22:37:13 UTC
Was able to deploy successfully without ceph.

Comment 6 Giulio Fidente 2017-08-31 10:47:28 UTC
(In reply to Michele Baldessari from comment #5)
> Env with the problem (sorry forgot it last night):
> ssh -l root sealusa9.mobius.lab.eng.rdu2.redhat.com (qum10net)
> ssh undercloud-0

Thanks Michele, the root problem seems to be a timeout during the create_admin step from the enable_ssh_admin workbook (workbooks/access.yaml).

The ceph-ansible workbook calls enable_ssh_admin to enable ssh access on the nodes.

The error from mistral engine.log is:

create_admin_via_nova [task_ex_id=79ff5171-e9fb-4e8e-a859-da156d2e6c01] -> Failure caused by error in tasks: create_admin

Later seen also as:

Task 'deploy_config' (6132ba20-2716-44c7-bcd1-86017c00dda8) [RUNNING -> ERROR, msg=Timeout for heat deployment 'create_admin'] (execution_id=1f73711d-0655-40e4-
8255-51ac803b0546)

Finally send_message task (executed on-completion, not only on success) fails as there is no deploy_stderr from deploy_config task, which timed out. Looking at the overcloud-cephstorage-1 logs, in the journal there is a successful run of the ansible tasks described in enable_ssh_admin but then it is unable to signal back completion to Heat because the SSL certificate is not verified.

The signal url is generated via tripleo-common/actions/deployment.py so I think this is ... in the end ... just a duplicate of BZ #1485189

Adding Jirka on CC to the BZ in case we need amends.

Comment 7 Jiri Stransky 2017-08-31 12:49:27 UTC
I logged into the environment and i agree that this looks like a duplicate of bug 1485189. Snippet from controller-0 logs:

srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:20,632] (heat-config) [INFO] Return code 0
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:20,632] (heat-config) [INFO]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: PLAY [localhost] ***************************************************************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: TASK [Gathering Facts] *********************************************************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: ok: [localhost]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: TASK [create user tripleo-admin] ***********************************************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: changed: [localhost]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: TASK [grant admin rights to user tripleo-admin] ********************************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: changed: [localhost]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: TASK [ensure .ssh dir exists for user tripleo-admin] ***************************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: changed: [localhost]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: TASK [ensure authorized_keys file exists for user tripleo-admin] ***************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: changed: [localhost]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: TASK [authorize TripleO Mistral key for user tripleo-admin] ********************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: changed: [localhost]
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: PLAY RECAP *********************************************************************
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: localhost                  : ok=6    changed=5    unreachable=0    failed=0
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:20,632] (heat-config) [INFO] Completed /var/lib/heat-config/heat-config-ansible/e6eb930e-ddfc-46b6-af62-b16d95ba77e2_playbook.yaml
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:20,637] (heat-config) [INFO] Completed /usr/libexec/heat-config/hooks/ansible
srp 30 17:39:20 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:20,637] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/e6eb930e-ddfc-46b6-af62-b16d95ba77e2.json < /var/lib/heat-config/deployed/e6eb930e-ddfc-46b6-af62-b16d95ba77e2.notify.json
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:21,259] (heat-config) [INFO]
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:21,259] (heat-config) [ERROR] Error running heat-config-notify. [1]
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: [2017-08-30 17:39:21,259] (heat-config) [ERROR] [2017-08-30 17:39:21,213] (heat-config-notify) [DEBUG] Signaling to https://192.168.24.2:13808/v1/AUTH_ca2606af7a76428481381fa61d412ff4/create_admin-11ca0855-8027-4b75-8f4c-bbe4590f6cb4/89fe7599-d5c2-4179-8fa9-3746b7f20d92?temp_url_sig=5809968ba493c10bfd8f6ada9a891cc19b55d46b&temp_url_expires=1504132725 via PUT
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: Traceback (most recent call last):
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/bin/heat-config-notify", line 163, in <module>
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: sys.exit(main(sys.argv, sys.stdin))
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/bin/heat-config-notify", line 110, in main
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: headers={'content-type': 'application/json'})
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/lib/python2.7/site-packages/requests/api.py", line 123, in put
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: return request('put', url, data=data, **kwargs)
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/lib/python2.7/site-packages/requests/api.py", line 56, in request
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: return session.request(method=method, url=url, **kwargs)
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: resp = self.send(prep, **send_kwargs)
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 596, in send
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: r = adapter.send(request, **kwargs)
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: raise SSLError(e, request=request)
srp 30 17:39:21 overcloud-controller-0 os-collect-config[3078]: requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)

Comment 8 Giulio Fidente 2017-08-31 15:26:53 UTC

*** This bug has been marked as a duplicate of bug 1485189 ***


Note You need to log in before you can comment on or make changes to this bug.