Bug 1619263 - OSP-13 : Stack creation failed due to ceph-ansible error 'dict object' has no attribute 'slurp_client_keys'
Summary: OSP-13 : Stack creation failed due to ceph-ansible error 'dict object' has no...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 3.*
Assignee: Sébastien Han
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-20 13:15 UTC by karan singh
Modified: 2022-03-13 16:04 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 10:27:59 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-3730 0 None None None 2022-03-13 16:04:24 UTC

Description karan singh 2018-08-20 13:15:52 UTC
Description of problem:

I am trying to deploy OSP-13 and RHCS3 in a HCI environment. At the very last stage of deployment, openstack stack deploy failed. 

After troubleshooting it looks like, stack creation failed because of ceph-ansibiel error

2018-08-20 08:29:10,611 p=27101 u=mistral |  TASK [ceph-client : get client cephx keys] *************************************
2018-08-20 08:29:10,611 p=27101 u=mistral |  Monday 20 August 2018  08:29:10 -0400 (0:00:00.083)       0:13:53.981 *********
2018-08-20 08:29:10,729 p=27101 u=mistral |  fatal: [192.168.120.20]: FAILED! => {"msg": "'dict object' has no attribute 'slurp_client_keys'"}
2018-08-20 08:29:10,730 p=27101 u=mistral |  fatal: [192.168.120.13]: FAILED! => {"msg": "'dict object' has no attribute 'slurp_client_keys'"}
2018-08-20 08:29:10,732 p=27101 u=mistral |  fatal: [192.168.120.19]: FAILED! => {"msg": "'dict object' has no attribute 'slurp_client_keys'"}


More Logs below

openstack stack deploy logs
---------------------------

2018-08-20 12:06:21Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.4]: SIGNAL_IN_PROGRESS  Signal: deployment f376b56b-4a7b-4542-8254-6ec8fbb4b6fa succeeded
2018-08-20 12:06:21Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.2]: SIGNAL_IN_PROGRESS  Signal: deployment 6df7dd1d-ba64-40dc-b4a8-62778b3543b4 succeeded
2018-08-20 12:06:22Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.4]: CREATE_COMPLETE  state changed
2018-08-20 12:06:22Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.2]: CREATE_COMPLETE  state changed
2018-08-20 12:06:25Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.1]: SIGNAL_IN_PROGRESS  Signal: deployment b22c508e-3429-4186-bf5e-749d99e4259d succeeded
2018-08-20 12:06:25Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.1]: CREATE_COMPLETE  state changed
2018-08-20 12:06:30Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.0]: SIGNAL_IN_PROGRESS  Signal: deployment 04197e90-4fbf-4cfc-b1f0-9d89e9c6d290 succeeded
2018-08-20 12:06:31Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.0]: CREATE_COMPLETE  state changed
2018-08-20 12:06:41Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.3]: SIGNAL_IN_PROGRESS  Signal: deployment f11e36f1-781a-4ce1-bf42-b7e7faf32b99 succeeded
2018-08-20 12:06:42Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1.3]: CREATE_COMPLETE  state changed
2018-08-20 12:06:42Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1]: CREATE_COMPLETE  Stack CREATE completed successfully
2018-08-20 12:06:43Z [overcloud.AllNodesDeploySteps.ComputeHCIDeployment_Step1]: CREATE_COMPLETE  state changed
2018-08-20 12:14:05Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0]: SIGNAL_IN_PROGRESS  Signal: deployment c3c0bdd9-d5a9-41eb-b5a2-291ae2a2cab2 succeeded
2018-08-20 12:14:06Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0]: CREATE_COMPLETE  state changed
2018-08-20 12:14:06Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_COMPLETE  Stack CREATE completed successfully
2018-08-20 12:14:06Z [overclHeat Stack create failed.
Heat Stack create failed.
oud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_COMPLETE  state changed
2018-08-20 12:14:07Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_IN_PROGRESS  state changed
2018-08-20 12:14:08Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_COMPLETE  state changed
2018-08-20 12:14:09Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS  state changed
2018-08-20 12:29:17Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-20 12:29:17Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-20 12:29:17Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-20 12:29:18Z [overcloud]: CREATE_FAILED  Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: 5a971867-190a-4ab4-8eb7-2e0c50d836b9
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
[root@refarch-r220-02 tmp]#



/var/log/mistral/ceph-install-workflow.log
-------------------------------------------

2018-08-20 08:29:10,365 p=27101 u=mistral |  TASK [ceph-client : list existing pool(s)] *************************************
2018-08-20 08:29:10,365 p=27101 u=mistral |  Monday 20 August 2018  08:29:10 -0400 (0:00:00.169)       0:13:53.735 *********
2018-08-20 08:29:10,444 p=27101 u=mistral |  TASK [ceph-client : create ceph pool(s)] ***************************************
2018-08-20 08:29:10,445 p=27101 u=mistral |  Monday 20 August 2018  08:29:10 -0400 (0:00:00.079)       0:13:53.814 *********
2018-08-20 08:29:10,527 p=27101 u=mistral |  TASK [ceph-client : kill a dummy container that created pool(s)/key(s)] ********
2018-08-20 08:29:10,528 p=27101 u=mistral |  Monday 20 August 2018  08:29:10 -0400 (0:00:00.082)       0:13:53.897 *********
2018-08-20 08:29:10,574 p=27101 u=mistral |  skipping: [192.168.120.13] => {"changed": false, "skip_reason": "Conditional result was False"}
2018-08-20 08:29:10,587 p=27101 u=mistral |  skipping: [192.168.120.20] => {"changed": false, "skip_reason": "Conditional result was False"}
2018-08-20 08:29:10,598 p=27101 u=mistral |  skipping: [192.168.120.19] => {"changed": false, "skip_reason": "Conditional result was False"}
2018-08-20 08:29:10,611 p=27101 u=mistral |  TASK [ceph-client : get client cephx keys] *************************************
2018-08-20 08:29:10,611 p=27101 u=mistral |  Monday 20 August 2018  08:29:10 -0400 (0:00:00.083)       0:13:53.981 *********
2018-08-20 08:29:10,729 p=27101 u=mistral |  fatal: [192.168.120.20]: FAILED! => {"msg": "'dict object' has no attribute 'slurp_client_keys'"}
2018-08-20 08:29:10,730 p=27101 u=mistral |  fatal: [192.168.120.13]: FAILED! => {"msg": "'dict object' has no attribute 'slurp_client_keys'"}
2018-08-20 08:29:10,732 p=27101 u=mistral |  fatal: [192.168.120.19]: FAILED! => {"msg": "'dict object' has no attribute 'slurp_client_keys'"}
2018-08-20 08:29:10,734 p=27101 u=mistral |  PLAY RECAP *********************************************************************
2018-08-20 08:29:10,734 p=27101 u=mistral |  192.168.120.10             : ok=121  changed=20   unreachable=0    failed=0
2018-08-20 08:29:10,734 p=27101 u=mistral |  192.168.120.13             : ok=104  changed=13   unreachable=0    failed=1
2018-08-20 08:29:10,734 p=27101 u=mistral |  192.168.120.19             : ok=104  changed=13   unreachable=0    failed=1
2018-08-20 08:29:10,734 p=27101 u=mistral |  192.168.120.20             : ok=104  changed=13   unreachable=0    failed=1
2018-08-20 08:29:10,734 p=27101 u=mistral |  192.168.120.7              : ok=114  changed=13   unreachable=0    failed=1
2018-08-20 08:29:10,734 p=27101 u=mistral |  192.168.120.8              : ok=67   changed=12   unreachable=0    failed=1
2018-08-20 08:29:10,735 p=27101 u=mistral |  INSTALLER STATUS ***************************************************************
2018-08-20 08:29:10,763 p=27101 u=mistral |  Install Ceph Monitor        : Complete (0:01:20)
2018-08-20 08:29:10,764 p=27101 u=mistral |  Install Ceph Manager        : Complete (0:00:29)
2018-08-20 08:29:10,764 p=27101 u=mistral |  Install Ceph OSD            : Complete (0:11:02)
2018-08-20 08:29:10,764 p=27101 u=mistral |  Install Ceph Client         : In Progress (0:00:41)
2018-08-20 08:29:10,764 p=27101 u=mistral |  	This phase can be restarted by running: roles/ceph-client/tasks/main.yml
2018-08-20 08:29:10,764 p=27101 u=mistral |  Monday 20 August 2018  08:29:10 -0400 (0:00:00.153)       0:13:54.134 *********
2018-08-20 08:29:10,764 p=27101 u=mistral |  ===============================================================================

Version-Release number of selected component (if applicable):

[root@refarch-r220-02 tmp]# rpm -qa | egrep -i "openstack|ceph"
openstack-nova-placement-api-17.0.3-0.20180420001141.el7ost.noarch
puppet-openstacklib-12.4.0-0.20180329042555.4b30e6f.el7ost.noarch
ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch
openstack-glance-16.0.1-2.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-23.el7ost.noarch
openstack-heat-api-10.0.1-0.20180411125640.el7ost.noarch
python2-openstackclient-3.14.1-1.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch
openstack-mistral-api-6.0.2-1.el7ost.noarch
openstack-zaqar-6.0.1-1.el7ost.noarch
openstack-swift-container-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-ironic-api-10.1.2-4.el7ost.noarch
openstack-tripleo-ui-8.3.1-3.el7ost.noarch
python-openstackclient-lang-3.14.1-1.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-nova-scheduler-17.0.3-0.20180420001141.el7ost.noarch
openstack-swift-object-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-neutron-common-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-engine-10.0.1-0.20180411125640.el7ost.noarch
openstack-ironic-common-10.1.2-4.el7ost.noarch
openstack-mistral-executor-6.0.2-1.el7ost.noarch
puppet-ceph-2.5.0-1.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch
openstack-selinux-0.8.14-12.el7ost.noarch
python2-openstacksdk-0.11.3-1.el7ost.noarch
openstack-nova-api-17.0.3-0.20180420001141.el7ost.noarch
openstack-nova-compute-17.0.3-0.20180420001141.el7ost.noarch
openstack-keystone-13.0.1-0.20180420194847.7bd6454.el7ost.noarch
openstack-nova-common-17.0.3-0.20180420001141.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011364.0ec54fd.el7ost.noarch
openstack-heat-api-cfn-10.0.1-0.20180411125640.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch
openstack-mistral-engine-6.0.2-1.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180413042250.2634296.el7ost.noarch
openstack-swift-account-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-tripleo-common-8.6.1-23.el7ost.noarch
openstack-swift-proxy-2.17.1-0.20180314165245.caeeb54.el7ost.noarch
openstack-ironic-conductor-10.1.2-4.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-nova-conductor-17.0.3-0.20180420001141.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
openstack-heat-common-10.0.1-0.20180411125640.el7ost.noarch
openstack-mistral-common-6.0.2-1.el7ost.noarch
[root@refarch-r220-02 tmp]#

How reproducible:

100% of the time

Steps to Reproduce:
1. 
2.
3.

Actual results:

Stack creation failed at very last stage 

Expected results:

HCI Stack creation should be successfull


Additional info:

Comment 1 John Fulton 2018-08-23 20:02:40 UTC
I reproduced this using ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch and rhceph/rhceph-3-rhel7:3-12 (fa3b551f0952). Full ceph-ansible output:

 http://sprunge.us/oawDbF

So why does it fail on this task:

https://github.com/ceph/ceph-ansible/blob/v3.1.0rc21/roles/ceph-client/tasks/create_users_keys.yml#L62-L72

I observed during the deployment that all 60 OSDs were created but that none of them were brought up or in [1]. On investigating I saw the OSDs were flapping. 

Re-running ceph-ansible results in a failure because of missing containers [2] before it can fail on the "slurp client cephx key(s)" task. 

Maybe the root cause is that the containers never got in? 

[1]
[root@controller-0 ~]# ceph -s
  cluster:
    id:     4ad54812-a703-11e8-916e-2047478ccfaa
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 60 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     
 
[root@controller-0 ~]# 


[2]

TASK [ceph-docker-common : inspect ceph osd container] ***************************************************
Thursday 23 August 2018  15:46:42 -0400 (0:00:00.160)       0:01:53.832 ******* 
fatal: [192.168.120.17]: FAILED! => {"changed": false, "cmd": ["docker", "inspect", "c7fcbec7e947", "97f2c
38eeabc", "e656eb8666b7"], "delta": "0:00:00.036323", "end": "2018-08-23 19:46:43.454781", "msg": "non-zer
o return code", "rc": 1, "start": "2018-08-23 19:46:43.418458", "stderr": "Error: No such object: c7fcbec7
e947", "stderr_lines": ["Error: No such object: c7fcbec7e947"], "stdout": "[]", "stdout_lines": ["[]"]}
...
fatal: [192.168.120.6]: FAILED! => {"changed": false, "cmd": ["docker", "inspect", "eab7f7f02601", "76089d
529f46", "d19b177dd469"], "delta": "0:00:00.036426", "end": "2018-08-23 19:46:43.693686", "msg": "non-zero return code", "rc": 1, "start": "2018-08-23 19:46:43.657260", "stderr": "Error: No such object: eab7f7f02
601", "stderr_lines": ["Error: No such object: eab7f7f02601"], "stdout": "[]", "stdout_lines": ["[]"]}

PLAY RECAP ***********************************************************************************************
192.168.120.11             : ok=112  changed=12   unreachable=0    failed=0   
192.168.120.15             : ok=33   changed=0    unreachable=0    failed=1   
192.168.120.16             : ok=33   changed=0    unreachable=0    failed=1   
192.168.120.17             : ok=36   changed=0    unreachable=0    failed=1   
192.168.120.6              : ok=33   changed=0    unreachable=0    failed=1   
192.168.120.9              : ok=33   changed=0    unreachable=0    failed=1

Comment 5 Sébastien Han 2018-08-24 11:08:00 UTC
The error:

2018-08-24 11:05:36.950684 7fcca1e67d80 -1 unable to find any IP address in networks '172.17.4.0/24' interfaces ''

Wrong config on the ceph.conf, the network does not exist on the box

[root@osd-compute-4 ~]# ip a | grep 172.17.4
[root@osd-compute-4 ~]# ip a | grep 172.17.3
    inet 172.17.3.225/24 brd 172.17.3.255 scope global vlan170

Only .3 is present.

Closing this, feel free to re-open if you have any more concerns.

Comment 6 John Fulton 2018-08-24 14:24:36 UTC
root cause: network misconfiguration; OSD not connected to StorageMgmt network. 

The network-environment.yaml file was updated with something like the following to fix it:

 right:  OS::TripleO::ComputeHCI::Ports::StorageMgmtPort: network/ports/storage_mgmt_from_pool.yaml
 wrong:  OS::TripleO::ComputeHCI::Ports::StorageMgmtPort: network/ports/noop.yaml


Full diff below

(undercloud) [stack@refarch-r220-02 templates]$ diff -u network-environment.yaml~ network-environment.yaml
--- network-environment.yaml~   2018-08-20 06:22:41.350436109 -0400
+++ network-environment.yaml    2018-08-24 08:39:46.249534353 -0400
@@ -5,13 +5,13 @@
   OS::TripleO::Controller::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/external_from_pool.yaml
   OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml
   OS::TripleO::Controller::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
-  OS::TripleO::Controller::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt_from_pool.yaml
+  OS::TripleO::Controller::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
   OS::TripleO::Controller::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/tenant_from_pool.yaml
 
   OS::TripleO::ComputeHCI::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
   OS::TripleO::ComputeHCI::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml
   OS::TripleO::ComputeHCI::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
-  OS::TripleO::ComputeHCI::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
+  OS::TripleO::ComputeHCI::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt_from_pool.yaml
   OS::TripleO::ComputeHCI::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/tenant_from_pool.yaml
 
 parameter_defaults:
(undercloud) [stack@refarch-r220-02 templates]$

Comment 7 karan singh 2018-08-29 10:27:13 UTC
John / Seb : Thanks for your help so far on this BZ, but unfortunately overcloud deployment never completed successfully. 

I am still getting errors related ceph_install tasks (however no errors in ceph-mistral logs + ceph -s is fine too). I have created a new BZ : 1623417 for that.

Comment 8 karan singh 2018-08-29 10:27:59 UTC
should we clo


Note You need to log in before you can comment on or make changes to this bug.