Bug 1479514 - rhosp-director: OSP12 deployment with ceph fails. OS::Mistral::ExternalResource WorkflowTasks_Step2_Execution: ERROR
Summary: rhosp-director: OSP12 deployment with ceph fails. OS::Mistral::ExternalResour...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 12.0 (Pike)
Assignee: Giulio Fidente
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-08 17:01 UTC by Alexander Chuzhoy
Modified: 2018-02-05 19:10 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170805163046.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 21:49:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1709464 0 None None None 2017-08-08 18:57:17 UTC
OpenStack gerrit 491886 0 None MERGED Fix cidr get_attr in custom networks 2021-01-06 02:21:08 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Alexander Chuzhoy 2017-08-08 17:01:47 UTC
rhosp-director: OSP12 deployment with ceph fails. OS::Mistral::ExternalResource   WorkflowTasks_Step2_Execution: ERROR

Environment:
python-cephfs-10.2.7-28.el7cp.x86_64
ceph-mon-10.2.7-28.el7cp.x86_64
ceph-mds-10.2.7-28.el7cp.x86_64
libcephfs1-10.2.7-28.el7cp.x86_64
puppet-ceph-2.3.1-0.20170805094345.868e6d6.el7ost.noarch
ceph-common-10.2.7-28.el7cp.x86_64
ceph-osd-10.2.7-28.el7cp.x86_64
ceph-selinux-10.2.7-28.el7cp.x86_64
ceph-base-10.2.7-28.el7cp.x86_64
ceph-radosgw-10.2.7-28.el7cp.x86_64
openstack-tripleo-heat-templates-7.0.0-0.20170805163045.el7ost.noarch
instack-undercloud-7.2.1-0.20170729010705.el7ost.noarch
openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch


Steps to reproduce:
Attempt to deploy overcloud with ceph:


Deployment command:
openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/workaround_params.yaml \
-e /home/stack/virt/docker-images.yaml \
--log-file overcloud_deployment_74.log



(undercloud) [stack@undercloud-0 ~]$ cat virt/internal.yaml 
parameter_defaults:
    CinderEnableIscsiBackend: false
    CinderEnableRbdBackend: true
    CinderEnableNfsBackend: false
    NovaEnableRbdBackend: true
    GlanceBackend: rbd
    CinderRbdPoolName: "volumes"
    NovaRbdPoolName: "vms"
    GlanceRbdPoolName: "images"
    ExtraConfig:
      ceph::profile::params::osds:
       '/dev/vdb':
           journal: ''



Result:

The deployment fails with:
2017-08-08 16:30:19Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2017-08-08 16:30:20Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2017-08-08 16:30:20Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2017-08-08 16:30:20Z [overcloud]: CREATE_FAILED  Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud CREATE_FAILED 

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: 54b4ff63-c7f0-4328-ae22-b8d128a9350b
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack create failed.
Heat Stack create failed.



)
(undercloud) [stack@undercloud-0 ~]$ heat resource-list -n5 overcloud|grep -v COMPLE

+----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| resource_name                                | physical_resource_id                                                                                                                                                                 | resource_type                                                                                                                    | resource_status | updated_time         | stack_name                                                                                                                                               |
+----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| AllNodesDeploySteps                          | 2bc918bb-85c8-4baa-8abc-d27aea12773e                                                                                                                                                 | OS::TripleO::PostDeploySteps                                                                                                     | CREATE_FAILED   | 2017-08-08T16:07:03Z | overcloud                                                                                                                                                |
| WorkflowTasks_Step2_Execution                | 54b4ff63-c7f0-4328-ae22-b8d128a9350b                                                                                                                                                 | OS::Mistral::ExternalResource                                                                                                    | CREATE_FAILED   | 2017-08-08T16:20:29Z | overcloud-AllNodesDeploySteps-y2vh76uaidg6                                                                                                               |
+----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------------+------------------------------------------



Repeating errors for docker systemd unit on controller:
Aug 08 16:53:35 controller-1 dockerd-current[19397]: time="2017-08-08T12:53:35.844098021-04:00" level=error msg="Handler for DELETE /v1.24/containers/ceph-mon-controller-1 returned error: No such container: ceph-mon-controller-1"
Aug 08 16:53:35 controller-1 dockerd-current[19397]: time="2017-08-08T12:53:35.844126280-04:00" level=error msg="Handler for DELETE /v1.24/containers/ceph-mon-controller-1 returned error: No such container: ceph-mon-controller-1"
Aug 08 16:53:35 controller-1 dockerd-current[19397]: time="2017-08-08T12:53:35.881262714-04:00" level=error msg="Handler for POST /v1.24/containers/ceph-mon-controller-1/stop?t=10 returned error: No such container: ceph-mon-controller-1"
Aug 08 16:53:35 controller-1 dockerd-current[19397]: time="2017-08-08T12:53:35.881288330-04:00" level=error msg="Handler for POST /v1.24/containers/ceph-mon-controller-1/stop returned error: No such container: ceph-mon-controller-1"






[root@controller-1 ~]# journalctl -u os-collect-config|grep -i error
Aug 08 16:15:15 controller-1 os-collect-config[3436]: [2017-08-08 12:15:15,588] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:15:39 controller-1 os-collect-config[3436]: [2017-08-08 12:15:39,646] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:16:16 controller-1 os-collect-config[3436]: [2017-08-08 12:16:16,645] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:16:53 controller-1 os-collect-config[3436]: [2017-08-08 12:16:53,457] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:17:28 controller-1 os-collect-config[3436]: [2017-08-08 12:17:28,663] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:19:07 controller-1 os-collect-config[3436]: [2017-08-08 12:19:07,712] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:19:44 controller-1 os-collect-config[3436]: [2017-08-08 12:19:44,951] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:20:23 controller-1 os-collect-config[3436]: [2017-08-08 12:20:23,095] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:20:59 controller-1 os-collect-config[3436]: [2017-08-08 12:20:59,600] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:22:07 controller-1 os-collect-config[3436]: [2017-08-08 12:22:07,795] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:26:01 controller-1 os-collect-config[3436]: [2017-08-08 16:26:01,407] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:28:09 controller-1 os-collect-config[3436]: [2017-08-08 16:28:09,076] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None
Aug 08 16:28:53 controller-1 os-collect-config[3436]: [2017-08-08 16:28:53,027] (heat-config) [ERROR] Skipping group os-apply-config with no hook script None

Comment 6 Alexander Chuzhoy 2017-08-18 15:32:53 UTC
Environment:
openstack-tripleo-heat-templates-7.0.0-0.20170805163048.el7ost.noarch

The reported issue doesn't reproduce.

Comment 7 Artem Hrechanychenko 2017-09-12 14:39:21 UTC
reproduced
overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: fb6f06d3-0c1e-4409-abcb-1d370612534e
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR

images from puddle: 2017-09-08.3


from mistral logs - /var/log/mistral/ceph-install-workflow.log

2017-09-12 08:25:14,405 p=6542 u=mistral |  failed: [192.168.24.12] (item=[{u'mon_cap': u'allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create', u'mds_cap': u'allow *', u'name': u'client.manila', u'mode': u'0644', u'key': u'AQBNxLdZAAAAABAAXLOTnMwE/KF1CcnCrBVB8Q==', u'osd_cap': u'allow rw'}, {'_ansible_parsed': True, 'stderr_lines': [u'2017-09-12 12:25:14.364532 7fbf70c65700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory', u'2017-09-12 12:25:14.368477 7fbf70c65700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication', u'2017-09-12 12:25:14.368504 7fbf70c65700  0 librados: client.admin authentication error (95) Operation not supported', u'Error connecting to cluster: Error'], '_ansible_item_result': True, u'end': u'2017-09-12 12:25:14.378680', '_ansible_no_log': False, u'stdout': u'', u'cmd': [u'ceph', u'--cluster', u'ceph', u'auth', u'get', u'client.manila'], u'rc': 1, 'item': {u'mon_cap': u'allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create', u'mds_cap': u'allow *', u'name': u'client.manila', u'mode': u'0644', u'key': u'AQBNxLdZAAAAABAAXLOTnMwE/KF1CcnCrBVB8Q==', u'osd_cap': u'allow rw'}, u'delta': u'0:00:00.116920', u'stderr': u'2017-09-12 12:25:14.364532 7fbf70c65700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\n2017-09-12 12:25:14.368477 7fbf70c65700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication\n2017-09-12 12:25:14.368504 7fbf70c65700  0 librados: client.admin authentication error (95) Operation not supported\nError connecting to cluster: Error', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'ceph --cluster ceph auth get client.manila', u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-09-12 12:25:14.261760', 'failed': False}]) => {"changed": false, "cmd": ["ceph", "--cluster", "ceph", "auth", "import", "-i", "/etc/ceph/ceph.client.manila.keyring"], "delta": "0:00:00.122784", "end": "2017-09-12 12:25:15.281245", "failed": true, "item": [{"key": "AQBNxLdZAAAAABAAXLOTnMwE/KF1CcnCrBVB8Q==", "mds_cap": "allow *", "mode": "0644", "mon_cap": "allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create", "name": "client.manila", "osd_cap": "allow rw"}, {"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "auth", "get", "client.manila"], "delta": "0:00:00.116920", "end": "2017-09-12 12:25:14.378680", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "ceph --cluster ceph auth get client.manila", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": {"key": "AQBNxLdZAAAAABAAXLOTnMwE/KF1CcnCrBVB8Q==", "mds_cap": "allow *", "mode": "0644", "mon_cap": "allow r, allow command auth del, allow command auth caps, allow command auth get, allow command auth get-or-create", "name": "client.manila", "osd_cap": "allow rw"}, "rc": 1, "start": "2017-09-12 12:25:14.261760", "stderr": "2017-09-12 12:25:14.364532 7fbf70c65700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\n2017-09-12 12:25:14.368477 7fbf70c65700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication\n2017-09-12 12:25:14.368504 7fbf70c65700  0 librados: client.admin authentication error (95) Operation not supported\nError connecting to cluster: Error", "stderr_lines": ["2017-09-12 12:25:14.364532 7fbf70c65700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory", "2017-09-12 12:25:14.368477 7fbf70c65700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication", "2017-09-12 12:25:14.368504 7fbf70c65700  0 librados: client.admin authentication error (95) Operation not supported", "Error connecting to cluster: Error"], "stdout": "", "stdout_lines": []}], "rc": 1, "start": "2017-09-12 12:25:15.158461", "stderr": "2017-09-12 12:25:15.268335 7f4ff6dc9700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory\n2017-09-12 12:25:15.270008 7f4ff6dc9700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication\n2017-09-12 12:25:15.270015 7f4ff6dc9700  0 librados: client.admin authentication error (95) Operation not supported\nError connecting to cluster: Error", "stderr_lines": ["2017-09-12 12:25:15.268335 7f4ff6dc9700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory", "2017-09-12 12:25:15.270008 7f4ff6dc9700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication", "2017-09-12 12:25:15.270015 7f4ff6dc9700  0 librados: client.admin authentication error (95) Operation not supported", "Error connecting to cluster: Error"], "stdout": "", "stdout_lines": []}
(undercloud) [stack@underclou

Comment 8 John Fulton 2017-09-12 17:48:42 UTC
I think this is fixed by the following. 

https://github.com/ceph/ceph-ansible/commit/a57f61efd9548bac17a8dbaaba47c14fba15a02c

You also need an updated version of THT which defaults the copy_admin_key: false

Perhaps we can verify that fixes the issue?

Comment 9 Alexander Chuzhoy 2017-09-12 19:14:33 UTC
The original issue is resolved.
The one reported in comment #7 is  https://bugzilla.redhat.com/show_bug.cgi?id=1469426

Comment 10 Alexander Chuzhoy 2017-09-12 19:14:56 UTC
switching back to verified.

Comment 15 errata-xmlrpc 2017-12-13 21:49:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.