Bug 1586155 - [mixed versions] compat installation overcloud deployment failed WorkflowTasks_Step2_Execution
Summary: [mixed versions] compat installation overcloud deployment failed WorkflowTask...
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
Target Milestone: z3
: 12.0 (Pike)
Assignee: Giulio Fidente
QA Contact: Ronnie Rasouli
Keywords: Triaged, ZStream
Depends On:
Blocks: 1593774
TreeView+ depends on / blocked
Reported: 2018-06-05 15:22 UTC by Ronnie Rasouli
Modified: 2018-08-20 13:02 UTC (History)
10 users (show)

OpenStack Director 13 can now successfully deploy an overcloud together with Ceph, using OpenStack 12 templates.

Prior to this update, Ceph deployment would fail during overcloud deployment step 2 because OpenStack Director failed to set the correct version of Ceph. Now OpenStack Director 12 templates always deploy the Ceph Jewel release.
Clone Of:
: 1593774 (view as bug list)
Last Closed: 2018-08-20 13:01:30 UTC

Attachments (Terms of Use)
ceph workflow log (1.32 MB, text/plain)
2018-06-05 15:22 UTC, Ronnie Rasouli
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2331 None None None 2018-08-20 13:02 UTC
OpenStack gerrit 575789 None stable/pike: NEW tripleo-heat-templates: Set Ceph pools rule_name to replicated_rule (I275c1ca53ea79eea607cbbb58aa21cae6d6be80b) 2018-06-18 12:03 UTC
Launchpad 1776252 None None None 2018-06-11 16:55 UTC

Description Ronnie Rasouli 2018-06-05 15:22:18 UTC
Created attachment 1447906 [details]
ceph workflow log

Description of problem:

The overcloud deployment failed on compat isntallation of RHOS12

CREATE_FAILED  Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud CREATE_FAILED

  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: f1ac1f2e-33cc-4d1f-b10f-2c1109faf499
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR

Looking at /var/log/mistral/ceph-install-workflow.log

2018-06-05 05:41:03,455 p=4017 u=mistral |  failed: [ ->] (item=[{u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, {'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'volumes'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller-1', u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'volumes', u'size'], u'end': u'2018-06-05 09:41:00.754895', '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'', 'ansible_host': u''}, '_ansible_item_result': True, u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout': u'', u'start': u'2018-06-05 09:41:00.514376', u'delta': u'0:00:00.240519', 'item': {u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, u'rc': 2, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u"Error ENOENT: unrecognized pool 'volumes'", '_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "create", "volumes", "32", "32", "replicated_rule", "1"], "delta": "0:00:00.258824", "end": "2018-06-05 09:41:03.366452", "item": [{"name": "volumes", "pg_num": 32, "rule_name": ""}, {"_ansible_delegated_vars": {"ansible_delegated_host": "", "ansible_host": ""}, "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "get", "volumes", "size"], "delta": "0:00:00.240519", "end": "2018-06-05 09:41:00.754895", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"name": "volumes", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:00.514376", "stderr": "Error ENOENT: unrecognized pool 'volumes'", "stderr_lines": ["Error ENOENT: unrecognized pool 'volumes'"], "stdout": "", "stdout_lines": []}], "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:03.107628", "stderr": "Error ENOENT: specified ruleset replicated_rule doesn't exist", "stderr_lines": ["Error ENOENT: specified ruleset replicated_rule doesn't exist"], "stdout": "", "stdout_lines": []}

Version-Release number of selected component (if applicable):

How reproducible:


Steps to Reproduce:
1. install the undercloud
2. prepare the THT for compat package
3. deploy RHOS12 containers with CEPH

Actual results:
overcloud deployment hangs out

Expected results:
successful deployment

Additional info:

Comment 2 Ronnie Rasouli 2018-06-10 07:20:04 UTC
Although Ceph-ansible installed the issue persist with similar log errors as attached

Comment 3 Ronnie Rasouli 2018-06-10 07:20:50 UTC
rpm -qa | grep ceph

Comment 4 Ronnie Rasouli 2018-06-10 11:14:14 UTC
noticed on audit.log the following errors

type=AVC msg=audit(1528619736.855:639): avc:  denied  { read } for  pid=13752 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file
type=AVC msg=audit(1528619739.102:640): avc:  denied  { read } for  pid=13861 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file

Comment 12 Giulio Fidente 2018-06-11 16:49:56 UTC
Looks like to deploy successfully Jewel with ceph-ansible 3.1 branch we need to explicitly set the right value for "rule_name" in openstack_pools.

Leaving it empty works for Jewel when using ceph-ansible 3.0 and for Luminous when using ceph-ansible 3.1, but not for Jewel when deploying via ceph-ansible 3.1

Thanks Guillaume for clarifying the issue!

Comment 25 Ronnie Rasouli 2018-07-19 11:44:31 UTC
retested by compat installation job with latest puddle, all tests has passed

Comment 33 errata-xmlrpc 2018-08-20 13:01:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.