Bug 1586155 - [mixed versions] compat installation overcloud deployment failed WorkflowTasks_Step2_Execution
Summary: [mixed versions] compat installation overcloud deployment failed WorkflowTask...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 12.0 (Pike)
Assignee: Giulio Fidente
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Keywords: Triaged, ZStream
Depends On:
Blocks: 1593774
TreeView+ depends on / blocked
 
Reported: 2018-06-05 15:22 UTC by Ronnie Rasouli
Modified: 2018-08-20 13:02 UTC (History)
10 users (show)

(edit)
OpenStack Director 13 can now successfully deploy an overcloud together with Ceph, using OpenStack 12 templates.

Prior to this update, Ceph deployment would fail during overcloud deployment step 2 because OpenStack Director failed to set the correct version of Ceph. Now OpenStack Director 12 templates always deploy the Ceph Jewel release.
Clone Of:
: 1593774 (view as bug list)
(edit)
Last Closed: 2018-08-20 13:01:30 UTC


Attachments (Terms of Use)
ceph workflow log (1.32 MB, text/plain)
2018-06-05 15:22 UTC, Ronnie Rasouli
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2331 None None None 2018-08-20 13:02 UTC
OpenStack gerrit 575789 None stable/pike: NEW tripleo-heat-templates: Set Ceph pools rule_name to replicated_rule (I275c1ca53ea79eea607cbbb58aa21cae6d6be80b) 2018-06-18 12:03 UTC
Launchpad 1776252 None None None 2018-06-11 16:55 UTC

Description Ronnie Rasouli 2018-06-05 15:22:18 UTC
Created attachment 1447906 [details]
ceph workflow log

Description of problem:

The overcloud deployment failed on compat isntallation of RHOS12

CREATE_FAILED  Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: f1ac1f2e-33cc-4d1f-b10f-2c1109faf499
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR

Looking at /var/log/mistral/ceph-install-workflow.log


2018-06-05 05:41:03,455 p=4017 u=mistral |  failed: [192.168.24.15 -> 192.168.24.19] (item=[{u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, {'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'volumes'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller-1', u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'volumes', u'size'], u'end': u'2018-06-05 09:41:00.754895', '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'192.168.24.19', 'ansible_host': u'192.168.24.19'}, '_ansible_item_result': True, u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout': u'', u'start': u'2018-06-05 09:41:00.514376', u'delta': u'0:00:00.240519', 'item': {u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, u'rc': 2, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u"Error ENOENT: unrecognized pool 'volumes'", '_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "create", "volumes", "32", "32", "replicated_rule", "1"], "delta": "0:00:00.258824", "end": "2018-06-05 09:41:03.366452", "item": [{"name": "volumes", "pg_num": 32, "rule_name": ""}, {"_ansible_delegated_vars": {"ansible_delegated_host": "192.168.24.19", "ansible_host": "192.168.24.19"}, "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "get", "volumes", "size"], "delta": "0:00:00.240519", "end": "2018-06-05 09:41:00.754895", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"name": "volumes", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:00.514376", "stderr": "Error ENOENT: unrecognized pool 'volumes'", "stderr_lines": ["Error ENOENT: unrecognized pool 'volumes'"], "stdout": "", "stdout_lines": []}], "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:03.107628", "stderr": "Error ENOENT: specified ruleset replicated_rule doesn't exist", "stderr_lines": ["Error ENOENT: specified ruleset replicated_rule doesn't exist"], "stdout": "", "stdout_lines": []}

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-compat-7.0.9-8.1.el7ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. install the undercloud
2. prepare the THT for compat package
3. deploy RHOS12 containers with CEPH

Actual results:
overcloud deployment hangs out

Expected results:
successful deployment

Additional info:

Comment 2 Ronnie Rasouli 2018-06-10 07:20:04 UTC
Although Ceph-ansible installed the issue persist with similar log errors as attached

Comment 3 Ronnie Rasouli 2018-06-10 07:20:50 UTC
rpm -qa | grep ceph
ceph-ansible-3.1.0-0.1.rc6.el7cp.noarch
puppet-ceph-2.5.0-1.el7ost.noarch

Comment 4 Ronnie Rasouli 2018-06-10 11:14:14 UTC
noticed on audit.log the following errors

type=AVC msg=audit(1528619736.855:639): avc:  denied  { read } for  pid=13752 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file
type=AVC msg=audit(1528619739.102:640): avc:  denied  { read } for  pid=13861 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file

Comment 12 Giulio Fidente 2018-06-11 16:49:56 UTC
Looks like to deploy successfully Jewel with ceph-ansible 3.1 branch we need to explicitly set the right value for "rule_name" in openstack_pools.

Leaving it empty works for Jewel when using ceph-ansible 3.0 and for Luminous when using ceph-ansible 3.1, but not for Jewel when deploying via ceph-ansible 3.1

Thanks Guillaume for clarifying the issue!

Comment 25 Ronnie Rasouli 2018-07-19 11:44:31 UTC
retested by compat installation job with latest puddle, all tests has passed

Comment 33 errata-xmlrpc 2018-08-20 13:01:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331


Note You need to log in before you can comment on or make changes to this bug.