Bug 1586155

Summary: [mixed versions] compat installation overcloud deployment failed WorkflowTasks_Step2_Execution
Product: Red Hat OpenStack Reporter: Ronnie Rasouli <rrasouli>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: gfidente, jamsmith, johfulto, markmc, mburns, mschuppe, pgrist, rrasouli, slinaber, yrabl
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.12-2.el7ost Doc Type: Bug Fix
Doc Text:
OpenStack Director 13 can now successfully deploy an overcloud together with Ceph, using OpenStack 12 templates. Prior to this update, Ceph deployment would fail during overcloud deployment step 2 because OpenStack Director failed to set the correct version of Ceph. Now OpenStack Director 12 templates always deploy the Ceph Jewel release.
Story Points: ---
Clone Of:
: 1593774 (view as bug list) Environment:
Last Closed: 2018-08-20 13:01:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1593774    
Attachments:
Description Flags
ceph workflow log none

Description Ronnie Rasouli 2018-06-05 15:22:18 UTC
Created attachment 1447906 [details]
ceph workflow log

Description of problem:

The overcloud deployment failed on compat isntallation of RHOS12

CREATE_FAILED  Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: f1ac1f2e-33cc-4d1f-b10f-2c1109faf499
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR

Looking at /var/log/mistral/ceph-install-workflow.log


2018-06-05 05:41:03,455 p=4017 u=mistral |  failed: [192.168.24.15 -> 192.168.24.19] (item=[{u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, {'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'volumes'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller-1', u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'volumes', u'size'], u'end': u'2018-06-05 09:41:00.754895', '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'192.168.24.19', 'ansible_host': u'192.168.24.19'}, '_ansible_item_result': True, u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout': u'', u'start': u'2018-06-05 09:41:00.514376', u'delta': u'0:00:00.240519', 'item': {u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, u'rc': 2, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u"Error ENOENT: unrecognized pool 'volumes'", '_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "create", "volumes", "32", "32", "replicated_rule", "1"], "delta": "0:00:00.258824", "end": "2018-06-05 09:41:03.366452", "item": [{"name": "volumes", "pg_num": 32, "rule_name": ""}, {"_ansible_delegated_vars": {"ansible_delegated_host": "192.168.24.19", "ansible_host": "192.168.24.19"}, "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "get", "volumes", "size"], "delta": "0:00:00.240519", "end": "2018-06-05 09:41:00.754895", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"name": "volumes", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:00.514376", "stderr": "Error ENOENT: unrecognized pool 'volumes'", "stderr_lines": ["Error ENOENT: unrecognized pool 'volumes'"], "stdout": "", "stdout_lines": []}], "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:03.107628", "stderr": "Error ENOENT: specified ruleset replicated_rule doesn't exist", "stderr_lines": ["Error ENOENT: specified ruleset replicated_rule doesn't exist"], "stdout": "", "stdout_lines": []}

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-compat-7.0.9-8.1.el7ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. install the undercloud
2. prepare the THT for compat package
3. deploy RHOS12 containers with CEPH

Actual results:
overcloud deployment hangs out

Expected results:
successful deployment

Additional info:

Comment 2 Ronnie Rasouli 2018-06-10 07:20:04 UTC
Although Ceph-ansible installed the issue persist with similar log errors as attached

Comment 3 Ronnie Rasouli 2018-06-10 07:20:50 UTC
rpm -qa | grep ceph
ceph-ansible-3.1.0-0.1.rc6.el7cp.noarch
puppet-ceph-2.5.0-1.el7ost.noarch

Comment 4 Ronnie Rasouli 2018-06-10 11:14:14 UTC
noticed on audit.log the following errors

type=AVC msg=audit(1528619736.855:639): avc:  denied  { read } for  pid=13752 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file
type=AVC msg=audit(1528619739.102:640): avc:  denied  { read } for  pid=13861 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file

Comment 12 Giulio Fidente 2018-06-11 16:49:56 UTC
Looks like to deploy successfully Jewel with ceph-ansible 3.1 branch we need to explicitly set the right value for "rule_name" in openstack_pools.

Leaving it empty works for Jewel when using ceph-ansible 3.0 and for Luminous when using ceph-ansible 3.1, but not for Jewel when deploying via ceph-ansible 3.1

Thanks Guillaume for clarifying the issue!

Comment 25 Ronnie Rasouli 2018-07-19 11:44:31 UTC
retested by compat installation job with latest puddle, all tests has passed

Comment 33 errata-xmlrpc 2018-08-20 13:01:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331