Created attachment 1447906 [details] ceph workflow log Description of problem: The overcloud deployment failed on compat isntallation of RHOS12 CREATE_FAILED Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: f1ac1f2e-33cc-4d1f-b10f-2c1109faf499 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR Looking at /var/log/mistral/ceph-install-workflow.log 2018-06-05 05:41:03,455 p=4017 u=mistral | failed: [192.168.24.15 -> 192.168.24.19] (item=[{u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, {'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'volumes'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller-1', u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'volumes', u'size'], u'end': u'2018-06-05 09:41:00.754895', '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'192.168.24.19', 'ansible_host': u'192.168.24.19'}, '_ansible_item_result': True, u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout': u'', u'start': u'2018-06-05 09:41:00.514376', u'delta': u'0:00:00.240519', 'item': {u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}, u'rc': 2, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u"Error ENOENT: unrecognized pool 'volumes'", '_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "create", "volumes", "32", "32", "replicated_rule", "1"], "delta": "0:00:00.258824", "end": "2018-06-05 09:41:03.366452", "item": [{"name": "volumes", "pg_num": 32, "rule_name": ""}, {"_ansible_delegated_vars": {"ansible_delegated_host": "192.168.24.19", "ansible_host": "192.168.24.19"}, "_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-1", "ceph", "--cluster", "ceph", "osd", "pool", "get", "volumes", "size"], "delta": "0:00:00.240519", "end": "2018-06-05 09:41:00.754895", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller-1 ceph --cluster ceph osd pool get volumes size", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"name": "volumes", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:00.514376", "stderr": "Error ENOENT: unrecognized pool 'volumes'", "stderr_lines": ["Error ENOENT: unrecognized pool 'volumes'"], "stdout": "", "stdout_lines": []}], "msg": "non-zero return code", "rc": 2, "start": "2018-06-05 09:41:03.107628", "stderr": "Error ENOENT: specified ruleset replicated_rule doesn't exist", "stderr_lines": ["Error ENOENT: specified ruleset replicated_rule doesn't exist"], "stdout": "", "stdout_lines": []} Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-compat-7.0.9-8.1.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. install the undercloud 2. prepare the THT for compat package 3. deploy RHOS12 containers with CEPH Actual results: overcloud deployment hangs out Expected results: successful deployment Additional info:
Although Ceph-ansible installed the issue persist with similar log errors as attached
rpm -qa | grep ceph ceph-ansible-3.1.0-0.1.rc6.el7cp.noarch puppet-ceph-2.5.0-1.el7ost.noarch
noticed on audit.log the following errors type=AVC msg=audit(1528619736.855:639): avc: denied { read } for pid=13752 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file type=AVC msg=audit(1528619739.102:640): avc: denied { read } for pid=13861 comm="inet_gethost" name="unix" dev="proc" ino=4026532003 scontext=system_u:system_r:rabbitmq_t:s0 tcontext=system_u:object_r:proc_net_t:s0 tclass=file
Looks like to deploy successfully Jewel with ceph-ansible 3.1 branch we need to explicitly set the right value for "rule_name" in openstack_pools. Leaving it empty works for Jewel when using ceph-ansible 3.0 and for Luminous when using ceph-ansible 3.1, but not for Jewel when deploying via ceph-ansible 3.1 Thanks Guillaume for clarifying the issue!
retested by compat installation job with latest puddle, all tests has passed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2331