This bug was initially created as a copy of Bug #1420861 I am copying this bug because: We have several customers willing to use this feature in OSP13 (Long Life). Since this feature mainly depends on ceph-ansible and the version shipped in OSP13 is supporting it, the RFE should be testonly. This RFE Bugzilla has been created in order to track decisions and developments relating to a request for the ability to deploy second tier Ceph storage for OpenStack Platform, through OSP director.
The necessary code changes for puppet-tripleo and tripleo-heat-templates landed in OSP13 already, as per BZ #1309550 Tiering of the Ceph pools can also be configured after the overcloud deployment using device classes; for example, assuming operators use the tripleo parameter "CinderRbdExtraPools", as per [1] to create an additional "tier2" pool, it can be later assigned to a specific (ssd) device classe with: # ceph osd crush rule create-replicated faster default host ssd # ceph osd pool set tier2 crush_rule faster I think the only piece missing for OSP13 would be to backport the docs; what was added in OSP14 via BZ #1654792 should be added in OSP13 docs as well, docs change tracked by BZ #1671061 1. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/index#proc_ceph-configuring-block-storage-use-new-pool-assembly_ceph-second-tier-storage
According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-18.el7ost. This build is available now.
According to our records, this should be resolved by puppet-tripleo-8.4.1-5.el7ost. This build is available now.
The deployment failed with the error: 2019-06-20 15:27:16,798 p=25230 u=mistral | failed: [192.168.24.6 -> 192.168.24.23] (item=[{u'application': u'rbd', u'pg_num': 64, u'name': u'vms', u'rule_na me': u'standard'}, {'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'vms'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller-0' , u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'vms', u'size'], u'end': u'2019-06-20 19:27:12.114155', '_ansible_no_log': False, '_ansible_delega ted_vars': {'ansible_delegated_host': u'192.168.24.23', 'ansible_host': u'192.168.24.23'}, '_ansible_item_result': True, u'changed': True, u'invocation': {u'm odule_args': {u'creates': None, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool g et vms size', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin': None}}, u'stdout': u'', 'item': {u'application': u'rbd', u'pg_num': 64 , u'name': u'vms', u'rule_name': u'standard'}, u'delta': u'0:00:00.442574', '_ansible_item_label': {u'application': u'rbd', u'pg_num': 64, u'name': u'vms', u' rule_name': u'standard'}, u'stderr': u"Error ENOENT: unrecognized pool 'vms'", u'rc': 2, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_res ult': False, u'start': u'2019-06-20 19:27:11.671581', '_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "cep h-mon-controller-0", "ceph", "--cluster", "ceph", "osd", "pool", "create", "vms", "64", "64", "standard", "1"], "delta": "0:00:00.423014", "end": "2019-06-20 19:27:16.752713", "item": [{"application": "rbd", "name": "vms", "pg_num": 64, "rule_name": "standard"}, {"_ansible_delegated_vars": {"ansible_delegated_host" : "192.168.24.23", "ansible_host": "192.168.24.23"}, "_ansible_ignore_errors": null, "_ansible_item_label": {"application": "rbd", "name": "vms", "pg_num": 64 , "rule_name": "standard"}, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph- mon-controller-0", "ceph", "--cluster", "ceph", "osd", "pool", "get", "vms", "size"], "delta": "0:00:00.442574", "end": "2019-06-20 19:27:12.114155", "failed" : false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool get vms si ze", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"applica tion": "rbd", "name": "vms", "pg_num": 64, "rule_name": "standard"}, "msg": "non-zero return code", "rc": 2, "start": "2019-06-20 19:27:11.671581", "stderr": "Error ENOENT: unrecognized pool 'vms'", "stderr_lines": ["Error ENOENT: unrecognized pool 'vms'"], "stdout": "", "stdout_lines": []}], "msg": "non-zero retur n code", "rc": 2, "start": "2019-06-20 19:27:16.329699", "stderr": "Error ENOENT: specified rule standard doesn't exist", "stderr_lines": ["Error ENOENT: spec ified rule standard doesn't exist"], "stdout": "", "stdout_lines": []}
The crush map configuration is CephAnsibleExtraConfig: create_crush_tree: true crush_rules: - name: standard root: standard_root type: rack default: true - name: fast root: fast_root type: rack default: false CephPools: - name: tier2 pg_num: 64 rule_name: fast application: rbd - name: volumes pg_num: 64 rule_name: standard application: rbd - name: vms pg_num: 64 rule_name: standard application: rbd - name: backups pg_num: 64 rule_name: standard application: rbd - name: images pg_num: 64 rule_name: standard application: rbd - name: metrics pg_num: 64 rule_name: standard application: openstack_gnocchi
Can you ceph osd dump, the pools are created ? Can you paste your NodeDataLookup param ?
1. Heat environment input: http://ix.io/1Mqz 2. TripleO genereated inventory: http://ix.io/1Mr0 The NodeDataLookup param was quoted instead of being passed as JSON as per #1 Looks like it was translated into the inventory as per #2. Also, you can see the crush_rules in the inventory in #1. No pools were created. The deployment failed because it tried to create the pool with a rule that didn't yet exist. E.g. here I'm re-running the tasks from the ansible log which failed: [root@controller-0 ~]# docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool create tier2 64 64 fast 1 Error ENOENT: specified rule fast doesn't exist [root@controller-0 ~]# docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool get tier2 size Error ENOENT: unrecognized pool 'tier2' [root@controller-0 ~]# No fast rule: [root@controller-0 ~]# ceph osd crush rule ls replicated_rule [root@controller-0 ~]# So why wasn't the rule created? Deployment was run using ceph-ansible 3.2.15 (with the 3-18 ceph container (not the latest)) and here's the tasks that would create the crush rule: https://github.com/ceph/ceph-ansible/blob/v3.2.15/roles/ceph-mon/tasks/crush_rules.yml None of these tasks ran according to the ceph-ansible logs: [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "configure crush hierarchy" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "create configured crush rules" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "get id for new default crush rule" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "set_fact info_ceph_default_crush_rule_yaml" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "insert new default crush rule into daemon to prevent restart" [root@undercloud-0 mistral]# We do see the crush_rules.yml was included but its tasks were skipped: http://ix.io/1Mrh So we need to see what condition failed and ask if it's because we didn't pass something we should have (user error) or if it's a bug in ceph-ansible.
(In reply to Gregory Charot from comment #11) > Can you ceph osd dump, the pools are created ? [root@controller-0 ~]# ceph osd dump | curl -F 'f:1=<-' ix.io http://ix.io/1MrN [root@controller-0 ~]# no pools were created [root@controller-0 ~]# ceph df | curl -F 'f:1=<-' ix.io http://ix.io/1MrP [root@controller-0 ~]# > Can you paste your NodeDataLookup param ? Heat environment input: http://ix.io/1Mqz I think this input should be pure JSON, not single-quoted JSON
(In reply to John Fulton from comment #12) > So why wasn't the rule created? ... > We do see the crush_rules.yml was included but its tasks were skipped: > http://ix.io/1Mrh > > So we need to see what condition failed and ask if it's because we didn't > pass something we should have (user error) or if it's a bug in ceph-ansible. https://github.com/ceph/ceph-ansible/blob/v3.2.15/roles/ceph-mon/tasks/main.yml#L35 So the THT from comment #10 should have had: crush_rules_config: true
very true thanks for spotting that!
Yogev, I restored your original environment files but added the following under CephAnsibleExtraConfig: crush_rules_config: true I then deleted your overcloud and redeployed and it finished deploying Ceph [1]. I'm setting this bug back to ON_QA so that you may test with the additional 'crush_rules_config: true' parameter. Note that I used your original string for NodeDataLookup. Thanks to Giulio for spotting the missing crush_rules_config. [1] [root@controller-0 ~]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 330GiB 300GiB 30.3GiB 9.20 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS tier2 1 0B 0 62.9GiB 0 metrics 2 0B 0 31.5GiB 0 volumes 3 0B 0 31.5GiB 0 images 4 0B 0 31.5GiB 0 backups 5 0B 0 31.5GiB 0 vms 6 0B 0 31.5GiB 0 [root@controller-0 ~]# ceph osd crush rule ls replicated_rule standard fast [root@controller-0 ~]#
Verified that its working with the following configuration: CephAnsibleExtraConfig: create_crush_tree: true crush_rule_config: true crush_rules: - name: standard root: standard_root type: rack default: true - name: fast root: fast_root type: rack default: false CephPools: - name: tier2 pg_num: 64 rule_name: fast application: rbd - name: volumes pg_num: 64 rule_name: standard application: rbd - name: vms pg_num: 64 rule_name: standard application: rbd - name: backups pg_num: 64 rule_name: standard application: rbd - name: images pg_num: 64 rule_name: standard application: rbd - name: metrics pg_num: 64 rule_name: standard application: openstack_gnocchi CephAnsibleDisksConfig: devices: - '/dev/vdb' - '/dev/vdc' - '/dev/vdd' - '/dev/vde' - '/dev/vdf' osd_scenario: lvm osd_objectstore: bluestore journal_size: 512 NodeDataLookup: '{"d336f6d2-60b7-4a50-82d0-2e43c30e47e8": {"osd_crush_location": {"root": "standard_root", "rack": "rack1_std", "host": "ceph-0"}},"6b17e3d9-f3d1-4888-8687-ad98d77cb44f": {"osd_crush_location": {"root": "standard_root", "rack": "rack2_std", "host": "ceph-1"}},"c9c3dd3e-0980-4994-95fa-6478e87f5752": {"osd_crush_location": {"root": "fast_root", "rack": "rack3_std", "host": "ceph-2"}},"58f926d8-5d97-4051-9f31-e76c6b435255": {"osd_crush_location": {"root": "fast_root", "rack": "rack1_fast", "host": "ceph-3"}},"fa18be32-e9e5-4bb2-ac83-4c83c497b9b2": {"osd_crush_location": {"root": "fast_root", "rack": "rack2_fast", "host": "ceph-4"}},"0dee4a82-64cb-41cd-939e-39b784317cab": {"osd_crush_location": {"root": "fast_root", "rack": "rack3_fast", "host": "ceph-5"}}}' CinderRbdExtraPools: - tier2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1738