Bug 1714227
| Summary: | [RFE][Backport] Add Support for a Second Ceph Storage Tier deployment capability through director | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Gregory Charot <gcharot> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Giulio Fidente <gfidente> |
| Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 13.0 (Queens) | CC: | dcadzow, johfulto, mburns, mgeary, sputhenp, yrabl |
| Target Milestone: | z7 | Keywords: | FeatureBackport, TestOnly, Triaged, ZStream |
| Target Release: | 13.0 (Queens) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-8.0.0-0.20180103192340.el7ost puppet-tripleo-8.1.1-0.20180102165828.el7ost | Doc Type: | Enhancement |
| Doc Text: |
This update adds support for a second ceph Storage Tier deployment capability through director.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-07-10 13:05:31 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1420861 | ||
| Bug Blocks: | 1671061 | ||
|
Description
Gregory Charot
2019-05-27 12:57:34 UTC
The necessary code changes for puppet-tripleo and tripleo-heat-templates landed in OSP13 already, as per BZ #1309550 Tiering of the Ceph pools can also be configured after the overcloud deployment using device classes; for example, assuming operators use the tripleo parameter "CinderRbdExtraPools", as per [1] to create an additional "tier2" pool, it can be later assigned to a specific (ssd) device classe with: # ceph osd crush rule create-replicated faster default host ssd # ceph osd pool set tier2 crush_rule faster I think the only piece missing for OSP13 would be to backport the docs; what was added in OSP14 via BZ #1654792 should be added in OSP13 docs as well, docs change tracked by BZ #1671061 1. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/deploying_an_overcloud_with_containerized_red_hat_ceph/index#proc_ceph-configuring-block-storage-use-new-pool-assembly_ceph-second-tier-storage According to our records, this should be resolved by openstack-tripleo-heat-templates-8.3.1-18.el7ost. This build is available now. According to our records, this should be resolved by puppet-tripleo-8.4.1-5.el7ost. This build is available now. The deployment failed with the error:
2019-06-20 15:27:16,798 p=25230 u=mistral | failed: [192.168.24.6 -> 192.168.24.23] (item=[{u'application': u'rbd', u'pg_num': 64, u'name': u'vms', u'rule_na
me': u'standard'}, {'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'vms'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller-0'
, u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'vms', u'size'], u'end': u'2019-06-20 19:27:12.114155', '_ansible_no_log': False, '_ansible_delega
ted_vars': {'ansible_delegated_host': u'192.168.24.23', 'ansible_host': u'192.168.24.23'}, '_ansible_item_result': True, u'changed': True, u'invocation': {u'm
odule_args': {u'creates': None, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool g
et vms size', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin': None}}, u'stdout': u'', 'item': {u'application': u'rbd', u'pg_num': 64
, u'name': u'vms', u'rule_name': u'standard'}, u'delta': u'0:00:00.442574', '_ansible_item_label': {u'application': u'rbd', u'pg_num': 64, u'name': u'vms', u'
rule_name': u'standard'}, u'stderr': u"Error ENOENT: unrecognized pool 'vms'", u'rc': 2, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_res
ult': False, u'start': u'2019-06-20 19:27:11.671581', '_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "cep
h-mon-controller-0", "ceph", "--cluster", "ceph", "osd", "pool", "create", "vms", "64", "64", "standard", "1"], "delta": "0:00:00.423014", "end": "2019-06-20
19:27:16.752713", "item": [{"application": "rbd", "name": "vms", "pg_num": 64, "rule_name": "standard"}, {"_ansible_delegated_vars": {"ansible_delegated_host"
: "192.168.24.23", "ansible_host": "192.168.24.23"}, "_ansible_ignore_errors": null, "_ansible_item_label": {"application": "rbd", "name": "vms", "pg_num": 64
, "rule_name": "standard"}, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-
mon-controller-0", "ceph", "--cluster", "ceph", "osd", "pool", "get", "vms", "size"], "delta": "0:00:00.442574", "end": "2019-06-20 19:27:12.114155", "failed"
: false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool get vms si
ze", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": {"applica
tion": "rbd", "name": "vms", "pg_num": 64, "rule_name": "standard"}, "msg": "non-zero return code", "rc": 2, "start": "2019-06-20 19:27:11.671581", "stderr":
"Error ENOENT: unrecognized pool 'vms'", "stderr_lines": ["Error ENOENT: unrecognized pool 'vms'"], "stdout": "", "stdout_lines": []}], "msg": "non-zero retur
n code", "rc": 2, "start": "2019-06-20 19:27:16.329699", "stderr": "Error ENOENT: specified rule standard doesn't exist", "stderr_lines": ["Error ENOENT: spec
ified rule standard doesn't exist"], "stdout": "", "stdout_lines": []}
The crush map configuration is
CephAnsibleExtraConfig:
create_crush_tree: true
crush_rules:
- name: standard
root: standard_root
type: rack
default: true
- name: fast
root: fast_root
type: rack
default: false
CephPools:
- name: tier2
pg_num: 64
rule_name: fast
application: rbd
- name: volumes
pg_num: 64
rule_name: standard
application: rbd
- name: vms
pg_num: 64
rule_name: standard
application: rbd
- name: backups
pg_num: 64
rule_name: standard
application: rbd
- name: images
pg_num: 64
rule_name: standard
application: rbd
- name: metrics
pg_num: 64
rule_name: standard
application: openstack_gnocchi
Can you ceph osd dump, the pools are created ? Can you paste your NodeDataLookup param ? 1. Heat environment input: http://ix.io/1Mqz 2. TripleO genereated inventory: http://ix.io/1Mr0 The NodeDataLookup param was quoted instead of being passed as JSON as per #1 Looks like it was translated into the inventory as per #2. Also, you can see the crush_rules in the inventory in #1. No pools were created. The deployment failed because it tried to create the pool with a rule that didn't yet exist. E.g. here I'm re-running the tasks from the ansible log which failed: [root@controller-0 ~]# docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool create tier2 64 64 fast 1 Error ENOENT: specified rule fast doesn't exist [root@controller-0 ~]# docker exec ceph-mon-controller-0 ceph --cluster ceph osd pool get tier2 size Error ENOENT: unrecognized pool 'tier2' [root@controller-0 ~]# No fast rule: [root@controller-0 ~]# ceph osd crush rule ls replicated_rule [root@controller-0 ~]# So why wasn't the rule created? Deployment was run using ceph-ansible 3.2.15 (with the 3-18 ceph container (not the latest)) and here's the tasks that would create the crush rule: https://github.com/ceph/ceph-ansible/blob/v3.2.15/roles/ceph-mon/tasks/crush_rules.yml None of these tasks ran according to the ceph-ansible logs: [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "configure crush hierarchy" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "create configured crush rules" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "get id for new default crush rule" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "set_fact info_ceph_default_crush_rule_yaml" [root@undercloud-0 mistral]# cat ceph-install-workflow.log | grep "insert new default crush rule into daemon to prevent restart" [root@undercloud-0 mistral]# We do see the crush_rules.yml was included but its tasks were skipped: http://ix.io/1Mrh So we need to see what condition failed and ask if it's because we didn't pass something we should have (user error) or if it's a bug in ceph-ansible. (In reply to Gregory Charot from comment #11) > Can you ceph osd dump, the pools are created ? [root@controller-0 ~]# ceph osd dump | curl -F 'f:1=<-' ix.io http://ix.io/1MrN [root@controller-0 ~]# no pools were created [root@controller-0 ~]# ceph df | curl -F 'f:1=<-' ix.io http://ix.io/1MrP [root@controller-0 ~]# > Can you paste your NodeDataLookup param ? Heat environment input: http://ix.io/1Mqz I think this input should be pure JSON, not single-quoted JSON (In reply to John Fulton from comment #12) > So why wasn't the rule created? ... > We do see the crush_rules.yml was included but its tasks were skipped: > http://ix.io/1Mrh > > So we need to see what condition failed and ask if it's because we didn't > pass something we should have (user error) or if it's a bug in ceph-ansible. https://github.com/ceph/ceph-ansible/blob/v3.2.15/roles/ceph-mon/tasks/main.yml#L35 So the THT from comment #10 should have had: crush_rules_config: true very true thanks for spotting that! Yogev,
I restored your original environment files but added the following under CephAnsibleExtraConfig:
crush_rules_config: true
I then deleted your overcloud and redeployed and it finished deploying Ceph [1].
I'm setting this bug back to ON_QA so that you may test with the additional 'crush_rules_config: true' parameter.
Note that I used your original string for NodeDataLookup. Thanks to Giulio for spotting the missing crush_rules_config.
[1]
[root@controller-0 ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
330GiB 300GiB 30.3GiB 9.20
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
tier2 1 0B 0 62.9GiB 0
metrics 2 0B 0 31.5GiB 0
volumes 3 0B 0 31.5GiB 0
images 4 0B 0 31.5GiB 0
backups 5 0B 0 31.5GiB 0
vms 6 0B 0 31.5GiB 0
[root@controller-0 ~]# ceph osd crush rule ls
replicated_rule
standard
fast
[root@controller-0 ~]#
Verified that its working with the following configuration:
CephAnsibleExtraConfig:
create_crush_tree: true
crush_rule_config: true
crush_rules:
- name: standard
root: standard_root
type: rack
default: true
- name: fast
root: fast_root
type: rack
default: false
CephPools:
- name: tier2
pg_num: 64
rule_name: fast
application: rbd
- name: volumes
pg_num: 64
rule_name: standard
application: rbd
- name: vms
pg_num: 64
rule_name: standard
application: rbd
- name: backups
pg_num: 64
rule_name: standard
application: rbd
- name: images
pg_num: 64
rule_name: standard
application: rbd
- name: metrics
pg_num: 64
rule_name: standard
application: openstack_gnocchi
CephAnsibleDisksConfig:
devices:
- '/dev/vdb'
- '/dev/vdc'
- '/dev/vdd'
- '/dev/vde'
- '/dev/vdf'
osd_scenario: lvm
osd_objectstore: bluestore
journal_size: 512
NodeDataLookup: '{"d336f6d2-60b7-4a50-82d0-2e43c30e47e8": {"osd_crush_location": {"root": "standard_root", "rack": "rack1_std", "host": "ceph-0"}},"6b17e3d9-f3d1-4888-8687-ad98d77cb44f": {"osd_crush_location": {"root": "standard_root", "rack": "rack2_std", "host": "ceph-1"}},"c9c3dd3e-0980-4994-95fa-6478e87f5752": {"osd_crush_location": {"root": "fast_root", "rack": "rack3_std", "host": "ceph-2"}},"58f926d8-5d97-4051-9f31-e76c6b435255": {"osd_crush_location": {"root": "fast_root", "rack": "rack1_fast", "host": "ceph-3"}},"fa18be32-e9e5-4bb2-ac83-4c83c497b9b2": {"osd_crush_location": {"root": "fast_root", "rack": "rack2_fast", "host": "ceph-4"}},"0dee4a82-64cb-41cd-939e-39b784317cab": {"osd_crush_location": {"root": "fast_root", "rack": "rack3_fast", "host": "ceph-5"}}}'
CinderRbdExtraPools:
- tier2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1738 |