Description of problem: encountered a bug where the with using root with both ssd and hdd device-class used the restful api command fails with python traceback: root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily id -21 class ssd # do not change unnecessarily # weight 0.363 alg straw2 hash 0 # rjenkins1 item osds-0 weight 0.058 item osds-1 weight 0.058 item osds-5 weight 0.058 item osds-4 weight 0.057 item osds-2 weight 0.072 item osds-3 weight 0.058 } # curl -k --noproxy '*' -u admin:a57347d0-f56f-4d1f-8b29-62c296a751ac https://mons-2:8003/osd <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <title>500 Internal Server Error</title> <h1>Internal Server Error</h1> <p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p> 2018-06-25 05:59:43.894543 7f2766c5a700 0 mgr[restful] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/pecan/core.py", line 678, in __call__ self.invoke_controller(controller, args, kwargs, state) File "/usr/lib/python2.7/site-packages/pecan/core.py", line 569, in invoke_controller result = controller(*args, **kwargs) File "/usr/lib64/ceph/mgr/restful/decorators.py", line 33, in decorated return f(*args, **kwargs) File "/usr/lib64/ceph/mgr/restful/api/osd.py", line 130, in get return module.instance.get_osds(pool_id) File "/usr/lib64/ceph/mgr/restful/module.py", line 543, in get_osds pools_map = self.get_osd_pools() File "/usr/lib64/ceph/mgr/restful/module.py", line 516, in get_osd_pools pool_osds = common.crush_rule_osds(self.get('osd_map_tree')['nodes'], rule) File "/usr/lib64/ceph/mgr/restful/common.py", line 149, in crush_rule_osds osds |= _gather_osds(nodes_by_id[step['item']], rule['steps'][i + 1:]) KeyError: -2L Version-Release number of selected component (if applicable): ceph version 12.2.4-10.el7cp How reproducible: Always Steps to Reproduce: 1. deploy ceph with HDD and SSD based osd 2. have a crushmap with root bucket combined ssd and hdd device-class 3. set up a mgr restful plugin 4. curl https://<hostname>:8003/osd Actual results: Expected results: Additional info:
There is nothing special in the mgr.log even with debug_mgr=20 debug_mgrc=20 debug_ms=1, but I can upload it if needed. # ceph osd crush rule create-replicated cold default host hdd # ceph osd crush rule create-replicated hot default host ssd Crush map: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd device 4 osd.4 class hdd device 5 osd.5 class hdd device 6 osd.6 class ssd device 7 osd.7 class ssd device 8 osd.8 class ssd device 9 osd.9 class hdd device 10 osd.10 class hdd device 11 osd.11 class hdd device 12 osd.12 class ssd device 13 osd.13 class ssd device 14 osd.14 class ssd device 15 osd.15 class hdd device 16 osd.16 class hdd device 17 osd.17 class hdd device 18 osd.18 class hdd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host osds-0 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily id -15 class ssd # do not change unnecessarily # weight 0.058 alg straw2 hash 0 # rjenkins1 item osd.0 weight 0.019 item osd.3 weight 0.019 item osd.13 weight 0.019 } host osds-1 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily id -16 class ssd # do not change unnecessarily # weight 0.058 alg straw2 hash 0 # rjenkins1 item osd.1 weight 0.019 item osd.4 weight 0.019 item osd.12 weight 0.019 } host osds-5 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily id -17 class ssd # do not change unnecessarily # weight 0.058 alg straw2 hash 0 # rjenkins1 item osd.2 weight 0.019 item osd.5 weight 0.019 item osd.14 weight 0.019 } host osds-4 { id -9 # do not change unnecessarily id -10 class hdd # do not change unnecessarily id -18 class ssd # do not change unnecessarily # weight 0.057 alg straw2 hash 0 # rjenkins1 item osd.7 weight 0.018 item osd.16 weight 0.019 item osd.17 weight 0.019 } host osds-2 { id -11 # do not change unnecessarily id -12 class hdd # do not change unnecessarily id -19 class ssd # do not change unnecessarily # weight 0.072 alg straw2 hash 0 # rjenkins1 item osd.6 weight 0.018 item osd.8 weight 0.018 item osd.15 weight 0.019 item osd.18 weight 0.018 } host osds-3 { id -13 # do not change unnecessarily id -14 class hdd # do not change unnecessarily id -20 class ssd # do not change unnecessarily # weight 0.058 alg straw2 hash 0 # rjenkins1 item osd.9 weight 0.019 item osd.10 weight 0.019 item osd.11 weight 0.019 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily id -21 class ssd # do not change unnecessarily # weight 0.363 alg straw2 hash 0 # rjenkins1 item osds-0 weight 0.058 item osds-1 weight 0.058 item osds-5 weight 0.058 item osds-4 weight 0.057 item osds-2 weight 0.072 item osds-3 weight 0.058 } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule cold { id 1 type replicated min_size 1 max_size 10 step take default class hdd step chooseleaf firstn 0 type host step emit } rule hot { id 2 type replicated min_size 1 max_size 10 step take default class ssd step chooseleaf firstn 0 type host step emit } # end crush map [root@mons-1 ~]# ceph osd crush show-tunables { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "chooseleaf_stable": 1, "straw_calc_version": 1, "allowed_bucket_algs": 54, "profile": "jewel", "optimal_tunables": 1, "legacy_tunables": 0, "minimum_required_version": "jewel", "require_feature_tunables": 1, "require_feature_tunables2": 1, "has_v2_rules": 0, "require_feature_tunables3": 1, "has_v3_rules": 0, "has_v4_buckets": 1, "require_feature_tunables5": 1, "has_v5_rules": 0 }
This should be fixed by https://github.com/ceph/ceph/pull/21138 We can back-port it downstream for the next z-stream release.
FYI: Upstream luminous/mimic back-ports: https://github.com/ceph/ceph/pull/26199 https://github.com/ceph/ceph/pull/26200
Created attachment 1529157 [details] Mgr log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0475