Bug 1594746

Summary: [SEE/SD][restful-api] curl https://<hostname>:8003/osd fails with python traceback when crushmap uses root with combined device-class
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tomas Petr <tpetr>
Component: Ceph-Mgr PluginsAssignee: Boris Ranto <branto>
Status: CLOSED ERRATA QA Contact: Parikshith <pbyregow>
Severity: medium Docs Contact: Erin Donnelly <edonnell>
Priority: medium    
Version: 3.0CC: assingh, branto, ceph-eng-bugs, ceph-qe-bugs, edonnell, kdreyer, mkasturi, tchandra, tserlin, vimishra, ykaul
Target Milestone: z1   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-77.el7cp Ubuntu: ceph_12.2.8-62redhat1 Doc Type: Bug Fix
Doc Text:
.HDD and SSD devices can now be mixed when accessing the `/osd` endpoint Previously, the {product} RESTful API did not handle when HDD and SSD devices were mixed when accessing the `/osd` endpoint and returned an error. With this update, the OSD traversal algorithm has been improved to handle this scenario as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-07 15:50:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    
Attachments:
Description Flags
Mgr log none

Description Tomas Petr 2018-06-25 10:21:24 UTC
Description of problem:
encountered a bug where the with using root with both ssd and hdd device-class used the restful api command fails with python traceback:

root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        id -21 class ssd                # do not change unnecessarily
        # weight 0.363
        alg straw2
        hash 0  # rjenkins1
        item osds-0 weight 0.058
        item osds-1 weight 0.058
        item osds-5 weight 0.058
        item osds-4 weight 0.057
        item osds-2 weight 0.072
        item osds-3 weight 0.058
}

# curl -k --noproxy '*' -u admin:a57347d0-f56f-4d1f-8b29-62c296a751ac https://mons-2:8003/osd
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.</p>


2018-06-25 05:59:43.894543 7f2766c5a700  0 mgr[restful] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pecan/core.py", line 678, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/usr/lib/python2.7/site-packages/pecan/core.py", line 569, in invoke_controller
    result = controller(*args, **kwargs)
  File "/usr/lib64/ceph/mgr/restful/decorators.py", line 33, in decorated
    return f(*args, **kwargs)
  File "/usr/lib64/ceph/mgr/restful/api/osd.py", line 130, in get
    return module.instance.get_osds(pool_id)
  File "/usr/lib64/ceph/mgr/restful/module.py", line 543, in get_osds
    pools_map = self.get_osd_pools()
  File "/usr/lib64/ceph/mgr/restful/module.py", line 516, in get_osd_pools
    pool_osds = common.crush_rule_osds(self.get('osd_map_tree')['nodes'], rule)
  File "/usr/lib64/ceph/mgr/restful/common.py", line 149, in crush_rule_osds
    osds |= _gather_osds(nodes_by_id[step['item']], rule['steps'][i + 1:])
KeyError: -2L


Version-Release number of selected component (if applicable):
ceph version 12.2.4-10.el7cp

How reproducible:
Always

Steps to Reproduce:
1. deploy ceph with HDD and SSD based osd
2. have a crushmap  with root bucket combined ssd and hdd device-class
3. set up a mgr restful plugin
4. curl https://<hostname>:8003/osd 

Actual results:


Expected results:


Additional info:

Comment 3 Tomas Petr 2018-06-25 10:27:55 UTC
There is nothing special in the mgr.log even with debug_mgr=20 debug_mgrc=20 debug_ms=1, but I can upload it if needed.

# ceph osd crush rule create-replicated cold default host hdd
# ceph osd crush rule create-replicated hot default host ssd

Crush map:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host osds-0 {
	id -3		# do not change unnecessarily
	id -4 class hdd		# do not change unnecessarily
	id -15 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.0 weight 0.019
	item osd.3 weight 0.019
	item osd.13 weight 0.019
}
host osds-1 {
	id -5		# do not change unnecessarily
	id -6 class hdd		# do not change unnecessarily
	id -16 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.1 weight 0.019
	item osd.4 weight 0.019
	item osd.12 weight 0.019
}
host osds-5 {
	id -7		# do not change unnecessarily
	id -8 class hdd		# do not change unnecessarily
	id -17 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.2 weight 0.019
	item osd.5 weight 0.019
	item osd.14 weight 0.019
}
host osds-4 {
	id -9		# do not change unnecessarily
	id -10 class hdd		# do not change unnecessarily
	id -18 class ssd		# do not change unnecessarily
	# weight 0.057
	alg straw2
	hash 0	# rjenkins1
	item osd.7 weight 0.018
	item osd.16 weight 0.019
	item osd.17 weight 0.019
}
host osds-2 {
	id -11		# do not change unnecessarily
	id -12 class hdd		# do not change unnecessarily
	id -19 class ssd		# do not change unnecessarily
	# weight 0.072
	alg straw2
	hash 0	# rjenkins1
	item osd.6 weight 0.018
	item osd.8 weight 0.018
	item osd.15 weight 0.019
	item osd.18 weight 0.018
}
host osds-3 {
	id -13		# do not change unnecessarily
	id -14 class hdd		# do not change unnecessarily
	id -20 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.9 weight 0.019
	item osd.10 weight 0.019
	item osd.11 weight 0.019
}
root default {
	id -1		# do not change unnecessarily
	id -2 class hdd		# do not change unnecessarily
	id -21 class ssd		# do not change unnecessarily
	# weight 0.363
	alg straw2
	hash 0	# rjenkins1
	item osds-0 weight 0.058
	item osds-1 weight 0.058
	item osds-5 weight 0.058
	item osds-4 weight 0.057
	item osds-2 weight 0.072
	item osds-3 weight 0.058
}

# rules
rule replicated_rule {
	id 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}
rule cold {
	id 1
	type replicated
	min_size 1
	max_size 10
	step take default class hdd
	step chooseleaf firstn 0 type host
	step emit
}
rule hot {
	id 2
	type replicated
	min_size 1
	max_size 10
	step take default class ssd
	step chooseleaf firstn 0 type host
	step emit
}

# end crush map



[root@mons-1 ~]# ceph osd crush show-tunables 
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 1,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "jewel",
    "optimal_tunables": 1,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

Comment 6 Boris Ranto 2019-01-10 11:46:36 UTC
This should be fixed by

https://github.com/ceph/ceph/pull/21138

We can back-port it downstream for the next z-stream release.

Comment 7 Boris Ranto 2019-01-30 11:09:08 UTC
FYI: Upstream luminous/mimic back-ports:

https://github.com/ceph/ceph/pull/26199
https://github.com/ceph/ceph/pull/26200

Comment 13 Parikshith 2019-02-11 16:42:51 UTC
Created attachment 1529157 [details]
Mgr log

Comment 21 errata-xmlrpc 2019-03-07 15:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475