Bug 1594746 - [SEE/SD][restful-api] curl https://<hostname>:8003/osd fails with python traceback when crushmap uses root with combined device-class
Summary: [SEE/SD][restful-api] curl https://<hostname>:8003/osd fails with python trac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Mgr Plugins
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z1
: 3.2
Assignee: Boris Ranto
QA Contact: Parikshith
Erin Donnelly
URL:
Whiteboard:
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2018-06-25 10:21 UTC by Tomas Petr
Modified: 2019-12-02 13:35 UTC (History)
11 users (show)

Fixed In Version: RHEL: ceph-12.2.8-77.el7cp Ubuntu: ceph_12.2.8-62redhat1
Doc Type: Bug Fix
Doc Text:
.HDD and SSD devices can now be mixed when accessing the `/osd` endpoint Previously, the {product} RESTful API did not handle when HDD and SSD devices were mixed when accessing the `/osd` endpoint and returned an error. With this update, the OSD traversal algorithm has been improved to handle this scenario as expected.
Clone Of:
Environment:
Last Closed: 2019-03-07 15:50:55 UTC
Embargoed:


Attachments (Terms of Use)
Mgr log (4.84 MB, text/plain)
2019-02-11 16:42 UTC, Parikshith
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0475 0 None None None 2019-03-07 15:51:06 UTC

Description Tomas Petr 2018-06-25 10:21:24 UTC
Description of problem:
encountered a bug where the with using root with both ssd and hdd device-class used the restful api command fails with python traceback:

root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        id -21 class ssd                # do not change unnecessarily
        # weight 0.363
        alg straw2
        hash 0  # rjenkins1
        item osds-0 weight 0.058
        item osds-1 weight 0.058
        item osds-5 weight 0.058
        item osds-4 weight 0.057
        item osds-2 weight 0.072
        item osds-3 weight 0.058
}

# curl -k --noproxy '*' -u admin:a57347d0-f56f-4d1f-8b29-62c296a751ac https://mons-2:8003/osd
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.</p>


2018-06-25 05:59:43.894543 7f2766c5a700  0 mgr[restful] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pecan/core.py", line 678, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/usr/lib/python2.7/site-packages/pecan/core.py", line 569, in invoke_controller
    result = controller(*args, **kwargs)
  File "/usr/lib64/ceph/mgr/restful/decorators.py", line 33, in decorated
    return f(*args, **kwargs)
  File "/usr/lib64/ceph/mgr/restful/api/osd.py", line 130, in get
    return module.instance.get_osds(pool_id)
  File "/usr/lib64/ceph/mgr/restful/module.py", line 543, in get_osds
    pools_map = self.get_osd_pools()
  File "/usr/lib64/ceph/mgr/restful/module.py", line 516, in get_osd_pools
    pool_osds = common.crush_rule_osds(self.get('osd_map_tree')['nodes'], rule)
  File "/usr/lib64/ceph/mgr/restful/common.py", line 149, in crush_rule_osds
    osds |= _gather_osds(nodes_by_id[step['item']], rule['steps'][i + 1:])
KeyError: -2L


Version-Release number of selected component (if applicable):
ceph version 12.2.4-10.el7cp

How reproducible:
Always

Steps to Reproduce:
1. deploy ceph with HDD and SSD based osd
2. have a crushmap  with root bucket combined ssd and hdd device-class
3. set up a mgr restful plugin
4. curl https://<hostname>:8003/osd 

Actual results:


Expected results:


Additional info:

Comment 3 Tomas Petr 2018-06-25 10:27:55 UTC
There is nothing special in the mgr.log even with debug_mgr=20 debug_mgrc=20 debug_ms=1, but I can upload it if needed.

# ceph osd crush rule create-replicated cold default host hdd
# ceph osd crush rule create-replicated hot default host ssd

Crush map:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host osds-0 {
	id -3		# do not change unnecessarily
	id -4 class hdd		# do not change unnecessarily
	id -15 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.0 weight 0.019
	item osd.3 weight 0.019
	item osd.13 weight 0.019
}
host osds-1 {
	id -5		# do not change unnecessarily
	id -6 class hdd		# do not change unnecessarily
	id -16 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.1 weight 0.019
	item osd.4 weight 0.019
	item osd.12 weight 0.019
}
host osds-5 {
	id -7		# do not change unnecessarily
	id -8 class hdd		# do not change unnecessarily
	id -17 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.2 weight 0.019
	item osd.5 weight 0.019
	item osd.14 weight 0.019
}
host osds-4 {
	id -9		# do not change unnecessarily
	id -10 class hdd		# do not change unnecessarily
	id -18 class ssd		# do not change unnecessarily
	# weight 0.057
	alg straw2
	hash 0	# rjenkins1
	item osd.7 weight 0.018
	item osd.16 weight 0.019
	item osd.17 weight 0.019
}
host osds-2 {
	id -11		# do not change unnecessarily
	id -12 class hdd		# do not change unnecessarily
	id -19 class ssd		# do not change unnecessarily
	# weight 0.072
	alg straw2
	hash 0	# rjenkins1
	item osd.6 weight 0.018
	item osd.8 weight 0.018
	item osd.15 weight 0.019
	item osd.18 weight 0.018
}
host osds-3 {
	id -13		# do not change unnecessarily
	id -14 class hdd		# do not change unnecessarily
	id -20 class ssd		# do not change unnecessarily
	# weight 0.058
	alg straw2
	hash 0	# rjenkins1
	item osd.9 weight 0.019
	item osd.10 weight 0.019
	item osd.11 weight 0.019
}
root default {
	id -1		# do not change unnecessarily
	id -2 class hdd		# do not change unnecessarily
	id -21 class ssd		# do not change unnecessarily
	# weight 0.363
	alg straw2
	hash 0	# rjenkins1
	item osds-0 weight 0.058
	item osds-1 weight 0.058
	item osds-5 weight 0.058
	item osds-4 weight 0.057
	item osds-2 weight 0.072
	item osds-3 weight 0.058
}

# rules
rule replicated_rule {
	id 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}
rule cold {
	id 1
	type replicated
	min_size 1
	max_size 10
	step take default class hdd
	step chooseleaf firstn 0 type host
	step emit
}
rule hot {
	id 2
	type replicated
	min_size 1
	max_size 10
	step take default class ssd
	step chooseleaf firstn 0 type host
	step emit
}

# end crush map



[root@mons-1 ~]# ceph osd crush show-tunables 
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 1,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "jewel",
    "optimal_tunables": 1,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

Comment 6 Boris Ranto 2019-01-10 11:46:36 UTC
This should be fixed by

https://github.com/ceph/ceph/pull/21138

We can back-port it downstream for the next z-stream release.

Comment 7 Boris Ranto 2019-01-30 11:09:08 UTC
FYI: Upstream luminous/mimic back-ports:

https://github.com/ceph/ceph/pull/26199
https://github.com/ceph/ceph/pull/26200

Comment 13 Parikshith 2019-02-11 16:42:51 UTC
Created attachment 1529157 [details]
Mgr log

Comment 21 errata-xmlrpc 2019-03-07 15:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0475


Note You need to log in before you can comment on or make changes to this bug.