Description of problem: When "rbd_stats_pools" list is empty ``` sh-4.4# ceph config get mgr mgr/prometheus/rbd_stats_pools ``` and I curl the ceph-mgr metrics endpoint, I get following output ``` $ curl -XGET 10.107.123.127:9283/metrics # HELP ceph_health_status Cluster health status # TYPE ceph_health_status untyped ceph_health_status 1.0 # HELP ceph_mon_quorum_status Monitors in quorum # TYPE ceph_mon_quorum_status gauge ceph_mon_quorum_status{ceph_daemon="mon.a"} 1.0 ceph_mon_quorum_status{ceph_daemon="mon.b"} 1.0 ceph_mon_quorum_status{ceph_daemon="mon.c"} 1.0 # HELP ceph_fs_metadata FS Metadata # TYPE ceph_fs_metadata untyped # HELP ceph_mds_metadata MDS Metadata # TYPE ceph_mds_metadata untyped # HELP ceph_mon_metadata MON Metadata # TYPE ceph_mon_metadata untyped ceph_mon_metadata{ceph_daemon="mon.a",hostname="minikube",public_addr="10.104.211.218",rank="0",ceph_version="ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)"} 1.0 ceph_mon_metadata{ceph_daemon="mon.b",hostname="minikube",public_addr="10.96.251.169",rank="1",ceph_version="ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)"} 1.0 ceph_mon_metadata{ceph_daemon="mon.c",hostname="minikube",public_addr="10.109.161.142",rank="2",ceph_version="ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable)"} 1.0 ``` But, when I set some value to "rbd_stats_pools" ``` sh-4.4# ceph config set mgr mgr/prometheus/rbd_stats_pools replicapool sh-4.4# ceph config get mgr mgr/prometheus/rbd_stats_pools replicapool ``` and curl the ceph-mgr metrics endpoint, I hit the following error ``` $ curl -XGET 10.107.123.127:9283/metrics <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta> <title>500 Internal Server Error</title> <style type="text/css"> #powered_by { margin-top: 20px; border-top: 2px solid black; font-style: italic; } #traceback { color: red; } </style> </head> <body> <h2>500 Internal Server Error</h2> <p>The server encountered an unexpected condition which prevented it from fulfilling the request.</p> <pre id="traceback">Traceback (most recent call last): File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line 638, in respond self._do_respond(path_info) File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line 697, in _do_respond response.body = self.handler() File "/lib/python3.6/site-packages/cherrypy/lib/encoding.py", line 219, in __call__ self.body = self.oldhandler(*args, **kwargs) File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/share/ceph/mgr/prometheus/module.py", line 1047, in metrics return self._metrics(instance) File "/usr/share/ceph/mgr/prometheus/module.py", line 1062, in _metrics instance.collect_cache = instance.collect() File "/usr/share/ceph/mgr/prometheus/module.py", line 965, in collect self.get_rbd_stats() File "/usr/share/ceph/mgr/prometheus/module.py", line 726, in get_rbd_stats 'rbd_stats_pools_refresh_interval', 300) TypeError: unsupported operand type(s) for +: 'int' and 'str' </pre> <div id="powered_by"> <span> Powered by <a href="http://www.cherrypy.org">CherryPy 18.4.0</a> </span> </div> </body> </html> ``` This error disappears as soon as I reset "rbd_stats_pools" to an empty list. Version-Release number of selected component (if applicable): ceph version 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) octopus (stable) How reproducible: 100% Steps to Reproduce: 1. Refer to description 2. 3. Actual results: I get an error Expected results: I should get the exported metrics list, like in any other cases Additional info: Tried on Rook-Ceph
Looked a little more into it. Turns out we hit this error only when we do `ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval <ANY_INTERVAL>`. If we do `ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval ""`, the error disappears and stats collection starts again.
@Umanga: Thanks for digging deeper into this, this is actually an easy fix: https://github.com/ceph/ceph/pull/36102 Please review to get this fixed and back-ported quickly.
Boris, as commented in that PR, I think the missing backport here is https://github.com/ceph/ceph/pull/33991/commits/6d5f88450e61122016fcf7b6cf9431dc67128d3d (not in Nautilus/4.*)
@Umanga: You mentioned you were able to hit this in octopus, did it have the python fix Ernesto mentioned above in it? You should be able to check that by looking inside the '/usr/share/ceph/mgr/prometheus/module.py' in the ceph-mgr container.
(In reply to Boris Ranto from comment #8) > @Umanga: You mentioned you were able to hit this in octopus, did it have the > python fix Ernesto mentioned above in it? You should be able to check that > by looking inside the '/usr/share/ceph/mgr/prometheus/module.py' in the > ceph-mgr container. I don't have that cluster to verify this, but I don't think it had this fix. Because when I checked for default value, it was empty not 300 as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0081