Description of problem: - Post RHCS upgrade,( from Ceph 4.3 -> 5.3) getting 404 error in the dashboard while connecting to prometheus Version-Release number of selected component (if applicable): - RHCS 5. How reproducible: Post upgrade, login to ceph dashboard and check. Steps to Reproduce: 1. Deploy an RHCS cluster and make sure that the mgr and prometheus are running on different machines. 2. Follow the upgrade guide and upgrade RHCS 4->RHCS 5 3. Post upgrade login to ceph dashboard 4. Check the connectivity between the active mgr and prometheus. Actual results: - Getting 404 error in the ceph dashboard Expected results: - Post upgrade, dashboard should work without any issues.
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
###Additional Info - Workaround: add firewall rule in prometheus server for port 9095/tcp. ###Test results from my lab 1. Upgraded the cluster from RHCS 4 to RHCS 5 and check the `prometheus` and `ceph config dump`: ``` [ceph: root@node1 /]# ceph orch ps --daemon_type prometheus NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID prometheus.node4 node4 running (9m) 9m ago 6w 217M - 2.22.2 ec2d358ca73c db67f744cb3c [ceph: root@node1 /]# [ceph: root@node1 /]# ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced cluster_network 192.168.122.0/24 * global basic container_image registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:04c39425bc9e05e667ebe23513847b905b5998994cc95572c6a4549b8826bd81 * global advanced osd_pool_default_crush_rule -1 global advanced public_network 192.168.122.0/24 * mgr advanced mgr/balancer/active true mgr advanced mgr/cephadm/autotune_memory_target_ratio 0.700000 * mgr advanced mgr/cephadm/container_image_alertmanager registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.6 * mgr advanced mgr/cephadm/container_image_base registry.redhat.io/rhceph/rhceph-5-rhel8 mgr advanced mgr/cephadm/container_image_grafana registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:5 * mgr advanced mgr/cephadm/container_image_node_exporter registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6 * mgr advanced mgr/cephadm/container_image_prometheus registry.redhat.io/openshift4/ose-prometheus:v4.6 * mgr advanced mgr/cephadm/migration_current 5 * mgr advanced mgr/dashboard/node1/server_addr 192.168.122.139 * mgr advanced mgr/dashboard/node2/server_addr 192.168.122.60 * mgr advanced mgr/dashboard/node3/server_addr 192.168.122.218 * mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://node4:9093 * mgr advanced mgr/dashboard/GRAFANA_API_PASSWORD redhat123 * mgr advanced mgr/dashboard/GRAFANA_API_SSL_VERIFY false * mgr advanced mgr/dashboard/GRAFANA_API_URL https://node4:3000 * mgr advanced mgr/dashboard/GRAFANA_API_USERNAME admin * mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://node4:9095 * mgr advanced mgr/dashboard/server_port 8443 * mgr advanced mgr/dashboard/ssl true * mgr advanced mgr/dashboard/ssl_server_port 8443 * mgr advanced mgr/orchestrator/orchestrator cephadm osd host:node1 basic osd_memory_target 4294967296 osd host:node2 basic osd_memory_target 4294967296 osd host:node3 basic osd_memory_target 4294967296 [ceph: root@node1 /]# ``` 2. Check the dashboard and confirm whether the 404 error is present or not. 3. As the part of troubleshooting, I can see the below error from the active mgr: ``` Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: debug 2023-04-12T19:47:46.568+0000 7f9e2e72a700 0 [dashboard ERROR exception] Dashboard Exception Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Traceback (most recent call last): Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/connection.py", line 162, in _new_conn Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: (self._dns_host, self.port), self.timeout, **extra_kw) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: raise err Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: sock.connect(sa) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: OSError: [Errno 113] No route to host Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: During handling of the above exception, another exception occurred: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Traceback (most recent call last): Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: chunked=chunked) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: conn.request(method, url, **httplib_request_kw) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib64/python3.6/http/client.py", line 1273, in request Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: self._send_request(method, url, body, headers, encode_chunked) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib64/python3.6/http/client.py", line 1319, in _send_request Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: self.endheaders(body, encode_chunked=encode_chunked) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib64/python3.6/http/client.py", line 1268, in endheaders Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: self._send_output(message_body, encode_chunked=encode_chunked) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib64/python3.6/http/client.py", line 1044, in _send_output Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: self.send(msg) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib64/python3.6/http/client.py", line 982, in send Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: self.connect() Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/connection.py", line 184, in connect Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: conn = self._new_conn() Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: self, "Failed to establish a new connection: %s" % e) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f9e2b6e9588>: Failed to establish a new connection: [Errno 113] > Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: During handling of the above exception, another exception occurred: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Traceback (most recent call last): Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/requests/adapters.py", line 449, in send Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: timeout=timeout Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: _stacktrace=sys.exc_info()[2]) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: raise MaxRetryError(_pool, url, error or ResponseError(cause)) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='node4', port=9095): Max retries exceeded with url: /api/v1/rules (Caused by NewConn> Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: During handling of the above exception, another exception occurred: pr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Traceback (most recent call last): Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 49, in _proxy Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: json=payload, verify=verify) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/requests/api.py", line 60, in request Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: return session.request(method=method, url=url, **kwargs) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/requests/sessions.py", line 533, in request Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: resp = self.send(prep, **send_kwargs) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/requests/sessions.py", line 646, in send Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: r = adapter.send(request, **kwargs) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/requests/adapters.py", line 516, in send Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: raise ConnectionError(e, request=request) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: requests.exceptions.ConnectionError: HTTPConnectionPool(host='node4', port=9095): Max retries exceeded with url: /api/v1/rules (Caused by NewC> Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: During handling of the above exception, another exception occurred: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: Traceback (most recent call last): Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: return handler(*args, **kwargs) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: return self.callable(*self.args, **self.kwargs) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 258, in inner Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: ret = func(*args, **kwargs) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 191, in wrapper Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: return func(*vpath, **params) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 71, in rules Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: return self.prometheus_proxy('GET', '/rules', params) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 34, in prometheus_proxy Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: verify=Settings.PROMETHEUS_API_SSL_VERIFY) Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 54, in _proxy Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: component='prometheus') Apr 13 01:17:46 node1 ceph-2c4dde66-4207-463b-bcea-b994c7f4e7ba-mgr-node1[1859]: dashboard.exceptions.DashboardException: Could not reach Prometheus's API on http://node4:9095/api/v1 ``` 4. From active mgr, the port `9095` was not accessible: ``` [root@node1 ~]# cephadm shell -- ceph -s Inferring fsid 2c4dde66-4207-463b-bcea-b994c7f4e7ba Using recent ceph image registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:957294824e1cbf89ca24a1a2aa2a8e8acd567cfb5a25535e2624989ad1046a60 cluster: id: 2c4dde66-4207-463b-bcea-b994c7f4e7ba health: HEALTH_WARN mons are allowing insecure global_id reclaim services: mon: 3 daemons, quorum node2,node1,node3 (age 19m) mgr: node1(active, since 19m), standbys: node2, node3 osd: 3 osds: 3 up (since 19m), 3 in (since 6w) data: pools: 1 pools, 1 pgs objects: 1 objects, 0 B usage: 109 MiB used, 60 GiB / 60 GiB avail pgs: 1 active+clean [root@node1 ~]# telnet 192.168.122.133 9095 Trying 192.168.122.133... telnet: connect to address 192.168.122.133: No route to host [root@node1 ~]# ping -c2 192.168.122.133 PING 192.168.122.133 (192.168.122.133) 56(84) bytes of data. 64 bytes from 192.168.122.133: icmp_seq=1 ttl=64 time=0.311 ms 64 bytes from 192.168.122.133: icmp_seq=2 ttl=64 time=0.204 ms --- 192.168.122.133 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1022ms rtt min/avg/max/mdev = 0.204/0.257/0.311/0.055 ms [root@node1 ~]# ``` 5. But in `prometheus` node, I can see the port is listening and there is no firewall rule for the same. So I added the firewall rule on `prometheus` node as below: ``` [root@node4 ~]# ss -plunt | grep 9095 tcp LISTEN 0 128 *:9095 *:* users:(("prometheus",pid=1842,fd=8)) [root@node4 ~]# [root@node4 ~]# firewall-cmd --list-all public (active) target: default icmp-block-inversion: no interfaces: enp1s0 sources: services: cockpit dhcpv6-client ssh ports: 9100/tcp 3000/tcp 9092/tcp 9093/tcp 9094/tcp 9094/udp protocols: forward: no masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: [root@node4 ~]# [root@node4 ~]# firewall-cmd --permanent --add-port=9095/tcp success [root@node4 ~]# firewall-cmd --reload success [root@node4 ~]# firewall-cmd --list-ports 3000/tcp 9092/tcp 9093/tcp 9094/tcp 9095/tcp 9100/tcp 9094/udp [root@node4 ~]# ``` 6. After adding the firewall rule, check whether the port `9095` is accessible from the active mgr: ``` [root@node1 ~]# telnet 192.168.122.133 9095 Trying 192.168.122.133... Connected to 192.168.122.133. Escape character is '^]'. ^] telnet> quit Connection closed. [root@node1 ~]# ``` The above results shows, its connecting without any issues. 7. So check from dashboard and confirm whether the error is still present or not. Let me know any further data from the environment is needed to proceed further. Regards, Geo Jose
Missed the 5.3 z4 deadline. Moving from z4 to z5.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:4213