Bug 2369786 - [8.1] prometheus fails with 404 error when mgr daemons are upgraded due to root-CA not in sync post upgrade
Summary: [8.1] prometheus fails with 404 error when mgr daemons are upgraded due to ro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 8.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 8.1
Assignee: Kushal Deb
QA Contact: Vinayak Papnoi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-06-02 13:32 UTC by Vinayak Papnoi
Modified: 2025-06-26 12:32 UTC (History)
5 users (show)

Fixed In Version: ceph-19.2.1-215.el9cp
Doc Type: Bug Fix
Doc Text:
Cause: Mismatch between the Common Name (CN) used to generate the certificates and the actual CN of the cephadm Root CA loaded in certmgr. Consequence: The Dashboard fails to reach the Prometheus API and raises related errors due to SSL connectivity issues. Fix: Ensure that new certificates are issued using the currently active cephadm Root CA. Result: Certificates signed by cephadm are now correctly aligned with the cephadm Root CA, resolving API connectivity issues and improving consistency.
Clone Of:
Environment:
Last Closed: 2025-06-26 12:32:38 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-11513 0 None None None 2025-06-02 13:34:45 UTC
Red Hat Product Errata RHSA-2025:9775 0 None None None 2025-06-26 12:32:42 UTC

Description Vinayak Papnoi 2025-06-02 13:32:56 UTC
Description of problem:

With mgmt-gateway service running, when the "mgr" daemons are upgraded to a newer image which uses new root CA (cephadm-root-<fsid>) as opposed to old (cephadm-root), the prometheus service fails with below error:

404 - Not Found
Could not reach Prometheus's API on https://10.0.66.11:29443/internal/prometheus/api/v1 error HTTPSConnectionPool(host='10.0.66.11', port=29443): Max retries exceeded with url: /internal/prometheus/api/v1/rules (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1147)')))



Version-Release number of selected component (if applicable):

ceph 8.1

How reproducible:

1/1

Steps to Reproduce:
1. Install a cluster with 8.0 z stream release (which uses cephadm-root CA)
2. Deploy mgmt-gateway service
3. Upgrade the mgr daemons to the latest 8.1 image using : ceph orch upgrade start <image> --daemon_types mgr
4. Check logs/dashboard


Actual results:

prometheus fails with error mentioned in the description 


Expected results:

prometheus should not fail


Additional info:

Comment 9 errata-xmlrpc 2025-06-26 12:32:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775


Note You need to log in before you can comment on or make changes to this bug.