Bug 2369786

Summary: [8.1] prometheus fails with 404 error when mgr daemons are upgraded due to root-CA not in sync post upgrade
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vinayak Papnoi <vpapnoi>
Component: CephadmAssignee: Kushal Deb <kdeb>
Status: CLOSED ERRATA QA Contact: Vinayak Papnoi <vpapnoi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.1CC: cephqe-warriors, kdeb, rkachach, sabose, tserlin
Target Milestone: ---Keywords: UpgradeBlocker
Target Release: 8.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-19.2.1-215.el9cp Doc Type: Bug Fix
Doc Text:
Cause: Mismatch between the Common Name (CN) used to generate the certificates and the actual CN of the cephadm Root CA loaded in certmgr. Consequence: The Dashboard fails to reach the Prometheus API and raises related errors due to SSL connectivity issues. Fix: Ensure that new certificates are issued using the currently active cephadm Root CA. Result: Certificates signed by cephadm are now correctly aligned with the cephadm Root CA, resolving API connectivity issues and improving consistency.
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-06-26 12:32:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vinayak Papnoi 2025-06-02 13:32:56 UTC
Description of problem:

With mgmt-gateway service running, when the "mgr" daemons are upgraded to a newer image which uses new root CA (cephadm-root-<fsid>) as opposed to old (cephadm-root), the prometheus service fails with below error:

404 - Not Found
Could not reach Prometheus's API on https://10.0.66.11:29443/internal/prometheus/api/v1 error HTTPSConnectionPool(host='10.0.66.11', port=29443): Max retries exceeded with url: /internal/prometheus/api/v1/rules (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1147)')))



Version-Release number of selected component (if applicable):

ceph 8.1

How reproducible:

1/1

Steps to Reproduce:
1. Install a cluster with 8.0 z stream release (which uses cephadm-root CA)
2. Deploy mgmt-gateway service
3. Upgrade the mgr daemons to the latest 8.1 image using : ceph orch upgrade start <image> --daemon_types mgr
4. Check logs/dashboard


Actual results:

prometheus fails with error mentioned in the description 


Expected results:

prometheus should not fail


Additional info:

Comment 9 errata-xmlrpc 2025-06-26 12:32:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775