Bug 1890971

Summary: [External] RGW metrics are not available if anything else except 9283 is provided as the monitoring-endpoint-port
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Rachael <rgeorge>
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: assingh, bkunal, edonnell, jthottan, madam, muagarwa, nberry, ocs-bugs, shan
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.6.0-154.ci Doc Type: Known Issue
Doc Text:
.Prometheus listens only on port 9283 The Prometheus service on `ceph-mgr` in an external cluster is expected to listen on port 9283. Other ports are not supported. Red Hat Ceph Storage administrators must use only port 9283 for the Prometheus exporter.
Story Points: ---
Clone Of:
: 1894412 (view as bug list) Environment:
Last Closed: 2020-12-17 06:25:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1882359    

Comment 3 Sébastien Han 2020-10-28 09:34:45 UTC
Today, Rook's metrics port is not configurable and we don't have time to try any fix for 4.6.
What we can do though is force the RHCS administrator to use 9283 for the prometheus exporter.

Neha, would that work for you?

Comment 4 Neha Berry 2020-11-02 13:34:37 UTC
(In reply to leseb from comment #3)
> Today, Rook's metrics port is not configurable and we don't have time to try
> any fix for 4.6.
> What we can do though is force the RHCS administrator to use 9283 for the
> prometheus exporter.
> 
> Neha, would that work for you?

@Seb do you think this is a possibility ? The RHCS cluster might already be in use for different purposes and not sure if all RHCS admins would accept changing the port.

BTW, I would also like to take views of Bipin on this, as they would be fielding queries from the users in case they hit the issue.

IMHO, we should allow the port to be change-able from OCS side, even if it means fixing it in next release ?

Comment 5 Sébastien Han 2020-11-02 15:20:36 UTC
(In reply to Neha Berry from comment #4)
> (In reply to leseb from comment #3)
> > Today, Rook's metrics port is not configurable and we don't have time to try
> > any fix for 4.6.
> > What we can do though is force the RHCS administrator to use 9283 for the
> > prometheus exporter.
> > 
> > Neha, would that work for you?
> 
> @Seb do you think this is a possibility ? The RHCS cluster might already be
> in use for different purposes and not sure if all RHCS admins would accept
> changing the port.

In this case, this would mean, the prometheus is already enabled and metrics are exported onto another k8s cluster, which seems very unlikely.
The port 9283 is the prometheus exporter only.

> 
> BTW, I would also like to take views of Bipin on this, as they would be
> fielding queries from the users in case they hit the issue.
> 
> IMHO, we should allow the port to be change-able from OCS side, even if it
> means fixing it in next release ?

OCS is supposed to be opinionated so things like making port number configurable are a bit out of scope IMO.

Comment 6 Michael Adam 2020-11-03 09:55:46 UTC
(In reply to leseb from comment #5)
> (In reply to Neha Berry from comment #4)
> > (In reply to leseb from comment #3)
> > > Today, Rook's metrics port is not configurable and we don't have time to try
> > > any fix for 4.6.
> > > What we can do though is force the RHCS administrator to use 9283 for the
> > > prometheus exporter.
> > > 
> > > Neha, would that work for you?
> > 
> > @Seb do you think this is a possibility ? The RHCS cluster might already be
> > in use for different purposes and not sure if all RHCS admins would accept
> > changing the port.
> 
> In this case, this would mean, the prometheus is already enabled and metrics
> are exported onto another k8s cluster, which seems very unlikely.
> The port 9283 is the prometheus exporter only.
> 
> > 
> > BTW, I would also like to take views of Bipin on this, as they would be
> > fielding queries from the users in case they hit the issue.
> > 
> > IMHO, we should allow the port to be change-able from OCS side, even if it
> > means fixing it in next release ?
> 
> OCS is supposed to be opinionated so things like making port number
> configurable are a bit out of scope IMO.


I agree that this is not a blocker for 4.6.
This is rather an RFE, imho.

@Neha, also note that (best I know), we currently do not support attaching to arbitrary pre-existing ceph clusters, but require the ceph clusters to be explicitly set up and configured for OCS. So I think it's OK to have a requirement here for the port.

We can still take an RFE for making the port configurable for 4.7.

Do we want to create a doc BZ for this?

Comment 8 Sébastien Han 2020-11-03 15:07:17 UTC
What about my proposal then? Do we want to force the script to only accept 9283 for now?

Comment 12 Mudit Agarwal 2020-11-04 08:26:46 UTC
Quick summary of an offline discussion:

Break this BZ into two parts 
a) force the script to only accept 9283 for now?
b) make the port configurable

This BZ is acked for a) and will remain in 4.6 and for b) an RFE targeted for 4.7 is opened https://bugzilla.redhat.com/show_bug.cgi?id=1894412

Sebestian, do we require doc text for this as a known issue?

Comment 18 errata-xmlrpc 2020-12-17 06:25:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Comment 19 Red Hat Bugzilla 2023-09-15 00:50:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days