Bug 1999952

Summary: Automate the creation of cephobjectstoreuser for obc metrics collector
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Jiffin <jthottan>
Component: ocs-operatorAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: akarsha <akrai>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.9CC: bkunal, ebenahar, jthottan, kbg, madam, mbukatov, muagarwa, ocs-bugs, odf-bz-bot, shilpsha, sostapov, tdesala
Target Milestone: ---   
Target Release: ODF 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
.Automated the creation of cephobjectstoreuser for object bucket claim metrics collector With this update, the cephobjectstoreuser known as `prometheus-user` to collect data from the RGW server is automatically created.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-13 18:49:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2011326    

Description Jiffin 2021-09-01 06:25:01 UTC
Description of problem
======================

Currently, for the obc-metrics-collector is prerequisite to have cephobjectstoreuser with name "prometheus-user" with certain permissions. It is better to automate that workflow than doing it manually

Version of all relevant components
===================================
4.9

Does this issue impact your ability to continue to work with the product
========================================================================


Is there any workaround available to the best of your knowledge?
================================================================


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
========================================


Can this issue reproducible?
============================

Yes

Can this issue reproduce from the UI?
=====================================

Yes

If this is a regression
=======================

Additional info
===============

For adding permissions to user the following PR https://github.com/rook/rook/pull/8211 in Rook needs to merge 
Post that small fix OCS-Op to create that user. Fix is not intrusive for any existing workflow.

Comment 2 Jiffin 2021-09-14 07:28:45 UTC
The dependent Rook PR got merged in v1.7.3 and OCS-Op PR posted https://github.com/red-hat-storage/ocs-operator/pull/1336

Comment 11 Mudit Agarwal 2021-10-19 13:07:46 UTC
This is not ready for 4.9, moving it to 4.10
For 4.9, we have opened a doc BZ (#2015382) to document the procedure.

Comment 14 Martin Bukatovic 2021-10-26 08:37:30 UTC
(In reply to Mudit Agarwal from comment #11)
> This is not ready for 4.9, moving it to 4.10
> For 4.9, we have opened a doc BZ (#2015382) to document the procedure.

Per comment 3, this is not acceptable as long as RHSTOR-1879 is not pushed out as well.

Moreover we are going to receive this via upstream rebase anyway, so I would like to see how would we handle that exactly no matter which decision will be taken in the end.

Please consult next steps with QE owner of RHSTOR-1879, until that happens, I'm moving it back. One sided override like that is simply not acceptable. Please not do it again.

Comment 18 Martin Bukatovic 2021-10-26 08:39:10 UTC
Status of this BZ is disputed, and decision needs to be based on both QE and DEV owners of this bug and RHSTOR-1879.

Comment 21 Mudit Agarwal 2021-10-26 10:45:35 UTC
(In reply to Martin Bukatovic from comment #14)
> (In reply to Mudit Agarwal from comment #11)
> > This is not ready for 4.9, moving it to 4.10
> > For 4.9, we have opened a doc BZ (#2015382) to document the procedure.
> 
> Per comment 3, this is not acceptable as long as RHSTOR-1879 is not pushed
> out as well.
> Moreover we are going to receive this via upstream rebase anyway, so I would
> like to see how would we handle that exactly no matter which decision will
> be taken in the end.

We are not going to receive the complete fix via upstream, only rook fix is in upstream.
If we were able to fix this within the dev freeze time, then it would not have moved.

> Please consult next steps with QE owner of RHSTOR-1879, until that happens,
> I'm moving it back. One sided override like that is simply not acceptable.
> Please not do it again.

This is a dev preview feature, which means regression only. 
Moreover, this is not a blocker and thus doesn't qualify after we enter dev freeze.
If you want to see this in 4.9, please mark it a blocker with proper justification saying why we should not release 4.9 without this fix.

Comment 26 Martin Bukatovic 2021-10-27 15:42:29 UTC
(In reply to Mudit Agarwal from comment #21)
> We are not going to receive the complete fix via upstream, only rook fix is
> in upstream.
> If we were able to fix this within the dev freeze time, then it would not
> have moved.

I believe that the solution you proposed makes sense from technical perspective,
but I would still like to have it consulted and aligned with people assigned
to RHSTOR-1879, including QE, since during bug triage meeting (when this bug
was acked) we agreed that these tasks are closely related and work on it will
be coordinated.

> > Please consult next steps with QE owner of RHSTOR-1879, until that happens,
> > I'm moving it back. One sided override like that is simply not acceptable.
> > Please not do it again.
> 
> This is a dev preview feature, which means regression only. 
> Moreover, this is not a blocker and thus doesn't qualify after we enter dev
> freeze.

Dev freeze feature level doesn't afaik imply regression testing only. If that has
been changed, could you provide a reference to program approved definition?

> If you want to see this in 4.9, please mark it a blocker with proper
> justification saying why we should not release 4.9 without this fix.

I'm not against pushing it out. Actually I would have not provided qa ack if
I haven't been told it is related to a new feature. But based on what we agreed
on before, the expected course of action here would be:

- consider impact of dropping this BZ on RHSTOR-1879
- sync with dev and qe owners of RHSTOR-1879, and note in this bug that it
  has happened

Maybe there is some existing agreement I'm not aware about, but if that is
the case, let's reference it here.

Looking into known state of this BZ and RHSTOR-1879, I would assume that
neither should be part of the 4.9 release.

Comment 29 Mudit Agarwal 2021-10-27 16:04:22 UTC
(In reply to Martin Bukatovic from comment #26)
> (In reply to Mudit Agarwal from comment #21)
> > We are not going to receive the complete fix via upstream, only rook fix is
> > in upstream.
> > If we were able to fix this within the dev freeze time, then it would not
> > have moved.
> 
> I believe that the solution you proposed makes sense from technical
> perspective,
> but I would still like to have it consulted and aligned with people assigned
> to RHSTOR-1879, including QE, since during bug triage meeting (when this bug
> was acked) we agreed that these tasks are closely related and work on it will
> be coordinated.

I am not providing any solution, I am just saying that the work is incomplete and we don't have time to finish that in the current release.
Hence we want to move this out, given that it is not a blocker for the release.
Regarding the acks during the triage meeting, when we acked it we were not in a blocker only phase but we are now.

> > > Please consult next steps with QE owner of RHSTOR-1879, until that happens,
> > > I'm moving it back. One sided override like that is simply not acceptable.
> > > Please not do it again.
> > 
> > This is a dev preview feature, which means regression only. 
> > Moreover, this is not a blocker and thus doesn't qualify after we enter dev
> > freeze.
> 
> Dev freeze feature level doesn't afaik imply regression testing only. If
> that has
> been changed, could you provide a reference to program approved definition?
Yeah, sorry my bad, it is not regression only but functionality that MAY NOT be fully tested.

> > If you want to see this in 4.9, please mark it a blocker with proper
> > justification saying why we should not release 4.9 without this fix.
> 
> I'm not against pushing it out. Actually I would have not provided qa ack if
> I haven't been told it is related to a new feature. But based on what we
> agreed
> on before, the expected course of action here would be:
> 
> - consider impact of dropping this BZ on RHSTOR-1879
> - sync with dev and qe owners of RHSTOR-1879, and note in this bug that it
>   has happened
Impact is mentioned in the above comments, user will need to perform some manual steps which would be well documented via the doc bz.

Comment 45 Mudit Agarwal 2021-11-08 13:40:06 UTC
After having an offline discussion with Eran and Elad, moving this out of 4.9
Have added it as a known issue, will try to fix it in 4.9.z

Comment 50 Mudit Agarwal 2022-02-01 13:22:37 UTC
Please test with any of the latest 4.10 builds.

Comment 56 akarsha 2022-03-17 06:27:39 UTC
Version:
OCP: 4.10.0-0.nightly-2022-03-16-000645
ODF: 4.10.0-194
CEPH: 16.2.7-76.el8cp (f4d6ada772570ae8b05c62ad79e222fbd3f04188) pacific (stable)

prometheus-user got created in openshift-storage and listed as shown in the sample output. Observed the obc related metrics are exported as shown in the attached screenshot [1] and [2] in the comment54, comment 55.
Based on the observation moving bug to verified state.

Sample output:

$ oc get cephobjectstoreuser -n openshift-storage
NAME                                     AGE
noobaa-ceph-objectstore-user             23h
ocs-storagecluster-cephobjectstoreuser   23h
prometheus-user                          23h

Comment 58 Jiffin 2022-04-12 05:23:12 UTC
The doc text looks good to me

Comment 60 errata-xmlrpc 2022-04-13 18:49:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Comment 61 Mudit Agarwal 2022-04-15 07:24:37 UTC
Doc text was added, thanks Bipin.