Bug 2066514 - OCS operator to install Ceph prometheus alerts instead of Rook
Summary: OCS operator to install Ceph prometheus alerts instead of Rook
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ODF 4.11.0
Assignee: Travis Nielsen
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-21 23:11 UTC by Travis Nielsen
Modified: 2023-08-09 17:00 UTC (History)
9 users (show)

Fixed In Version: 4.11.0-89
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-24 13:49:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 1615 0 None open Ceph prometheus rules created by OCS operator instead of Rook 2022-04-04 23:08:39 UTC
Red Hat Product Errata RHSA-2022:6156 0 None None None 2022-08-24 13:50:41 UTC

Description Travis Nielsen 2022-03-21 23:11:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):

As discussed recently in the ODF operators weekly, the Rook community has determined that the prometheus alerts should no longer be installed by the Rook operator with the CephCluster CR option "monitoring.enabled: true". 
- The alerts now need to be created by the OCS operator. Code that was in rook for creating the rules should be straight forward to move over to the OCS operator
- The alerts will be customizable in the future by 
- The "monitoring.enabled: true" will only create resources around prometheus, but not the alerts
- This change allows the alerts upstream to be updated more aggressively to match the ceph repo, while the downstream alerts can be updated when QE is ready to sign off.
- The prometheus rules will be installed upstream with the helm chart

The upstream Rook change is found here: https://github.com/rook/rook/pull/9837


Version of all relevant components (if applicable):

The change affects 4.11


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Alerts will not be available after the next time Rook is sync'd until this fix is made.

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

Yes

Can this issue reproduce from the UI?

Yes

Comment 1 Travis Nielsen 2022-03-21 23:19:05 UTC
To finish the second bullet above...
- The alerts will be customizable in the future by an OCP feature described in this enhancement: https://github.com/openshift/enhancements/pull/958

Customization upstream will be based on a helm chart post processor.

Comment 2 Martin Bukatovic 2022-05-09 17:20:06 UTC
Do I understand it right that unless QE needs to disable or tweak monitoring, the default behavior doesn't change, thus this is mostly regression testing (monitoring should still works as it did before)?

Comment 3 Travis Nielsen 2022-05-09 18:21:03 UTC
Correct, a regression test will be sufficient since the alerts are expected to remain the same. 
Also if you could test during an upgrade from 4.10 --> 4.11 that the alerts are still preserved. The PrometheusRule CR created now by the OCS operator has a slightly different resource name (prometheus-ceph-rules) than the CR that had been created by rook. The rules CR previously had the major ceph version in the name (v14 or v16) that survived ODF upgrades, but I just want to confirm it's ok in this case as well, thanks.

Comment 4 Martin Bukatovic 2022-05-10 12:25:16 UTC
Thanks, so besides standard regression testing, we will also run alerting tests after an upgrade.

Comment 10 Martin Bukatovic 2022-08-22 14:40:35 UTC
Moving to verified based on test results of applicable alerting tests for ODF 4.11.0 RC3 build (run ID 1660738848).

Comment 12 errata-xmlrpc 2022-08-24 13:49:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.