Bug 2265492 - Add Runbooks for ODF alerts - some text correction is required in Runbooks for few alerts
Summary: Add Runbooks for ODF alerts - some text correction is required in Runbooks fo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph-monitoring
Version: 4.15
Hardware: ppc64le
OS: Linux
unspecified
medium
Target Milestone: ---
: ODF 4.16.0
Assignee: arun kumar mohan
QA Contact: Nagendra Reddy
URL:
Whiteboard:
Depends On:
Blocks: 2260844
TreeView+ depends on / blocked
 
Reported: 2024-02-22 11:02 UTC by Aaruni Aggarwal
Modified: 2024-11-15 04:25 UTC (History)
7 users (show)

Fixed In Version: 4.16.0-86
Doc Type: Bug Fix
Doc Text:
.Wrong help text shown in runbooks for some alerts Previously, Wrong help text was shown in the runbooks for some alerts as there was wrong text in runbook markdown files of those alerts. With this fix, the text in the runbook markdown files so that the alerts show the correct help text.
Clone Of:
Environment:
Last Closed: 2024-07-17 13:14:10 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift runbooks pull 173 0 None open Some corrections to the alerts' runbook files 2024-03-27 04:09:44 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:14:17 UTC

Description Aaruni Aggarwal 2024-02-22 11:02:53 UTC
Description of problem (please be detailed as possible and provide log
snippets):
Add Runbooks for ODF alerts - some text correction is required in Runbooks for a few alerts

Version of all relevant components (if applicable):

OCP: 4.15.0
ODF: v4.15.0-143.stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP4.15, ODF4.15
2. Create storage system
3. In OCP web console, navigate to Observe -> Alerting -> Alerting Rules
4. Check the Alerts and click on the link for their description which will lead to their corresponding Runbooks. 


Actual results:
For few alerts, the description provided in Runbook and OCP console mismatches and one of the links is not redirecting to the correct page. 

Expected results:

The description of Alerts in the OCP console should match with the Runbook description and the link should be working.  
Additional info:

Comment 2 Aaruni Aggarwal 2024-02-22 11:10:31 UTC
1. CephClusterWarningState: In https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterWarningState.md -> it shows cephcluster is in warning state for more than 10mins, 
while in the OCP console for this alert, it shows cephcluster is in a warning state for more than 15 minutes.

2. For CephPoolQuotaBytesCriticallyExhausted and CephPoolQuotaBytesNearExhaustion :

The mitigation statement seems incorrect. ie. Pool quotas can be adjusted Ceph CLI up or down (or removed) with 

The statement should be something like: Pool quotas can be adjusted up or down (or removed) with Ceph CLI.

3. CephMgrIsAbsent:

in the Diagnosis section,

Therefore, if it is a network issue, escalate to the ODF team by following the steps here.

which leads to  https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/helpers/sre-to-engineering-escalation.md#references --->>> 

the link seems incorrect in the following statement ie.

Escalation path when the incident urgently requires help from ODF Engineering
Follow the steps in link, starting with submitting the Google Form, to get help from the ODF Engineering Team.

Comment 3 Nishanth Thomas 2024-02-22 12:10:46 UTC
Low severity issue, moving to 4.16

Comment 4 arun kumar mohan 2024-03-27 04:09:45 UTC
PR submitted: https://github.com/openshift/runbooks/pull/173

@aaruni, can you please check the 3rd point (about 'CephMgrIsAbsent' alert)?
The 'here' link (under ## Diagnosis heading) leads to `helpers/sre-to-engineering-escalation.md#procedure` (not to #refernces). May be this might have got already corrected (by some other PR).
PS: please do check/review the above PR#173 as well.

Thanks,
Arun

Comment 5 Aaruni Aggarwal 2024-03-27 12:36:42 UTC
Hi Arun
Thanks for taking a look at the BZ. 
PR:  https://github.com/openshift/runbooks/pull/173 looks good to me. 

Also, 'CephMgrIsAbsent' alert is working as expected. 

Thanks.

Comment 9 arun kumar mohan 2024-04-25 08:07:55 UTC
Providing the RDT, please take a look

Comment 17 Nagendra Reddy 2024-05-09 10:47:06 UTC
Verified all the above lister alerts's runbook. All the required changes were implemented correctly. Please find the attached snapshots for more reference.

odf build used for verification: 4.16.0-94.stable

Comment 18 arun kumar mohan 2024-05-28 08:02:15 UTC
Removing all the 'needinfo'-s as they are no longer needed

Comment 20 errata-xmlrpc 2024-07-17 13:14:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591

Comment 21 Red Hat Bugzilla 2024-11-15 04:25:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.