Bug 2265492

Summary: Add Runbooks for ODF alerts - some text correction is required in Runbooks for few alerts
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aaruni Aggarwal <aaaggarw>
Component: ceph-monitoringAssignee: arun kumar mohan <amohan>
Status: CLOSED ERRATA QA Contact: Nagendra Reddy <nagreddy>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.15CC: amohan, asriram, kbg, muagarwa, nthomas, odf-bz-bot, tdesala
Target Milestone: ---   
Target Release: ODF 4.16.0   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: 4.16.0-86 Doc Type: Bug Fix
Doc Text:
.Wrong help text shown in runbooks for some alerts Previously, Wrong help text was shown in the runbooks for some alerts as there was wrong text in runbook markdown files of those alerts. With this fix, the text in the runbook markdown files so that the alerts show the correct help text.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-07-17 13:14:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2260844    

Description Aaruni Aggarwal 2024-02-22 11:02:53 UTC
Description of problem (please be detailed as possible and provide log
snippets):
Add Runbooks for ODF alerts - some text correction is required in Runbooks for a few alerts

Version of all relevant components (if applicable):

OCP: 4.15.0
ODF: v4.15.0-143.stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP4.15, ODF4.15
2. Create storage system
3. In OCP web console, navigate to Observe -> Alerting -> Alerting Rules
4. Check the Alerts and click on the link for their description which will lead to their corresponding Runbooks. 


Actual results:
For few alerts, the description provided in Runbook and OCP console mismatches and one of the links is not redirecting to the correct page. 

Expected results:

The description of Alerts in the OCP console should match with the Runbook description and the link should be working.  
Additional info:

Comment 2 Aaruni Aggarwal 2024-02-22 11:10:31 UTC
1. CephClusterWarningState: In https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterWarningState.md -> it shows cephcluster is in warning state for more than 10mins, 
while in the OCP console for this alert, it shows cephcluster is in a warning state for more than 15 minutes.

2. For CephPoolQuotaBytesCriticallyExhausted and CephPoolQuotaBytesNearExhaustion :

The mitigation statement seems incorrect. ie. Pool quotas can be adjusted Ceph CLI up or down (or removed) with 

The statement should be something like: Pool quotas can be adjusted up or down (or removed) with Ceph CLI.

3. CephMgrIsAbsent:

in the Diagnosis section,

Therefore, if it is a network issue, escalate to the ODF team by following the steps here.

which leads to  https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/helpers/sre-to-engineering-escalation.md#references --->>> 

the link seems incorrect in the following statement ie.

Escalation path when the incident urgently requires help from ODF Engineering
Follow the steps in link, starting with submitting the Google Form, to get help from the ODF Engineering Team.

Comment 3 Nishanth Thomas 2024-02-22 12:10:46 UTC
Low severity issue, moving to 4.16

Comment 4 arun kumar mohan 2024-03-27 04:09:45 UTC
PR submitted: https://github.com/openshift/runbooks/pull/173

@aaruni, can you please check the 3rd point (about 'CephMgrIsAbsent' alert)?
The 'here' link (under ## Diagnosis heading) leads to `helpers/sre-to-engineering-escalation.md#procedure` (not to #refernces). May be this might have got already corrected (by some other PR).
PS: please do check/review the above PR#173 as well.

Thanks,
Arun

Comment 5 Aaruni Aggarwal 2024-03-27 12:36:42 UTC
Hi Arun
Thanks for taking a look at the BZ. 
PR:  https://github.com/openshift/runbooks/pull/173 looks good to me. 

Also, 'CephMgrIsAbsent' alert is working as expected. 

Thanks.

Comment 9 arun kumar mohan 2024-04-25 08:07:55 UTC
Providing the RDT, please take a look

Comment 17 Nagendra Reddy 2024-05-09 10:47:06 UTC
Verified all the above lister alerts's runbook. All the required changes were implemented correctly. Please find the attached snapshots for more reference.

odf build used for verification: 4.16.0-94.stable

Comment 18 arun kumar mohan 2024-05-28 08:02:15 UTC
Removing all the 'needinfo'-s as they are no longer needed

Comment 20 errata-xmlrpc 2024-07-17 13:14:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591

Comment 21 Red Hat Bugzilla 2024-11-15 04:25:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days