Description of problem (please be detailed as possible and provide log snippets): Add Runbooks for ODF alerts - some text correction is required in Runbooks for a few alerts Version of all relevant components (if applicable): OCP: 4.15.0 ODF: v4.15.0-143.stable Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP4.15, ODF4.15 2. Create storage system 3. In OCP web console, navigate to Observe -> Alerting -> Alerting Rules 4. Check the Alerts and click on the link for their description which will lead to their corresponding Runbooks. Actual results: For few alerts, the description provided in Runbook and OCP console mismatches and one of the links is not redirecting to the correct page. Expected results: The description of Alerts in the OCP console should match with the Runbook description and the link should be working. Additional info:
1. CephClusterWarningState: In https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterWarningState.md -> it shows cephcluster is in warning state for more than 10mins, while in the OCP console for this alert, it shows cephcluster is in a warning state for more than 15 minutes. 2. For CephPoolQuotaBytesCriticallyExhausted and CephPoolQuotaBytesNearExhaustion : The mitigation statement seems incorrect. ie. Pool quotas can be adjusted Ceph CLI up or down (or removed) with The statement should be something like: Pool quotas can be adjusted up or down (or removed) with Ceph CLI. 3. CephMgrIsAbsent: in the Diagnosis section, Therefore, if it is a network issue, escalate to the ODF team by following the steps here. which leads to https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/helpers/sre-to-engineering-escalation.md#references --->>> the link seems incorrect in the following statement ie. Escalation path when the incident urgently requires help from ODF Engineering Follow the steps in link, starting with submitting the Google Form, to get help from the ODF Engineering Team.
Low severity issue, moving to 4.16
PR submitted: https://github.com/openshift/runbooks/pull/173 @aaruni, can you please check the 3rd point (about 'CephMgrIsAbsent' alert)? The 'here' link (under ## Diagnosis heading) leads to `helpers/sre-to-engineering-escalation.md#procedure` (not to #refernces). May be this might have got already corrected (by some other PR). PS: please do check/review the above PR#173 as well. Thanks, Arun
Hi Arun Thanks for taking a look at the BZ. PR: https://github.com/openshift/runbooks/pull/173 looks good to me. Also, 'CephMgrIsAbsent' alert is working as expected. Thanks.
Providing the RDT, please take a look
Verified all the above lister alerts's runbook. All the required changes were implemented correctly. Please find the attached snapshots for more reference. odf build used for verification: 4.16.0-94.stable
Removing all the 'needinfo'-s as they are no longer needed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days