1834440 – Update and improve description of OCS storage utilization alerts

Bug 1834440 - Update and improve description of OCS storage utilization alerts

Summary: Update and improve description of OCS storage utilization alerts

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	documentation
Sub Component:
Version:	4.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.3.z
Assignee:	Anjana Suparna Sriram
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:	1809248
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-11 17:39 UTC by Martin Bukatovic
Modified:	2023-11-23 15:45 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-23 15:45:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Martin Bukatovic 2020-05-11 17:39:12 UTC

Document URL
============

Red Hat OpenShift Container Storage 4.3
Troubleshooting OpenShift Container Storage

Section Number and Name
=======================

Chapter 5. Troubleshooting alerts and errors in OpenShift Container Storage
5.1. Resolving alerts and errors

Describe the issue
==================

In the list of OCS alerts, I see entries for CephClusterCriticallyFull and
CephClusterNearFull, but it's description is insufficient, lacking clear and
precise meaning. What will happen when an action is not taken is not discussed.

Suggestions for improvement
===========================

For all storage utilization alerts (such as CephClusterNearFull and
CephClusterCriticallyFull), we should provide the following details in a clear
way:

- Exact definition of the alert, and how to understand it wrt cluster state.
  What is based on? How does it related to cluster vs usable storage? Does it
  mean I will be able to write 25% data untill hiting out of space issue when
  the alert states that utilization crossed 75%?
- What is going to happen when the alert is not acted upon (include worst case
  scenario)
- Impact on OCP Prometheus monitoring when it's storage is backed by OCS.

We should also make sure that all storage utilization alerts are listed.

Additional information
======================

Exact content depends on engineering resolution for BZ 1809248. Please reach
out to dev team when BZ 1809248 so that doc changes can be drafted.

Action items for admin to follow as listed in Procedure section needs to be
also revisited, if changes in eng. BZs makes it necessary.

Other related eng. bugs include BZ 1818736 and BZ 1775432.

Comment 2 Martin Bukatovic 2020-05-11 17:42:52 UTC

Marking BZ 1818736 as a blocker for this doc bug, as discussed in Additional information section above.

Comment 3 Martin Bukatovic 2020-05-11 17:44:20 UTC

Fixing copy-paste typo in a blocker bug.

Note You need to log in before you can comment on or make changes to this bug.