Description of problem (please be detailed as possible and provide log snippests): When reaching the 100% of the quota in objects, ObcQuotaObjectsAlert (>80%) and ObcQuotaObjectsExhausedAlert (>100%) are raised at the same time. I would suggest not to overlap both, changing the first one to be 80-100% instead. Additionally, the metrics for the alerts are not continuously scrapped (screenshot attached). It makes a flapping in the alert (see additional screenshot on the slack channel that receives the alert). Version of all relevant components (if applicable): OCP 4.14.7, ODF 4.14.3 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? It creates a bad experience and can lead to silence the alerts. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy an OBC with quota 2. Upload objects to reach 100% of quota in objects a/o size 3. See the metrics and the alerts Actual results: Flapping and overlapping alerts Expected results: Non flapping alerts Additional info:
If you see other Ceph alerting rules, is like you say (CephOSDNearFull, and CephOSDCriticallyFull). However, other standard rules from kubernetes are not overlapping. See KubeQuotaAlmostFull and KubeQuotaFullyUsed. Additionally, those later alerts are "info", as it is not an issue in the cluster but something that should be aware in case an user needs more space.
The flapping issue can be due to https://bugzilla.redhat.com/show_bug.cgi?id=2258479?
Added PR: https://github.com/red-hat-storage/ocs-operator/pull/2472 to stop the overlapping. About flapping issues, we may require a cluster setup to debug the issue. Will check with Divyansh (as per comment#7, regarding BZ#2258479) to confirm the theory.
Hi, @amohan. The PR only contains the quota for objects, but there is another for bytes. BTW, for readability I recommend using the notation > 0.8 < 1 instead of < 1 > 0.8.
Hi Ramon, made both the changes (changed the condition better for readability and added the same for ObcQuotaBytesAlert as well)
Adding the RDT details, please take a look.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days