Description of problem (please be detailed as possible and provide log snippests): As documented extensively in Bug 1976936, when the limit of rbd snapshots is reached for a volume a flattening task runs in the background to mitigate the issue. During this time no more snapshots can be created. This will cause a lengthy delay when provisioning new PVCs that clone an impacted volume. This reason for this delay is not apparent to the user or cluster admin and will cause frustration and support tickets. I would like to request that an alert should fire while a volume is being flattened so we can let the user know what to expect in this situation. Version of all relevant components (if applicable): 4.8+ Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? It creates a bad customer experience in high-scale workloads. Is there any workaround available to the best of your knowledge? The situation resolves itself eventually (could be hours later) but this will cause confusion since there is no way to understand what is happening unless you analyze logs and have a deep understanding of ceph internals. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes. Can this issue reproduce from the UI? In theory yes, but it's more likely to happen when creating many volume clones from the CLI. If this is a regression, please provide more details to justify this: Not a regression Steps to Reproduce: See Bug 1976936 for full details. Actual results: Flattening happens in the background with no user-visible alerts or explanation why further snapshots and clones cannot continue. Expected results: An alert is firing to indicate this condition and appropriate documentation (runbook) explains the situation to the user. Additional info:
CSI team should review this, but ultimately it goes to the monitoring component
(In reply to Travis Nielsen from comment #3) > CSI team should review this, but ultimately it goes to the monitoring > component Indeed, Ceph-CSI can not create alerts in OCP (or Kubernetes). Because CSI are independent of the container platform, they should not use container platform APIs directly. Monitoring components can check the events for a PV(C), and should be able to create an alert when flattening is in progress (and remove the alert when the PV(C) is created).
Created a Jira for this RFE
Reopening because the follow up Jira is not linked anywhere. Where can I check to find this Jira?
(In reply to Adam Litke from comment #6) > Reopening because the follow up Jira is not linked anywhere. Where can I > check to find this Jira? @alitke , its linked to 'Links' section of this BZ(https://issues.redhat.com/browse/OCSBZM-3408). Is that what you are looking for?
My bad, I should have added the epic link. Here you go https://issues.redhat.com/browse/RHSTOR-3276
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days