Description of problem: There should be alerts for Ceph pool utilization. There are currently alerts for cluster utilization (CephClusterNearFull and CephClusterCriticallyFull) but in extreme situation when user utilizes only one pool, user doesn't have to be informed by cluster level alerts in time.
Not a blocker for 4.6, RFE Also pool management coming in as part of 4.7, hence moving this out to 4.7
> Also pool management coming in as part of 4.7 Could you provide a reference to a BZ or JIRA concerned with pool management for OCS 4.7?
Note: Even if we have Pool Mnagement, Pool level alerts can only be created when there is functionality of setting quotas on ceph pools through OCS, otherwise the size of pools remains dynamic according to storage consumption.
@Eran Is there a requirement from the customer side for Pool level alerts? Also, IMO this should be supplemented with the UI visualization we want to get the alerts.
I'm not sure why pool level alert is more precise as the pool free space is a shared free space of the entire cluster. @Elad, pool management moved to OCS 4.8.
@etamir , Its not clear to me what's fix you are looking at here? Can you elaborate?
All usable storage space information from user perspective (how much data is ceph cluster still able to receive and store from ceph clients) needs to take ceph pool configuration into account.
Closing this BZ on the basis of https://bugzilla.redhat.com/show_bug.cgi?id=1870083#c16 and https://bugzilla.redhat.com/show_bug.cgi?id=1870083#c19
@Martin - If I understand you correctly, you are saying that only on a pool level, we will be able to show free namespace without the overhead. Is that the current motivation? If so, this value can be calculated and used. @Anat, please keep me honest here.
(In reply to Eran Tamir from comment #29) > @Martin - If I understand you correctly, you are saying that only on a pool > level, we will be able to show free namespace without the overhead. Is that > the current motivation? If the pools share the OSDs than per pool usable free space is confusing and misleading to the user. Let say we have two pools replica 2 and replica 3 and 300G available raw capacity. Pool 1 has 150G free space and Pool 2 has 100G, I think this will be very confusing for the user. It will be even more confusing if you add compression. As for Ceph's near full and full alerts those are calculated on the raw capacity and don't take into account the pool replication factor. They are for the cluster admin to inform them when to expand the cluster or free space. The values that were chosen allow them enough time to handle the situation. This is especially true with small clusters as we don't have much spare capacity and the cluster can fill up quickly. The reason for moving to read only in case of a full cluster is because deletion of data requires additional space for the metadata and we don't want to get into a situation it is impossible to delete data. This can happen at an OSD level as well for that we have the osd full alert. This is more likely in small cluster and/or small capacity OSDs. There is work in Ceph Pacific to calculate this threshold automatically to support smaller cluster better. > > If so, this value can be calculated and used. @Anat, please keep me honest > here.
(In reply to Eran Tamir from comment #29) > @Martin - If I understand you correctly, you are saying that only on a pool > level, we will be able to show free namespace without the overhead. Is that > the current motivation? > > If so, this value can be calculated and used. @Anat, please keep me honest > here. The question has been addressed by Orit in comment #30. Clearing the needinfo request.
clearing stale needinfo.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days