Description of problem: Values of total inode capacity for filesystems with dynamic inode allocation (such as CephFS) are not valid to provide calculations or alarms about storage capacity. An example of the inodes metrics for a PV backed by cephfs could be: kubelet_volume_stats_inodes_free{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 0 kubelet_volume_stats_inodes{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 7418 kubelet_volume_stats_inodes_used{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 7419 The kubelet_volume_stats_inodes_free does not show the reality, because inodes are added dynamically when it is needed, and the other two metrics cannot be used to calculate capacity, or free inodes for the same reason. The discussion of the solution is available a part of "Bug 2128263 - Alert KubePersistentVolumeInodesFillingUp MON-2802", but summarizing, it has been decided to introduce a new Ceph-CSI configuration option to not report inode information at all. As an improvement, it could be nice to provide only the inode information that is real, in our case, only 'kubelet_volume_stats_inodes_used' Version of all relevant components (if applicable): OCP 4.11.0 ODF 4.11.0 Does this issue impact your ability to continue to work with the product False alarms raised. Is there any workaround available to the best of your knowledge? Silent this kind of alarms Rate from 1 - 5 the complexity of the scenario you performed that caused this bug ? 1 - very simple Can this issue reproducible? yes Steps to Reproduce: 1. Install OCP 2. Reconfigure OpenShift Container Platform registry to use RWX CephFS volume provided by ODF 3. Use the cluster for a while 4. Check firing alerts Actual results: Alert KubePersistentVolumeInodesFillingUp is firing Expected results: Alert KubePersistentVolumeInodesFillingUp is not firing when RWX CephFS volume is used to provide persistent storage for some OCP component. Additional info: See bug 2128263
CephFS does not have a concept of "free inodes", inodes get allocated on-demand in the filesystem. This confuses alerting managers that expect a (high) number of free inodes, and warnings get produced if the number of free inodes is not high enough. This causes alerts to always get reported for CephFS. To prevent the false-positive alerts from happening, the NodeGetVolumeStats procedure for CephFS (and CephNFS) will not contain inodes in the reply anymore.
*** Bug 2089225 has been marked as a duplicate of this bug. ***
https://github.com/red-hat-storage/ceph-csi/pull/138 is the backport to the ODF devel branch
The backport for ODF-4.12 is ready at https://github.com/red-hat-storage/ceph-csi/pull/139 Once this bug is approved, leave "/bugzilla refresh" as a comment in the PR to get is merged.
The verification steps: 1) Create CephFS pvc 2) Create a pod and attach the pod to a pvc 3) Check on which nodes the pods are running oc get pods -o wide 4) go to the csi-cephfsplugin logs of the pod that are running on the same IP that the pod: oc logs csi-cephfsplugin-2zcdd -c csi-cephfsplugin and make sure that no "unit":2 appear in the log content. Tis was indeed verified on 4.12.0-114 ODF. No "unit":2 messaged appear, only "unit":1. Therefore moving this BZ to "Verified". @Niels de Vos please confirm these verification steps. In case anything else should be checked - please get this BZ back to me.
Those steps looks correct, but do verify that NodeGetVolumeStats does appear in the same logs. Kubelet only calls that procedure at intervals, and if it isn't called yet, the test isn't valid yet either. In addition to that, I suggest to verify that some of the metrics for the PVC are available, but "kubelet_volume_stats_inodes" and other from comment #0 should be 'missing'.
I've checked the logs for NodeGetVolumeStats oc logs csi-cephfsplugin-rljzb -c csi-cephfsplugin 2>&1 | tee test_plugin_logs.txt And the following message does appear frequently. I1124 11:07:42.302874 1 utils.go:195] ID: 3268 GRPC call: /csi.v1.Node/NodeGetVolumeStats
Also no kubelet_volume_stats_inodes records appear in the logs.
*** Bug 2164633 has been marked as a duplicate of this bug. ***