Bug 2132270

Summary: CephFS should not report incomplete/incorrect inode info
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Juan Miguel Olmo <jolmomar>
Component: csi-driverAssignee: Niels de Vos <ndevos>
Status: CLOSED CURRENTRELEASE QA Contact: Yuli Persky <ypersky>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.11CC: kramdoss, muagarwa, musoni, ndevos, ocs-bugs, odf-bz-bot, spasquie, ssonigra, tdesala, vcojot, ypersky
Target Milestone: ---   
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.12.0-79 Doc Type: Removed functionality
Doc Text:
PersistentVolumes that use CephFS did not provide useful statistics about consumed/free inodes. Because the number of free inodes on a CephFS volume is not relevant (new inodes get created when needed), metrics that suggest running out of inodes do not provide important information. In order to prevent erroneous alerting about running low, or out of inodes, Ceph-CSI does not return metrics about inodes on CephFS at all anymore.
Story Points: ---
Clone Of:
: 2149677 (view as bug list) Environment:
Last Closed: 2023-02-08 14:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2128263, 2149676, 2149677    

Description Juan Miguel Olmo 2022-10-05 10:15:08 UTC
Description of problem:

Values of total inode capacity for filesystems with dynamic inode
allocation (such as CephFS) are not valid to provide calculations or alarms about storage capacity.

An example of the inodes metrics for a PV backed by cephfs could be:
kubelet_volume_stats_inodes_free{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 0
kubelet_volume_stats_inodes{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 7418
kubelet_volume_stats_inodes_used{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 7419

The kubelet_volume_stats_inodes_free does not show the reality, because inodes are added dynamically when it is needed, and the other two metrics cannot be used to calculate capacity, or free inodes for the same reason.

The discussion of the solution is available a part of "Bug 2128263 - Alert KubePersistentVolumeInodesFillingUp MON-2802", but summarizing, it has been decided to introduce a new Ceph-CSI configuration option to not report inode information at all.

As an improvement, it could be nice to provide only the inode information that is real, in our case, only 'kubelet_volume_stats_inodes_used'



Version of all relevant components (if applicable):
OCP 4.11.0
ODF 4.11.0

Does this issue impact your ability to continue to work with the product

False alarms raised.


Is there any workaround available to the best of your knowledge?
Silent this kind of alarms

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug ?
1 - very simple

Can this issue reproducible?
yes 

Steps to Reproduce:
1. Install OCP
2. Reconfigure OpenShift Container Platform registry to use RWX
   CephFS volume provided by ODF
3. Use the cluster for a while
4. Check firing alerts


Actual results:
Alert KubePersistentVolumeInodesFillingUp is firing

Expected results:
Alert KubePersistentVolumeInodesFillingUp is not firing when RWX CephFS volume
is used to provide persistent storage for some OCP component.

Additional info:
See bug 2128263

Comment 2 Niels de Vos 2022-10-05 12:47:42 UTC
CephFS does not have a concept of "free inodes", inodes get allocated on-demand in the filesystem.

This confuses alerting managers that expect a (high) number of free inodes, and warnings get produced if the number of free inodes is not high enough. This causes alerts to always get reported for CephFS.

To prevent the false-positive alerts from happening, the NodeGetVolumeStats procedure for CephFS (and CephNFS) will not contain inodes in the reply anymore.

Comment 3 Niels de Vos 2022-10-12 12:25:55 UTC
*** Bug 2089225 has been marked as a duplicate of this bug. ***

Comment 4 Niels de Vos 2022-10-14 08:02:00 UTC
https://github.com/red-hat-storage/ceph-csi/pull/138 is the backport to the ODF devel branch

Comment 5 Niels de Vos 2022-10-14 08:20:49 UTC
The backport for ODF-4.12 is ready at https://github.com/red-hat-storage/ceph-csi/pull/139

Once this bug is approved, leave "/bugzilla refresh" as a comment in the PR to get is merged.

Comment 8 Yuli Persky 2022-11-22 11:46:48 UTC
The verification steps: 

1) Create CephFS pvc
2) Create a pod and attach the pod to a pvc 
3) Check on which nodes the pods are running

oc get pods -o wide

4) go to the csi-cephfsplugin logs of the pod that are running on the same IP that the pod: 

oc logs csi-cephfsplugin-2zcdd -c csi-cephfsplugin and make sure that no "unit":2 appear in the log content. 


Tis was indeed verified on  4.12.0-114 ODF. 
No "unit":2 messaged appear, only "unit":1. 


Therefore moving this BZ to "Verified". 

@Niels de Vos please confirm these verification steps. 
In case anything else should be checked - please get this BZ back to me.

Comment 9 Niels de Vos 2022-11-22 16:54:34 UTC
Those steps looks correct, but do verify that NodeGetVolumeStats does appear in the same logs. Kubelet only calls that procedure at intervals, and if it isn't called yet, the test isn't valid yet either.

In addition to that, I suggest to verify that some of the metrics for the PVC are available, but "kubelet_volume_stats_inodes" and other from comment #0 should be 'missing'.

Comment 10 Yuli Persky 2022-11-24 11:14:59 UTC
I've checked the logs for NodeGetVolumeStats

oc logs csi-cephfsplugin-rljzb -c csi-cephfsplugin 2>&1 | tee test_plugin_logs.txt

And the following message does appear frequently. 


I1124 11:07:42.302874       1 utils.go:195] ID: 3268 GRPC call: /csi.v1.Node/NodeGetVolumeStats

Comment 11 Yuli Persky 2022-11-24 11:16:51 UTC
Also no kubelet_volume_stats_inodes records appear in the logs.

Comment 19 Juan Miguel Olmo 2023-04-03 09:07:41 UTC
*** Bug 2164633 has been marked as a duplicate of this bug. ***