2132270 – CephFS should not report incomplete/incorrect inode info

Bug 2132270 - CephFS should not report incomplete/incorrect inode info

Summary: CephFS should not report incomplete/incorrect inode info

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.12.0
Assignee:	Niels de Vos
QA Contact:	Yuli Persky
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2089225 2164633 (view as bug list)
Depends On:
Blocks:	2128263 2149676 2149677
TreeView+	depends on / blocked

Reported:	2022-10-05 10:15 UTC by Juan Miguel Olmo
Modified:	2023-08-09 16:37 UTC (History)
CC List:	11 users (show)
Fixed In Version:	4.12.0-79
Doc Type:	Removed functionality
Doc Text:	PersistentVolumes that use CephFS did not provide useful statistics about consumed/free inodes. Because the number of free inodes on a CephFS volume is not relevant (new inodes get created when needed), metrics that suggest running out of inodes do not provide important information. In order to prevent erroneous alerting about running low, or out of inodes, Ceph-CSI does not return metrics about inodes on CephFS at all anymore.
Clone Of:
Clones:	2149677 (view as bug list)
Environment:
Last Closed:	2023-02-08 14:06:28 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-csi pull 3407	None	Merged	util: make inode metrics optional in FilesystemNodeGetVolumeStats()	2022-10-14 08:02:17 UTC
Github	red-hat-storage ceph-csi pull 138	None	open	Sync the upstream changes from `ceph/ceph-csi:devel` into the `devel` branch	2022-10-14 08:01:59 UTC
Github	red-hat-storage ceph-csi pull 139	None	open	Bug 2132270: util: make inode metrics optional in FilesystemNodeGetVolumeStats()	2022-10-14 10:04:00 UTC
Github	red-hat-storage ocs-ci pull 6882	None	Merged	Validation for BZ-2132270	2023-02-10 07:48:04 UTC

Internal Links: 2128263

Description Juan Miguel Olmo 2022-10-05 10:15:08 UTC

Description of problem:

Values of total inode capacity for filesystems with dynamic inode
allocation (such as CephFS) are not valid to provide calculations or alarms about storage capacity.

An example of the inodes metrics for a PV backed by cephfs could be:
kubelet_volume_stats_inodes_free{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 0
kubelet_volume_stats_inodes{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 7418
kubelet_volume_stats_inodes_used{persistentvolumeclaim="registry-cephfs-rwx-pvc"} 7419

The kubelet_volume_stats_inodes_free does not show the reality, because inodes are added dynamically when it is needed, and the other two metrics cannot be used to calculate capacity, or free inodes for the same reason.

The discussion of the solution is available a part of "Bug 2128263 - Alert KubePersistentVolumeInodesFillingUp MON-2802", but summarizing, it has been decided to introduce a new Ceph-CSI configuration option to not report inode information at all.

As an improvement, it could be nice to provide only the inode information that is real, in our case, only 'kubelet_volume_stats_inodes_used'



Version of all relevant components (if applicable):
OCP 4.11.0
ODF 4.11.0

Does this issue impact your ability to continue to work with the product

False alarms raised.


Is there any workaround available to the best of your knowledge?
Silent this kind of alarms

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug ?
1 - very simple

Can this issue reproducible?
yes 

Steps to Reproduce:
1. Install OCP
2. Reconfigure OpenShift Container Platform registry to use RWX
   CephFS volume provided by ODF
3. Use the cluster for a while
4. Check firing alerts


Actual results:
Alert KubePersistentVolumeInodesFillingUp is firing

Expected results:
Alert KubePersistentVolumeInodesFillingUp is not firing when RWX CephFS volume
is used to provide persistent storage for some OCP component.

Additional info:
See bug 2128263

Comment 2 Niels de Vos 2022-10-05 12:47:42 UTC

CephFS does not have a concept of "free inodes", inodes get allocated on-demand in the filesystem.

This confuses alerting managers that expect a (high) number of free inodes, and warnings get produced if the number of free inodes is not high enough. This causes alerts to always get reported for CephFS.

To prevent the false-positive alerts from happening, the NodeGetVolumeStats procedure for CephFS (and CephNFS) will not contain inodes in the reply anymore.

Comment 3 Niels de Vos 2022-10-12 12:25:55 UTC

*** Bug 2089225 has been marked as a duplicate of this bug. ***

Comment 4 Niels de Vos 2022-10-14 08:02:00 UTC

https://github.com/red-hat-storage/ceph-csi/pull/138 is the backport to the ODF devel branch

Comment 5 Niels de Vos 2022-10-14 08:20:49 UTC

The backport for ODF-4.12 is ready at https://github.com/red-hat-storage/ceph-csi/pull/139

Once this bug is approved, leave "/bugzilla refresh" as a comment in the PR to get is merged.

Comment 8 Yuli Persky 2022-11-22 11:46:48 UTC

The verification steps: 

1) Create CephFS pvc
2) Create a pod and attach the pod to a pvc 
3) Check on which nodes the pods are running

oc get pods -o wide

4) go to the csi-cephfsplugin logs of the pod that are running on the same IP that the pod: 

oc logs csi-cephfsplugin-2zcdd -c csi-cephfsplugin and make sure that no "unit":2 appear in the log content. 


Tis was indeed verified on  4.12.0-114 ODF. 
No "unit":2 messaged appear, only "unit":1. 


Therefore moving this BZ to "Verified". 

@Niels de Vos please confirm these verification steps. 
In case anything else should be checked - please get this BZ back to me.

Comment 9 Niels de Vos 2022-11-22 16:54:34 UTC

Those steps looks correct, but do verify that NodeGetVolumeStats does appear in the same logs. Kubelet only calls that procedure at intervals, and if it isn't called yet, the test isn't valid yet either.

In addition to that, I suggest to verify that some of the metrics for the PVC are available, but "kubelet_volume_stats_inodes" and other from comment #0 should be 'missing'.

Comment 10 Yuli Persky 2022-11-24 11:14:59 UTC

I've checked the logs for NodeGetVolumeStats

oc logs csi-cephfsplugin-rljzb -c csi-cephfsplugin 2>&1 | tee test_plugin_logs.txt

And the following message does appear frequently. 


I1124 11:07:42.302874       1 utils.go:195] ID: 3268 GRPC call: /csi.v1.Node/NodeGetVolumeStats

Comment 11 Yuli Persky 2022-11-24 11:16:51 UTC

Also no kubelet_volume_stats_inodes records appear in the logs.

Comment 19 Juan Miguel Olmo 2023-04-03 09:07:41 UTC

*** Bug 2164633 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.