2089225 – Wrong alert fired in UI after a fresh cluster creation .

Bug 2089225 - Wrong alert fired in UI after a fresh cluster creation .

Summary: Wrong alert fired in UI after a fresh cluster creation .

Keywords:
Status:	CLOSED DUPLICATE of bug 2132270
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	yati padia
QA Contact:	Mugdha Soni
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-23 09:22 UTC by Mugdha Soni
Modified:	2023-08-09 16:37 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-10-12 12:25:55 UTC
Embargoed:

Attachments	(Terms of Use)
registry-cephfs-rwx-pvc details page . (141.95 KB, image/png) 2022-05-23 09:22 UTC, Mugdha Soni	no flags	Details
View All

Description Mugdha Soni 2022-05-23 09:22:08 UTC

Created attachment 1882222 [details]
registry-cephfs-rwx-pvc details page .

Description of problem:
--------------------------
After the successful fresh deployment of the OCP + ODF cluster with the versions mentioned below , when directed to management console under Home --> Overview , alert is fired which says :-

"May 19, 2022, 4:15 PM
The PersistentVolume claimed by registry-cephfs-rwx-pvc in Namespace openshift-image-registry only has 0% free inodes."


Version-Release number of selected component :
-----------------------------------------------
ODF : 4.11.0-75

OCP : 4.11.0-0.nightly-2022-05-18-171831


How reproducible:
------------------
3/3

Steps to Reproduce:
---------------------
1. Deploy OCP +ODF cluster .
2. Direct to management-console , Home --> Overview
 
Actual results:-
------------------
(a) OCP is firing wrong alert.

(b) No inode issue as was able to create files .


Expected results:-
-------------------

The correct alert should be fired .

Additional info:
------------------
Donot see any inode issue I was able to create files.

[root@localhost 11-ocp]# oc get pods -n openshift-image-registry 
NAME                                               READY   STATUS      RESTARTS        AGE
cluster-image-registry-operator-78d977c67c-9bgtd   1/1     Running     1 (3d21h ago)   3d21h
image-pruner-27551520-tscxc                        0/1     Completed   0               2d7h
image-pruner-27552960-j922z                        0/1     Completed   0               31h
image-pruner-27554400-76fhf                        0/1     Completed   0               7h33m
image-registry-58f4cfb7f5-ldf4l                    1/1     Running     0               3d20h
node-ca-5gfzd                                      1/1     Running     0               3d21h
node-ca-6xg5k                                      1/1     Running     0               3d21h
node-ca-9sc2c                                      1/1     Running     0               3d21h
node-ca-dljr7                                      1/1     Running     0               3d21h
node-ca-jdxfd                                      1/1     Running     0               3d21h
node-ca-p9xwl                                      1/1     Running     0               3d21h

=================================================

[root@localhost 11-ocp]# oc rsh -n openshift-image-registry image-registry-58f4cfb7f5-ldf4l
sh-4.4$ df -i 
Filesystem                                                                                                                                                  Inodes  IUsed    IFree IUse% Mounted on
overlay                                                                                                                                                   62651840 115707 62536133    1% /
tmpfs                                                                                                                                                      8243906     17  8243889    1% /dev
tmpfs                                                                                                                                                      8243906     17  8243889    1% /sys/fs/cgroup
shm                                                                                                                                                        8243906      1  8243905    1% /dev/shm
tmpfs                                                                                                                                                      8243906   4864  8239042    1% /etc/passwd
172.30.21.130:6789,172.30.130.222:6789,172.30.208.208:6789:/volumes/csi/csi-vol-a5050c60-d760-11ec-b65a-0a580a800213/77564c2a-db11-4ad3-86eb-681acc0f30c1        1      -        -     - /registry
tmpfs                                                                                                                                                      8243906      7  8243899    1% /etc/secrets
/dev/sda4                                                                                                                                                 62651840 115707 62536133    1% /etc/hosts
tmpfs                                                                                                                                                      8243906      5  8243901    1% /var/lib/kubelet
tmpfs                                                                                                                                                      8243906      5  8243901    1% /run/secrets/openshift/serviceaccount
tmpfs                                                                                                                                                      8243906     11  8243895    1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                                                                                                                                      8243906      1  8243905    1% /proc/acpi
tmpfs                                                                                                                                                      8243906      1  8243905    1% /proc/scsi
tmpfs                                                                                                                                                      8243906      1  8243905    1% /sys/firmware

Comment 1 Yu Qi Zhang 2022-05-24 21:24:50 UTC

Hmm, I am not sure why I am the assignee here. Trying to reset

Comment 2 Jakub Hadvig 2022-06-06 15:49:41 UTC

Reassigning to ODF team for further input/triage since the OCP console is only responsible for rendering the alert.

Comment 7 Mugdha Soni 2022-06-16 08:54:02 UTC

Hi Bipul 

I have attached the screenshots for the alerts in comment #5 and comment#6. Please let me know if anything else is required .


Thanks 
Mugdha Soni

Comment 9 Jan Safranek 2022-06-17 14:10:06 UTC

The mounted cephfs volume does not show any size; df output from the above:

Filesystem                                                                                                                                                  Inodes  IUsed    IFree IUse% Mounted on
172.30.21.130:6789,172.30.130.222:6789,172.30.208.208:6789:/volumes/csi/csi-vol-a5050c60-d760-11ec-b65a-0a580a800213/77564c2a-db11-4ad3-86eb-681acc0f30c1        1      -        -     - /registry

Do you know why? It's then reported as zero to Prometheus and it then thinks the volume is full. It looks like a cephfs issue to me.

Kubelet should not treat `-` as zero and we should fix that, but the root cause it IMO somewhere else (cephfs? kernel?)

Comment 12 Jan Safranek 2022-06-20 13:11:57 UTC

Unfortunately, gRPC / protobuf does not allow kubelet to distinguish between "available is not set" and "available is set to 0", as 0 is the default value of int64 fields:

> Note that for scalar message fields, once a message is parsed there's no way of telling whether
> a field was explicitly set to the default value (for example whether a boolean was set to false)
> or just not set at all: you should bear this in mind when defining your message types. 

https://developers.google.com/protocol-buffers/docs/proto3#default

The CSI driver must be fixed to provide some value of available inodes. MAXINT64 would probably work.

Comment 13 Jan Safranek 2022-06-20 13:15:28 UTC

Alternatively, you can report just the free space and don't report any inode counts, it seems that cephfs does not really care about them.

Comment 15 Mudit Agarwal 2022-07-19 13:54:51 UTC

Not a 4.11 blocker

Comment 16 Niels de Vos 2022-10-12 12:25:55 UTC

Bug 2132270 has been reported for this issue as well. CephFS will not report inode information anymore once that BZ is closed.

*** This bug has been marked as a duplicate of bug 2132270 ***

Note You need to log in before you can comment on or make changes to this bug.