This bug was initially created as a copy of Bug #2274015 I am copying this bug because: Description of problem (please be detailed as possible and provide log snippets): It has been observed in at least 2 ODF 4.16 clusters, that Ceph health is getting into WARNING state with the following message: cluster: id: 45c47df0-e2fa-4931-9e45-b6c109ce5b69 health: HEALTH_WARN 1 MDSs report slow requests services: mon: 3 daemons, quorum a,b,c (age 6h) mgr: b(active, since 5h), standbys: a mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 26h), 3 in (since 30h) data: volumes: 1/1 healthy pools: 5 pools, 145 pgs objects: 226 objects, 577 MiB usage: 3.9 GiB used, 296 GiB / 300 GiB avail pgs: 145 active+clean io: client: 853 B/s rd, 3.3 KiB/s wr, 1 op/s rd, 0 op/s wr This seems to impact CephFS functionality, with CephFS backed PVCs fail to reach Bound state: E ocs_ci.ocs.exceptions.ResourceWrongStatusException: Resource pvc-test-87a4c63e82584fdabf50638748121fe describe output: Name: pvc-test-87a4c63e82584fdabf50638748121fe E Namespace: namespace-test-f41280f9e48b49a98deb0bc0f E StorageClass: ocs-storagecluster-cephfs E Status: Pending E Volume: E Labels: <none> E Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com E volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com E Finalizers: [kubernetes.io/pvc-protection] E Capacity: E Access Modes: E VolumeMode: Filesystem E Used By: <none> E Events: E Type Reason Age From Message E ---- ------ ---- ---- ------- E Normal Provisioning 76s openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-675dbc77b8-xfzs4_9aae553a-5912-479b-95f2-1dcbe530dbdf External provisioner is provisioning volume for claim "namespace-test-f41280f9e48b49a98deb0bc0f/pvc-test-87a4c63e82584fdabf50638748121fe" E Normal ExternalProvisioning 11s (x6 over 76s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'openshift-storage.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered. Seeing these log entries in the MDS pod logs: debug 2024-04-08T13:49:01.154+0000 7f08f1877640 0 log_channel(cluster) log [WRN] : 4 slow requests, 0 included below; oldest blocked for > 15522.036438 secs debug 2024-04-08T13:49:06.154+0000 7f08f1877640 0 log_channel(cluster) log [WRN] : 4 slow requests, 0 included below; oldest blocked for > 15527.036638 secs debug 2024-04-08T13:49:11.154+0000 7f08f1877640 0 log_channel(cluster) log [WRN] : 4 slow requests, 0 included below; oldest blocked for > 15532.036821 secs debug 2024-04-08T13:49:16.154+0000 7f08f1877640 0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for > 15537.036953 secs debug 2024-04-08T13:49:16.154+0000 7f08f1877640 0 log_channel(cluster) log [WRN] : slow request 15363.773573 seconds old, received at 2024-04-08T09:33:12.382280+0000: client_request(client.271142:3 lookup #0x10000000000/csi 2024-04-08T09:33:12.380870+0000 caller_uid=0, caller_gid=0{}) currently cleaned up request Version of all relevant components (if applicable): ODF 4.16.0-69 Ceph Version 18.2.1-76.el9cp (2517f8a5ef5f5a6a22013b2fb11a591afd474668) reef (stable) OCP 4.16.0-0.nightly-2024-04-06-020637 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? CephFS functionality seems to be impacted as described above Is there any workaround available to the best of your knowledge? Restarting one of the MDS pods brings Ceph health back to OK but the issue repeats again after 1-2 hours Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? N/A If this is a regression, please provide more details to justify this: This is new in 4.16 Steps to Reproduce: 1. Deploy ODF 4.16. Wait for 1-2 hours and check Ceph health Actual results: Ceph health showing the aforementioned WARNING Expected results: Ceph health should not degrade Additional info: Must gather attached with MDS in log level = 20
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
*** This bug has been marked as a duplicate of bug 2277944 ***