Created attachment 2007833 [details] alert Description of problem (please be detailed as possible and provide log snippests): Description: Description Below MDSCPUUsageHigh alert is fired when Ceph metadata server pod (rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-8494565f5zs9m) has high cpu usage but the alert does not provide clear instructions or steps to take in response to the alert. The alert should include a call to action, providing either steps to increase the number of active metadata servers or a link to the documentation on what to do when the MDSCPUUsageHigh alert is received. Alert: Ceph metadata server pod (rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-8494565f5zs9m) has high cpu usage. Please consider increasing the number of active metadata servers, it can be done by increasing the number of activeMetadataServers parameter in the StorageCluster CR. Name MDSCPUUsageHigh Severity Warning Message Ceph metadata server pod (rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-8494565f5zs9m) has high cpu usage Version of all relevant components (if applicable): odf: 4.15.0-104.stable 4.15.0-0.nightly-2024-01-06-062415 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Create 3m, 3w OCP cluster [BM Nvme platform] and install ODF on it. 2.Create two or more cephfs PVCs with RWX access mode 3. Run file creator pods with more no.of threads [eg: 80], the CPU load will go high and the alert will be received. 4. Go to the dashboard, alert will be received but the alert won't be having any instructions to perform he required action. Actual results: The MDSCPUUsageHigh alert is lacking a call to action Expected results: The alert should include a call to action, providing either steps to increase the number of active metadata servers or a link to the documentation on what to do when the MDSCPUUsageHigh alert is received. Additional info:
*** This bug has been marked as a duplicate of bug 2256725 ***
Reopening this, as this is different from the mentioned one as it's about CPU usage, and that one is about cache usage.
Hi Manish, Any plans for this BZ in 4.15? else we can move it out to 4.16.
It should be fixed in a similar fashion like https://bugzilla.redhat.com/show_bug.cgi?id=2256725 Providing devel ack
Verified with below versions, issue still persists. I didn't find any actionable link provided in the alert, please check once. Please find the attached screenshot for the same. OCP:4.15.0-0.nightly-2024-02-16-235514 ODF: 4.15.0-144
Increasing the severity to High as it is failing one of our test case and blocking us to verify the actionable link.
Rule was modified to tune the time for test requirement. Used latest prometheus yaml file and retested, I am able to see the actionable link. Thanks! One query regarding the feature. According to this feature we are suggesting customer to increase the MDS pods right. Do we need to suggest CPU increment as well? RHSTOR-3865: Alert when CephFS MDS scaling is needed - More MDS pods are required -->Goal : Improve customer experience and alert if MDS scaling is needed The below information found it the article linked in the alert. We need to either increase the allocated CPU or run multiple active MDS. The blow command describes how to set the number of allocated CPU for MDS server. oc patch -n openshift-storage storagecluster ocs-storagecluster \ --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "8"}, "requests": {"cpu": "8"}}}}}' In order to run multiple active MDS servers, use below command ```bash oc patch -n openshift-storage cephfilesystem ocs-storagecluster-cephfilesystem\ --type merge \ --patch '{"spec": {"metadataServer": {"activeCount": 2}}}' Make sure we have enough CPU provisioned for MDS depending on the load. ```
I have given steps to do both, its their choice to choose one of the two remedies.
(In reply to Manish Yathnalli from comment #17) > I have given steps to do both, its their choice to choose one of the two > remedies. Eran and Bipin, When the MDSCPUHighUsage alert is received, the actionable link provides steps to either increase the allocated CPU by editing the ocs-storagecluster or to run multiple active MDS by patching the cephfilesystem, increasing the metadataServer activeCount to 2. I followed the steps in the https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHigh.md to run multiple active metadata serves and observed that Two Active and Two S-R MDS daemons found after scale up. Out of two active mds one will be stopped very soon and there won't be any load share between two active MDS pods, load will continue only on Single active MDS's CPU. Hence, the alert still remains in firing state only. Before scale up of MDS pods: sh-5.1$ ceph fs status ocs-storagecluster-cephfilesystem - 3 clients ================================= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ocs-storagecluster-cephfilesystem-a Reqs: 807 /s 2530k 2531k 1299 2065k 0-s standby-replay ocs-storagecluster-cephfilesystem-b Evts: 4529 /s 2862k 2862k 1288 0 POOL TYPE USED AVAIL ocs-storagecluster-cephfilesystem-metadata metadata 10.1G 742G ocs-storagecluster-cephfilesystem-data0 data 62.2G 742G MDS version: ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable) After Scale up of MDS pods using patch cmd given in the document provided above. sh-5.1$ date Wed Feb 21 06:22:11 UTC 2024 sh-5.1$ ceph fs status ocs-storagecluster-cephfilesystem - 4 clients ================================= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ocs-storagecluster-cephfilesystem-a Reqs: 1668 /s 2536k 2536k 1299 2051k 1 active ocs-storagecluster-cephfilesystem-d Reqs: 0 /s 12 15 13 3 0-s standby-replay ocs-storagecluster-cephfilesystem-b Evts: 9454 /s 2862k 2862k 1288 0 1-s standby-replay ocs-storagecluster-cephfilesystem-c Evts: 0 /s 0 3 1 0 POOL TYPE USED AVAIL ocs-storagecluster-cephfilesystem-metadata metadata 10.2G 741G ocs-storagecluster-cephfilesystem-data0 data 63.8G 741G MDS version: ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable) Waited for few minutes, no load share b/w active MDS. Found that one active MDS stopped and went to Standby MDS. sh-5.1$ date Wed Feb 21 06:26:24 UTC 2024 sh-5.1$ ceph fs status ocs-storagecluster-cephfilesystem - 3 clients ================================= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ocs-storagecluster-cephfilesystem-a Reqs: 1256 /s 2536k 2536k 1299 2064k 0-s standby-replay ocs-storagecluster-cephfilesystem-b Evts: 2888 /s 2864k 2864k 1288 0 POOL TYPE USED AVAIL ocs-storagecluster-cephfilesystem-metadata metadata 10.7G 740G ocs-storagecluster-cephfilesystem-data0 data 67.4G 740G STANDBY MDS ocs-storagecluster-cephfilesystem-d ocs-storagecluster-cephfilesystem-c MDS version: ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable) ------------------------------------------------ Below are Ceph status logs monitored thought the procedure of MDS scale up sh-5.1$ ceph -s -w cluster: id: 23416e5d-5223-492f-89e0-eefdebcb0193 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 12h) mgr: a(active, since 45h), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 12h), 3 in (since 45h) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 169 pgs objects: 5.43M objects, 17 GiB usage: 144 GiB used, 2.6 TiB / 2.7 TiB avail pgs: 169 active+clean io: client: 25 MiB/s rd, 17 MiB/s wr, 129 op/s rd, 6.74k op/s wr 2024-02-21T06:20:49.191293+0000 mon.a [WRN] Health check failed: 1 filesystem is online with fewer MDS than max_mds (MDS_UP_LESS_THAN_MAX) 2024-02-21T06:20:49.203774+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-d assigned to filesystem ocs-storagecluster-cephfilesystem as rank 1 (now has 2 ranks) 2024-02-21T06:20:49.203829+0000 mon.a [INF] Health check cleared: MDS_UP_LESS_THAN_MAX (was: 1 filesystem is online with fewer MDS than max_mds) 2024-02-21T06:20:49.203838+0000 mon.a [INF] Cluster is now healthy 2024-02-21T06:20:49.295776+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-d is now active in filesystem ocs-storagecluster-cephfilesystem as rank 1 2024-02-21T06:24:12.136660+0000 mon.a [INF] stopping daemon mds.ocs-storagecluster-cephfilesystem-d 2024-02-21T06:24:29.174132+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-d finished stopping rank 1 in filesystem ocs-storagecluster-cephfilesystem (now has 1 ranks) 2024-02-21T06:30:00.000141+0000 mon.a [INF] overall HEALTH_OK Currently, I'm unsure if we have been recommending this procedure to customers to increase the active MDS count. Do you think we should include this procedure in the actionable link? Please share your thoughts.
Manish, Out of two solutions provided by you in the https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHigh.md, only CPU increment is working fine. The load share is happening when CPU increment was done and the alert disappeared. The second solution--> MDS pods scale-up is not working as expected. The load share is not happening on Active MDS pods when the scale up is done. Observed that only one Active MDS will be available for few minutes after scale up. In sometime after scale up, lets say 20 to 30mins--> there won't be any Active MDS available, MDS daemon will be in rojoin state forever. This observation is already updated in Comment 18. @Manish, The MDS scale up procedure seems to be not working and it looks like a blocker for 4.15. Based on our test results, QE can agree with only CPU increment based on our test results.
Yes it's exposed here, https://github.com/red-hat-storage/ocs-operator/blob/f8a0c2c9fc43de45527a5ef892d682fa5e98f5c2/api/v1/storagecluster_types.go#L234 Can be done by oc patch like this oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephFilesystems/activeMetadataServers", "value": <> }]'
Yes Subham, directly modifying the Filesystem CR won't work as it will just be reconciled back. You have to patch via the storagecluster CR only, so the command should be changed in the runbook example.
Based on Comment 18, moving it back to Assigned state.
https://github.com/openshift/runbooks/pull/167
>This command is adding 1 Active and 1 standby-replay mds daemons. So there will be total 2-Active and 2 standby-replay mds after this patch command. But there is no Load share is happening on newly added mds pods. Attached snapshot of the metrics to this BZ. In 4.15 clusters we have default csi-subvolume group pinning enabled with default settings, So we should see the load share, @Patrick would you like to share some thoughts. Nagendra can you share the output of `oc get CephFilesystemSubVolumeGroup -o yaml`
Created a separate BZ https://bugzilla.redhat.com/show_bug.cgi?id=2265987 for Comment 29. @Manish, you can edit document and remove the instructions for MDS scale up. We don't recommend to suggest MDS scale up officially until it is tested in ODF environment internally.
(In reply to Parth Arora from comment #31) > >This command is adding 1 Active and 1 standby-replay mds daemons. So there will be total 2-Active and 2 standby-replay mds after this patch command. > But there is no Load share is happening on newly added mds pods. Attached > snapshot of the metrics to this BZ. > > In 4.15 clusters we have default csi-subvolume group pinning enabled with > default settings, So we should see the load share, @Patrick would you like > to share some thoughts. > > Nagendra can you share the output of `oc get CephFilesystemSubVolumeGroup -o > yaml` Load share happened after 7hrs of scale up, created a separate BZ [comment 32] for that. We can discuss about load share issue on the new BZ. Let's close this BZ with document modification. sh-5.1$ ceph fs status ocs-storagecluster-cephfilesystem - 2 clients ================================= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ocs-storagecluster-cephfilesystem-c Reqs: 1229 /s 7098 1459 54 1446 1 active ocs-storagecluster-cephfilesystem-d Reqs: 0 /s 15 18 15 1 0-s standby-replay ocs-storagecluster-cephfilesystem-b Evts: 1370 /s 89.7k 1568 54 0 1-s standby-replay ocs-storagecluster-cephfilesystem-a Evts: 0 /s 5 8 5 0 POOL TYPE USED AVAIL ocs-storagecluster-cephfilesystem-metadata metadata 22.0G 707G ocs-storagecluster-cephfilesystem-data0 data 60.3G 707G MDS version: ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable) sh-5.1$ exit exit oc get CephFilesystemSubVolumeGroup -o yaml apiVersion: v1 items: - apiVersion: ceph.rook.io/v1 kind: CephFilesystemSubVolumeGroup metadata: creationTimestamp: "2024-02-22T06:08:51Z" finalizers: - cephfilesystemsubvolumegroup.ceph.rook.io generation: 1 name: ocs-storagecluster-cephfilesystem-csi namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1 blockOwnerDeletion: true controller: true kind: StorageCluster name: ocs-storagecluster uid: 8a7362b4-3284-4ea8-84da-2ead62d72179 resourceVersion: "375147" uid: aabedaf8-8811-4dcc-9977-faa2f16f4b30 spec: filesystemName: ocs-storagecluster-cephfilesystem name: csi pinning: distributed: 1 status: info: clusterID: 5bb69c306a7d011c3e91c3cec112fb7a observedGeneration: 1 phase: Ready kind: List metadata: resourceVersion: ""
We are removing the instructions to increase MDS for 4.15 Harish, as discussed please create a BZ for 4.16 to fix the MDS scale up/down part.
(In reply to Mudit Agarwal from comment #34) > We are removing the instructions to increase MDS for 4.15 > > Harish, as discussed please create a BZ for 4.16 to fix the MDS scale > up/down part. Mudit, Please find the BZ for MDS scale up https://bugzilla.redhat.com/show_bug.cgi?id=2265987
Verified with 4.15.0-150. Yes, the document is modified and only CPU increment is suggested. But some important changes needs to be done as described below. 1. Please modify the Alert description in Prometheus-rules to suggest CPU increment not the MDS servers increment. - alert: MDSCPUUsageHigh annotations: description: |- Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage. Please consider increasing the number of active metadata servers, it can be done by increasing the number of activeMetadataServers parameter in the StorageCluster CR. message: Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHigh.md severity_level: warning expr: | pod:container_cpu_usage:sum{pod=~"rook-ceph-mds.*"}/ on(pod) kube_pod_resource_limit{resource='cpu',pod=~"rook-ceph-mds.*"} > 0.67 for: 6h labels: severity: warning 2. Please remove "or run multiple active metadata servers" from the Impact section in https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHigh.md
See: https://bugzilla.redhat.com/show_bug.cgi?id=2265987#c3
Verified with fix, changes reflected in alert and runbook. Please find the snapshots for the same.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383