2272031 – [CephFS - Consistency Group] - High CPU utilisation 59.8% by MDS in Baremetal Ceph with CG quiesce testing

Bug 2272031 - [CephFS - Consistency Group] - High CPU utilisation 59.8% by MDS in Baremetal Ceph with CG quiesce testing

Summary: [CephFS - Consistency Group] - High CPU utilisation 59.8% by MDS in Baremetal...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	7.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	7.1
Assignee:	Leonid Usov
QA Contact:	sumr
Docs Contact:	Akash Raj
URL:
Whiteboard:
Depends On:
Blocks:	2267614 2298578 2298579
TreeView+	depends on / blocked

Reported:	2024-03-28 10:02 UTC by sumr
Modified:	2024-07-18 07:59 UTC (History)
CC List:	10 users (show)
Fixed In Version:	ceph-18.2.1-107.el8cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-06-13 14:30:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-8681	0	None	None	None	2024-03-28 10:10:01 UTC
Red Hat Product Errata	RHSA-2024:3925	0	None	None	None	2024-06-13 14:30:48 UTC

Description sumr 2024-03-28 10:02:42 UTC

Description of problem:

CPU Utilisation on Baremetal Ceph node hosting MDS is Very High, it's 59.8%.
This is seen after Stress and Scalability tests on Consistency Group feature.

top cmd output snippet: from MDS node magna24

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                    
   1759 167       20   0 9658228   5.8g   8892 S  59.8  18.7   5864:09 ceph-mds                                                                                                                                                                   
    599 root      20   0  228720 130484 127536 S   1.0   0.4 569:12.47 systemd-journal                                                                                                                                                            
    802 root      20   0  598176  46556  34760 S   1.0   0.1 141:57.00 rsyslogd                                                                                                                                                                   
   2378 167       20   0 5008724   3.2g  10332 S   0.3  10.2 103:43.44 ceph-osd                                                                                                                                                                   
   2381 167       20   0 4896376   3.1g  10500 S   0.3   9.9 125:46.09 ceph-osd 




Version-Release number of selected component (if applicable): ceph version 18.2.1-76.el9cp


How reproducible:


Steps to Reproduce:
1. Run Consistency Group quiesce stress and scalability tests on baremetal ceph setup.
2. Capture CPU usage from MDS node.

Actual results: CPU utilisation - 59.8%

System reports socket timeout for any further tests, and ceph status shows "1 MDSs report slow requests" warning for more than a day.

Test logs for socket timeout : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-P4WY12

Expected results: Even if the IO payload to MDS is high, due to Stress and scalability tests, MDS daemon could limit on the CPU resource utilisation.




Additional info: System logs will be copied.

Comment 10 Leonid Usov 2024-05-09 09:41:13 UTC

No documentation / release notes update is required for this issue

Comment 11 errata-xmlrpc 2024-06-13 14:30:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925

Note You need to log in before you can comment on or make changes to this bug.