|Summary:||MTSH: multithreaded self heal hogs cpu consistently over 150%|
|Product:||Red Hat Gluster Storage||Reporter:||nchilaka <nchilaka>|
|Component:||replicate||Assignee:||Mohit Agrawal <moagrawa>|
|Status:||CLOSED ERRATA||QA Contact:||Vijay Avuthu <vavuthu>|
|Version:||rhgs-3.2||CC:||amukherj, moagrawa, ravishankar, rhinduja, rhs-bugs, sheggodu, srmukher, storage-qa-internal|
|Target Release:||RHGS 3.4.0||Flags:||srmukher:
|Fixed In Version:||glusterfs-3.12.2-2||Doc Type:||Bug Fix|
Some gluster daemons like glustershd have a higher cpu or memory consumption, when there is a large amount of data/entries to healed. This results in slow consumption of resources. You can resolve this by running the control-cpu-load.sh script. This script used the control groups for regulating cpu and memory of any gluster daemon.
|Last Closed:||2018-09-04 06:29:44 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:||1484446|
Description nchilaka 2016-11-23 11:25:22 UTC
Description of problem: ========================= When we set shd threads to say 4 and wait for the heal. We can see that the selfheal deamon consumes abotu 150% cpu at an average consitenty as long as heal goes on. THis can really put resources on strain. For that reason, I am guessing even my systemic testing where heal is pending and mtsh is set to 4 are so busy with kernel hung messages that I can't even log in. I can only ping to the machines. I know mtsh comes with the trade off of resources but there must be a cap to it Version-Release number of selected component (if applicable): =========== 3.8.4-5
Comment 2 nchilaka 2016-11-23 11:31:30 UTC
For eg, I have attached some cpu log files, you can notice that 1102 is the self heal deamon process of the source brick with shd option set to default 1 ===>avg cpu consumption is <100% However with the threads set to 4, the avg cpu consumption is ~150% (refer 3354) leg_newlog.log ==>legacy gluster processes top o/p newlog.log ==>with mtsh set to 4
Comment 4 Atin Mukherjee 2016-11-25 08:01:50 UTC
Dev comment : We can't fix this bug atm as per the design, raising number of threads would definitely eat up more CPU. We'd need to loop in Perf team to assess the h/w recommendation for MT-self heal usage which needs to be documented. Pranith will follow up with perf team. A decision has been arrived to take this bug out of 3.2.0 in today's triage meeting between Dev, QE & PM. More details at https://docs.google.com/spreadsheets/d/1ew4cafcvIVEWuJ4tLDuZ4ao7ZTYpsRz5NwCtQ4JVZaQ/edit#gid=0
Comment 6 Ambarish 2017-03-16 08:55:27 UTC
Seeing similar stuff on physical perf machines. I see CPU usage shooting upto 150% (and it stays there),though I could login etc to my machines since the machines I ma using are pretty high-end with 24 cores and 48G RAM.
Comment 8 Ravishankar N 2017-09-27 09:44:54 UTC
Mohit's patch upstream: https://review.gluster.org/#/c/18404/
Comment 12 Vijay Avuthu 2018-05-17 04:58:59 UTC
This bug has been verified as part of bug 1478395. Changing status to Verified.
Comment 13 Srijita Mukherjee 2018-09-03 15:33:04 UTC
Have updated the doc text. kindly review and confirm.
Comment 15 errata-xmlrpc 2018-09-04 06:29:44 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607