Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1520767 - 500% -600% CPU utitlisation when one brick is down in EC volume
500% -600% CPU utitlisation when one brick is down in EC volume
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: disperse (Show other bugs)
3.3
All Linux
high Severity high
: ---
: RHGS 3.4.0
Assigned To: Ashish Pandey
nchilaka
:
Depends On:
Blocks: 1503137
  Show dependency treegraph
 
Reported: 2017-12-05 00:58 EST by Karan Sandha
Modified: 2018-09-17 07:32 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-04 02:39:49 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 02:41 EDT

  None (edit)
Description Karan Sandha 2017-12-05 00:58:52 EST
Description of problem:
High CPU Utilisation when one of the brick is killed in EC volume

Version-Release number of selected component (if applicable):
3.8.4-52

How reproducible:
Tested only once

Steps to Reproduce:
1. Create an EC volume 24*(4+2)
2. Start the volume and run CCTV workload from 4 windows client. eg. Milestone's X-protect
3. Kill one brick from the volume.
4. Monitor the CPU utilisation from the TOP command.

Actual results:
1) 500%-600% CPU utilisation was seen. 

Second Observation:- When all the bricks are up glusterfsd takes 150% CPU utilisation.  

Expected results:
This amount of CPU utilisation shouldn't be observed. 

Additional info:

The software populate with 16MB medium files from 4 windows Clients 

performance.parallel-readdir on
performance.readdir-ahead on
performance.quick-read off
performance.io-cache off
nfs.disable on
transport.address-family inet
features.cache-invalidation on
features.cache-invalidation-timeout 600
performance.stat-prefetch on
performance.cache-invalidation on
performance.md-cache-timeout 600
network.inode-lru-limit 200000
performance.nl-cache on
performance.nl-cache-timeout 600
cluster.lookup-optimize on
server.event-threads 4
client.event-threads 6
performance.cache-samba-metadata on
performance.client-io-threads on
cluster.readdir-optimize on
Comment 9 nchilaka 2018-05-30 06:46:34 EDT
using the cpu control script I was able to control cpu consumption of shd
(however note this is a workaround and not actual fix as already detailed above)
moving the bz to verified
test version:3.12.2-11

ot@dhcp35-97 scripts]# 30
-bash: 30: command not found
[root@dhcp35-97 scripts]# top -n 1 -b|egrep "glusterfs$|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14882 root      20   0 3089444 157524   3712 S 288.2  2.0   7:12.13 glusterfs
14872 root      20   0  538516   9612   3592 S   0.0  0.1   0:00.17 glusterfs
[root@dhcp35-97 scripts]# top -n 1 -b|egrep "glusterfs$|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14882 root      20   0 3089436 153312   3712 S 244.4  1.9   7:25.34 glusterfs
14872 root      20   0  538516   9612   3592 S   0.0  0.1   0:00.17 glusterfs
[root@dhcp35-97 scripts]# ./control-cpu-load.sh 
Enter gluster daemon pid for which you want to control CPU.
^C
[root@dhcp35-97 scripts]# ./control-cpu-load.sh 
Enter gluster daemon pid for which you want to control CPU.

Entered daemon_pid is not numeric so Rerun the script.
[root@dhcp35-97 scripts]# ./control-cpu-load.sh 
Enter gluster daemon pid for which you want to control CPU.
14882
If you want to continue the script to attach 14882 with new cgroup_gluster_14882 cgroup Press (y/n)?
invalid
[root@dhcp35-97 scripts]# ./control-cpu-load.sh 
Enter gluster daemon pid for which you want to control CPU.

Entered daemon_pid is not numeric so Rerun the script.
[root@dhcp35-97 scripts]# ./control-cpu-load.sh 
Enter gluster daemon pid for which you want to control CPU.
14882
If you want to continue the script to attach 14882 with new cgroup_gluster_14882 cgroup Press (y/n)?y
yes
Creating child cgroup directory 'cgroup_gluster_14882 cgroup' for glusterd.service.
Enter quota value in range [10,100]:  
50
Entered quota value is 50
Setting 50000 to cpu.cfs_quota_us for gluster_cgroup.
Tasks are attached successfully specific to 14882 to cgroup_gluster_14882.
[root@dhcp35-97 scripts]# top -n 1 -b|egrep "glusterfs$|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14882 root      20   0 3089488 157564   3712 S  58.8  2.0   8:27.61 glusterfs
14872 root      20   0  538516   9612   3592 S   0.0  0.1   0:00.17 glusterfs
[root@dhcp35-97 scripts]# 14882
-bash: 14882: command not found
[root@dhcp35-97 scripts]# top -n 1 -b|egrep "glusterfs$|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14882 root      20   0 3089456 159556   3712 S  50.0  2.0   9:01.75 glusterfs
14872 root      20   0  538516   9612   3592 S   0.0  0.1   0:00.18 glusterfs
[root@dhcp35-97 scripts]# 14882
-bash: 14882: command not found
[root@dhcp35-97 scripts]# 14882
-bash: 14882: command not found
[root@dhcp35-97 scripts]# top -n 1 -b|egrep "glusterfs$|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14882 root      20   0 3089480 153672   3712 S  33.3  1.9   9:04.91 glusterfs
14872 root      20   0  538516   9612   3592 S   0.0  0.1   0:00.18 glusterfs
[root@dhcp35-97 scripts]# pwd
/usr/share/glusterfs/scripts
[root@dhcp35-97 scripts]# ^C
[root@dhcp35-97 scripts]#
Comment 11 errata-xmlrpc 2018-09-04 02:39:49 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.