Bug 1464336 - selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another
selfheal deamon cpu consumption not reducing when IOs are going on and all r...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: disperse (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity high
: ---
: RHGS 3.3.0
Assigned To: Ashish Pandey
nchilaka
3.3.0-devel-freeze-exception
:
Depends On:
Blocks: 1417151 1464359 1468457
  Show dependency treegraph
 
Reported: 2017-06-23 02:56 EDT by nchilaka
Modified: 2017-09-21 00:59 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-33
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1464359 (view as bug list)
Environment:
Last Closed: 2017-09-21 00:59:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2017-06-23 02:56:59 EDT
Description of problem:
-=========================
Hit this while verifying BZ#1396010 - [Disperse] healing should not start if only data bricks are UP
The fix in bz#1396010 takes care of reducing cpu when the heal deamon notices at the beginning itself all the redundant bricks are down, but if we bring down redundant brick one after another and parallelly IOs were happening, then the CPU consumption doesnt reduce.
Hence raising this bz

Version-Release number of selected component (if applicable):
===
3.8.4-28

How reproducible:
========
always

Steps to Reproduce:
1.create a  1x(4+2) ec vol(offline all  the vols on this cluster except this vol)
2.trigger IOs say linux kenrel untar
3. keep capturing CPU usage of shd process on all nodes
3.kill b1
4. wait for say 2 minutes and kill b2

Actual results:
=====
it can be seen that the cpu usage is more than 100% as long as IOs go on, even though only data number bricks are up

Expected results:
============
cpu usage should reduce for shd as there is nothing to heal
Comment 9 nchilaka 2017-07-26 08:28:47 EDT
on_qa validation on 3.8.4-35

Moving to verified, as I don't see the issue anymore

Noticed that by running above case the cpu utilization by shd is mostly null or max of 0-6%, hence bringing down the utilization significantly

Problems/observation:
1)However, I also issued a ls -lRt from another client and the command was hung when both bricks were down(both bricks hosted on same node, 2 brick per node in 3node cluster) --->raised a BZ#1475310 	







Checked for about 10 min and below is the snippet(refer glusterfs for shd proc)
################## LOOP 198 ###############
Mon Jul 24 19:15:41 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  25.0  0.9   3:22.61 glusterfsd
28143 root      20   0 1483380  75680   4680 S  18.8  0.9   3:10.42 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.32 glusterfs
################### LOOP 199 ###############
Mon Jul 24 19:15:44 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  25.0  0.9   3:23.42 glusterfsd
28143 root      20   0 1483380  75680   4680 S  25.0  0.9   3:11.19 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.33 glusterfs
################### LOOP 200 ###############
Mon Jul 24 19:15:47 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  31.2  0.9   3:24.24 glusterfsd
28143 root      20   0 1483380  75680   4680 S  18.8  0.9   3:11.93 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.34 glusterfs
################### LOOP 201 ###############
Mon Jul 24 19:15:50 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  25.0  0.9   3:25.01 glusterfsd
28143 root      20   0 1483380  75680   4680 S  25.0  0.9   3:12.65 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.35 glusterfs
################### LOOP 202 ###############
Mon Jul 24 19:15:54 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28143 root      20   0 1483380  75680   4680 S  31.2  0.9   3:13.44 glusterfsd
28124 root      20   0 1549176  75608   4700 S  25.0  0.9   3:25.87 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.36 glusterfs



################### LOOP 203 ###############
Mon Jul 24 19:15:57 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  25.0  0.9   3:26.71 glusterfsd
28143 root      20   0 1483380  75680   4680 S  25.0  0.9   3:14.19 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.36 glusterfs
################### LOOP 204 ###############
Mon Jul 24 19:16:00 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  25.0  0.9   3:27.52 glusterfsd
28143 root      20   0 1483380  75680   4680 S  25.0  0.9   3:14.93 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.39 glusterfs
################### LOOP 205 ###############
Mon Jul 24 19:16:03 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  29.4  0.9   3:28.36 glusterfsd
28143 root      20   0 1483380  75680   4680 S  17.6  0.9   3:15.68 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.39 glusterfs
################### LOOP 206 ###############
Mon Jul 24 19:16:06 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  29.4  0.9   3:29.22 glusterfsd
28143 root      20   0 1483380  75680   4680 S  23.5  0.9   3:16.48 glusterfsd
28163 root      20   0 1465612  63520   3248 S   5.9  0.8   1:52.41 glusterfs
################### LOOP 207 ###############
Mon Jul 24 19:16:10 IST 2017
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28124 root      20   0 1549176  75608   4700 S  31.2  0.9   3:30.07 glusterfsd
28143 root      20   0 1483380  75680   4680 S  25.0  0.9   3:17.26 glusterfsd
28163 root      20   0 1465612  63520   3248 S   0.0  0.8   1:52.41 glusterfs
################### LOOP 208 ###############
Comment 11 errata-xmlrpc 2017-09-21 00:59:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.