Description of problem: -========================= Hit this while verifying BZ#1396010 - [Disperse] healing should not start if only data bricks are UP The fix in bz#1396010 takes care of reducing cpu when the heal deamon notices at the beginning itself all the redundant bricks are down, but if we bring down redundant brick one after another and parallelly IOs were happening, then the CPU consumption doesnt reduce. Hence raising this bz Version-Release number of selected component (if applicable): === 3.8.4-28 How reproducible: ======== always Steps to Reproduce: 1.create a 1x(4+2) ec vol(offline all the vols on this cluster except this vol) 2.trigger IOs say linux kenrel untar 3. keep capturing CPU usage of shd process on all nodes 3.kill b1 4. wait for say 2 minutes and kill b2 Actual results: ===== it can be seen that the cpu usage is more than 100% as long as IOs go on, even though only data number bricks are up Expected results: ============ cpu usage should reduce for shd as there is nothing to heal
on_qa validation on 3.8.4-35 Moving to verified, as I don't see the issue anymore Noticed that by running above case the cpu utilization by shd is mostly null or max of 0-6%, hence bringing down the utilization significantly Problems/observation: 1)However, I also issued a ls -lRt from another client and the command was hung when both bricks were down(both bricks hosted on same node, 2 brick per node in 3node cluster) --->raised a BZ#1475310 Checked for about 10 min and below is the snippet(refer glusterfs for shd proc) ################## LOOP 198 ############### Mon Jul 24 19:15:41 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 25.0 0.9 3:22.61 glusterfsd 28143 root 20 0 1483380 75680 4680 S 18.8 0.9 3:10.42 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.32 glusterfs ################### LOOP 199 ############### Mon Jul 24 19:15:44 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 25.0 0.9 3:23.42 glusterfsd 28143 root 20 0 1483380 75680 4680 S 25.0 0.9 3:11.19 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.33 glusterfs ################### LOOP 200 ############### Mon Jul 24 19:15:47 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 31.2 0.9 3:24.24 glusterfsd 28143 root 20 0 1483380 75680 4680 S 18.8 0.9 3:11.93 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.34 glusterfs ################### LOOP 201 ############### Mon Jul 24 19:15:50 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 25.0 0.9 3:25.01 glusterfsd 28143 root 20 0 1483380 75680 4680 S 25.0 0.9 3:12.65 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.35 glusterfs ################### LOOP 202 ############### Mon Jul 24 19:15:54 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28143 root 20 0 1483380 75680 4680 S 31.2 0.9 3:13.44 glusterfsd 28124 root 20 0 1549176 75608 4700 S 25.0 0.9 3:25.87 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.36 glusterfs ################### LOOP 203 ############### Mon Jul 24 19:15:57 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 25.0 0.9 3:26.71 glusterfsd 28143 root 20 0 1483380 75680 4680 S 25.0 0.9 3:14.19 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.36 glusterfs ################### LOOP 204 ############### Mon Jul 24 19:16:00 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 25.0 0.9 3:27.52 glusterfsd 28143 root 20 0 1483380 75680 4680 S 25.0 0.9 3:14.93 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.39 glusterfs ################### LOOP 205 ############### Mon Jul 24 19:16:03 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 29.4 0.9 3:28.36 glusterfsd 28143 root 20 0 1483380 75680 4680 S 17.6 0.9 3:15.68 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.39 glusterfs ################### LOOP 206 ############### Mon Jul 24 19:16:06 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 29.4 0.9 3:29.22 glusterfsd 28143 root 20 0 1483380 75680 4680 S 23.5 0.9 3:16.48 glusterfsd 28163 root 20 0 1465612 63520 3248 S 5.9 0.8 1:52.41 glusterfs ################### LOOP 207 ############### Mon Jul 24 19:16:10 IST 2017 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28124 root 20 0 1549176 75608 4700 S 31.2 0.9 3:30.07 glusterfsd 28143 root 20 0 1483380 75680 4680 S 25.0 0.9 3:17.26 glusterfsd 28163 root 20 0 1465612 63520 3248 S 0.0 0.8 1:52.41 glusterfs ################### LOOP 208 ###############
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774