REVIEW: https://review.gluster.org/17724 (cluster/ec : Don't try to heal when no sink is UP) posted (#1) for review on release-3.11 by Ashish Pandey (aspandey)
Description of problem: -========================= Hit this while verifying BZ#1396010 - [Disperse] healing should not start if only data bricks are UP The fix in bz#1396010 takes care of reducing cpu when the heal deamon notices at the beginning itself all the redundant bricks are down, but if we bring down redundant brick one after another and parallelly IOs were happening, then the CPU consumption doesnt reduce. Hence raising this bz Version-Release number of selected component (if applicable): === 3.8.4-28 How reproducible: ======== always Steps to Reproduce: 1.create a 1x(4+2) ec vol(offline all the vols on this cluster except this vol) 2.trigger IOs say linux kenrel untar 3. keep capturing CPU usage of shd process on all nodes 3.kill b1 4. wait for say 2 minutes and kill b2 Actual results: ===== it can be seen that the cpu usage is more than 100% as long as IOs go on, even though only data number bricks are up Expected results: ============ cpu usage should reduce for shd as there is nothing to heal
COMMIT: https://review.gluster.org/17724 committed in release-3.11 by Shyamsundar Ranganathan (srangana) ------ commit af569e4a418a65b452cd8842d6999734677ad5f3 Author: Ashish Pandey <aspandey> Date: Tue Jul 4 16:18:20 2017 +0530 cluster/ec : Don't try to heal when no sink is UP Problem: 4 + 2 EC volume configuration. If untar of linux is going on and we kill a brick, indices will be created for the files/dir which need to be healed. ec_shd_index_sweep spawns threads to scan these entries and start heal. If in the middle of this we kill one more brick, we end up in a situation where we can not heal an entry as there are only "ec->fragment" number of bricks are UP. However, the scan will be continued and it will trigger the heal for those entries. Solution: When a heal is triggered for an entry, check if it *CAN* be healed or not. If not come out with ENOTCONN. >Change-Id: I305be7701c289f36bd7bde22491b71074771424f >BUG: 1464359 >Signed-off-by: Ashish Pandey <aspandey> >Reviewed-on: https://review.gluster.org/17692 >Smoke: Gluster Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu> >Reviewed-by: Sunil Kumar Acharya <sheggodu> >Reviewed-by: Xavier Hernandez <xhernandez> >Signed-off-by: Ashish Pandey <aspandey> Change-Id: I305be7701c289f36bd7bde22491b71074771424f BUG: 1468457 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: https://review.gluster.org/17724 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Xavier Hernandez <xhernandez> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.2, please open a new bug report. glusterfs-3.11.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-July/031908.html [2] https://www.gluster.org/pipermail/gluster-users/