Bug 1468457 - selfheal deamon cpu consumption not reducing when IOs are going on and all redundant bricks are brought down one after another
selfheal deamon cpu consumption not reducing when IOs are going on and all r...
Product: GlusterFS
Classification: Community
Component: disperse (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Ashish Pandey
Depends On: 1464336 1464359
  Show dependency treegraph
Reported: 2017-07-07 03:31 EDT by Ashish Pandey
Modified: 2017-08-12 09:07 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.11.2
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1464359
Last Closed: 2017-08-12 09:07:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Comment 1 Worker Ant 2017-07-07 03:33:57 EDT
REVIEW: https://review.gluster.org/17724 (cluster/ec : Don't try to heal when no sink is UP) posted (#1) for review on release-3.11 by Ashish Pandey (aspandey@redhat.com)
Comment 2 Ashish Pandey 2017-07-07 03:36:51 EDT
Description of problem:
Hit this while verifying BZ#1396010 - [Disperse] healing should not start if only data bricks are UP
The fix in bz#1396010 takes care of reducing cpu when the heal deamon notices at the beginning itself all the redundant bricks are down, but if we bring down redundant brick one after another and parallelly IOs were happening, then the CPU consumption doesnt reduce.
Hence raising this bz

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.create a  1x(4+2) ec vol(offline all  the vols on this cluster except this vol)
2.trigger IOs say linux kenrel untar
3. keep capturing CPU usage of shd process on all nodes
3.kill b1
4. wait for say 2 minutes and kill b2

Actual results:
it can be seen that the cpu usage is more than 100% as long as IOs go on, even though only data number bricks are up

Expected results:
cpu usage should reduce for shd as there is nothing to heal
Comment 3 Worker Ant 2017-07-10 09:58:36 EDT
COMMIT: https://review.gluster.org/17724 committed in release-3.11 by Shyamsundar Ranganathan (srangana@redhat.com) 
commit af569e4a418a65b452cd8842d6999734677ad5f3
Author: Ashish Pandey <aspandey@redhat.com>
Date:   Tue Jul 4 16:18:20 2017 +0530

    cluster/ec : Don't try to heal when no sink is UP
    4 + 2 EC volume configuration.
    If untar of linux is going on and we kill a brick,
    indices will be created for the files/dir which need
    to be healed. ec_shd_index_sweep spawns threads to
    scan these entries and start heal. If in the middle
    of this we kill one more brick, we end up in a
    situation where we can not heal an entry as there
    are only "ec->fragment" number of bricks are UP.
    However, the scan will be continued and it will
    trigger the heal for those entries.
    When a heal is triggered for an entry, check if it
    *CAN* be healed or not. If not come out with ENOTCONN.
    >Change-Id: I305be7701c289f36bd7bde22491b71074771424f
    >BUG: 1464359
    >Signed-off-by: Ashish Pandey <aspandey@redhat.com>
    >Reviewed-on: https://review.gluster.org/17692
    >Smoke: Gluster Build System <jenkins@build.gluster.org>
    >CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    >Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    >Reviewed-by: Sunil Kumar Acharya <sheggodu@redhat.com>
    >Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    >Signed-off-by: Ashish Pandey <aspandey@redhat.com>
    Change-Id: I305be7701c289f36bd7bde22491b71074771424f
    BUG: 1468457
    Signed-off-by: Ashish Pandey <aspandey@redhat.com>
    Reviewed-on: https://review.gluster.org/17724
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Comment 4 Shyamsundar 2017-08-12 09:07:33 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.2, please open a new bug report.

glusterfs-3.11.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-July/031908.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.