Bug 1464091 - Regression: Heal info takes longer time when a brick is down
Regression: Heal info takes longer time when a brick is down
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: disperse (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Ashish Pandey
:
Depends On: 1463108
Blocks: 1465854
  Show dependency treegraph
 
Reported: 2017-06-22 08:15 EDT by Ashish Pandey
Modified: 2017-09-05 13:34 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1463108
: 1465854 (view as bug list)
Environment:
Last Closed: 2017-08-16 02:44:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Ashish Pandey 2017-06-22 08:16:09 EDT
Description of problem:
=========================
(A possible regression)
When a brick is down the heal info of a disperse volume takes significantly longer time.
for eg: without any heal pending or on a fresh setup, with all bricks up, the heal info response time for  a 1x(4+2) volume is hardly 1 sec
where as with one brick down the same setup takes 11 Sec

and in one case where heal pending entries were there the heal info was hung(the re were hardly 10 entries on each brick)
If I hit the hang again, will raise a seperate bug

Version-Release number of selected component (if applicable):
===
3.8.4-28

How reproducible:
==
always

Steps to Reproduce:
1.create a 4+2 ecvolume and check heal info time (it will be hardly 1 sec)
2.kill one brick
3.check heal info time(takes about 11 sec)
Comment 2 Worker Ant 2017-06-22 08:26:42 EDT
REVIEW: https://review.gluster.org/17606 (ec: Increase notification in all the cases) posted (#1) for review on master by Ashish Pandey (aspandey@redhat.com)
Comment 3 Worker Ant 2017-06-23 08:58:07 EDT
REVIEW: https://review.gluster.org/17606 (ec: Increase notification in all the cases) posted (#2) for review on master by Ashish Pandey (aspandey@redhat.com)
Comment 4 Worker Ant 2017-06-23 09:02:42 EDT
REVIEW: https://review.gluster.org/17606 (ec: Increase notification in all the cases) posted (#3) for review on master by Ashish Pandey (aspandey@redhat.com)
Comment 5 Worker Ant 2017-06-24 00:48:56 EDT
REVIEW: https://review.gluster.org/17606 (ec: Increase notification in all the cases) posted (#4) for review on master by Ashish Pandey (aspandey@redhat.com)
Comment 6 Worker Ant 2017-06-26 10:07:10 EDT
REVIEW: https://review.gluster.org/17606 (ec: Increase notification in all the cases) posted (#5) for review on master by Ashish Pandey (aspandey@redhat.com)
Comment 7 Worker Ant 2017-06-27 13:28:30 EDT
REVIEW: https://review.gluster.org/17606 (ec: Increase notification in all the cases) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)
Comment 8 Worker Ant 2017-06-28 05:52:27 EDT
COMMIT: https://review.gluster.org/17606 committed in master by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 630d3d8c8466228e1764c1c0962b9db40548fffb
Author: Ashish Pandey <aspandey@redhat.com>
Date:   Thu Jun 22 17:06:40 2017 +0530

    ec: Increase notification in all the cases
    
    Problem:
    "gluster v heal <volname> info" is taking
    long time to respond when a brick is down.
    
    RCA:
    Heal info command does virtual mount.
    EC wait for 10 seconds, before sending UP call to upper xlator,
    to get notification (DOWN or UP) from all the bricks.
    
    Currently, we are increasing ec->xl_notify_count based on
    the current status of the brick. So, if a DOWN event notification
    has come and brick is already down, we are not increasing
    ec->xl_notify_count in ec_handle_down.
    
    Solution:
    Handle DOWN even as notification irrespective of what
    is the current status of brick.
    
    Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f
    BUG: 1464091
    Signed-off-by: Ashish Pandey <aspandey@redhat.com>
    Reviewed-on: https://review.gluster.org/17606
    Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Comment 9 Shyamsundar 2017-09-05 13:34:44 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.