Bug 1145726 - when both the brick are offline jus after self-heal is online, it leads to crashes
Summary: when both the brick are offline jus after self-heal is online, it leads to cr...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-23 14:52 UTC by Pranith Kumar K
Modified: 2014-11-11 08:40 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.6.0beta2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-11 08:40:06 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2014-09-23 14:52:05 UTC
Description of problem:
    When self-heal code doesn't see at least 2 successes on looking up children,
    then self-heal can't be done. What is happening now is if all the lookups fail
    then the pending changelog is all zeros in xattrs so all the children are
    becoming sources and leading to crashes when the code paths further assume that
    some data structures are populated properly


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2014-09-23 14:53:25 UTC
REVIEW: http://review.gluster.org/8824 (cluster/afr: Don't start heal when lookup succeeds on < 2 children) posted (#1) for review on release-3.6 by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2014-09-23 16:49:14 UTC
COMMIT: http://review.gluster.org/8824 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit 1b27b8231e2d69c3bfd4710ab3f631cd3604e362
Author: Pranith Kumar K <pkarampu>
Date:   Tue Sep 23 12:43:02 2014 +0530

    cluster/afr: Don't start heal when lookup succeeds on < 2 children
    
            Backport of http://review.gluster.org/8698
    
    Problem:
    When self-heal code doesn't see at least 2 successes on looking up children,
    then self-heal can't be done. What is happening now is if all the lookups fail
    then the pending changelog is all zeros in xattrs so all the children are
    becoming sources and leading to crashes when the code paths further assume that
    some data structures are populated properly
    
    Fix:
    Don't proceed with self-heals when < 2 children succeed lookups.
    
    BUG: 1145726
    Change-Id: I65465843f0e554c8ccdd8fa930ab42ac123ec023
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8824
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 3 Niels de Vos 2014-09-25 08:27:42 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta2 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018883.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-11-11 08:40:06 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.