1091597 – Self-heal daemon crashes with assert failure on master

Bug 1091597 - Self-heal daemon crashes with assert failure on master

Summary: Self-heal daemon crashes with assert failure on master

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-26 06:39 UTC by Pranith Kumar K
Modified:	2014-11-11 08:30 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-3.6.0beta1
Clone Of:
Environment:
Last Closed:	2014-11-11 08:30:55 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2014-04-26 06:39:39 UTC

Description of problem:

When iozone is running and self-heal-daemon is enabled, self-heal-daemon crashes with following trace:

(gdb) bt
#0  0x00000034ee635a19 in raise () from /lib64/libc.so.6
#1  0x00000034ee637128 in abort () from /lib64/libc.so.6
#2  0x00000034ee62e986 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000034ee62ea32 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f9ef9d7dd0d in __inode_forget (inode=0x7f9eef3052bc, nlookup=1) at inode.c:591
#5  0x00007f9ef9d7e80e in inode_forget (inode=0x7f9eef3052bc, nlookup=1) at inode.c:941
#6  0x00007f9eef5b2c70 in afr_selfheal (this=0xc4c7d0, gfid=0x7f9eee466dd0 "\362\351Z\210R\214@g\204\001\311p(VV\270\060", <incomplete sequence \314>)
    at afr-self-heal-common.c:1001
#7  0x00007f9eef5ba270 in afr_shd_selfheal (healer=0xc6ee00, child=0, 
    gfid=0x7f9eee466dd0 "\362\351Z\210R\214@g\204\001\311p(VV\270\060", <incomplete sequence \314>) at afr-self-heald.c:301
#8  0x00007f9eef5ba6de in afr_shd_index_sweep (healer=0xc6ee00) at afr-self-heald.c:427
#9  0x00007f9eef5bab36 in afr_shd_index_healer (data=0xc6ee00) at afr-self-heald.c:539
#10 0x00000034eea07c53 in start_thread () from /lib64/libpthread.so.0
#11 0x00000034ee6f513d in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2014-04-26 07:11:07 UTC

REVIEW: http://review.gluster.org/7567 (cluster/afr: Fix inode_forget assert failure) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2014-04-26 07:14:18 UTC

REVIEW: http://review.gluster.org/7567 (cluster/afr: Fix inode_forget assert failure) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2014-04-26 08:30:00 UTC

REVIEW: http://review.gluster.org/7567 (cluster/afr: Fix inode_forget assert failure) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2014-04-29 01:19:06 UTC

COMMIT: http://review.gluster.org/7567 committed in master by Anand Avati (avati) 
------
commit 49e2d5162013ccf5f3f99c68c2521ca1cc6c3f20
Author: Pranith Kumar K <pkarampu>
Date:   Fri Apr 25 20:36:11 2014 +0530

    cluster/afr: Fix inode_forget assert failure
    
    Problem:
    If two self-heals are triggered on same inode in
    parallel then one inode will be linked and the other
    inode will not be linked as an inode with that gfid
    is already linked in inode table. Calling inode-forget
    on that inode leads to assert failure.
    
    Fix:
    Always use linked inode for performing self-heal.
    
    Added inode-forgets in other places as well even though
    its not really a memory leak.
    
    Change-Id: Ib84bf080c8cb6a4243f66541ece587db28f9a052
    BUG: 1091597
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/7567
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 5 Niels de Vos 2014-09-22 12:38:56 UTC

A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 6 Niels de Vos 2014-11-11 08:30:55 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Note You need to log in before you can comment on or make changes to this bug.