Bug 1327864

Summary: assert failure happens when parallel rm -rf is issued on nfs mounts
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.11CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1321554 Environment:
Last Closed: 2016-06-28 12:14:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1321554    
Bug Blocks:    

Description Pranith Kumar K 2016-04-17 02:06:16 UTC
+++ This bug was initially created as a clone of Bug #1321554 +++

Description of problem:
(gdb) bt
#0  0x00007f652b091a98 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f652b09369a in __GI_abort () at abort.c:89
#2  0x00007f652b08a227 in __assert_fail_base (fmt=<optimized out>,
    assertion=assertion@entry=0x7f652c5a529c "inode->nlookup >=
nlookup", file=file@entry=0x7f652c5a512a "inode.c", line=line@entry=711,
    function=function@entry=0x7f652c5a5848 <__PRETTY_FUNCTION__.10534>
"__inode_forget") at assert.c:92
#3  0x00007f652b08a2d2 in __GI___assert_fail (
    assertion=0x7f652c5a529c "inode->nlookup >= nlookup",
    file=0x7f652c5a512a "inode.c", line=711,
    function=0x7f652c5a5848 <__PRETTY_FUNCTION__.10534>
"__inode_forget") at assert.c:101
#4  0x00007f652c5203e8 in __inode_forget (inode=0x7f6504038aec,
---Type <return> to continue, or q <return> to quit---
    nlookup=1) at inode.c:711
#5  0x00007f652c5210f8 in inode_forget (inode=0x7f6504038aec,
    nlookup=1) at inode.c:1123
#6  0x00007f651f75258c in afr_lookup_sh_metadata_wrap (
    opaque=0x7f65180a9b3c) at afr-common.c:1928
#7  0x00007f652c54d925 in synctask_wrap (old_task=0x7f65040467b0)
    at syncop.c:375
#8  0x00007f652b0a5f10 in ?? () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb) f 4
#4  0x00007f652c5203e8 in __inode_forget (inode=0x7f6504038aec,
nlookup=1) at inode.c:711
711            GF_ASSERT (inode->nlookup >= nlookup);
(gdb) p inode->nlookup
$1 = 0
(gdb) p nlookup
$2 = 1
(gdb) 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-03-28 08:01:00 EDT ---

REVIEW: http://review.gluster.org/13834 (cluster/afr: Don't lookup/forget inodes) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-28 11:27:48 EDT ---

REVIEW: http://review.gluster.org/13834 (cluster/afr: Don't lookup/forget inodes) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-31 08:46:37 EDT ---

COMMIT: http://review.gluster.org/13834 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b2a5eed9b17a82ec4b6366b0107fe2271328c16a
Author: Pranith Kumar K <pkarampu>
Date:   Mon Mar 28 16:31:12 2016 +0530

    cluster/afr: Don't lookup/forget inodes
    
    Problem:
    All inodes that are looked-up are always forgotten without fail in
    afr removing the benefits of them being in lru. This same code can
    cause crashes if between inode_lookup, inode_forget in afr if the
    top xlator does inode_forget(0).
    
    Fix:
    Don't use lookup/forget in afr. No benefits are there at the moment
    for keeping this code. It is impossible to prevent top xlators to
    do inode_forget(0). Found similar instances in ec
    and removed them even though those code paths are not going to
    be executed in any place other than heal-daemon.
    
    BUG: 1321554
    Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/13834
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Anuradha Talur <atalur>

Comment 1 Vijay Bellur 2016-04-17 02:53:00 UTC
REVIEW: http://review.gluster.org/14009 (cluster/afr: Don't lookup/forget inodes) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 2 Vijay Bellur 2016-04-17 14:11:12 UTC
COMMIT: http://review.gluster.org/14009 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 20300fa96802ec6a0cd17edba38baf9639561d55
Author: Pranith Kumar K <pkarampu>
Date:   Mon Mar 28 16:31:12 2016 +0530

    cluster/afr: Don't lookup/forget inodes
    
    Problem:
    All inodes that are looked-up are always forgotten without fail in
    afr removing the benefits of them being in lru. This same code can
    cause crashes if between inode_lookup, inode_forget in afr if the
    top xlator does inode_forget(0).
    
    Fix:
    Don't use lookup/forget in afr. No benefits are there at the moment
    for keeping this code. It is impossible to prevent top xlators to
    do inode_forget(0). Found similar instances in ec
    and removed them even though those code paths are not going to
    be executed in any place other than heal-daemon.
    
     >BUG: 1321554
     >Change-Id: Ia4cb236178f7f129cc898d53f0bbd26f494a2a8d
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: http://review.gluster.org/13834
     >Smoke: Gluster Build System <jenkins.com>
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.com>
     >Reviewed-by: Anuradha Talur <atalur>
    
    BUG: 1327864
    Change-Id: I3507ed88cd75e069ed302525bfa259cf407871fb
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14009
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 3 Kaushal 2016-06-28 12:14:18 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user