Bug 968289

Summary: nfs: "rm -rf" throws "E [client3_1-fops.c:5214:client3_1_inodelk]" Assertion failed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: glusterdAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Saurabh <saujain>
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: amarts, mzywusko, pkarampu, rhs-bugs, spradhan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 971805 (view as bug list) Environment:
Last Closed: 2013-09-23 22:39:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 971805    

Description Saurabh 2013-05-29 11:38:24 UTC
Description of problem:
volume type 6x2

just trying to do "rm -rf *" on nfs mount

Version-Release number of selected component (if applicable):
glusterfs-3.3.0.8rhs-1.el6rhs.x86_64

How reproducible:
happened on this release

Steps to Reproduce:
1. create volume of type 6x3, start the volume
2. add some data to it
3. execute "rm -rf *" on the mount point

Actual results:
rm -rf finished, but throws error in nfs.log

nfs.log says,
[2013-05-29 09:25:32.172816] I [glusterfsd-mgmt.c:1568:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2013-05-29 09:25:32.173503] I [glusterfsd-mgmt.c:1568:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2013-05-29 09:27:33.502273] E [client3_1-fops.c:5214:client3_1_inodelk] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_lock_rec+0x80) [0x7f150ff028e0] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_nonblocking_inodelk+0x608) [0x7f150ff1e5b8] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/protocol/client.so(client_inodelk+0x9e) [0x7f151015279e]))) 0-: Assertion failed: 0
[2013-05-29 09:27:33.502847] E [client3_1-fops.c:5214:client3_1_inodelk] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_lock_rec+0x80) [0x7f150ff028e0] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_nonblocking_inodelk+0x608) [0x7f150ff1e5b8] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/protocol/client.so(client_inodelk+0x9e) [0x7f151015279e]))) 0-: Assertion failed: 0
[2013-05-29 09:27:33.502963] E [client3_1-fops.c:5214:client3_1_inodelk] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_blocking_lock+0x84) [0x7f150ff1f354] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_lock_blocking+0xad7) [0x7f150ff1f207] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/protocol/client.so(client_inodelk+0x9e) [0x7f151015279e]))) 0-: Assertion failed: 0
[2013-05-29 09:27:33.503050] E [client3_1-fops.c:5214:client3_1_inodelk] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(+0x4545e) [0x7f150ff1f45e] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_lock_blocking+0xad7) [0x7f150ff1f207] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/protocol/client.so(client_inodelk+0x9e) [0x7f151015279e]))) 0-: Assertion failed: 0
[2013-05-29 09:27:33.503089] I [afr-lk-common.c:996:afr_lock_blocking] 0-dist-rep-replicate-1: unable to lock on even one child
[2013-05-29 09:27:33.503119] I [afr-transaction.c:1031:afr_post_blocking_inodelk_cbk] 0-dist-rep-replicate-1: Blocking inodelks failed.
[2013-05-29 09:27:33.503161] E [dht-linkfile.c:213:dht_linkfile_setattr_cbk] 0-dist-rep-dht: setattr of uid/gid on <gfid:64c844f0-1ddf-48cf-b195-330a781a2a07>/c4 :<gfid:00000000-0000-0000-0000-000000000000> failed (Invalid argument)


Expected results:
rm -rf should finish to completion

Additional info:

Comment 1 santosh pradhan 2013-05-29 12:55:09 UTC
I ll have a look as the summary says NFS related.

Comment 3 santosh pradhan 2013-05-30 12:10:07 UTC

The log snippet which shows assertion failure:
==============================================

[2013-05-29 09:27:33.502273] E [client3_1-fops.c:5214:client3_1_inodelk] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_lock_rec+0x80) [0x7f150ff028e0] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/cluster/replicate.so(afr_nonblocking_inodelk+0x608) [0x7f150ff1e5b8] (-->/usr/lib64/glusterfs/3.3.0.8rhs/xlator/protocol/client.so(client_inodelk+0x9e) [0x7f151015279e]))) 0-: Assertion failed: 0


The assertion failure happens in client3_1_inodelk(). There are two code paths involved: either [afr_lock_rec() / afr_nonblocking_inodelk() / client_inodelk()] or [afr_blocking_lock() / afr_lock_blocking() / client_inodelk() ].

This sounds more like an AFR issue rather belongs to NFS.

Again, this looks more similar to BZ 800291, already in CLOSED state though not sure what was the FIX.

Comment 4 santosh pradhan 2013-05-30 13:11:21 UTC
I would like to hear from AFR if this is a known issue and already addressed. Reassigning to Amar.

Comment 5 Amar Tumballi 2013-06-05 13:10:32 UTC
Pranith, can you check this out? I am not sure if we already have the fix for this backported/rebased in rhs-2.1 branch.

Comment 6 Pranith Kumar K 2013-06-07 10:10:05 UTC
Amar,
    Following log suggests that for linkfile setattr gfid is not present either in loc->gfid/loc->inode->gfid.

[2013-05-29 09:27:33.503161] E [dht-linkfile.c:213:dht_linkfile_setattr_cbk] 0-dist-rep-dht: setattr of uid/gid on <gfid:64c844f0-1ddf-48cf-b195-330a781a2a07>/c4 :<gfid:00000000-0000-0000-0000-000000000000> failed (Invalid argument)

I could re-create the issue on master by running bug-884597.t

Here are the details:
(gdb) fr 7
#7  0x00007f0b4093561a in dht_lookup_linkfile_create_cbk (frame=0x7f0b43c523a8, 

(gdb) p local->loc
$1 = {path = 0x7f0b34002220 "/2", name = 0x7f0b34002221 "2", inode = 0x7f0b3b0ab0e8, 
  parent = 0x7f0b3b0ab04c, gfid = '\000' <repeats 15 times>, 
  pargfid = '\000' <repeats 15 times>, "\001"}
(gdb) p local->loc->inode->gfid
$2 = '\000' <repeats 15 times>

The following patch fixes the problem. Will send the patch.

pranithk@pranithk-laptop - ~/workspace/gerrit-repo/tests/bugs (master)
15:28:59 :) ⚡ git diff
diff --git a/xlators/cluster/dht/src/dht-linkfile.c b/xlators/cluster/dht/src/dht-linkfile.c
index 39d72ae..ae5bd49 100644
--- a/xlators/cluster/dht/src/dht-linkfile.c
+++ b/xlators/cluster/dht/src/dht-linkfile.c
@@ -302,6 +302,8 @@ dht_linkfile_attr_heal (call_frame_t *frame, xlator_t *this)
              is_equal (frame->root->gid, local->stbuf.ia_gid)))
                 return 0;
 
+        uuid_copy (local->loc.gfid, local->stbuf.ia_gfid);
+
         copy = copy_frame (frame);
 
         if (!copy)

Comment 10 Scott Haines 2013-09-23 22:39:48 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 11 Scott Haines 2013-09-23 22:43:48 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html