Bug 764555 (GLUSTER-2823)

Summary: SEGV in marker_quota_removexattr_cbk
Product: [Community] GlusterFS Reporter: Jeff Darcy <jdarcy>
Component: unclassifiedAssignee: Vijay Bellur <vbellur>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jeff Darcy 2011-04-20 17:38:50 UTC
Verified that commit 450a7be2cede5a44c74f5f74224292af0c81a45f for #2801 seems to make symptoms disappear (except for the bogus gfid).  Marking as duplicate.

*** This bug has been marked as a duplicate of bug 2801 ***

Comment 1 Jeff Darcy 2011-04-20 19:53:35 UTC
I was trying to reproduce the problem with group permissions (#2818) on my systems when I found something that might be even worse.  Steps:

(1) Create a three-way distributed volume on newly made ext4 filesystems, start and mount it on a separate client.

(2) As root, create a file.  Change its group to 500 and its mode to 0660.  Ditto for the directory (mode 0770).

(3) Log in as UID 500 and use "vi" to edit the file.

The first thing that happened was that vi complained about a swap file already existing.  When I look, sure enough, there's fubar~ and .fubar.{swo,swp,swpx} - all owned by *root*.  This bears investigation, but isn't the cause of the SEGV.  That happens when I modify and try to save the file, and looks like this:


#0  0x00007fd35c6459e0 in marker_quota_removexattr_cbk (frame=0x7fd35ef0d6d8, 
    cookie=0x7fd35ef0d75c, this=0x11c00c0, op_ret=-1, op_errno=34) at marker.c:954
#1  0x00007fd35c8672e0 in iot_removexattr_cbk (frame=0x7fd35ef0d75c, 
    cookie=0x7fd35ef0d7e0, this=0x11bf000, op_ret=-1, op_errno=34)
    at io-threads.c:1703
#2  0x000000313dc234f3 in default_removexattr_cbk (frame=0x7fd35ef0d7e0, 
    cookie=0x7fd35ef0d864, this=0x11bdf70, op_ret=-1, op_errno=34) at defaults.c:326
#3  0x000000313dc234f3 in default_removexattr_cbk (frame=0x7fd35ef0d864, 
    cookie=0x7fd35ef0d8e8, this=0x11bced0, op_ret=-1, op_errno=34) at defaults.c:326
#4  0x00007fd35ceac4d0 in posix_removexattr (frame=0x7fd35ef0d8e8, this=0x11bbc60, 
    loc=0x7fd35ec4205c, name=0x7fd350000a40 "") at posix.c:3478
#5  0x000000313dc2c81a in default_removexattr (frame=0x7fd35ef0d864, 
    this=0x11bced0, loc=0x7fd35ec4205c, name=0x7fd350000a40 "") at defaults.c:1040
#6  0x000000313dc2c81a in default_removexattr (frame=0x7fd35ef0d7e0, 
    this=0x11bdf70, loc=0x7fd35ec4205c, name=0x7fd350000a40 "") at defaults.c:1040
#7  0x00007fd35c8674da in iot_removexattr_wrapper (frame=0x7fd35ef0d75c, 
    this=0x11bf000, loc=0x7fd35ec4205c, name=0x7fd350000a40 "") at io-threads.c:1712
#8  0x000000313dc405c6 in call_resume_wind (stub=0x7fd35ec42024) at call-stub.c:2297
#9  0x000000313dc4699a in call_resume (stub=0x7fd35ec42024) at call-stub.c:3861
#10 0x00007fd35c85d073 in iot_worker (data=0x11c5a40) at io-threads.c:129
#11 0x0000003dd0a077e1 in start_thread () from /lib64/libpthread.so.0
#12 0x0000003dd06e153d in clone () from /lib64/libc.so.6


On the node that had the SEGV (my server #2) the file is actually a linkfile to another node (my server #3) and the gfid for the file is bogus:

trusted.gfid=0xc711fbdfafbe400cb330eebfd2d26342

Also, if I try to get trusted.glusterfs.pathinfo on the client after this, it fails (but succeeds after a remount).  The offending code looks like this:

 951 unwind:
 952         STACK_UNWIND_STRICT (rename, frame, -1, ENOMEM, NULL,
 953                              NULL, NULL, NULL, NULL);
 954         local->oplocal = NULL;

Is it valid to be dereferencing frame->local after STACK_UNWIND?