Bug 762784 (GLUSTER-1052)

Summary: Crash in server_lookup_cbk
Product: [Community] GlusterFS Reporter: Anush Shetty <anush>
Component: protocolAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Anush Shetty 2010-07-06 09:45:52 UTC
Caught this while running dbench on a dht+afr setup with error-gen below server protocol

Server vol file:
volume posix4
  type storage/posix 
  option directory /mnt/exportnew1
end-volume

volume locks
  type features/posix-locks
  option mandatory on
  subvolumes posix4
end-volume


volume iot
 type performance/io-threads
 option thread-count 8
 subvolumes locks
end-volume

volume brick4
  type debug/error-gen
  option failure 5
  subvolumes iot
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option listen-port 8181
  subvolumes brick4
  option auth.addr.brick4.allow *              
end-volume

(gdb) bt
#0  0x00007fcab3086106 in server_lookup_cbk (frame=0x1ed6700, cookie=0x1ed8da8, this=0x1eb6158, op_ret=0, op_errno=22, inode=0x1ed67a8, 
    stbuf=0x7fcab1efde60, dict=0x0, postparent=0x7fcab1efddf0) at server3_1-fops.c:163
#1  0x00007fcab32a48c0 in error_gen_lookup_cbk (frame=0x1ed8da8, cookie=0x1ed1418, this=0x1eb4e58, op_ret=0, op_errno=22, inode=0x1ed67a8, 
    buf=0x7fcab1efde60, dict=0x0, postparent=0x7fcab1efddf0) at error-gen.c:381
#2  0x00007fcab34ba693 in iot_lookup_cbk (frame=0x1ed1418, cookie=0x1ed0e48, this=0x1eb3ba8, op_ret=0, op_errno=22, inode=0x1ed67a8, buf=0x7fcab1efde60, 
    xattr=0x0, postparent=0x7fcab1efddf0) at io-threads.c:168
#3  0x00007fcab36d60c8 in pl_lookup_cbk (frame=0x1ed0e48, cookie=0x1ec8908, this=0x1eb2868, op_ret=0, op_errno=22, inode=0x1ed67a8, buf=0x7fcab1efde60, 
    dict=0x0, postparent=0x7fcab1efddf0) at posix.c:1127
#4  0x00007fcab38e64f6 in posix_lookup (frame=0x1ec8908, this=0x1eb1558, loc=0x1ed14e8, xattr_req=0x0) at posix.c:532
#5  0x00007fcab36d650a in pl_lookup (frame=0x1ed0e48, this=0x1eb2868, loc=0x1ed14e8, xattr_req=0x0) at posix.c:1167
#6  0x00007fcab34ba897 in iot_lookup_wrapper (frame=0x1ed1418, this=0x1eb3ba8, loc=0x1ed14e8, xattr_req=0x0) at io-threads.c:178
#7  0x00007fcab52c484b in call_resume_wind (stub=0x1ed14b8) at call-stub.c:2471
#8  0x00007fcab52cb0b5 in call_resume (stub=0x1ed14b8) at call-stub.c:3954
#9  0x00007fcab34ba472 in iot_worker (data=0x1ebb758) at io-threads.c:118
#10 0x00007fcab4e77a04 in start_thread (arg=<value optimized out>) at pthread_create.c:300
#11 0x00007fcab4be0d4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#12 0x0000000000000000 in ?? ()

Comment 1 Amar Tumballi 2010-07-06 11:13:05 UTC
Patch http://patches.gluster.com/patch/3542/ fixes the issue.

This happened because 'frame->local' was set to NULL at the entry of server_lookup_cbk, and in a failure path, there was a STACK_WIND () with _CBKFN value set to 'server_lookup_cbk'. 

When this path of the code is hit, when the server_lookup_cbk comes back second time, invariably 'req' used to be NULL, and this used to crash. 

With the above patch, the bug gets fixed.