Bug 823244

Summary: [1d939fe7adef651b90bb5c4cd5843768417f0138]: destination brick crashed while doing replace brick
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: protocolAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact: Raghavendra Bhat <rabhat>
Severity: low Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:54:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: release-3.3 branch: git head: 638a4740cc553c96bc01d1dfe4a2b7acf0b406e6 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description Raghavendra Bhat 2012-05-20 09:41:55 UTC
Description of problem:
On 2 replica volume gave replace brick while untarring of glusterfs and build was happening on the mount point. The destination brick crashed with the below backtrace.


GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/sbin/glusterfs...done.
[New Thread 556]
[New Thread 557]
[New Thread 559]
[New Thread 560]
[New Thread 558]
Reading symbols from /usr/local/lib/libglusterfs.so.0...done.
Loaded symbols for /usr/local/lib/libglusterfs.so.0
Reading symbols from /usr/local/lib/libgfrpc.so.0...done.
Loaded symbols for /usr/local/lib/libgfrpc.so.0
Reading symbols from /usr/local/lib/libgfxdr.so.0...done.
Loaded symbols for /usr/local/lib/libgfxdr.so.0
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libpthread-2.13.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.0.9.8
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.13.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.13.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1
Reading symbols from /usr/local/lib/glusterfs/3.3git/xlator/storage/posix.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3git/xlator/storage/posix.so
Reading symbols from /usr/local/lib/glusterfs/3.3git/xlator/features/locks.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3git/xlator/features/locks.so
Reading symbols from /usr/local/lib/glusterfs/3.3git/xlator/protocol/server.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3git/xlator/protocol/server.so
Reading symbols from /usr/local/lib/glusterfs/3.3git/auth/login.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3git/auth/login.so
Reading symbols from /usr/local/lib/glusterfs/3.3git/auth/addr.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3git/auth/addr.so
Reading symbols from /usr/local/lib/glusterfs/3.3git/rpc-transport/socket.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3git/rpc-transport/socket.so
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1
Core was generated by `/usr/local/sbin/glusterfs -f/etc/glusterd/vols/mirror/rb_dst_brick.vol -p/etc/g'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f4f717d80dd in server_setxattr_cbk (frame=0x7f4f739b77bc, cookie=0x0, this=0x23b0730, op_ret=-1, op_errno=2, xdata=0x0)
    at ../../../../../xlators/protocol/server/src/server3_1-fops.c:899
899	                gf_log (this->name, ((op_errno == ENOTSUP) ?
(gdb) bt
#0  0x00007f4f717d80dd in server_setxattr_cbk (frame=0x7f4f739b77bc, cookie=0x0, this=0x23b0730, op_ret=-1, op_errno=2, xdata=0x0)
    at ../../../../../xlators/protocol/server/src/server3_1-fops.c:899
#1  0x00007f4f717e1a4f in server_setxattr_resume (frame=0x7f4f739b77bc, bound_xl=0x23af550)
    at ../../../../../xlators/protocol/server/src/server3_1-fops.c:2594
#2  0x00007f4f717cd78d in server_resolve_done (frame=0x7f4f739b77bc)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:535
#3  0x00007f4f717cd88e in server_resolve_all (frame=0x7f4f739b77bc)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:570
#4  0x00007f4f717cd721 in server_resolve (frame=0x7f4f739b77bc) at ../../../../../xlators/protocol/server/src/server-resolve.c:517
#5  0x00007f4f717cd865 in server_resolve_all (frame=0x7f4f739b77bc)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:566
#6  0x00007f4f717cceb7 in resolve_continue (frame=0x7f4f739b77bc) at ../../../../../xlators/protocol/server/src/server-resolve.c:224
#7  0x00007f4f717cca1e in resolve_gfid_cbk (frame=0x7f4f739b77bc, cookie=0x7f4f73bc104c, this=0x23b0730, op_ret=-1, op_errno=2, 
    inode=0x7f4f70440e80, buf=0x7fffef2bb440, xdata=0x0, postparent=0x7fffef2bb3d0)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:163
#8  0x00007f4f71a0ba6a in pl_lookup_cbk (frame=0x7f4f73bc104c, cookie=0x7f4f73bc10f8, this=0x23af550, op_ret=-1, op_errno=2, 
    inode=0x7f4f70440e80, buf=0x7fffef2bb440, xdata=0x0, postparent=0x7fffef2bb3d0)
    at ../../../../../xlators/features/locks/src/posix.c:1619
#9  0x00007f4f71c219d3 in posix_lookup (frame=0x7f4f73bc10f8, this=0x23ae1f0, loc=0x7f4f6c003768, xdata=0x0)
    at ../../../../../xlators/storage/posix/src/posix.c:187
#10 0x00007f4f71a0bf68 in pl_lookup (frame=0x7f4f73bc104c, this=0x23af550, loc=0x7f4f6c003768, xdata=0x0)
    at ../../../../../xlators/features/locks/src/posix.c:1661
#11 0x00007f4f717ccd56 in resolve_gfid (frame=0x7f4f739b77bc) at ../../../../../xlators/protocol/server/src/server-resolve.c:190
#12 0x00007f4f717cd323 in server_resolve_inode (frame=0x7f4f739b77bc)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:385
#13 0x00007f4f717cd666 in server_resolve (frame=0x7f4f739b77bc) at ../../../../../xlators/protocol/server/src/server-resolve.c:506
#14 0x00007f4f717cd810 in server_resolve_all (frame=0x7f4f739b77bc)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:559
#15 0x00007f4f717cd926 in resolve_and_resume (frame=0x7f4f739b77bc, fn=0x7f4f717e17a0 <server_setxattr_resume>)
    at ../../../../../xlators/protocol/server/src/server-resolve.c:589
#16 0x00007f4f717e7e0b in server_setxattr (req=0x7f4f7109c04c) at ../../../../../xlators/protocol/server/src/server3_1-fops.c:3890
#17 0x00007f4f756ee148 in rpcsvc_handle_rpc_call (svc=0x23b35e0, trans=0x23c06c0, msg=0x7f4f6c0008d0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:513
#18 0x00007f4f756ee4db in rpcsvc_notify (trans=0x23c06c0, mydata=0x23b35e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4f6c0008d0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:612
#19 0x00007f4f756f4021 in rpc_transport_notify (this=0x23c06c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4f6c0008d0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:489
#20 0x00007f4f70e903a5 in socket_event_poll_in (this=0x23c06c0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1677
#21 0x00007f4f70e90919 in socket_event_handler (fd=6, idx=1, data=0x23c06c0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1792
#22 0x00007f4f7594eca5 in event_dispatch_epoll_handler (event_pool=0x23a4cd0, events=0x23bfa80, i=0)
    at ../../../libglusterfs/src/event.c:785
#23 0x00007f4f7594eebf in event_dispatch_epoll (event_pool=0x23a4cd0) at ../../../libglusterfs/src/event.c:847
#24 0x00007f4f7594f231 in event_dispatch (event_pool=0x23a4cd0) at ../../../libglusterfs/src/event.c:947
#25 0x0000000000408858 in main (argc=5, argv=0x7fffef2bbc38) at ../../../glusterfsd/src/glusterfsd.c:1674
(gdb) f 0
#0  0x00007f4f717d80dd in server_setxattr_cbk (frame=0x7f4f739b77bc, cookie=0x0, this=0x23b0730, op_ret=-1, op_errno=2, xdata=0x0)
    at ../../../../../xlators/protocol/server/src/server3_1-fops.c:899
899	                gf_log (this->name, ((op_errno == ENOTSUP) ?
(gdb) l [K
894	        state = CALL_STATE(frame);
895	        rsp.op_ret    = op_ret;
896	        rsp.op_errno  = gf_errno_to_error (op_errno);
897	
898	        if (op_ret == -1) {
899	                gf_log (this->name, ((op_errno == ENOTSUP) ?
900	                                     GF_LOG_DEBUG : GF_LOG_INFO),
901	                        "%"PRId64": SETXATTR %s (%s) ==> %s (%s)",
902	                        frame->root->unique, state->loc.path,
903	                        state->loc.inode ? uuid_utoa (state->loc.inode->gfid) :
(gdb) 
904	                        "--", state->dict->members_list->key,
905	                        strerror (op_errno));
906	        }
907	
908	        GF_PROTOCOL_DICT_SERIALIZE (this, xdata, (&rsp.xdata.xdata_val),
909	                                    rsp.xdata.xdata_len, op_errno, out);
910	
911	out:
912	        rsp.op_ret    = op_ret;
913	        rsp.op_errno  = gf_errno_to_error (op_errno);
(gdb) p this->name
$1 = 0x23b0600 "src-server"
(gdb) p state->loc
$2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
(gdb) p *state
$3 = {conn = 0x23c16b0, xprt = 0x23c06c0, itable = 0x23c2380, resume_fn = 0x7f4f717e17a0 <server_setxattr_resume>, loc = {
    path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, 
  loc2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, 
  resolve = {type = RESOLVE_MUST, fd_no = 18446744073709551615, gfid = "\t\230`L\214[F^\265*5_ێ\275n", 
    pargfid = '\000' <repeats 15 times>, path = 0x0, bname = 0x0, op_ret = -1, op_errno = 2, resolve_loc = {path = 0x0, name = 0x0, 
      inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}}, resolve2 = {type = 0, 
    fd_no = 18446744073709551615, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, path = 0x0, bname = 0x0, 
    op_ret = -1, op_errno = 22, resolve_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, 
      pargfid = '\000' <repeats 15 times>}}, loc_now = 0x7f4f6c0036e0, resolve_now = 0x7f4f6c0037a8, stbuf = {ia_ino = 0, 
    ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', 
      sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', 
        exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, 
    ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, 
    ia_ctime = 0, ia_ctime_nsec = 0}, valid = 0, fd = 0x0, params = 0x0, flags = 0, wbflags = 0, payload_vector = {{iov_base = 0x0, 
      iov_len = 0} <repeats 16 times>}, payload_count = 0, iobuf = 0x0, iobref = 0x0, size = 0, offset = 0, mode = 0, dev = 0, 
  nr_count = 0, cmd = 0, type = 0, name = 0x0, name_len = 0, mask = 0, is_revalidate = 0 '\000', dict = 0x7f4f7382e244, flock = {
    l_type = 0, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}, 
  volume = 0x0, entry = 0x0, xdata = 0x0, umask = 0}
(gdb) p state->dict
$4 = (dict_t *) 0x7f4f7382e244
(gdb) p state->dict->members_list 
$5 = (data_pair_t *) 0x0
(gdb) p state->dict->members_list->key
(gdb) quit

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. On a replicate volume do glusterfs untar and build
2. give replace-brick
3.
  
Actual results:

destination brick crashed
Expected results:

replace brick should complete successfully

Additional info:


gluster volume info
 
Volume Name: mirror
Type: Replicate
Volume ID: dae52985-0c47-4613-8ea1-f9220c2704ac
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: hyperspace:/tmp/mirror1
Brick2: hyperspace:/tmp/mirror2
Options Reconfigured:
features.limit-usage: /:22GB
features.quota: on


[2012-05-20 14:59:28.873903] I [server3_1-fops.c:1533:server_open_cbk] 0-src-server: 9685: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.874375] I [server3_1-fops.c:1533:server_open_cbk] 0-src-server: 9686: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.874469] I [server3_1-fops.c:1533:server_open_cbk] 0-src-server: 9687: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.874771] I [server3_1-fops.c:1533:server_open_cbk] 0-src-server: 9690: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.876304] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9694: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.876606] I [server3_1-fops.c:1533:server_open_cbk] 0-src-server: 9696: OPEN (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.986519] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9718: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.987377] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9720: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.988320] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9724: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.990834] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9731: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.991541] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9733: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.993769] I [server3_1-fops.c:252:server_inodelk_cbk] 0-src-server: 9738: INODELK (null) (--) ==> -1 (No such file or directory)
[2012-05-20 14:59:28.995226] I [server3_1-fops.c:346:server_entrylk_cbk] 0-src-server: 9742: ENTRYLK (null) (--) ==> -1 (No such file or directory)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-05-20 14:59:28
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
:

Comment 1 Amar Tumballi 2012-05-21 03:26:52 UTC
An empty dictionary in this case, should have considered checking for NULL dereference.

will send a patch. Meantime, still thinking why would anyone do 'setxattr()' on NULL dict?

Comment 2 Raghavendra Bhat 2012-05-21 05:22:06 UTC
Dictionary is not NULL. The members_list is NULL.

Comment 3 Amar Tumballi 2012-05-21 08:31:06 UTC
http://review.gluster.com/3385 is committed now (for upstream and release-3.3 branch). Moving bug to ON_QA

Comment 4 Raghavendra Bhat 2012-05-23 09:35:58 UTC
Checked with release-3.3 branch on git head (638a4740cc553c96bc01d1dfe4a2b7acf0b406e6). Its working fine and destination brick does not crash when replace-brick is done.