Bug 804905

Summary: Glusterfsd process crashes while replaces-brick commit.
Product: [Community] GlusterFS Reporter: Vijaykumar Koppad <vkoppad>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED CURRENTRELEASE QA Contact: Vijaykumar Koppad <vkoppad>
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: amarts, bbandari, gluster-bugs, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 13:55:37 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: 3.3.0qa42 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 817967    

Description Vijaykumar Koppad 2012-03-20 02:11:13 EDT
Description of problem:
Volume type was distribute-replicate. while replace-brick commit source brick crashed.

Version-Release number of selected component (if applicable):
[2ffefd720a54fb815b1efa11e9de766fe1518831]

Steps to Reproduce:
1.Create and start distributed-replicate volume
2.Start a replace-brick.
3.Before data migration , commit.
  

Additional info:
This is the back-trace from the core.
#############################################################################

#0  0x00007f48923a33a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f48923a6b0b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f489239bd4d in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f4892daf6f9 in __gf_free (free_ptr=0x1570d20) at mem-pool.c:278
#4  0x00007f4892d740f5 in data_destroy (data=0x7f48910f6a24) at dict.c:144
#5  0x00007f4892d74f1f in data_unref (this=0x7f48910f6a24) at dict.c:489
#6  0x00007f4892d7873c in dict_get_str (this=0x1541ae4, key=0x7f488e2327a2 "remote-subvolume", str=0x7fffe12a4218) at dict.c:2123
#7  0x00007f488e22a779 in client_setvolume_cbk (req=0x1592738, iov=0x1592778, count=1, myframe=0x7f48913b4028) at client-handshake.c:1388
#8  0x00007f4892b5816d in rpc_clnt_handle_reply (clnt=0x15708e0, pollin=0x7f488803ae20) at rpc-clnt.c:797
#9  0x00007f4892b584e4 in rpc_clnt_notify (trans=0x16b8ec0, mydata=0x1570910, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f488803ae20)
    at rpc-clnt.c:916
#10 0x00007f4892b542b0 in rpc_transport_notify (this=0x16b8ec0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f488803ae20) at rpc-transport.c:498
#11 0x00007f488f6e6317 in socket_event_poll_in (this=0x16b8ec0) at socket.c:1686
#12 0x00007f488f6e6880 in socket_event_handler (fd=19, idx=10, data=0x16b8ec0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#13 0x00007f4892dae708 in event_dispatch_epoll_handler (event_pool=0x15413a0, events=0x156f830, i=0) at event.c:794
#14 0x00007f4892dae91b in event_dispatch_epoll (event_pool=0x15413a0) at event.c:856
#15 0x00007f4892daec8e in event_dispatch (event_pool=0x15413a0) at event.c:956
#16 0x0000000000408340 in main (argc=19, argv=0x7fffe12a4738) at glusterfsd.c:1650
(gdb) f 0 
#0  0x00007f48923a33a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 1
#1  0x00007f48923a6b0b in abort () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 2
#2  0x00007f489239bd4d in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 3
#3  0x00007f4892daf6f9 in __gf_free (free_ptr=0x1570d20) at mem-pool.c:278
278	                GF_ASSERT (0);
(gdb) f 4
#4  0x00007f4892d740f5 in data_destroy (data=0x7f48910f6a24) at dict.c:144
144	                                        GF_FREE (data->data);
(gdb) f 5
#5  0x00007f4892d74f1f in data_unref (this=0x7f48910f6a24) at dict.c:489
489	                data_destroy (this);

############################################################################
this is the bt from the log 
###########################################################################
[2012-03-20 11:23:35.198530] I [client-handshake.c:1633:select_server_supported_programs] 0-doa-replace-brick: Using Program GlusterFS 3git, Num (1298
437), Version (330)
[2012-03-20 11:23:35.198663] W [posix-helpers.c:641:posix_handle_pair] 0-doa-posix: Extended attributes not supported
[2012-03-20 11:23:35.198736] C [mem-pool.c:541:mem_put] (-->/usr/local/lib/libglusterfs.so.0(+0x47ea8) [0x7f4892da8ea8] (-->/usr/local/lib/libglusterf
s.so.0(dict_unref+0xb3) [0x7f4892d74dc9] (-->/usr/local/lib/libglusterfs.so.0(dict_destroy+0xea) [0x7f4892d74cba]))) 0-mem-pool: mem_put called on fre
ed ptr 0x7f4891170190 of mem pool 0x1556900
[2012-03-20 11:23:35.198816] C [mem-pool.c:541:mem_put] (-->/usr/local/lib/libglusterfs.so.0(+0x47ea8) [0x7f4892da8ea8] (-->/usr/local/lib/libglusterf
s.so.0(dict_unref+0xb3) [0x7f4892d74dc9] (-->/usr/local/lib/libglusterfs.so.0(dict_destroy+0x13e) [0x7f4892d74d0e]))) 0-mem-pool: mem_put called on fr
eed ptr 0x1541d30 of mem pool 0x15417d0
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-03-20 11:23:35
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib/x86_64-linux-gnu/libc.so.6(+0x36420)[0x7f48923a3420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f48923a33a5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f48923a6b0b]
/lib/x86_64-linux-gnu/libc.so.6(__assert_fail+0xdd)[0x7f489239bd4d]
/usr/local/lib/libglusterfs.so.0(__gf_free+0xa3)[0x7f4892daf6f9]
/usr/local/lib/libglusterfs.so.0(data_destroy+0x72)[0x7f4892d740f5]
/usr/local/lib/libglusterfs.so.0(data_unref+0xb3)[0x7f4892d74f1f]
/usr/local/lib/libglusterfs.so.0(dict_get_str+0x95)[0x7f4892d7873c]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client_setvolume_cbk+0x52a)[0x7f488e22a779]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x20e)[0x7f4892b5816d]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2b4)[0x7f4892b584e4]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x115)[0x7f4892b542b0]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7f488f6e6317]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7f488f6e6880]
/usr/local/lib/libglusterfs.so.0(+0x4d708)[0x7f4892dae708]
/usr/local/lib/libglusterfs.so.0(+0x4d91b)[0x7f4892dae91b]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7f4892daec8e]
/usr/local/sbin/glusterfsd(main+0x238)[0x408340]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f489238e30d]
/usr/local/sbin/glusterfsd[0x4040c9]
Comment 1 Anand Avati 2012-04-25 07:06:55 EDT
CHANGE: http://review.gluster.com/3223 (pump: Removed extra dict_unref in pump_command_reply) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 2 Vijaykumar Koppad 2012-05-22 04:05:20 EDT
Now this crash is not happening in the version 3.3.0qa42.