Bug 801692

Summary: glusterd crashed while geo-replication operation.
Product: [Community] GlusterFS Reporter: Vijaykumar Koppad <vkoppad>
Component: glusterdAssignee: Venky Shankar <vshankar>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.2.5CC: bbandari, gluster-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-04 07:54:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2012-03-09 07:26:05 UTC
Description of problem:
I got this crash while working on US Courts issue while performing georeplication status command
For some reason ,core file was truncated. 
This is the backtrace from the log file. 

[2012-03-09 07:04:01.458923] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.83:887)
[2012-03-09 07:04:01.470540] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.83:871)
[2012-03-09 07:04:37.690074] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:700)
[2012-03-09 07:04:42.35858] I [glusterd-handler.c:1729:glusterd_handle_gsync_set] 0-: master not found, while handlinggeo-replication options
[2012-03-09 07:04:42.35907] I [glusterd-handler.c:1736:glusterd_handle_gsync_set] 0-: slave not not found, whilehandling geo-replication options
[2012-03-09 07:04:42.35997] E [glusterd-utils.c:235:glusterd_lock] 0-glusterd: Unable to get lock for uuid: 91050a3c-753e-4b6d-9dfe-60e2843d2080, lock held by: 16820bfb-a604-435b-abea-1ecf6852a8e3
[2012-03-09 07:04:42.36050] E [glusterd-handler.c:415:glusterd_op_txn_begin] 0-glusterd: Unable to acquire local lock, ret: -1
[2012-03-09 07:04:42.38131] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1019)
[2012-03-09 07:04:45.383203] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:686)
[2012-03-09 07:04:48.694597] I [glusterd-handler.c:3226:glusterd_handle_getwd] 0-glusterd: Received getwd req
[2012-03-09 07:04:48.696085] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1017)
[2012-03-09 07:04:53.240815] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.83:857)
[2012-03-09 07:04:56.355725] I [glusterd-handler.c:3226:glusterd_handle_getwd] 0-glusterd: Received getwd req
[2012-03-09 07:04:56.357430] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:853)
[2012-03-09 07:05:00.792486] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.83:941)
[2012-03-09 07:05:14.914134] I [glusterd-handler.c:488:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 16820bfb-a604-435b-abea-1ecf6852a8e3
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 8
time of crash: 2012-03-09 07:05:16
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2git
/lib64/libc.so.6[0x360d8302d0]
/opt/glusterfs/3.2git/lib64/libglusterfs.so.0[0x2b9e58c939ea]
/opt/glusterfs/3.2git/lib64/libglusterfs.so.0[0x2b9e58c93b36]
/opt/glusterfs/3.2git/lib64/libglusterfs.so.0(dict_get_int32+0x33)[0x2b9e58c95dc3]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so(glusterd_read_status_file+0x36b)[0x2aaaaaaea91b]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so(glusterd_get_gsync_status_mst_slv+0x1ac)[0x2aaaaaaeacac]
/opt/glusterfs/3.2git/lib64/libglusterfs.so.0(dict_foreach+0x36)[0x2b9e58c92eb6]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so[0x2aaaaaae0859]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so(glusterd_op_gsync_set+0x9ea)[0x2aaaaaaef4ca]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x487)[0x2aaaaaaf24a7]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so[0x2aaaaaaf35a3]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so(glusterd_op_sm+0x15f)[0x2aaaaaae04cf]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/xlator/mgmt/glusterd.so(glusterd_handle_commit_op+0xc7)[0x2aaaaaac78c7]
/opt/glusterfs/3.2git/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x291)[0x2b9e58eec1e1]
/opt/glusterfs/3.2git/lib64/libgfrpc.so.0(rpcsvc_notify+0x16c)[0x2b9e58eec3ec]
/opt/glusterfs/3.2git/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x2b9e58eed317]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2aaaaadec5ef]
/opt/glusterfs/3.2git/lib64/glusterfs/3.2git/rpc-transport/socket.so(socket_event_handler+0x188)[0x2aaaaadec798]
/opt/glusterfs/3.2git/lib64/libglusterfs.so.0[0x2b9e58cc1631]
/opt/glusterfs/3.2git/sbin/glusterd(main+0x45e)[0x40566e]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x360d81d994]
/opt/glusterfs/3.2git/sbin/glusterd[0x403739]


Version-Release number of selected component (if applicable): 3.2.0qa6


How reproducible: was able to produce only once 

Right now we don't have definite set of steps to reproduce.

Comment 1 Amar Tumballi 2012-03-12 09:46:34 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 2 Venky Shankar 2012-03-12 10:12:05 UTC
Amar, this bug is in release 3.2 (as mentioned in comment #0).

[Changing the relevant release number in the bug]

Comment 3 Venky Shankar 2012-03-15 16:47:21 UTC
Vijaykumar,

Can you start the load test again to see if you can somehow corner this bug. I would need the core file. By the looks of it; it seems that the crash happened somewhere in dict_get_*(). The core file will help debugging this. In the meantime I'll see if I can figure it out codewise.

Comment 4 Vijay Bellur 2012-04-04 07:54:50 UTC

*** This bug has been marked as a duplicate of bug 799716 ***