Description of problem: 2x2 distributed replicate volume. 1 fuse and 1 nfs client. Both running sanity script. volume set opeations, volume status and statedump operations were running parallely. One of the bricks of the volume crashed trying to access a null connection object. This is the backtrace. Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.10.1.11.145.export-'. Program terminated with signal 11, Segmentation fault. #0 0x000000390f52560c in __strncmp_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 (gdb) bt #0 0x000000390f52560c in __strncmp_sse42 () from /lib64/libc.so.6 #1 0x00007fb952201cea in server_connection_get (this=0x1fb3280, id=0x6da6c80 "node131-18382-2012/03/08-20:33:56:347013-mirror-client-3-0") at ../../../../../xlators/protocol/server/src/server-helpers.c:692 #2 0x00007fb952218ec6 in server_setvolume (req=0x7fb951ad43dc) at ../../../../../xlators/protocol/server/src/server-handshake.c:427 #3 0x00007fb9571f10a9 in rpcsvc_handle_rpc_call (svc=0x1fb67d0, trans=0x6db17b0, msg=0x6dac370) at ../../../../rpc/rpc-lib/src/rpcsvc.c:514 #4 0x00007fb9571f144c in rpcsvc_notify (trans=0x6db17b0, mydata=0x1fb67d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x6dac370) at ../../../../rpc/rpc-lib/src/rpcsvc.c:610 #5 0x00007fb9571f6da8 in rpc_transport_notify (this=0x6db17b0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x6dac370) at ../../../../rpc/rpc-lib/src/rpc-transport.c:498 #6 0x00007fb953f08270 in socket_event_poll_in (this=0x6db17b0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686 #7 0x00007fb953f087f4 in socket_event_handler (fd=64, idx=51, data=0x6db17b0, poll_in=1, poll_out=0, poll_err=0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801 #8 0x00007fb95745107c in event_dispatch_epoll_handler (event_pool=0x1f8ac20, events=0x1fa45d0, i=0) at ../../../libglusterfs/src/event.c:794 #9 0x00007fb95745129f in event_dispatch_epoll (event_pool=0x1f8ac20) at ../../../libglusterfs/src/event.c:856 #10 0x00007fb95745162a in event_dispatch (event_pool=0x1f8ac20) at ../../../libglusterfs/src/event.c:956 #11 0x0000000000407dbd in main (argc=19, argv=0x7fff7dd4ab68) at ../../../glusterfsd/src/glusterfsd.c:1611 (gdb) f 1 #1 0x00007fb952201cea in server_connection_get (this=0x1fb3280, id=0x6da6c80 "node131-18382-2012/03/08-20:33:56:347013-mirror-client-3-0") at ../../../../../xlators/protocol/server/src/server-helpers.c:692 692 if (!strncmp (trav->id, id, strlen (id))) { (gdb) p trav $1 = (server_connection_t *) 0x6db2020 (gdb) p *trav $2 = {list = {next = 0x0, prev = 0x0}, id = 0x15 <Address 0x15 out of bounds>, ref = 0, lock = {__data = {__lock = 115023960, __count = 0, __owner = 0, __nusers = 0, __kind = 1, __spins = 0, __list = {__prev = 0x735f63636762696c, __next = 0x312e6f732e}}, __size = "X \333\006", '\000' <repeats 12 times>, "\001\000\000\000\000\000\000\000libgcc_s.so.1\000\000", __align = 115023960}, fdtable = 0x141, ltable = 0x0, timer = 0x0, bound_xl = 0x0, this = 0x0, lk_version = 0} (gdb) p trav->id $3 = 0x15 <Address 0x15 out of bounds> (gdb) l 687 conf = this->private; 688 689 pthread_mutex_lock (&conf->mutex); 690 { 691 list_for_each_entry (trav, &conf->conns, list) { 692 if (!strncmp (trav->id, id, strlen (id))) { 693 conn = trav; 694 goto unlock; 695 } 696 } (gdb) p id $4 = 0x6da6c80 "node131-18382-2012/03/08-20:33:56:347013-mirror-client-3-0" (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: glusterfs server crashed trying to access the NULL connection object Expected results: glusterfs server should not crash Additional info: [2012-03-08 20:27:22.687100] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.131:1012 (version: 3.3. 0qa26) [2012-03-08 20:27:22.716243] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.145:1017 (version: 3.3. 0qa26) [2012-03-08 20:27:22.728959] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.130:1005 (version: 3.3. 0qa26) [2012-03-08 20:27:27.483091] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired [2012-03-08 20:27:27.483174] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node131-18349-2012/03/08-20:21:05:465938-mirror-client-3-0 [2012-03-08 20:27:28.483308] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired [2012-03-08 20:27:28.483397] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client3/~dmtmp/WORDPRO/RESULTS.XLS [2012-03-08 20:27:28.483442] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client7/~dmtmp/WORDPRO/RESULTS.XLS [2012-03-08 20:27:28.483461] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client8/~dmtmp/WORDPRO/RESULTS.XLS [2012-03-08 20:27:28.483475] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client4/~dmtmp/WORDPRO/RESULTS.XLS [2012-03-08 20:27:28.483489] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client0/~dmtmp/WORDPRO/RESULTS.XLS [2012-03-08 20:27:28.483600] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired [2012-03-08 20:27:28.483643] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node130-31468-2012/03/08-20:21:26:334046-mirror-client-3-0 [2012-03-08 20:27:28.483646] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-17712-2012/03/08-20:21:43:393795-mirror-client-3-0 [2012-03-08 20:34:32.214522] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.131:1012 [2012-03-08 20:34:32.214597] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server [2012-03-08 20:34:33.237302] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-03-08 20:34:33.257203] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.145:1017 [2012-03-08 20:34:33.257269] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server [2012-03-08 20:34:33.260576] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.130:1005 [2012-03-08 20:34:33.260631] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server [2012-03-08 20:34:34.271952] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-03-08 20:34:34.272037] I [glusterfsd-mgmt.c:1299:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2012-03-08 20:34:34.272195] I [glusterfsd-mgmt.c:1299:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-03-08 20:34:38 configuration details: argp 1 backtrace 1 dlfcn 1 :
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.
CHANGE: http://review.gluster.com/2911 (protocol/server: Remove connection from conf->conns w.o. race) merged in master by Vijay Bellur (vijay)
Checked with glusterfs-3.3.0qa33. Server did not crash because of the race conditions.