Bug 801675 - [glusterfs-3.3.0qa26]: glusterfs server crashed trying to access NULL connection object
Summary: [glusterfs-3.3.0qa26]: glusterfs server crashed trying to access NULL connect...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard: BETA1
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-09 06:27 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 18:01:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-09 06:27:01 UTC
Description of problem:
2x2 distributed replicate volume. 1 fuse and 1 nfs client. Both running sanity script.  volume set opeations, volume status and statedump operations were running parallely. One of the bricks of the volume crashed trying to access a null connection object. This is the backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.10.1.11.145.export-'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000390f52560c in __strncmp_sse42 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x000000390f52560c in __strncmp_sse42 () from /lib64/libc.so.6
#1  0x00007fb952201cea in server_connection_get (this=0x1fb3280, id=0x6da6c80 "node131-18382-2012/03/08-20:33:56:347013-mirror-client-3-0")
    at ../../../../../xlators/protocol/server/src/server-helpers.c:692
#2  0x00007fb952218ec6 in server_setvolume (req=0x7fb951ad43dc) at ../../../../../xlators/protocol/server/src/server-handshake.c:427
#3  0x00007fb9571f10a9 in rpcsvc_handle_rpc_call (svc=0x1fb67d0, trans=0x6db17b0, msg=0x6dac370) at ../../../../rpc/rpc-lib/src/rpcsvc.c:514
#4  0x00007fb9571f144c in rpcsvc_notify (trans=0x6db17b0, mydata=0x1fb67d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x6dac370)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:610
#5  0x00007fb9571f6da8 in rpc_transport_notify (this=0x6db17b0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x6dac370)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#6  0x00007fb953f08270 in socket_event_poll_in (this=0x6db17b0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686
#7  0x00007fb953f087f4 in socket_event_handler (fd=64, idx=51, data=0x6db17b0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801
#8  0x00007fb95745107c in event_dispatch_epoll_handler (event_pool=0x1f8ac20, events=0x1fa45d0, i=0) at ../../../libglusterfs/src/event.c:794
#9  0x00007fb95745129f in event_dispatch_epoll (event_pool=0x1f8ac20) at ../../../libglusterfs/src/event.c:856
#10 0x00007fb95745162a in event_dispatch (event_pool=0x1f8ac20) at ../../../libglusterfs/src/event.c:956
#11 0x0000000000407dbd in main (argc=19, argv=0x7fff7dd4ab68) at ../../../glusterfsd/src/glusterfsd.c:1611
(gdb) f 1
#1  0x00007fb952201cea in server_connection_get (this=0x1fb3280, id=0x6da6c80 "node131-18382-2012/03/08-20:33:56:347013-mirror-client-3-0")
    at ../../../../../xlators/protocol/server/src/server-helpers.c:692
692                             if (!strncmp (trav->id, id, strlen (id))) {
(gdb) p trav
$1 = (server_connection_t *) 0x6db2020
(gdb) p *trav
$2 = {list = {next = 0x0, prev = 0x0}, id = 0x15 <Address 0x15 out of bounds>, ref = 0, lock = {__data = {__lock = 115023960, __count = 0, 
      __owner = 0, __nusers = 0, __kind = 1, __spins = 0, __list = {__prev = 0x735f63636762696c, __next = 0x312e6f732e}}, 
    __size = "X \333\006", '\000' <repeats 12 times>, "\001\000\000\000\000\000\000\000libgcc_s.so.1\000\000", __align = 115023960}, 
  fdtable = 0x141, ltable = 0x0, timer = 0x0, bound_xl = 0x0, this = 0x0, lk_version = 0}
(gdb) p trav->id
$3 = 0x15 <Address 0x15 out of bounds>
(gdb) l
687             conf = this->private;
688
689             pthread_mutex_lock (&conf->mutex);
690             {
691                     list_for_each_entry (trav, &conf->conns, list) {
692                             if (!strncmp (trav->id, id, strlen (id))) {
693                                     conn = trav;
694                                     goto unlock;
695                             }
696                     }
(gdb) p id
$4 = 0x6da6c80 "node131-18382-2012/03/08-20:33:56:347013-mirror-client-3-0"
(gdb) 



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

glusterfs server crashed trying to access the NULL connection object
Expected results:

glusterfs server should not crash
Additional info:

[2012-03-08 20:27:22.687100] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.131:1012 (version: 3.3.
0qa26)
[2012-03-08 20:27:22.716243] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.145:1017 (version: 3.3.
0qa26)
[2012-03-08 20:27:22.728959] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.130:1005 (version: 3.3.
0qa26)
[2012-03-08 20:27:27.483091] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 20:27:27.483174] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node131-18349-2012/03/08-20:21:05:465938-mirror-client-3-0
[2012-03-08 20:27:28.483308] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 20:27:28.483397] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client3/~dmtmp/WORDPRO/RESULTS.XLS
[2012-03-08 20:27:28.483442] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client7/~dmtmp/WORDPRO/RESULTS.XLS
[2012-03-08 20:27:28.483461] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client8/~dmtmp/WORDPRO/RESULTS.XLS
[2012-03-08 20:27:28.483475] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client4/~dmtmp/WORDPRO/RESULTS.XLS
[2012-03-08 20:27:28.483489] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /run11794/clients/client0/~dmtmp/WORDPRO/RESULTS.XLS
[2012-03-08 20:27:28.483600] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 20:27:28.483643] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node130-31468-2012/03/08-20:21:26:334046-mirror-client-3-0
[2012-03-08 20:27:28.483646] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-17712-2012/03/08-20:21:43:393795-mirror-client-3-0
[2012-03-08 20:34:32.214522] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.131:1012
[2012-03-08 20:34:32.214597] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-08 20:34:33.237302] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-08 20:34:33.257203] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.145:1017
[2012-03-08 20:34:33.257269] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-08 20:34:33.260576] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.130:1005
[2012-03-08 20:34:33.260631] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-08 20:34:34.271952] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-08 20:34:34.272037] I [glusterfsd-mgmt.c:1299:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2012-03-08 20:34:34.272195] I [glusterfsd-mgmt.c:1299:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-08 20:34:38
configuration details:
argp 1
backtrace 1
dlfcn 1
:

Comment 1 Amar Tumballi 2012-03-12 09:46:45 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 2 Anand Avati 2012-03-13 11:58:56 UTC
CHANGE: http://review.gluster.com/2911 (protocol/server: Remove connection from conf->conns w.o. race) merged in master by Vijay Bellur (vijay)

Comment 3 Raghavendra Bhat 2012-04-05 10:47:33 UTC
Checked with glusterfs-3.3.0qa33. Server did not crash because of the race conditions.


Note You need to log in before you can comment on or make changes to this bug.