Bug 801411 - [glusterfs-3.3.0qa25]: glusterfs server crashed in server_release due to NULL conn
Summary: [glusterfs-3.3.0qa25]: glusterfs server crashed in server_release due to NULL...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard: BETA1
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-08 13:31 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:43:41 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa40
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-08 13:31:52 UTC
Description of problem:
2x2 distributed replicate volume. 1 fuse client. It was running sanity script. One of the bricks was brought down. volume set operations, volume status operations were running parallely. One of the up bricks crashed with the below backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.10.1.11.130.export-'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fb66a34ee52 in server_release (req=0x7fb669c38b04) at ../../../../../xlators/protocol/server/src/server3_1-fops.c:3064
3064            gf_fd_put (conn->fdtable, args.fd);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) 
(gdb) bt
#0  0x00007fb66a34ee52 in server_release (req=0x7fb669c38b04) at ../../../../../xlators/protocol/server/src/server3_1-fops.c:3064
#1  0x00007fb66f32f0a9 in rpcsvc_handle_rpc_call (svc=0x1f74330, trans=0x30f5a20, msg=0x2513850) at ../../../../rpc/rpc-lib/src/rpcsvc.c:514
#2  0x00007fb66f32f44c in rpcsvc_notify (trans=0x30f5a20, mydata=0x1f74330, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2513850)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:610
#3  0x00007fb66f334da8 in rpc_transport_notify (this=0x30f5a20, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2513850)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#4  0x00007fb66c045270 in socket_event_poll_in (this=0x30f5a20) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686
#5  0x00007fb66c0457f4 in socket_event_handler (fd=17, idx=12, data=0x30f5a20, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801
#6  0x00007fb66f58f07c in event_dispatch_epoll_handler (event_pool=0x1f48c20, events=0x1f62650, i=0) at ../../../libglusterfs/src/event.c:794
#7  0x00007fb66f58f29f in event_dispatch_epoll (event_pool=0x1f48c20) at ../../../libglusterfs/src/event.c:856
#8  0x00007fb66f58f62a in event_dispatch (event_pool=0x1f48c20) at ../../../libglusterfs/src/event.c:956
#9  0x0000000000407dbd in main (argc=19, argv=0x7fffd6150f18) at ../../../glusterfsd/src/glusterfsd.c:1611
(gdb) f 0
#0  0x00007fb66a34ee52 in server_release (req=0x7fb669c38b04) at ../../../../../xlators/protocol/server/src/server3_1-fops.c:3064
3064            gf_fd_put (conn->fdtable, args.fd);
(gdb) p conn
$1 = (server_connection_t *) 0x0
(gdb) l
3059                    req->rpc_err = GARBAGE_ARGS;
3060                    goto out;
3061            }
3062
3063            conn = req->trans->xl_private;
3064            gf_fd_put (conn->fdtable, args.fd);
3065
3066            server_submit_reply (NULL, req, &rsp, NULL, 0, NULL,
3067                                 (xdrproc_t)xdr_gf_common_rsp);
3068            ret = 0;
(gdb)  p conn
$1 = (server_connection_t *) 0x0
(gdb) l
3059                    req->rpc_err = GARBAGE_ARGS;
3060                    goto out;
3061            }
3062
3063            conn = req->trans->xl_private;
3064            gf_fd_put (conn->fdtable, args.fd);
3065
3066            server_submit_reply (NULL, req, &rsp, NULL, 0, NULL,
3067                                 (xdrproc_t)xdr_gf_common_rsp);
3068            ret = 0;
(gdb) p *conn
Cannot access memory at address 0x0
(gdb) info thr
  8 Thread 0x7fb669a07700 (LWP 21675)  0x00000034c2a0b74b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7fb668324700 (LWP 22172)  0x00000034c2a0b74b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7fb669c09700 (LWP 21673)  0x00000034c2a0b3cc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7fb66be22700 (LWP 21660)  0x00000034c2a0ecbd in nanosleep () from /lib64/libpthread.so.0
  4 Thread 0x7fb66cc50700 (LWP 21657)  0x00000034c2a0b3cc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7fb66d651700 (LWP 21656)  0x00000034c2a0b3cc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7fb66e052700 (LWP 21655)  0x00000034c2a0f235 in sigwait () from /lib64/libpthread.so.0
* 1 Thread 0x7fb66f0fe700 (LWP 21654)  0x00007fb66a34ee52 in server_release (req=0x7fb669c38b04)
    at ../../../../../xlators/protocol/server/src/server3_1-fops.c:3064
(gdb) 



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

glusterfs server crashed trying to access NULL connection
Expected results:
glusterfs server should not crash

Additional info:

[2012-03-08 06:51:40.760762] I [server3_1-fops.c:1123:server_flush_cbk] 0-mirror-server: 1234: FLUSH -2 (--) ==> -1 (No such file or directory
)
[2012-03-08 06:51:43.080374] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 06:51:43.080454] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node131-16264-2012/03
/08-06:45:38:230123-mirror-client-0-0
[2012-03-08 06:51:44.080583] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 06:51:44.080645] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-3243-2012/03/
08-06:46:16:621329-mirror-client-0-0
[2012-03-08 06:51:44.080657] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 06:51:44.080679] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of raghu.144machine.com-
26065-2012/03/08-06:46:16:623499-mirror-client-0-0
[2012-03-08 06:51:44.080700] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 06:51:44.080725] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node130-22232-2012/03
/08-06:45:59:079177-mirror-client-0-0
[2012-03-08 06:51:49.081508] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-08 06:51:49.081596] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-2824-2012/03/
08-05:58:26:202567-mirror-client-0-6
[2012-03-08 06:57:39.534830] W [socket.c:204:__socket_rwv] 0-tcp.mirror-server: writev failed (Broken pipe)
[2012-03-08 06:57:39.549029] E [iobuf.c:526:iobuf_get2] 0-iobuf: page_size (3204632) of iobufs in arena being requested is greater than max av
ailable
[2012-03-08 06:57:39.549088] E [glusterfsd-mgmt.c:84:glusterfs_serialize_reply] 0-glusterfs: Failed to get iobuf
[2012-03-08 06:57:39.549163] E [iobuf.c:659:iobuf_unref] (-->/usr/local/sbin/glusterfsd(glusterfs_handle_brick_status+0x644) [0x40ac16] (-->/u
sr/local/sbin/glusterfsd(glusterfs_submit_reply+0x111) [0x4084a2] (-->/usr/local/sbin/glusterfsd(glusterfs_serialize_reply+0x193) [0x40837e]))
) 0-iobuf: invalid argument: iobuf
[2012-03-08 06:57:39.549179] E [glusterfsd-mgmt.c:139:glusterfs_submit_reply] 0-glusterfs: Failed to serialize reply
[2012-03-08 06:57:39.565168] W [rpcsvc.c:1111:rpcsvc_error_reply] (-->/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130) [0x7fb66f334da8
] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x181) [0x7fb66f32f44c] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x37b) [0x7fb
66f32f0c4]))) 0-: sending a RPC error reply
[2012-03-08 06:57:39.577059] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-08 06:57:39.577216] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.104:928
[2012-03-08 06:57:39.577232] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-08 06:57:39.577296] I [socket.c:2377:socket_submit_reply] 0-tcp.mirror-server: not connected (priv->connected = 255)
[2012-03-08 06:57:39.577321] E [rpcsvc.c:1078:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x189997x, Program: Gluster
FS 3.3.0qa26, ProgVers: 330, Proc: 34) to rpc-transport (tcp.mirror-server)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-08 06:57:39
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
:

Comment 1 Amar Tumballi 2012-03-12 09:46:57 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 2 Amar Tumballi 2012-03-13 12:03:42 UTC
http://review.gluster.com/2911 should fix it.

Comment 3 Raghavendra Bhat 2012-04-05 09:54:59 UTC
This crash still exists in glusterfs-3.3.0qa33 where the server process crashes because of trying to access NULL connection object. Reopening it. I have the core file for the bug.

Comment 4 Anand Avati 2012-04-11 17:07:59 UTC
CHANGE: http://review.gluster.com/3125 (protocol/server: validate connection object before dereferencing) merged in master by Anand Avati (avati)

Comment 5 Raghavendra Bhat 2012-05-08 13:27:10 UTC
Checked with glusterfs-3.3.0qa40. Now this crash is not seen.


Note You need to log in before you can comment on or make changes to this bug.