Bug 817468 - [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to memory corruption
Summary: [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to m...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 852820 858487
TreeView+ depends on / blocked
 
Reported: 2012-04-30 07:14 UTC by Raghavendra Bhat
Modified: 2013-07-24 17:54 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 852820 (view as bug list)
Environment:
Last Closed: 2013-07-24 17:54:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-04-30 07:14:25 UTC
Description of problem:
glusterfs server crashed due to memory corruption with the below backtrace.


Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.hyperspace.mnt-sda7-ex'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
1223	                                if (listener->trans == trans) {
(gdb) bt
#0  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
#1  0x00007f7455b3f573 in rpcsvc_notify (trans=0x1a986f0, mydata=0x1a83610, event=RPC_TRANSPORT_CLEANUP, data=0x0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:636
#2  0x00007f7455b44ef8 in rpc_transport_unref (this=0x1a986f0) at ../../../../rpc/rpc-lib/src/rpc-transport.c:479
#3  0x00007f745208d9b1 in socket_event_handler (fd=8, idx=3, data=0x1a986f0, poll_in=1, poll_out=0, poll_err=16)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1809
#4  0x00007f7455d9fac1 in event_dispatch_epoll_handler (event_pool=0x1a7e3a0, events=0x1a97830, i=0) at ../../../libglusterfs/src/event.c:794
#5  0x00007f7455d9fcdb in event_dispatch_epoll (event_pool=0x1a7e3a0) at ../../../libglusterfs/src/event.c:856
#6  0x00007f7455da004d in event_dispatch (event_pool=0x1a7e3a0) at ../../../libglusterfs/src/event.c:956
#7  0x000000000040857f in main (argc=19, argv=0x7fff83788c18) at ../../../glusterfsd/src/glusterfsd.c:1652
(gdb) info thr
  7 Thread 29231  0x00007f74550b3cf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
  6 Thread 29235  0x00007f745570d4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  5 Thread 29262  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  4 Thread 29232  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 29263  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  2 Thread 29233  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
* 1 Thread 29230  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
(gdb) f 0
#0  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
1223	                                if (listener->trans == trans) {
(gdb) p listener->trans
Cannot access memory at address 0x676e696461657190
(gdb) p *listener
Cannot access memory at address 0x676e696461657190
(gdb) l
1218	
1219	        pthread_mutex_lock (&svc->rpclock);
1220	        {
1221	                list_for_each_entry (listener, &svc->listeners, list) {
1222	                        if (trans != NULL) {
1223	                                if (listener->trans == trans) {
1224	                                        found = 1;
1225	                                        break;
1226	                                }
1227	
(gdb) p svc->listeners
$1 = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}
(gdb) p *svc
$2 = {rpclock = {__data = {__lock = 825242203, __count = 875572530, __owner = 540488237, __nusers = 859452721, __kind = 808663607, 
      __spins = 875967022, __list = {__prev = 0x5b2057205d313037, __next = 0x632e74656b636f73}}, 
    __size = "[2012-04-27 15:37:30.264701] W [socket.c", __align = 3760555382451221083}, memfactor = 842346810, authschemes = {
    next = 0x705f74656b636f73, prev = 0x6174735f6f746f72}, options = 0x696863616d5f6574, allow_insecure = 542991726, 
  register_portmap = 1818701104, ctx = 0x3a73667265747375, listeners = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}, programs = {
    next = 0x6961662074656b63, prev = 0x727245202e64656c}, notify = {next = 0x6e6e6f432820726f, prev = 0x72206e6f69746365}, 
  notify_count = 1952805733, mydata = 0x70202c2972656570, notifyfn = 0x3732312820726565, rxpool = 0x323a312e302e302e}
(gdb) 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
+------------------------------------------------------------------------------+
[2012-04-27 14:40:33.792647] W [socket.c:419:__socket_keepalive] 0-socket: failed to set keep idle on socket 8
[2012-04-27 14:40:33.792815] W [socket.c:1885:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2012-04-27 14:40:33.884884] I [server-handshake.c:571:server_setvolume] 0-vol-server: accepted client from hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0 (version: 3git)
[2012-04-27 14:47:01.563460] I [server.c:685:server_rpc_notify] 0-vol-server: disconnecting connectionfrom hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0
[2012-04-27 14:47:01.563526] I [server-helpers.c:741:server_connection_put] 0-vol-server: Shutting down connection hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0
[2012-04-27 14:47:01.563615] I [server-helpers.c:629:server_connection_destroy] 0-vol-server: destroyed connection of hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0
[2012-04-27 14:47:07.584147] I [server-handshake.c:571:server_setvolume] 0-vol-server: accepted client from hyperspace-11685-2012/04/27-14:47:02:558264-vol-client-0-0 (version: 3git)
[2012-04-27 15:37:30.224134] W [glusterfsd.c:794:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f74550c104d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f7455704d8c] (-->/usr/local/sbin/glusterfsd(glusterfs_sigwaiter+0xea) [0x407c46]))) 0-: received signum (15), shutting down
[2012-04-27 15:37:30.224502] E [rpcsvc.c:1161:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
[2012-04-27 15:37:30.264661] W [socket.c:204:__socket_rwv] 0-glusterfs: readv failed (Connection reset by peer)
[2012-04-27 15:37:30.264701] W [socket.c:1521:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (Connection reset by peer), peer (127.0.0.1:24007)
[2012-04-27 15:37:30.264810] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f7455b48e7c] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7455b483ee] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7455b47e73]))) 0-glusterfs: forced unwinding frame type(Gluster Portmap) op(SIGNOUT(5)) called at 2012-04-27 15:37:30.224215 (xid=0x3x)
[2012-04-27 15:37:30.275653] A [mem-pool.c:153:__gf_calloc] (-->/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x115) [0x7f7455b45025] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x131) [0x7f7455b3f4a5] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_disconnect+0x7b) [0x7f7455b3f25b]))) : no memory available for size (62489783496)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-04-27 15:37:30
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
:

Comment 1 Amar Tumballi 2012-05-02 19:07:15 UTC
Noticed now that, this happened when the process goto the SIGTERM (cleanup_and_exit() with signal 15). At this stage, if there are pending frames, this could have happened. Need to bring in a check to wait for all the saved_frames_unwind() to finish before doing cleanup_and_exit().

Considering anyways process was about to get stopped, lowering the priority of the bug. Feel Free to grab from my name if someone want to fix it :-)

Comment 2 Raghavendra Bhat 2012-05-03 09:36:00 UTC
I got the crash again with the below backtrace.


Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.hyperspace.mnt-sda10-e'.
Program terminated with signal 6, Aborted.
#0  0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f3ff16bcab6 in abort () at abort.c:92
#2  0x00007f3ff16b17c5 in __assert_fail (assertion=0x7f3ff1db9f05 "(-(e)) != 3 || !robust", file=<value optimized out>, line=312, 
    function=<value optimized out>) at assert.c:81
#3  0x00007f3ff1db12aa in __pthread_mutex_lock_full (mutex=0xcca620) at pthread_mutex_lock.c:312
#4  0x00007f3ff21e923d in rpcsvc_handle_disconnect (svc=0xcca620, trans=0xcdf5c0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:554
#5  0x00007f3ff21e94ba in rpcsvc_notify (trans=0xcdf5c0, mydata=0xcca620, event=RPC_TRANSPORT_DISCONNECT, data=0xcdf5c0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:616
#6  0x00007f3ff21ef039 in rpc_transport_notify (this=0xcdf5c0, event=RPC_TRANSPORT_DISCONNECT, data=0xcdf5c0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#7  0x00007f3fee73327a in socket_event_poll_err (this=0xcdf5c0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:694
#8  0x00007f3fee7379a5 in socket_event_handler (fd=8, idx=3, data=0xcdf5c0, poll_in=1, poll_out=0, poll_err=16)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1808
#9  0x00007f3ff2449ac1 in event_dispatch_epoll_handler (event_pool=0xcc53a0, events=0xcde840, i=0) at ../../../libglusterfs/src/event.c:794
#10 0x00007f3ff2449cdb in event_dispatch_epoll (event_pool=0xcc53a0) at ../../../libglusterfs/src/event.c:856
#11 0x00007f3ff244a04d in event_dispatch (event_pool=0xcc53a0) at ../../../libglusterfs/src/event.c:956
#12 0x000000000040857f in main (argc=19, argv=0x7fff36e21028) at ../../../glusterfsd/src/glusterfsd.c:1652
(gdb) f 3
#3  0x00007f3ff1db12aa in __pthread_mutex_lock_full (mutex=0xcca620) at pthread_mutex_lock.c:312
312	pthread_mutex_lock.c: No such file or directory.
	in pthread_mutex_lock.c
(gdb) f 4
#4  0x00007f3ff21e923d in rpcsvc_handle_disconnect (svc=0xcca620, trans=0xcdf5c0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:554
554	        pthread_mutex_lock (&svc->rpclock);
(gdb) p svc
$1 = (rpcsvc_t *) 0xcca620
(gdb) p *svc
$2 = {rpclock = {__data = {__lock = -1322241445, __count = 892349746, __owner = 540225581, __nusers = 825898033, __kind = 859060786, 
      __spins = 875902254, __list = {__prev = 0x5b2057205d383036, __next = 0x632e74656b636f73}}, 
    __size = "[20\261\062-05-03 14:12:43.554608] W [socket.c", __align = 3832612978636632667}, memfactor = 842346810, authschemes = {
    next = 0x705f74656b636f73, prev = 0x6174735f6f746f72}, options = 0x696863616d5f6574, allow_insecure = 542991726, 
  register_portmap = 1818701104, ctx = 0x3a73667265747375, listeners = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}, programs = {
    next = 0x6961662074656b63, prev = 0x727245202e64656c}, notify = {next = 0x6e6e6f432820726f, prev = 0x72206e6f69746365}, 
  notify_count = 1952805733, mydata = 0x70202c2972656570, notifyfn = 0x3732312820726565, rxpool = 0x323a312e302e302e}
(gdb) info thr
  9 Thread 15323  0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
  8 Thread 15406  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  7 Thread 15405  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  6 Thread 15408  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  5 Thread 15325  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4 Thread 15407  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  3 Thread 15324  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 15327  0x00007f3ff1db74bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 15322  0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
(gdb) t 9
[Switching to thread 9 (Thread 15323)]#0  0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
82	../sysdeps/unix/syscall-template.S: No such file or directory.
	in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000407847 in glusterfs_pidfile_cleanup (ctx=0xcad010) at ../../../glusterfsd/src/glusterfsd.c:1296
#2  0x0000000000406225 in cleanup_and_exit (signum=15) at ../../../glusterfsd/src/glusterfsd.c:814
#3  0x0000000000407c46 in glusterfs_sigwaiter (arg=0x7fff36e20e30) at ../../../glusterfsd/src/glusterfsd.c:1384
#4  0x00007f3ff1daed8c in start_thread (arg=0x7f3ff0145700) at pthread_create.c:304
#5  0x00007f3ff176b04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()
(gdb)

Comment 3 Raghavendra Bhat 2012-12-04 09:41:59 UTC
http://review.gluster.org/4230 : root caused the issue as a race between freeing the RPC object and handling socket's POLL_ERR

May be couple of iterations of review before getting the patch accepted, but marking it as POST as its in review queue.

Comment 4 Sachidananda Urs 2012-12-18 08:52:02 UTC
The typical usecase for this is to:

kill all glusterfs processes and restart the glusterd couple of time. This would generate the crash.

With the new release(qa5), this behavior is no longer reproducible.


Note You need to log in before you can comment on or make changes to this bug.