Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 817468

Summary: [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to memory corruption
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: protocolAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: amarts, gluster-bugs, sac
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852820 (view as bug list) Environment:
Last Closed: 2013-07-24 17:54:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 852820, 858487    

Description Raghavendra Bhat 2012-04-30 07:14:25 UTC
Description of problem:
glusterfs server crashed due to memory corruption with the below backtrace.


Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.hyperspace.mnt-sda7-ex'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
1223	                                if (listener->trans == trans) {
(gdb) bt
#0  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
#1  0x00007f7455b3f573 in rpcsvc_notify (trans=0x1a986f0, mydata=0x1a83610, event=RPC_TRANSPORT_CLEANUP, data=0x0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:636
#2  0x00007f7455b44ef8 in rpc_transport_unref (this=0x1a986f0) at ../../../../rpc/rpc-lib/src/rpc-transport.c:479
#3  0x00007f745208d9b1 in socket_event_handler (fd=8, idx=3, data=0x1a986f0, poll_in=1, poll_out=0, poll_err=16)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1809
#4  0x00007f7455d9fac1 in event_dispatch_epoll_handler (event_pool=0x1a7e3a0, events=0x1a97830, i=0) at ../../../libglusterfs/src/event.c:794
#5  0x00007f7455d9fcdb in event_dispatch_epoll (event_pool=0x1a7e3a0) at ../../../libglusterfs/src/event.c:856
#6  0x00007f7455da004d in event_dispatch (event_pool=0x1a7e3a0) at ../../../libglusterfs/src/event.c:956
#7  0x000000000040857f in main (argc=19, argv=0x7fff83788c18) at ../../../glusterfsd/src/glusterfsd.c:1652
(gdb) info thr
  7 Thread 29231  0x00007f74550b3cf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
  6 Thread 29235  0x00007f745570d4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  5 Thread 29262  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  4 Thread 29232  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 29263  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  2 Thread 29233  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
* 1 Thread 29230  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
(gdb) f 0
#0  0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223
1223	                                if (listener->trans == trans) {
(gdb) p listener->trans
Cannot access memory at address 0x676e696461657190
(gdb) p *listener
Cannot access memory at address 0x676e696461657190
(gdb) l
1218	
1219	        pthread_mutex_lock (&svc->rpclock);
1220	        {
1221	                list_for_each_entry (listener, &svc->listeners, list) {
1222	                        if (trans != NULL) {
1223	                                if (listener->trans == trans) {
1224	                                        found = 1;
1225	                                        break;
1226	                                }
1227	
(gdb) p svc->listeners
$1 = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}
(gdb) p *svc
$2 = {rpclock = {__data = {__lock = 825242203, __count = 875572530, __owner = 540488237, __nusers = 859452721, __kind = 808663607, 
      __spins = 875967022, __list = {__prev = 0x5b2057205d313037, __next = 0x632e74656b636f73}}, 
    __size = "[2012-04-27 15:37:30.264701] W [socket.c", __align = 3760555382451221083}, memfactor = 842346810, authschemes = {
    next = 0x705f74656b636f73, prev = 0x6174735f6f746f72}, options = 0x696863616d5f6574, allow_insecure = 542991726, 
  register_portmap = 1818701104, ctx = 0x3a73667265747375, listeners = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}, programs = {
    next = 0x6961662074656b63, prev = 0x727245202e64656c}, notify = {next = 0x6e6e6f432820726f, prev = 0x72206e6f69746365}, 
  notify_count = 1952805733, mydata = 0x70202c2972656570, notifyfn = 0x3732312820726565, rxpool = 0x323a312e302e302e}
(gdb) 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
+------------------------------------------------------------------------------+
[2012-04-27 14:40:33.792647] W [socket.c:419:__socket_keepalive] 0-socket: failed to set keep idle on socket 8
[2012-04-27 14:40:33.792815] W [socket.c:1885:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2012-04-27 14:40:33.884884] I [server-handshake.c:571:server_setvolume] 0-vol-server: accepted client from hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0 (version: 3git)
[2012-04-27 14:47:01.563460] I [server.c:685:server_rpc_notify] 0-vol-server: disconnecting connectionfrom hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0
[2012-04-27 14:47:01.563526] I [server-helpers.c:741:server_connection_put] 0-vol-server: Shutting down connection hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0
[2012-04-27 14:47:01.563615] I [server-helpers.c:629:server_connection_destroy] 0-vol-server: destroyed connection of hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0
[2012-04-27 14:47:07.584147] I [server-handshake.c:571:server_setvolume] 0-vol-server: accepted client from hyperspace-11685-2012/04/27-14:47:02:558264-vol-client-0-0 (version: 3git)
[2012-04-27 15:37:30.224134] W [glusterfsd.c:794:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f74550c104d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f7455704d8c] (-->/usr/local/sbin/glusterfsd(glusterfs_sigwaiter+0xea) [0x407c46]))) 0-: received signum (15), shutting down
[2012-04-27 15:37:30.224502] E [rpcsvc.c:1161:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
[2012-04-27 15:37:30.264661] W [socket.c:204:__socket_rwv] 0-glusterfs: readv failed (Connection reset by peer)
[2012-04-27 15:37:30.264701] W [socket.c:1521:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (Connection reset by peer), peer (127.0.0.1:24007)
[2012-04-27 15:37:30.264810] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f7455b48e7c] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7455b483ee] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7455b47e73]))) 0-glusterfs: forced unwinding frame type(Gluster Portmap) op(SIGNOUT(5)) called at 2012-04-27 15:37:30.224215 (xid=0x3x)
[2012-04-27 15:37:30.275653] A [mem-pool.c:153:__gf_calloc] (-->/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x115) [0x7f7455b45025] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x131) [0x7f7455b3f4a5] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_disconnect+0x7b) [0x7f7455b3f25b]))) : no memory available for size (62489783496)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-04-27 15:37:30
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
:

Comment 1 Amar Tumballi 2012-05-02 19:07:15 UTC
Noticed now that, this happened when the process goto the SIGTERM (cleanup_and_exit() with signal 15). At this stage, if there are pending frames, this could have happened. Need to bring in a check to wait for all the saved_frames_unwind() to finish before doing cleanup_and_exit().

Considering anyways process was about to get stopped, lowering the priority of the bug. Feel Free to grab from my name if someone want to fix it :-)

Comment 2 Raghavendra Bhat 2012-05-03 09:36:00 UTC
I got the crash again with the below backtrace.


Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.hyperspace.mnt-sda10-e'.
Program terminated with signal 6, Aborted.
#0  0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f3ff16bcab6 in abort () at abort.c:92
#2  0x00007f3ff16b17c5 in __assert_fail (assertion=0x7f3ff1db9f05 "(-(e)) != 3 || !robust", file=<value optimized out>, line=312, 
    function=<value optimized out>) at assert.c:81
#3  0x00007f3ff1db12aa in __pthread_mutex_lock_full (mutex=0xcca620) at pthread_mutex_lock.c:312
#4  0x00007f3ff21e923d in rpcsvc_handle_disconnect (svc=0xcca620, trans=0xcdf5c0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:554
#5  0x00007f3ff21e94ba in rpcsvc_notify (trans=0xcdf5c0, mydata=0xcca620, event=RPC_TRANSPORT_DISCONNECT, data=0xcdf5c0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:616
#6  0x00007f3ff21ef039 in rpc_transport_notify (this=0xcdf5c0, event=RPC_TRANSPORT_DISCONNECT, data=0xcdf5c0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#7  0x00007f3fee73327a in socket_event_poll_err (this=0xcdf5c0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:694
#8  0x00007f3fee7379a5 in socket_event_handler (fd=8, idx=3, data=0xcdf5c0, poll_in=1, poll_out=0, poll_err=16)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1808
#9  0x00007f3ff2449ac1 in event_dispatch_epoll_handler (event_pool=0xcc53a0, events=0xcde840, i=0) at ../../../libglusterfs/src/event.c:794
#10 0x00007f3ff2449cdb in event_dispatch_epoll (event_pool=0xcc53a0) at ../../../libglusterfs/src/event.c:856
#11 0x00007f3ff244a04d in event_dispatch (event_pool=0xcc53a0) at ../../../libglusterfs/src/event.c:956
#12 0x000000000040857f in main (argc=19, argv=0x7fff36e21028) at ../../../glusterfsd/src/glusterfsd.c:1652
(gdb) f 3
#3  0x00007f3ff1db12aa in __pthread_mutex_lock_full (mutex=0xcca620) at pthread_mutex_lock.c:312
312	pthread_mutex_lock.c: No such file or directory.
	in pthread_mutex_lock.c
(gdb) f 4
#4  0x00007f3ff21e923d in rpcsvc_handle_disconnect (svc=0xcca620, trans=0xcdf5c0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:554
554	        pthread_mutex_lock (&svc->rpclock);
(gdb) p svc
$1 = (rpcsvc_t *) 0xcca620
(gdb) p *svc
$2 = {rpclock = {__data = {__lock = -1322241445, __count = 892349746, __owner = 540225581, __nusers = 825898033, __kind = 859060786, 
      __spins = 875902254, __list = {__prev = 0x5b2057205d383036, __next = 0x632e74656b636f73}}, 
    __size = "[20\261\062-05-03 14:12:43.554608] W [socket.c", __align = 3832612978636632667}, memfactor = 842346810, authschemes = {
    next = 0x705f74656b636f73, prev = 0x6174735f6f746f72}, options = 0x696863616d5f6574, allow_insecure = 542991726, 
  register_portmap = 1818701104, ctx = 0x3a73667265747375, listeners = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}, programs = {
    next = 0x6961662074656b63, prev = 0x727245202e64656c}, notify = {next = 0x6e6e6f432820726f, prev = 0x72206e6f69746365}, 
  notify_count = 1952805733, mydata = 0x70202c2972656570, notifyfn = 0x3732312820726565, rxpool = 0x323a312e302e302e}
(gdb) info thr
  9 Thread 15323  0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
  8 Thread 15406  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  7 Thread 15405  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  6 Thread 15408  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  5 Thread 15325  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4 Thread 15407  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  3 Thread 15324  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 15327  0x00007f3ff1db74bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 15322  0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
(gdb) t 9
[Switching to thread 9 (Thread 15323)]#0  0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
82	../sysdeps/unix/syscall-template.S: No such file or directory.
	in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000407847 in glusterfs_pidfile_cleanup (ctx=0xcad010) at ../../../glusterfsd/src/glusterfsd.c:1296
#2  0x0000000000406225 in cleanup_and_exit (signum=15) at ../../../glusterfsd/src/glusterfsd.c:814
#3  0x0000000000407c46 in glusterfs_sigwaiter (arg=0x7fff36e20e30) at ../../../glusterfsd/src/glusterfsd.c:1384
#4  0x00007f3ff1daed8c in start_thread (arg=0x7f3ff0145700) at pthread_create.c:304
#5  0x00007f3ff176b04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()
(gdb)

Comment 3 Raghavendra Bhat 2012-12-04 09:41:59 UTC
http://review.gluster.org/4230 : root caused the issue as a race between freeing the RPC object and handling socket's POLL_ERR

May be couple of iterations of review before getting the patch accepted, but marking it as POST as its in review queue.

Comment 4 Sachidananda Urs 2012-12-18 08:52:02 UTC
The typical usecase for this is to:

kill all glusterfs processes and restart the glusterd couple of time. This would generate the crash.

With the new release(qa5), this behavior is no longer reproducible.