Bug 817468
| Summary: | [8b6534031ab9b60da293e9c2ffb95141d714f973]: glusterfs server crashed due to memory corruption | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Raghavendra Bhat <rabhat> | |
| Component: | protocol | Assignee: | Raghavendra Bhat <rabhat> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | high | Docs Contact: | ||
| Priority: | medium | |||
| Version: | mainline | CC: | amarts, gluster-bugs, sac | |
| Target Milestone: | --- | Keywords: | Triaged | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 852820 (view as bug list) | Environment: | ||
| Last Closed: | 2013-07-24 17:54:39 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 852820, 858487 | |||
Noticed now that, this happened when the process goto the SIGTERM (cleanup_and_exit() with signal 15). At this stage, if there are pending frames, this could have happened. Need to bring in a check to wait for all the saved_frames_unwind() to finish before doing cleanup_and_exit(). Considering anyways process was about to get stopped, lowering the priority of the bug. Feel Free to grab from my name if someone want to fix it :-) I got the crash again with the below backtrace.
Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.hyperspace.mnt-sda10-e'.
Program terminated with signal 6, Aborted.
#0 0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f3ff16bcab6 in abort () at abort.c:92
#2 0x00007f3ff16b17c5 in __assert_fail (assertion=0x7f3ff1db9f05 "(-(e)) != 3 || !robust", file=<value optimized out>, line=312,
function=<value optimized out>) at assert.c:81
#3 0x00007f3ff1db12aa in __pthread_mutex_lock_full (mutex=0xcca620) at pthread_mutex_lock.c:312
#4 0x00007f3ff21e923d in rpcsvc_handle_disconnect (svc=0xcca620, trans=0xcdf5c0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:554
#5 0x00007f3ff21e94ba in rpcsvc_notify (trans=0xcdf5c0, mydata=0xcca620, event=RPC_TRANSPORT_DISCONNECT, data=0xcdf5c0)
at ../../../../rpc/rpc-lib/src/rpcsvc.c:616
#6 0x00007f3ff21ef039 in rpc_transport_notify (this=0xcdf5c0, event=RPC_TRANSPORT_DISCONNECT, data=0xcdf5c0)
at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#7 0x00007f3fee73327a in socket_event_poll_err (this=0xcdf5c0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:694
#8 0x00007f3fee7379a5 in socket_event_handler (fd=8, idx=3, data=0xcdf5c0, poll_in=1, poll_out=0, poll_err=16)
at ../../../../../rpc/rpc-transport/socket/src/socket.c:1808
#9 0x00007f3ff2449ac1 in event_dispatch_epoll_handler (event_pool=0xcc53a0, events=0xcde840, i=0) at ../../../libglusterfs/src/event.c:794
#10 0x00007f3ff2449cdb in event_dispatch_epoll (event_pool=0xcc53a0) at ../../../libglusterfs/src/event.c:856
#11 0x00007f3ff244a04d in event_dispatch (event_pool=0xcc53a0) at ../../../libglusterfs/src/event.c:956
#12 0x000000000040857f in main (argc=19, argv=0x7fff36e21028) at ../../../glusterfsd/src/glusterfsd.c:1652
(gdb) f 3
#3 0x00007f3ff1db12aa in __pthread_mutex_lock_full (mutex=0xcca620) at pthread_mutex_lock.c:312
312 pthread_mutex_lock.c: No such file or directory.
in pthread_mutex_lock.c
(gdb) f 4
#4 0x00007f3ff21e923d in rpcsvc_handle_disconnect (svc=0xcca620, trans=0xcdf5c0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:554
554 pthread_mutex_lock (&svc->rpclock);
(gdb) p svc
$1 = (rpcsvc_t *) 0xcca620
(gdb) p *svc
$2 = {rpclock = {__data = {__lock = -1322241445, __count = 892349746, __owner = 540225581, __nusers = 825898033, __kind = 859060786,
__spins = 875902254, __list = {__prev = 0x5b2057205d383036, __next = 0x632e74656b636f73}},
__size = "[20\261\062-05-03 14:12:43.554608] W [socket.c", __align = 3832612978636632667}, memfactor = 842346810, authschemes = {
next = 0x705f74656b636f73, prev = 0x6174735f6f746f72}, options = 0x696863616d5f6574, allow_insecure = 542991726,
register_portmap = 1818701104, ctx = 0x3a73667265747375, listeners = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}, programs = {
next = 0x6961662074656b63, prev = 0x727245202e64656c}, notify = {next = 0x6e6e6f432820726f, prev = 0x72206e6f69746365},
notify_count = 1952805733, mydata = 0x70202c2972656570, notifyfn = 0x3732312820726565, rxpool = 0x323a312e302e302e}
(gdb) info thr
9 Thread 15323 0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
8 Thread 15406 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
7 Thread 15405 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
6 Thread 15408 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
5 Thread 15325 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
4 Thread 15407 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
3 Thread 15324 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
2 Thread 15327 0x00007f3ff1db74bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 15322 0x00007f3ff16b8d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
(gdb) t 9
[Switching to thread 9 (Thread 15323)]#0 0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: No such file or directory.
in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0 0x00007f3ff175dcf7 in unlink () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000407847 in glusterfs_pidfile_cleanup (ctx=0xcad010) at ../../../glusterfsd/src/glusterfsd.c:1296
#2 0x0000000000406225 in cleanup_and_exit (signum=15) at ../../../glusterfsd/src/glusterfsd.c:814
#3 0x0000000000407c46 in glusterfs_sigwaiter (arg=0x7fff36e20e30) at ../../../glusterfsd/src/glusterfsd.c:1384
#4 0x00007f3ff1daed8c in start_thread (arg=0x7f3ff0145700) at pthread_create.c:304
#5 0x00007f3ff176b04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6 0x0000000000000000 in ?? ()
(gdb)
http://review.gluster.org/4230 : root caused the issue as a race between freeing the RPC object and handling socket's POLL_ERR May be couple of iterations of review before getting the patch accepted, but marking it as POST as its in review queue. The typical usecase for this is to: kill all glusterfs processes and restart the glusterd couple of time. This would generate the crash. With the new release(qa5), this behavior is no longer reproducible. |
Description of problem: glusterfs server crashed due to memory corruption with the below backtrace. Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.hyperspace.mnt-sda7-ex'. Program terminated with signal 11, Segmentation fault. #0 0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223 1223 if (listener->trans == trans) { (gdb) bt #0 0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223 #1 0x00007f7455b3f573 in rpcsvc_notify (trans=0x1a986f0, mydata=0x1a83610, event=RPC_TRANSPORT_CLEANUP, data=0x0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:636 #2 0x00007f7455b44ef8 in rpc_transport_unref (this=0x1a986f0) at ../../../../rpc/rpc-lib/src/rpc-transport.c:479 #3 0x00007f745208d9b1 in socket_event_handler (fd=8, idx=3, data=0x1a986f0, poll_in=1, poll_out=0, poll_err=16) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1809 #4 0x00007f7455d9fac1 in event_dispatch_epoll_handler (event_pool=0x1a7e3a0, events=0x1a97830, i=0) at ../../../libglusterfs/src/event.c:794 #5 0x00007f7455d9fcdb in event_dispatch_epoll (event_pool=0x1a7e3a0) at ../../../libglusterfs/src/event.c:856 #6 0x00007f7455da004d in event_dispatch (event_pool=0x1a7e3a0) at ../../../libglusterfs/src/event.c:956 #7 0x000000000040857f in main (argc=19, argv=0x7fff83788c18) at ../../../glusterfsd/src/glusterfsd.c:1652 (gdb) info thr 7 Thread 29231 0x00007f74550b3cf7 in unlink () at ../sysdeps/unix/syscall-template.S:82 6 Thread 29235 0x00007f745570d4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 5 Thread 29262 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 4 Thread 29232 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 3 Thread 29263 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 2 Thread 29233 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 * 1 Thread 29230 0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223 (gdb) f 0 #0 0x00007f7455b40a56 in rpcsvc_get_listener (svc=0x1a83610, port=65535, trans=0x1a90440) at ../../../../rpc/rpc-lib/src/rpcsvc.c:1223 1223 if (listener->trans == trans) { (gdb) p listener->trans Cannot access memory at address 0x676e696461657190 (gdb) p *listener Cannot access memory at address 0x676e696461657190 (gdb) l 1218 1219 pthread_mutex_lock (&svc->rpclock); 1220 { 1221 list_for_each_entry (listener, &svc->listeners, list) { 1222 if (trans != NULL) { 1223 if (listener->trans == trans) { 1224 found = 1; 1225 break; 1226 } 1227 (gdb) p svc->listeners $1 = {next = 0x676e696461657220, prev = 0x6f73206d6f726620} (gdb) p *svc $2 = {rpclock = {__data = {__lock = 825242203, __count = 875572530, __owner = 540488237, __nusers = 859452721, __kind = 808663607, __spins = 875967022, __list = {__prev = 0x5b2057205d313037, __next = 0x632e74656b636f73}}, __size = "[2012-04-27 15:37:30.264701] W [socket.c", __align = 3760555382451221083}, memfactor = 842346810, authschemes = { next = 0x705f74656b636f73, prev = 0x6174735f6f746f72}, options = 0x696863616d5f6574, allow_insecure = 542991726, register_portmap = 1818701104, ctx = 0x3a73667265747375, listeners = {next = 0x676e696461657220, prev = 0x6f73206d6f726620}, programs = { next = 0x6961662074656b63, prev = 0x727245202e64656c}, notify = {next = 0x6e6e6f432820726f, prev = 0x72206e6f69746365}, notify_count = 1952805733, mydata = 0x70202c2972656570, notifyfn = 0x3732312820726565, rxpool = 0x323a312e302e302e} (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: +------------------------------------------------------------------------------+ [2012-04-27 14:40:33.792647] W [socket.c:419:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2012-04-27 14:40:33.792815] W [socket.c:1885:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2012-04-27 14:40:33.884884] I [server-handshake.c:571:server_setvolume] 0-vol-server: accepted client from hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0 (version: 3git) [2012-04-27 14:47:01.563460] I [server.c:685:server_rpc_notify] 0-vol-server: disconnecting connectionfrom hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0 [2012-04-27 14:47:01.563526] I [server-helpers.c:741:server_connection_put] 0-vol-server: Shutting down connection hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0 [2012-04-27 14:47:01.563615] I [server-helpers.c:629:server_connection_destroy] 0-vol-server: destroyed connection of hyperspace-29240-2012/04/27-14:40:29:862111-vol-client-0-0 [2012-04-27 14:47:07.584147] I [server-handshake.c:571:server_setvolume] 0-vol-server: accepted client from hyperspace-11685-2012/04/27-14:47:02:558264-vol-client-0-0 (version: 3git) [2012-04-27 15:37:30.224134] W [glusterfsd.c:794:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f74550c104d] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f7455704d8c] (-->/usr/local/sbin/glusterfsd(glusterfs_sigwaiter+0xea) [0x407c46]))) 0-: received signum (15), shutting down [2012-04-27 15:37:30.224502] E [rpcsvc.c:1161:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap [2012-04-27 15:37:30.264661] W [socket.c:204:__socket_rwv] 0-glusterfs: readv failed (Connection reset by peer) [2012-04-27 15:37:30.264701] W [socket.c:1521:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (Connection reset by peer), peer (127.0.0.1:24007) [2012-04-27 15:37:30.264810] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x124) [0x7f7455b48e7c] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7455b483ee] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7455b47e73]))) 0-glusterfs: forced unwinding frame type(Gluster Portmap) op(SIGNOUT(5)) called at 2012-04-27 15:37:30.224215 (xid=0x3x) [2012-04-27 15:37:30.275653] A [mem-pool.c:153:__gf_calloc] (-->/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x115) [0x7f7455b45025] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_notify+0x131) [0x7f7455b3f4a5] (-->/usr/local/lib/libgfrpc.so.0(rpcsvc_handle_disconnect+0x7b) [0x7f7455b3f25b]))) : no memory available for size (62489783496) pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-04-27 15:37:30 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 :