Description of problem: ======================= glfsheal crashed and heal is pending on some of the files. Here is the bt: =============== (gdb) t a a bt Thread 7 (Thread 0x7ff98bd82700 (LWP 11331)): #0 0x00007ff9960c234a in mmap64 () from /lib64/libc.so.6 #1 0x00007ff9960432ec in _IO_file_doallocate_internal () from /lib64/libc.so.6 #2 0x00007ff9960507ac in _IO_doallocbuf_internal () from /lib64/libc.so.6 #3 0x00007ff99604f0dc in _IO_new_file_seekoff () from /lib64/libc.so.6 #4 0x00007ff99604de62 in _IO_new_file_attach () from /lib64/libc.so.6 #5 0x00007ff9960437a5 in fdopen@@GLIBC_2.2.5 () from /lib64/libc.so.6 #6 0x00007ff99826a10c in gf_backtrace_fillframes ( buf=0x7ff9998cf4f8 "(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7ff998253520] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7ff9979a1167] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0"...) at common-utils.c:3611 #7 0x00007ff99826a245 in gf_backtrace_save (buf=<value optimized out>) at common-utils.c:3665 #8 0x00007ff998253520 in _gf_log_callingfn ( domain=0x7ff97c0120f0 "vol2-client-7", file=<value optimized out>, function=0x7ff9979a7570 "saved_frames_unwind", line=362, level=GF_LOG_ERROR, fmt=0x7ff9979a7110 "forced unwinding frame type(%s) op(%s(%d)) called at %s (xid=0x%x)") at logging.c:837 #9 0x00007ff9979a1167 in saved_frames_unwind (saved_frames=0x7ff97c1f54c0) at rpc-clnt.c:353 #10 0x00007ff9979a127e in saved_frames_destroy (frames=0x7ff97c1f54c0) at rpc-clnt.c:383 #11 0x00007ff9979a134b in rpc_clnt_connection_cleanup (conn=0x7ff97c0fc090) at rpc-clnt.c:536 #12 0x00007ff9979a190f in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7ff97c0fc090, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:843 #13 0x00007ff99799cad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #14 0x00007ff98b177df1 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, data=0x7ff97c10bcd0, poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:1205 #15 socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7ff97c10bcd0, poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:2410 #16 0x00007ff9982b3970 in event_dispatch_epoll_handler (data=0x7ff984000920) at event-epoll.c:575 #17 event_dispatch_epoll_worker (data=0x7ff984000920) at event-epoll.c:678 #18 0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0 #19 0x00007ff9960c596d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7ff98c783700 (LWP 11330)): #0 0x00007ff99675c2ad in pthread_join () from /lib64/libpthread.so.0 #1 0x00007ff9982b341d in event_dispatch_epoll (event_pool=0x7ff9998edee0) at event-epoll.c:762 #2 0x00007ff997778ab4 in glfs_poller (data=<value optimized out>) at glfs.c:579 #3 0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ff9960c596d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7ff98d388700 (LWP 11329)): #0 0x00007ff996762fbd in nanosleep () from /lib64/libpthread.so.0 #1 0x00007ff9982715ca in gf_timer_proc (ctx=0x7ff9998cf170) at timer.c:205 #2 0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #3 0x00007ff9960c596d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7ff9944b1700 (LWP 11328)): #0 0x00007ff9960c234a in mmap64 () from /lib64/libc.so.6 #1 0x00007ff9960432ec in _IO_file_doallocate_internal () from /lib64/libc.so.6 #2 0x00007ff9960507ac in _IO_doallocbuf_internal () from /lib64/libc.so.6 #3 0x00007ff99604f0dc in _IO_new_file_seekoff () from /lib64/libc.so.6 #4 0x00007ff99604de62 in _IO_new_file_attach () from /lib64/libc.so.6 #5 0x00007ff9960437a5 in fdopen@@GLIBC_2.2.5 () from /lib64/libc.so.6 #6 0x00007ff99826a10c in gf_backtrace_fillframes ( buf=0x7ff978008610 "(--> /usr/lib64/libglusterfs.so.0(synctask_yield+0x2c)[0x7ff998296d9c] (--> /usr/lib64/libglusterfs.so.0(syncbarrier_wait+0x76)[0x7ff998296ea6] (--> /usr/lib64/libglusterfs.so.0(cluster_inodelk+0x384)"...) at common-utils.c:3611 #7 0x00007ff99826a245 in gf_backtrace_save (buf=<value optimized out>) at common-utils.c:3665 #8 0x00007ff998296d9c in synctask_yield (task=0x7ff978008180) at syncop.c:341 #9 0x00007ff998296ea6 in __syncbarrier_wait (barrier=0x7ff980225d68, waitfor=11) at syncop.c:1114 #10 syncbarrier_wait (barrier=0x7ff980225d68, waitfor=11) at syncop.c:1135 #11 0x00007ff9982c0fb4 in cluster_inodelk (subvols=0x7ff97c0714d0, on=0x7ff980225ea0 '\001' <repeats 11 times>"\211, \371\177", numsubvols=11, replies=0x7ff980225f00, locked_on=0x7ff980225ee0 "", frame=0x7ff99383eb3c, this=0x7ff97c0149d0, dom=0x7ff97c013300 "vol2-disperse-0", inode=0x7ff980fe706c, off=0, size=0) at cluster-syncop.c:1092 #12 0x00007ff9896f168c in ec_heal_metadata (frame=0x7ff99383eb3c, ec=0x7ff97c070fd0, inode=0x7ff980fe706c, sources=0x7ff98022bf10 "", healed_sinks=0x7ff98022bef0 "") at ec-heal.c:2110 #13 0x00007ff9896f18c0 in ec_heal_do (this=<value optimized out>, data=0x7ff98322506c, loc=0x7ff9832252b4, partial=1) at ec-heal.c:3638 #14 0x00007ff9896f1c9d in ec_synctask_heal_wrap ( opaque=<value optimized out>) at ec-heal.c:3683 #15 0x00007ff9982971f2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:381 #16 0x00007ff9960208f0 in ?? () from /lib64/libc.so.6 #17 0x0000000000000000 in ?? () Thread 3 (Thread 0x7ff983deb700 (LWP 11332)): #0 0x00007ff9960b96c7 in unlink () from /lib64/libc.so.6 #1 0x00007ff99826a1a4 in gf_backtrace_fillframes (buf=<value optimized out>) at common-utils.c:3638 #2 0x00007ff99826a245 in gf_backtrace_save (buf=<value optimized out>) at common-utils.c:3665 #3 0x00007ff998253520 in _gf_log_callingfn ( domain=0x7ff97c010e80 "vol2-client-6", file=<value optimized out>, function=0x7ff9979a7570 "saved_frames_unwind", line=362, level=GF_LOG_ERROR, fmt=0x7ff9979a7110 "forced unwinding frame type(%s) op(%s(%d)) called at %s (xid=0x%x)") at logging.c:837 #4 0x00007ff9979a1167 in saved_frames_unwind (saved_frames=0x7ff978001940) at rpc-clnt.c:353 #5 0x00007ff9979a127e in saved_frames_destroy (frames=0x7ff978001940) at rpc-clnt.c:383 #6 0x00007ff9979a134b in rpc_clnt_connection_cleanup (conn=0x7ff97c122c90) at rpc-clnt.c:536 #7 0x00007ff9979a190f in rpc_clnt_notify (trans=<value optimized out>, ---Type <return> to continue, or q <return> to quit--- mydata=0x7ff97c122c90, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:843 #8 0x00007ff99799cad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #9 0x00007ff98b177df1 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, data=0x7ff97c1328d0, poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:1205 #10 socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7ff97c1328d0, poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:2410 #11 0x00007ff9982b3970 in event_dispatch_epoll_handler (data=0x7ff97c0306d0) at event-epoll.c:575 #12 event_dispatch_epoll_worker (data=0x7ff97c0306d0) at event-epoll.c:678 #13 0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0 #14 0x00007ff9960c596d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7ff9986fa740 (LWP 11326)): #0 0x00007ff9960dcddc in __fprintf_chk () from /lib64/libc.so.6 #1 0x00007ff9982523c4 in fprintf (ctx=0x7ff9998cf170, domain=<value optimized out>, file=0x7ff9982cda8f "mem-pool.c", function=0x7ff9982cdcd0 "mem_pool_destroy", line=616, level=GF_LOG_INFO, errnum=0, msgid=101053, appmsgstr=0x7ffc012652e8, callstr=0x0, tv=..., graph_id=0, fmt=gf_logformat_withmsgid) at /usr/include/bits/stdio2.h:98 #2 gf_log_glusterlog (ctx=0x7ff9998cf170, domain=<value optimized out>, file=0x7ff9982cda8f "mem-pool.c", function=0x7ff9982cdcd0 "mem_pool_destroy", line=616, level=GF_LOG_INFO, errnum=0, msgid=101053, appmsgstr=0x7ffc012652e8, callstr=0x0, tv=..., graph_id=0, fmt=gf_logformat_withmsgid) at logging.c:1432 #3 0x00007ff998251227 in _gf_msg_internal (domain=<value optimized out>, file=<value optimized out>, function=<value optimized out>, line=<value optimized out>, level=<value optimized out>, errnum=<value optimized out>, trace=0, msgid=101053, fmt=0x7ff9982cdab8 "size=%lu max=%d total=%lu") at logging.c:1994 #4 _gf_msg (domain=<value optimized out>, file=<value optimized out>, function=<value optimized out>, line=<value optimized out>, level=<value optimized out>, errnum=<value optimized out>, trace=0, msgid=101053, fmt=0x7ff9982cdab8 "size=%lu max=%d total=%lu") at logging.c:2077 #5 0x00007ff998285ea0 in mem_pool_destroy (pool=0x7ff970000f50) at mem-pool.c:614 #6 0x00007ff9982742b2 in inode_table_destroy (inode_table=0x7ff970000e30) at inode.c:1754 #7 0x00007ff998274341 in inode_table_destroy_all (ctx=0x7ff9998cf170) at inode.c:1699 #8 0x00007ff9977787e9 in pub_glfs_fini (fs=0x7ff9998cf010) at glfs.c:1148 #9 0x00007ff9987151e8 in main (argc=<value optimized out>, argv=<value optimized out>) at glfs-heal.c:829 Thread 1 (Thread 0x7ff994eb2700 (LWP 11327)): #0 __inode_retire (inode=0x7ff970000e40) at inode.c:445 #1 0x00007ff9982740f4 in inode_table_prune (table=0x7ff970000e30) at inode.c:1487 #2 0x00007ff99827475c in inode_unref (inode=0x7ff980fe706c) at inode.c:529 #3 0x00007ff99824cf72 in loc_wipe (loc=0x7ff97c182b2c) at xlator.c:690 #4 0x00007ff98991c59e in client_local_wipe (local=0x7ff97c182b2c) at client-helpers.c:129 #5 0x00007ff9899300db in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, ---Type <return> to continue, or q <return> to quit--- myframe=0x7ff993a04724) at client-rpc-fops.c:2978 #6 0x00007ff9979a05e6 in rpc_clnt_submit (rpc=0x7ff97c197060, prog=<value optimized out>, procnum=<value optimized out>, cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, proghdr=<value optimized out>, proghdrcount=<value optimized out>, progpayload=0x0, progpayloadcount=0, iobref=<value optimized out>, frame=0x7ff993a04724, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1607 #7 0x00007ff989919d66 in client_submit_request (this=0x7ff97c00b650, req=0x7ff980414a30, frame=0x7ff993a04724, prog=0x7ff989b4ebc0, procnum=<value optimized out>, cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, iobref=0x0, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0, xdrproc=0x7ff997bb5120 <xdr_gfs3_lookup_req>) at client.c:315 #8 0x00007ff98992ebb0 in client3_3_lookup (frame=0x7ff993a04724, this=0x7ff97c00b650, data=0x7ff980414ae0) at client-rpc-fops.c:3329 #9 0x00007ff98991965d in client_lookup (frame=0x7ff993a04724, this=<value optimized out>, loc=<value optimized out>, xdata=0x0) at client.c:430 #10 0x00007ff9982bbdc0 in cluster_lookup (subvols=<value optimized out>, on=0x7ff980426ee0 '\001' <repeats 11 times>, numsubvols=11, replies=0x7ff980420dd0, output=0x7ff980414d10 "", frame=0x7ff99383c034, this=0x7ff97c0149d0, loc=0x7ff98041acc0, xdata=0x0) at cluster-syncop.c:968 #11 0x00007ff9896f0e09 in __ec_heal_metadata_prepare (frame=0x7ff99383c034, ec=0x7ff97c070fd0, inode=<value optimized out>, locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, replies=0x7ff980420dd0, versions=0x7ff98041adf0, dirty=0x7ff98041ad80, sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "") at ec-heal.c:1922 #12 0x00007ff9896f11ba in __ec_heal_metadata (frame=0x7ff99383c034, ec=0x7ff97c070fd0, inode=0x7ff980fe706c, locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "") at ec-heal.c:2032 #13 0x00007ff9896f16b3 in ec_heal_metadata (frame=0x7ff99383c034, ec=0x7ff97c070fd0, inode=0x7ff980fe706c, sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "") at ec-heal.c:2121 #14 0x00007ff9896f18c0 in ec_heal_do (this=<value optimized out>, data=0x7ff983225780, loc=0x7ff9832259c8, partial=1) at ec-heal.c:3638 #15 0x00007ff9896f1c9d in ec_synctask_heal_wrap ( opaque=<value optimized out>) at ec-heal.c:3683 #16 0x00007ff9982971f2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:381 #17 0x00007ff9960208f0 in ?? () from /lib64/libc.so.6 #18 0x0000000000000000 in ?? () (gdb) (gdb) bt #0 __inode_retire (inode=0x7ff970000e40) at inode.c:445 #1 0x00007ff9982740f4 in inode_table_prune (table=0x7ff970000e30) at inode.c:1487 #2 0x00007ff99827475c in inode_unref (inode=0x7ff980fe706c) at inode.c:529 #3 0x00007ff99824cf72 in loc_wipe (loc=0x7ff97c182b2c) at xlator.c:690 #4 0x00007ff98991c59e in client_local_wipe (local=0x7ff97c182b2c) at client-helpers.c:129 #5 0x00007ff9899300db in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7ff993a04724) at client-rpc-fops.c:2978 #6 0x00007ff9979a05e6 in rpc_clnt_submit (rpc=0x7ff97c197060, prog=<value optimized out>, procnum=<value optimized out>, cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, proghdr=<value optimized out>, proghdrcount=<value optimized out>, progpayload=0x0, progpayloadcount=0, iobref=<value optimized out>, frame=0x7ff993a04724, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1607 #7 0x00007ff989919d66 in client_submit_request (this=0x7ff97c00b650, req=0x7ff980414a30, frame=0x7ff993a04724, prog=0x7ff989b4ebc0, procnum=<value optimized out>, cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, iobref=0x0, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0, xdrproc=0x7ff997bb5120 <xdr_gfs3_lookup_req>) at client.c:315 #8 0x00007ff98992ebb0 in client3_3_lookup (frame=0x7ff993a04724, this=0x7ff97c00b650, data=0x7ff980414ae0) at client-rpc-fops.c:3329 #9 0x00007ff98991965d in client_lookup (frame=0x7ff993a04724, this=<value optimized out>, loc=<value optimized out>, xdata=0x0) at client.c:430 #10 0x00007ff9982bbdc0 in cluster_lookup (subvols=<value optimized out>, on=0x7ff980426ee0 '\001' <repeats 11 times>, numsubvols=11, replies=0x7ff980420dd0, output=0x7ff980414d10 "", frame=0x7ff99383c034, this=0x7ff97c0149d0, loc=0x7ff98041acc0, xdata=0x0) at cluster-syncop.c:968 #11 0x00007ff9896f0e09 in __ec_heal_metadata_prepare (frame=0x7ff99383c034, ec=0x7ff97c070fd0, inode=<value optimized out>, locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, replies=0x7ff980420dd0, versions=0x7ff98041adf0, dirty=0x7ff98041ad80, sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "") at ec-heal.c:1922 #12 0x00007ff9896f11ba in __ec_heal_metadata (frame=0x7ff99383c034, ec=0x7ff97c070fd0, inode=0x7ff980fe706c, locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "") at ec-heal.c:2032 #13 0x00007ff9896f16b3 in ec_heal_metadata (frame=0x7ff99383c034, ec=0x7ff97c070fd0, inode=0x7ff980fe706c, sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "") at ec-heal.c:2121 #14 0x00007ff9896f18c0 in ec_heal_do (this=<value optimized out>, data=0x7ff983225780, loc=0x7ff9832259c8, partial=1) at ec-heal.c:3638 #15 0x00007ff9896f1c9d in ec_synctask_heal_wrap ( opaque=<value optimized out>) at ec-heal.c:3683 #16 0x00007ff9982971f2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:381 #17 0x00007ff9960208f0 in ?? () from /lib64/libc.so.6 #18 0x0000000000000000 in ?? () (gdb) Version-Release number of selected component (if applicable): ============================================================= [root@transformers core]# gluster --version glusterfs 3.7.1 built on Jun 28 2015 11:01:17 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@transformers core]# How reproducible: ================= seen once Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: ================ Attaching the core file.
Created attachment 1044580 [details] core file
Patch sent for review: https://code.engineering.redhat.com/gerrit/#/c/51934/1
verified this on 3.7.1-12 build and didn't see the crash. Marking this as fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html