Bug 1236990 - glfsheal crashed
Summary: glfsheal crashed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: RHGS 3.1.1
Assignee: Anuradha
QA Contact: Bhaskarakiran
URL:
Whiteboard:
Depends On:
Blocks: 1223636 1251815
TreeView+ depends on / blocked
 
Reported: 2015-06-30 06:39 UTC by Bhaskarakiran
Modified: 2016-11-23 23:12 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.7.1-12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-05 07:15:57 UTC
Embargoed:


Attachments (Terms of Use)
core file (1.57 MB, application/zip)
2015-06-30 06:40 UTC, Bhaskarakiran
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1845 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 11:06:22 UTC

Description Bhaskarakiran 2015-06-30 06:39:34 UTC
Description of problem:
=======================

glfsheal crashed and heal is pending on some of the files.

Here is the bt:
===============
(gdb)  t a a bt

Thread 7 (Thread 0x7ff98bd82700 (LWP 11331)):
#0  0x00007ff9960c234a in mmap64 () from /lib64/libc.so.6
#1  0x00007ff9960432ec in _IO_file_doallocate_internal ()
   from /lib64/libc.so.6
#2  0x00007ff9960507ac in _IO_doallocbuf_internal () from /lib64/libc.so.6
#3  0x00007ff99604f0dc in _IO_new_file_seekoff () from /lib64/libc.so.6
#4  0x00007ff99604de62 in _IO_new_file_attach () from /lib64/libc.so.6
#5  0x00007ff9960437a5 in fdopen@@GLIBC_2.2.5 () from /lib64/libc.so.6
#6  0x00007ff99826a10c in gf_backtrace_fillframes (
    buf=0x7ff9998cf4f8 "(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7ff998253520] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7ff9979a1167] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0"...)
    at common-utils.c:3611
#7  0x00007ff99826a245 in gf_backtrace_save (buf=<value optimized out>)
    at common-utils.c:3665
#8  0x00007ff998253520 in _gf_log_callingfn (
    domain=0x7ff97c0120f0 "vol2-client-7", file=<value optimized out>, 
    function=0x7ff9979a7570 "saved_frames_unwind", line=362, 
    level=GF_LOG_ERROR, 
    fmt=0x7ff9979a7110 "forced unwinding frame type(%s) op(%s(%d)) called at %s (xid=0x%x)") at logging.c:837
#9  0x00007ff9979a1167 in saved_frames_unwind (saved_frames=0x7ff97c1f54c0)
    at rpc-clnt.c:353
#10 0x00007ff9979a127e in saved_frames_destroy (frames=0x7ff97c1f54c0)
    at rpc-clnt.c:383
#11 0x00007ff9979a134b in rpc_clnt_connection_cleanup (conn=0x7ff97c0fc090)
    at rpc-clnt.c:536
#12 0x00007ff9979a190f in rpc_clnt_notify (trans=<value optimized out>, 
    mydata=0x7ff97c0fc090, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:843
#13 0x00007ff99799cad8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#14 0x00007ff98b177df1 in socket_event_poll_err (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7ff97c10bcd0, 
    poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:1205
#15 socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7ff97c10bcd0, 
    poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:2410
#16 0x00007ff9982b3970 in event_dispatch_epoll_handler (data=0x7ff984000920)
    at event-epoll.c:575
#17 event_dispatch_epoll_worker (data=0x7ff984000920) at event-epoll.c:678
#18 0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0
#19 0x00007ff9960c596d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7ff98c783700 (LWP 11330)):
#0  0x00007ff99675c2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007ff9982b341d in event_dispatch_epoll (event_pool=0x7ff9998edee0)
    at event-epoll.c:762
#2  0x00007ff997778ab4 in glfs_poller (data=<value optimized out>)
    at glfs.c:579
#3  0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ff9960c596d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7ff98d388700 (LWP 11329)):
#0  0x00007ff996762fbd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007ff9982715ca in gf_timer_proc (ctx=0x7ff9998cf170) at timer.c:205
#2  0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit--- 
#3  0x00007ff9960c596d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7ff9944b1700 (LWP 11328)):
#0  0x00007ff9960c234a in mmap64 () from /lib64/libc.so.6
#1  0x00007ff9960432ec in _IO_file_doallocate_internal ()
   from /lib64/libc.so.6
#2  0x00007ff9960507ac in _IO_doallocbuf_internal () from /lib64/libc.so.6
#3  0x00007ff99604f0dc in _IO_new_file_seekoff () from /lib64/libc.so.6
#4  0x00007ff99604de62 in _IO_new_file_attach () from /lib64/libc.so.6
#5  0x00007ff9960437a5 in fdopen@@GLIBC_2.2.5 () from /lib64/libc.so.6
#6  0x00007ff99826a10c in gf_backtrace_fillframes (
    buf=0x7ff978008610 "(--> /usr/lib64/libglusterfs.so.0(synctask_yield+0x2c)[0x7ff998296d9c] (--> /usr/lib64/libglusterfs.so.0(syncbarrier_wait+0x76)[0x7ff998296ea6] (--> /usr/lib64/libglusterfs.so.0(cluster_inodelk+0x384)"...)
    at common-utils.c:3611
#7  0x00007ff99826a245 in gf_backtrace_save (buf=<value optimized out>)
    at common-utils.c:3665
#8  0x00007ff998296d9c in synctask_yield (task=0x7ff978008180)
    at syncop.c:341
#9  0x00007ff998296ea6 in __syncbarrier_wait (barrier=0x7ff980225d68, 
    waitfor=11) at syncop.c:1114
#10 syncbarrier_wait (barrier=0x7ff980225d68, waitfor=11) at syncop.c:1135
#11 0x00007ff9982c0fb4 in cluster_inodelk (subvols=0x7ff97c0714d0, 
    on=0x7ff980225ea0 '\001' <repeats 11 times>"\211, \371\177", 
    numsubvols=11, replies=0x7ff980225f00, locked_on=0x7ff980225ee0 "", 
    frame=0x7ff99383eb3c, this=0x7ff97c0149d0, 
    dom=0x7ff97c013300 "vol2-disperse-0", inode=0x7ff980fe706c, off=0, 
    size=0) at cluster-syncop.c:1092
#12 0x00007ff9896f168c in ec_heal_metadata (frame=0x7ff99383eb3c, 
    ec=0x7ff97c070fd0, inode=0x7ff980fe706c, sources=0x7ff98022bf10 "", 
    healed_sinks=0x7ff98022bef0 "") at ec-heal.c:2110
#13 0x00007ff9896f18c0 in ec_heal_do (this=<value optimized out>, 
    data=0x7ff98322506c, loc=0x7ff9832252b4, partial=1) at ec-heal.c:3638
#14 0x00007ff9896f1c9d in ec_synctask_heal_wrap (
    opaque=<value optimized out>) at ec-heal.c:3683
#15 0x00007ff9982971f2 in synctask_wrap (old_task=<value optimized out>)
    at syncop.c:381
#16 0x00007ff9960208f0 in ?? () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ff983deb700 (LWP 11332)):
#0  0x00007ff9960b96c7 in unlink () from /lib64/libc.so.6
#1  0x00007ff99826a1a4 in gf_backtrace_fillframes (buf=<value optimized out>)
    at common-utils.c:3638
#2  0x00007ff99826a245 in gf_backtrace_save (buf=<value optimized out>)
    at common-utils.c:3665
#3  0x00007ff998253520 in _gf_log_callingfn (
    domain=0x7ff97c010e80 "vol2-client-6", file=<value optimized out>, 
    function=0x7ff9979a7570 "saved_frames_unwind", line=362, 
    level=GF_LOG_ERROR, 
    fmt=0x7ff9979a7110 "forced unwinding frame type(%s) op(%s(%d)) called at %s (xid=0x%x)") at logging.c:837
#4  0x00007ff9979a1167 in saved_frames_unwind (saved_frames=0x7ff978001940)
    at rpc-clnt.c:353
#5  0x00007ff9979a127e in saved_frames_destroy (frames=0x7ff978001940)
    at rpc-clnt.c:383
#6  0x00007ff9979a134b in rpc_clnt_connection_cleanup (conn=0x7ff97c122c90)
    at rpc-clnt.c:536
#7  0x00007ff9979a190f in rpc_clnt_notify (trans=<value optimized out>, 
---Type <return> to continue, or q <return> to quit---
    mydata=0x7ff97c122c90, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:843
#8  0x00007ff99799cad8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#9  0x00007ff98b177df1 in socket_event_poll_err (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7ff97c1328d0, 
    poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:1205
#10 socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7ff97c1328d0, 
    poll_in=<value optimized out>, poll_out=0, poll_err=16) at socket.c:2410
#11 0x00007ff9982b3970 in event_dispatch_epoll_handler (data=0x7ff97c0306d0)
    at event-epoll.c:575
#12 event_dispatch_epoll_worker (data=0x7ff97c0306d0) at event-epoll.c:678
#13 0x00007ff99675ba51 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ff9960c596d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7ff9986fa740 (LWP 11326)):
#0  0x00007ff9960dcddc in __fprintf_chk () from /lib64/libc.so.6
#1  0x00007ff9982523c4 in fprintf (ctx=0x7ff9998cf170, 
    domain=<value optimized out>, file=0x7ff9982cda8f "mem-pool.c", 
    function=0x7ff9982cdcd0 "mem_pool_destroy", line=616, level=GF_LOG_INFO, 
    errnum=0, msgid=101053, appmsgstr=0x7ffc012652e8, callstr=0x0, tv=..., 
    graph_id=0, fmt=gf_logformat_withmsgid) at /usr/include/bits/stdio2.h:98
#2  gf_log_glusterlog (ctx=0x7ff9998cf170, domain=<value optimized out>, 
    file=0x7ff9982cda8f "mem-pool.c", 
    function=0x7ff9982cdcd0 "mem_pool_destroy", line=616, level=GF_LOG_INFO, 
    errnum=0, msgid=101053, appmsgstr=0x7ffc012652e8, callstr=0x0, tv=..., 
    graph_id=0, fmt=gf_logformat_withmsgid) at logging.c:1432
#3  0x00007ff998251227 in _gf_msg_internal (domain=<value optimized out>, 
    file=<value optimized out>, function=<value optimized out>, 
    line=<value optimized out>, level=<value optimized out>, 
    errnum=<value optimized out>, trace=0, msgid=101053, 
    fmt=0x7ff9982cdab8 "size=%lu max=%d total=%lu") at logging.c:1994
#4  _gf_msg (domain=<value optimized out>, file=<value optimized out>, 
    function=<value optimized out>, line=<value optimized out>, 
    level=<value optimized out>, errnum=<value optimized out>, trace=0, 
    msgid=101053, fmt=0x7ff9982cdab8 "size=%lu max=%d total=%lu")
    at logging.c:2077
#5  0x00007ff998285ea0 in mem_pool_destroy (pool=0x7ff970000f50)
    at mem-pool.c:614
#6  0x00007ff9982742b2 in inode_table_destroy (inode_table=0x7ff970000e30)
    at inode.c:1754
#7  0x00007ff998274341 in inode_table_destroy_all (ctx=0x7ff9998cf170)
    at inode.c:1699
#8  0x00007ff9977787e9 in pub_glfs_fini (fs=0x7ff9998cf010) at glfs.c:1148
#9  0x00007ff9987151e8 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at glfs-heal.c:829

Thread 1 (Thread 0x7ff994eb2700 (LWP 11327)):
#0  __inode_retire (inode=0x7ff970000e40) at inode.c:445
#1  0x00007ff9982740f4 in inode_table_prune (table=0x7ff970000e30)
    at inode.c:1487
#2  0x00007ff99827475c in inode_unref (inode=0x7ff980fe706c) at inode.c:529
#3  0x00007ff99824cf72 in loc_wipe (loc=0x7ff97c182b2c) at xlator.c:690
#4  0x00007ff98991c59e in client_local_wipe (local=0x7ff97c182b2c)
    at client-helpers.c:129
#5  0x00007ff9899300db in client3_3_lookup_cbk (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, 
---Type <return> to continue, or q <return> to quit---
    myframe=0x7ff993a04724) at client-rpc-fops.c:2978
#6  0x00007ff9979a05e6 in rpc_clnt_submit (rpc=0x7ff97c197060, 
    prog=<value optimized out>, procnum=<value optimized out>, 
    cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, 
    proghdr=<value optimized out>, proghdrcount=<value optimized out>, 
    progpayload=0x0, progpayloadcount=0, iobref=<value optimized out>, 
    frame=0x7ff993a04724, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, 
    rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1607
#7  0x00007ff989919d66 in client_submit_request (this=0x7ff97c00b650, 
    req=0x7ff980414a30, frame=0x7ff993a04724, prog=0x7ff989b4ebc0, 
    procnum=<value optimized out>, 
    cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, iobref=0x0, rsphdr=0x0, 
    rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0, 
    xdrproc=0x7ff997bb5120 <xdr_gfs3_lookup_req>) at client.c:315
#8  0x00007ff98992ebb0 in client3_3_lookup (frame=0x7ff993a04724, 
    this=0x7ff97c00b650, data=0x7ff980414ae0) at client-rpc-fops.c:3329
#9  0x00007ff98991965d in client_lookup (frame=0x7ff993a04724, 
    this=<value optimized out>, loc=<value optimized out>, xdata=0x0)
    at client.c:430
#10 0x00007ff9982bbdc0 in cluster_lookup (subvols=<value optimized out>, 
    on=0x7ff980426ee0 '\001' <repeats 11 times>, numsubvols=11, 
    replies=0x7ff980420dd0, output=0x7ff980414d10 "", frame=0x7ff99383c034, 
    this=0x7ff97c0149d0, loc=0x7ff98041acc0, xdata=0x0)
    at cluster-syncop.c:968
#11 0x00007ff9896f0e09 in __ec_heal_metadata_prepare (frame=0x7ff99383c034, 
    ec=0x7ff97c070fd0, inode=<value optimized out>, 
    locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, 
    replies=0x7ff980420dd0, versions=0x7ff98041adf0, dirty=0x7ff98041ad80, 
    sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "")
    at ec-heal.c:1922
#12 0x00007ff9896f11ba in __ec_heal_metadata (frame=0x7ff99383c034, 
    ec=0x7ff97c070fd0, inode=0x7ff980fe706c, 
    locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, 
    sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "")
    at ec-heal.c:2032
#13 0x00007ff9896f16b3 in ec_heal_metadata (frame=0x7ff99383c034, 
    ec=0x7ff97c070fd0, inode=0x7ff980fe706c, sources=0x7ff98042cf10 "", 
    healed_sinks=0x7ff98042cef0 "") at ec-heal.c:2121
#14 0x00007ff9896f18c0 in ec_heal_do (this=<value optimized out>, 
    data=0x7ff983225780, loc=0x7ff9832259c8, partial=1) at ec-heal.c:3638
#15 0x00007ff9896f1c9d in ec_synctask_heal_wrap (
    opaque=<value optimized out>) at ec-heal.c:3683
#16 0x00007ff9982971f2 in synctask_wrap (old_task=<value optimized out>)
    at syncop.c:381
#17 0x00007ff9960208f0 in ?? () from /lib64/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb) 
(gdb) bt
#0  __inode_retire (inode=0x7ff970000e40) at inode.c:445
#1  0x00007ff9982740f4 in inode_table_prune (table=0x7ff970000e30)
    at inode.c:1487
#2  0x00007ff99827475c in inode_unref (inode=0x7ff980fe706c) at inode.c:529
#3  0x00007ff99824cf72 in loc_wipe (loc=0x7ff97c182b2c) at xlator.c:690
#4  0x00007ff98991c59e in client_local_wipe (local=0x7ff97c182b2c)
    at client-helpers.c:129
#5  0x00007ff9899300db in client3_3_lookup_cbk (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7ff993a04724) at client-rpc-fops.c:2978
#6  0x00007ff9979a05e6 in rpc_clnt_submit (rpc=0x7ff97c197060, 
    prog=<value optimized out>, procnum=<value optimized out>, 
    cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, 
    proghdr=<value optimized out>, proghdrcount=<value optimized out>, 
    progpayload=0x0, progpayloadcount=0, iobref=<value optimized out>, 
    frame=0x7ff993a04724, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, 
    rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1607
#7  0x00007ff989919d66 in client_submit_request (this=0x7ff97c00b650, 
    req=0x7ff980414a30, frame=0x7ff993a04724, prog=0x7ff989b4ebc0, 
    procnum=<value optimized out>, 
    cbkfn=0x7ff98992fa70 <client3_3_lookup_cbk>, iobref=0x0, rsphdr=0x0, 
    rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0, 
    xdrproc=0x7ff997bb5120 <xdr_gfs3_lookup_req>) at client.c:315
#8  0x00007ff98992ebb0 in client3_3_lookup (frame=0x7ff993a04724, 
    this=0x7ff97c00b650, data=0x7ff980414ae0) at client-rpc-fops.c:3329
#9  0x00007ff98991965d in client_lookup (frame=0x7ff993a04724, 
    this=<value optimized out>, loc=<value optimized out>, xdata=0x0)
    at client.c:430
#10 0x00007ff9982bbdc0 in cluster_lookup (subvols=<value optimized out>, 
    on=0x7ff980426ee0 '\001' <repeats 11 times>, numsubvols=11, 
    replies=0x7ff980420dd0, output=0x7ff980414d10 "", frame=0x7ff99383c034, 
    this=0x7ff97c0149d0, loc=0x7ff98041acc0, xdata=0x0)
    at cluster-syncop.c:968
#11 0x00007ff9896f0e09 in __ec_heal_metadata_prepare (frame=0x7ff99383c034, 
    ec=0x7ff97c070fd0, inode=<value optimized out>, 
    locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, 
    replies=0x7ff980420dd0, versions=0x7ff98041adf0, dirty=0x7ff98041ad80, 
    sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "")
    at ec-heal.c:1922
#12 0x00007ff9896f11ba in __ec_heal_metadata (frame=0x7ff99383c034, 
    ec=0x7ff97c070fd0, inode=0x7ff980fe706c, 
    locked_on=0x7ff980426ee0 '\001' <repeats 11 times>, 
    sources=0x7ff98042cf10 "", healed_sinks=0x7ff98042cef0 "")
    at ec-heal.c:2032
#13 0x00007ff9896f16b3 in ec_heal_metadata (frame=0x7ff99383c034, 
    ec=0x7ff97c070fd0, inode=0x7ff980fe706c, sources=0x7ff98042cf10 "", 
    healed_sinks=0x7ff98042cef0 "") at ec-heal.c:2121
#14 0x00007ff9896f18c0 in ec_heal_do (this=<value optimized out>, 
    data=0x7ff983225780, loc=0x7ff9832259c8, partial=1) at ec-heal.c:3638
#15 0x00007ff9896f1c9d in ec_synctask_heal_wrap (
    opaque=<value optimized out>) at ec-heal.c:3683
#16 0x00007ff9982971f2 in synctask_wrap (old_task=<value optimized out>)
    at syncop.c:381
#17 0x00007ff9960208f0 in ?? () from /lib64/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb) 


Version-Release number of selected component (if applicable):
=============================================================
[root@transformers core]# gluster --version
glusterfs 3.7.1 built on Jun 28 2015 11:01:17
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@transformers core]# 

How reproducible:
=================
seen once

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
================
Attaching the core file.

Comment 2 Bhaskarakiran 2015-06-30 06:40:52 UTC
Created attachment 1044580 [details]
core file

Comment 3 Anuradha 2015-06-30 07:32:12 UTC
Patch sent for review:
https://code.engineering.redhat.com/gerrit/#/c/51934/1

Comment 6 Bhaskarakiran 2015-08-25 09:02:16 UTC
verified this on 3.7.1-12 build and didn't see the crash. Marking this as fixed.

Comment 8 errata-xmlrpc 2015-10-05 07:15:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html


Note You need to log in before you can comment on or make changes to this bug.