1510268 – Self-Heal not complete and pending frames signal reviced : 11

Bug 1510268 - Self-Heal not complete and pending frames signal reviced : 11

Summary: Self-Heal not complete and pending frames signal reviced : 11

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	quick-read
Sub Component:
Version:	3.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-07 05:07 UTC by jhkim
Modified:	2018-04-02 11:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-04-02 11:36:19 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description jhkim 2017-11-07 05:07:05 UTC

Description of problem:

Self-Heal not complete for 30 days. and heal fail log (Client Side)
command "find "" -exec file " to find broken files at mount point
but, pending frames and signal received : 11 

The message "W [MSGID: 122002] [ec-common.c:122:ec_heal_report] : Heal failed [Input/output error]" repeated 91 times between [2017-11-07 02:24:55.378087] and [2017-11-07 02:25:29.401347]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-11-07 02:25:31
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.1
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fd482105e92]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fd4821224ed]
/lib64/libc.so.6(+0x35670)[0x7fd4807f4670]
/lib64/libc.so.6(+0x147dc9)[0x7fd480906dc9]
/usr/lib64/glusterfs/3.7.1/xlator/performance/quick-read.so(qr_readv_cached+0x119)[0x7fd46f7cd329]
/usr/lib64/glusterfs/3.7.1/xlator/performance/quick-read.so(qr_readv+0x4a)[0x7fd46f7cd57a]
/lib64/libglusterfs.so.0(default_readv_resume+0x13c)[0x7fd482116bec]
/lib64/libglusterfs.so.0(call_resume_wind+0x242)[0x7fd482135b52]
/lib64/libglusterfs.so.0(call_resume+0x7d)[0x7fd48213614d]
/usr/lib64/glusterfs/3.7.1/xlator/performance/open-behind.so(open_and_resume+0xb8)[0x7fd46f5c3678]
/usr/lib64/glusterfs/3.7.1/xlator/performance/open-behind.so(ob_readv+0x7f)[0x7fd46f5c588f]
/usr/lib64/glusterfs/3.7.1/xlator/performance/md-cache.so(mdc_readv+0x157)[0x7fd46f3b63e7]
/usr/lib64/glusterfs/3.7.1/xlator/debug/io-stats.so(io_stats_readv+0x171)[0x7fd46f19a8d1]
/lib64/libglusterfs.so.0(default_readv+0x80)[0x7fd48210a510]
/usr/lib64/glusterfs/3.7.1/xlator/meta.so(meta_readv+0x4e)[0x7fd46ef84ffe]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(fuse_readv_resume+0x224)[0x7fd478ce7664]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8a65)[0x7fd478cdfa65]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x87a8)[0x7fd478cdf7a8]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8aae)[0x7fd478cdfaae]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(fuse_resolve_continue+0x23)[0x7fd478cdf023]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8748)[0x7fd478cdf748]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8a8e)[0x7fd478cdfa8e]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(fuse_resolve_and_resume+0x20)[0x7fd478cdfad0]
/usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x1b6ce)[0x7fd478cf26ce]
/lib64/libpthread.so.0(+0x7dc5)[0x7fd480f6edc5]
/lib64/libc.so.6(clone+0x6d)[0x7fd4808b528d]


Version-Release number of selected component (if applicable):
CentOS Linux release 7.2.1511 (Core) 
glusterfs 3.7.1

Comment 1 Sanoj Unnikrishnan 2017-11-07 09:59:17 UTC

Is this scenario reproducing?

Looks like a segfault in qr_readv_cached function. 
Did u get a core dump/ can u share it? 
Could you generate a core and share the core file for further analysis?

Comment 2 jhkim 2017-11-07 10:16:39 UTC

(In reply to Sanoj Unnikrishnan from comment #1)
> Is this scenario reproducing?
> 
> Looks like a segfault in qr_readv_cached function. 
> Did u get a core dump/ can u share it? 
> Could you generate a core and share the core file for further analysis?

(gdb) bt
#0  __memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:1614
#1  0x00007f924b9dc329 in memcpy (__len=1576, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/s
#2  qr_readv_cached (frame=frame@entry=0x7f925be7fb7c, qr_inode=0x7f92300c8110, size=size@entry=4096, offset=offset
#3  0x00007f924b9dc57a in qr_readv (frame=0x7f925be7fb7c, this=0x7f924c0eb300, fd=0x7f923001cfa0, size=4096, offset
#4  0x00007f925e36bbec in default_readv_resume (frame=0x7f925be685ec, this=0x7f924c0ec780, fd=0x7f923001cfa0, size=
#5  0x00007f925e38ab52 in call_resume_wind (stub=<optimized out>) at call-stub.c:2118
#6  0x00007f925e38b14d in call_resume (stub=0x7f925b90b5a0) at call-stub.c:2576
#7  0x00007f924b7d2678 in open_and_resume (this=this@entry=0x7f924c0ec780, fd=fd@entry=0x7f923001cfa0, stub=stub@en
#8  0x00007f924b7d488f in ob_readv (frame=0x7f925be685ec, this=0x7f924c0ec780, fd=<optimized out>, size=<optimized
#9  0x00007f924b5c53e7 in mdc_readv (frame=0x7f925be8c1b0, this=0x7f924c0edb40, fd=0x7f923001d00c, size=4096, offse
#10 0x00007f924b3a98d1 in io_stats_readv (frame=0x7f925be931e4, this=0x7f924c0eef60, fd=0x7f923001d00c, size=4096,
#11 0x00007f925e35f510 in default_readv (frame=0x7f925be931e4, this=0x7f924c0f04c0, fd=0x7f923001d00c, size=4096, o
#12 0x00007f924b193ffe in meta_readv (frame=0x7f925be931e4, this=0x7f924c0f04c0, fd=0x7f923001d00c, size=4096, offs
#13 0x00007f9254f3c664 in fuse_readv_resume (state=0x7f9220135ce0) at fuse-bridge.c:2210
#14 0x00007f9254f34a65 in fuse_resolve_done (state=<optimized out>) at fuse-resolve.c:644
#15 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:671
#16 0x00007f9254f347a8 in fuse_resolve (state=0x7f9220135ce0) at fuse-resolve.c:635
#17 0x00007f9254f34aae in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:667
#18 0x00007f9254f34023 in fuse_resolve_continue (state=state@entry=0x7f9220135ce0) at fuse-resolve.c:687
#19 0x00007f9254f34748 in fuse_resolve_fd (state=0x7f9220135ce0) at fuse-resolve.c:547
#20 fuse_resolve (state=0x7f9220135ce0) at fuse-resolve.c:624
#21 0x00007f9254f34a8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:660
#22 0x00007f9254f34ad0 in fuse_resolve_and_resume (state=0x7f9220135ce0, fn=0x7f9254f3c440 <fuse_readv_resume>) at
#23 0x00007f9254f476ce in fuse_thread_proc (data=0x7f925f003d50) at fuse-bridge.c:4903
#24 0x00007f925d1c3dc5 in start_thread (arg=0x7f922638a700) at pthread_create.c:308
#25 0x00007f925cb0a28d in getxattr () at ../sysdeps/unix/syscall-template.S:81
#26 0x0000000000000000 in ?? ()
(gdb) f 4
#4  0x00007f925e36bbec in default_readv_resume (frame=0x7f925be685ec, this=0x7f924c0ec780, fd=0x7f923001cfa0, size=4096, offset=0, flags=32768, xdata=0x0) at defaults.c:1405
1405            STACK_WIND (frame, default_readv_cbk, FIRST_CHILD(this),
(gdb) list
1400
1401    int32_t
1402    default_readv_resume (call_frame_t *frame, xlator_t *this, fd_t *fd,
1403                          size_t size, off_t offset, uint32_t flags, dict_t *xdata)
1404    {
1405            STACK_WIND (frame, default_readv_cbk, FIRST_CHILD(this),
1406                        FIRST_CHILD(this)->fops->readv, fd, size, offset, flags, xdata);
1407            return 0;
1408    }

Comment 4 Milind Changire 2017-11-09 06:05:59 UTC

jhkim,
I'd suggest you to upgrade to latest bits: 3.12.2
You seem to be using an old gluster release: 3.7.1

Let me know if the upgrade to 3.12.2 helps and then close the BZ appropriately.

Comment 5 Milind Changire 2017-11-09 06:15:22 UTC

Patch https://review.gluster.org/18146, which addresses the issue is available upstream with version 3.12.2

Note You need to log in before you can comment on or make changes to this bug.