Description of problem: Self-Heal not complete for 30 days. and heal fail log (Client Side) command "find "" -exec file " to find broken files at mount point but, pending frames and signal received : 11 The message "W [MSGID: 122002] [ec-common.c:122:ec_heal_report] : Heal failed [Input/output error]" repeated 91 times between [2017-11-07 02:24:55.378087] and [2017-11-07 02:25:29.401347] pending frames: frame : type(1) op(READ) frame : type(1) op(OPEN) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(FLUSH) frame : type(1) op(FLUSH) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-11-07 02:25:31 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.1 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fd482105e92] /lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fd4821224ed] /lib64/libc.so.6(+0x35670)[0x7fd4807f4670] /lib64/libc.so.6(+0x147dc9)[0x7fd480906dc9] /usr/lib64/glusterfs/3.7.1/xlator/performance/quick-read.so(qr_readv_cached+0x119)[0x7fd46f7cd329] /usr/lib64/glusterfs/3.7.1/xlator/performance/quick-read.so(qr_readv+0x4a)[0x7fd46f7cd57a] /lib64/libglusterfs.so.0(default_readv_resume+0x13c)[0x7fd482116bec] /lib64/libglusterfs.so.0(call_resume_wind+0x242)[0x7fd482135b52] /lib64/libglusterfs.so.0(call_resume+0x7d)[0x7fd48213614d] /usr/lib64/glusterfs/3.7.1/xlator/performance/open-behind.so(open_and_resume+0xb8)[0x7fd46f5c3678] /usr/lib64/glusterfs/3.7.1/xlator/performance/open-behind.so(ob_readv+0x7f)[0x7fd46f5c588f] /usr/lib64/glusterfs/3.7.1/xlator/performance/md-cache.so(mdc_readv+0x157)[0x7fd46f3b63e7] /usr/lib64/glusterfs/3.7.1/xlator/debug/io-stats.so(io_stats_readv+0x171)[0x7fd46f19a8d1] /lib64/libglusterfs.so.0(default_readv+0x80)[0x7fd48210a510] /usr/lib64/glusterfs/3.7.1/xlator/meta.so(meta_readv+0x4e)[0x7fd46ef84ffe] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(fuse_readv_resume+0x224)[0x7fd478ce7664] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8a65)[0x7fd478cdfa65] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x87a8)[0x7fd478cdf7a8] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8aae)[0x7fd478cdfaae] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(fuse_resolve_continue+0x23)[0x7fd478cdf023] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8748)[0x7fd478cdf748] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x8a8e)[0x7fd478cdfa8e] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(fuse_resolve_and_resume+0x20)[0x7fd478cdfad0] /usr/lib64/glusterfs/3.7.1/xlator/mount/fuse.so(+0x1b6ce)[0x7fd478cf26ce] /lib64/libpthread.so.0(+0x7dc5)[0x7fd480f6edc5] /lib64/libc.so.6(clone+0x6d)[0x7fd4808b528d] Version-Release number of selected component (if applicable): CentOS Linux release 7.2.1511 (Core) glusterfs 3.7.1
Is this scenario reproducing? Looks like a segfault in qr_readv_cached function. Did u get a core dump/ can u share it? Could you generate a core and share the core file for further analysis?
(In reply to Sanoj Unnikrishnan from comment #1) > Is this scenario reproducing? > > Looks like a segfault in qr_readv_cached function. > Did u get a core dump/ can u share it? > Could you generate a core and share the core file for further analysis? (gdb) bt #0 __memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:1614 #1 0x00007f924b9dc329 in memcpy (__len=1576, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/s #2 qr_readv_cached (frame=frame@entry=0x7f925be7fb7c, qr_inode=0x7f92300c8110, size=size@entry=4096, offset=offset #3 0x00007f924b9dc57a in qr_readv (frame=0x7f925be7fb7c, this=0x7f924c0eb300, fd=0x7f923001cfa0, size=4096, offset #4 0x00007f925e36bbec in default_readv_resume (frame=0x7f925be685ec, this=0x7f924c0ec780, fd=0x7f923001cfa0, size= #5 0x00007f925e38ab52 in call_resume_wind (stub=<optimized out>) at call-stub.c:2118 #6 0x00007f925e38b14d in call_resume (stub=0x7f925b90b5a0) at call-stub.c:2576 #7 0x00007f924b7d2678 in open_and_resume (this=this@entry=0x7f924c0ec780, fd=fd@entry=0x7f923001cfa0, stub=stub@en #8 0x00007f924b7d488f in ob_readv (frame=0x7f925be685ec, this=0x7f924c0ec780, fd=<optimized out>, size=<optimized #9 0x00007f924b5c53e7 in mdc_readv (frame=0x7f925be8c1b0, this=0x7f924c0edb40, fd=0x7f923001d00c, size=4096, offse #10 0x00007f924b3a98d1 in io_stats_readv (frame=0x7f925be931e4, this=0x7f924c0eef60, fd=0x7f923001d00c, size=4096, #11 0x00007f925e35f510 in default_readv (frame=0x7f925be931e4, this=0x7f924c0f04c0, fd=0x7f923001d00c, size=4096, o #12 0x00007f924b193ffe in meta_readv (frame=0x7f925be931e4, this=0x7f924c0f04c0, fd=0x7f923001d00c, size=4096, offs #13 0x00007f9254f3c664 in fuse_readv_resume (state=0x7f9220135ce0) at fuse-bridge.c:2210 #14 0x00007f9254f34a65 in fuse_resolve_done (state=<optimized out>) at fuse-resolve.c:644 #15 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:671 #16 0x00007f9254f347a8 in fuse_resolve (state=0x7f9220135ce0) at fuse-resolve.c:635 #17 0x00007f9254f34aae in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:667 #18 0x00007f9254f34023 in fuse_resolve_continue (state=state@entry=0x7f9220135ce0) at fuse-resolve.c:687 #19 0x00007f9254f34748 in fuse_resolve_fd (state=0x7f9220135ce0) at fuse-resolve.c:547 #20 fuse_resolve (state=0x7f9220135ce0) at fuse-resolve.c:624 #21 0x00007f9254f34a8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:660 #22 0x00007f9254f34ad0 in fuse_resolve_and_resume (state=0x7f9220135ce0, fn=0x7f9254f3c440 <fuse_readv_resume>) at #23 0x00007f9254f476ce in fuse_thread_proc (data=0x7f925f003d50) at fuse-bridge.c:4903 #24 0x00007f925d1c3dc5 in start_thread (arg=0x7f922638a700) at pthread_create.c:308 #25 0x00007f925cb0a28d in getxattr () at ../sysdeps/unix/syscall-template.S:81 #26 0x0000000000000000 in ?? () (gdb) f 4 #4 0x00007f925e36bbec in default_readv_resume (frame=0x7f925be685ec, this=0x7f924c0ec780, fd=0x7f923001cfa0, size=4096, offset=0, flags=32768, xdata=0x0) at defaults.c:1405 1405 STACK_WIND (frame, default_readv_cbk, FIRST_CHILD(this), (gdb) list 1400 1401 int32_t 1402 default_readv_resume (call_frame_t *frame, xlator_t *this, fd_t *fd, 1403 size_t size, off_t offset, uint32_t flags, dict_t *xdata) 1404 { 1405 STACK_WIND (frame, default_readv_cbk, FIRST_CHILD(this), 1406 FIRST_CHILD(this)->fops->readv, fd, size, offset, flags, xdata); 1407 return 0; 1408 }
jhkim, I'd suggest you to upgrade to latest bits: 3.12.2 You seem to be using an old gluster release: 3.7.1 Let me know if the upgrade to 3.12.2 helps and then close the BZ appropriately.
Patch https://review.gluster.org/18146, which addresses the issue is available upstream with version 3.12.2