+++ This bug was initially created as a clone of Bug #1386097 +++ +++ This bug was initially created as a clone of Bug #1385606 +++ Description of problem: ========================= I saw that 4 of the 8 bricks in a distrepvol crashed simultaneously. The four bricks are part of 2 dht subvols (complete dht subvol 1 and 2 ) (IO : for more information on exact IO refer to the work-sheet "IOs" in the google doc shared) On the server side the below actions were running(in screen sessions): 1) heal info --xml ====>for last 4 days for which I am waiting for o/p (refer BZ#1382686) 2) healing is going on from about 3 hours back as bricks were brought online after about a week 3)snapshot is scheduled every 1 hour Client side IO: ============= from 4 clients : lookups using ls -lRt for the same 4 clients: symlinks were being created for same target directories From another 4 clients: 2 clients are creating same directory structure , while other 2 are renaming directories from another 2 client2: append to same file Backtrace using gdb: (gdb) bt #0 0x00007fe1e15cbab4 in vfprintf () from /lib64/libc.so.6 #1 0x00007fe1e168ee25 in __vsnprintf_chk () from /lib64/libc.so.6 #2 0x00007fe1e2ef9598 in vsnprintf (__ap=0x7fe152cf8a70, __fmt=<optimized out>, __n=0, __s=0x0) at /usr/include/bits/stdio2.h:77 #3 gf_vasprintf (string_ptr=string_ptr@entry=0x7fe152cf8b78, format=format@entry=0x7fe1d52c01b0 "op=%s;path=%s;error=%s;brick=%s:%s", arg=arg@entry=0x7fe152cf8b90) at mem-pool.c:219 #4 0x00007fe1e2f482da in gf_event (event=event@entry=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=fmt@entry=0x7fe1d52c01b0 "op=%s;path=%s;error=%s;brick=%s:%s") at events.c:84 #5 0x00007fe1d52b7660 in posix_fs_health_check (this=this@entry=0x7fe1d0006dd0) at posix-helpers.c:1795 #6 0x00007fe1d52b77e4 in posix_health_check_thread_proc (data=0x7fe1d0006dd0) at posix-helpers.c:1833 #7 0x00007fe1e1d34dc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007fe1e1679ced in clone () from /lib64/libc.so.6 Probable Root cause(based on initial findings by Raghavendra Gowdappa) =================================================================== op_errno is supposed to be an integer but is being assigned a string 1794 "%s() on %s returned", op, file_path); 1795 gf_event (EVENT_POSIX_HEALTH_CHECK_FAILED, 1796 "op=%s;path=%s;error=%s;brick=%s:%s", op, file_path, 1797 op_errno, priv->hostname, priv->base_path); 1798 } Brick logs: ============== tat on parent /rhs/brick1/distrepvol/rootdir1/symlink failed [Input/output error] [2016-10-17 11:26:48.124026] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink [Input/output error] [2016-10-17 11:26:48.124037] E [MSGID: 113018] [posix.c:237:posix_lookup] 0-distrepvol-posix: post-operation lstat on parent /rhs/brick1/distrepvol/rootdir1/symlink failed [Input/output error] [2016-10-17 11:26:48.124051] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-distrepvol-server: 2701980: LOOKUP /rootdir1/symlink/file.559585 (603542a5-8221-4bde-8869-09f0167ecb80/file.559585) ==> (Input/output error) [Input/output error] [2016-10-17 11:26:48.124075] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-distrepvol-server: 2653096: LOOKUP /rootdir1/symlink/file.561231 (603542a5-8221-4bde-8869-09f0167ecb80/file.561231) ==> (Input/output error) [Input/output error] [2016-10-17 11:26:48.124326] W [MSGID: 113075] [posix-helpers.c:1794:posix_fs_health_check] 0-distrepvol-posix: open() on /rhs/brick1/distrepvol/.glusterfs/health_check returned [Input/output error] [2016-10-17 11:26:48.124523] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink/file.520670 [Input/output error] [2016-10-17 11:26:48.124549] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink/file.520670 [Input/output error] [2016-10-17 11:26:48.124550] W [MSGID: 113018] [posix.c:199:posix_lookup] 0-distrepvol-posix: lstat on /rhs/brick1/distrepvol/rootdir1/symlink/file.520670 failed [Input/output error] [2016-10-17 11:26:48.124568] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink [Input/output error] [2016-10-17 11:26:48.124593] E [MSGID: 113018] [posix.c:237:posix_lookup] 0-distrepvol-posix: post-operation lstat on parent /rhs/brick1/distrepvol/rootdir1/symlink failed [Input/output error] pending frames: frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) patchset: git://git.gluster.com/glusterfs.git [2016-10-17 11:26:48.124625] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-distrepvol-server: 2757018: LOOKUP /rootdir1/symlink/file.520670 (603542a5-8221-4bde-8869-09f0167ecb80/file.520670) ==> (Input/output error) [Input/output error] [2016-10-17 11:26:48.124978] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink/file.561236 [Input/output error] [2016-10-17 11:26:48.125012] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink/file.561236 [Input/output error] [2016-10-17 11:26:48.125025] W [MSGID: 113018] [posix.c:199:posix_lookup] 0-distrepvol-posix: lstat on /rhs/brick1/distrepvol/rootdir1/symlink/file.561236 failed [Input/output error] [2016-10-17 11:26:48.125035] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink [Input/output error] [2016-10-17 11:26:48.125042] E [MSGID: 113018] [posix.c:237:posix_lookup] 0-distrepvol-posix: post-operation lstat on parent /rhs/brick1/distrepvol/rootdir1/symlink failed [Input/output error] [2016-10-17 11:26:48.125057] W [MSGID: 113018] [posix-helpers.c:667:posix_pstat] 0-distrepvol-posix: lstat failed on /rhs/brick1/distrepvol/rootdir1/symlink/file.559585 [Input/output error] signal received: 11 time of crash: 2016-10-17 11:26:48 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f9e28ae3832] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f9e28aed2c4] /lib64/libc.so.6(+0x35670)[0x7f9e271c8670] /lib64/libc.so.6(_IO_vfprintf+0x1564)[0x7f9e271dbab4] /lib64/libc.so.6(__vsnprintf_chk+0x95)[0x7f9e2729ee25] /lib64/libglusterfs.so.0(gf_vasprintf+0x68)[0x7f9e28b09598] /lib64/libglusterfs.so.0(gf_event+0x1aa)[0x7f9e28b582da] /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so(+0x29660)[0x7f9e1aec7660] /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so(+0x297e4)[0x7f9e1aec77e4] /lib64/libpthread.so.0(+0x7dc5)[0x7f9e27944dc5] /lib64/libc.so.6(clone+0x6d)[0x7f9e27289ced] --------- --- Additional comment from Worker Ant on 2016-10-18 06:55:48 EDT --- REVIEW: http://review.gluster.org/15671 (events: Add FMT_WARN for gf_event) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-10-18 14:26:53 EDT --- REVIEW: http://review.gluster.org/15671 (events: Add FMT_WARN for gf_event) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-09 11:11:41 EST --- REVIEW: http://review.gluster.org/15671 (events: Add FMT_WARN for gf_event) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-18 04:56:43 EST --- COMMIT: http://review.gluster.org/15671 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 5310be8838f8db748a698bd3a98f8d00a4114e65 Author: Pranith Kumar K <pkarampu> Date: Tue Oct 18 15:16:17 2016 +0530 events: Add FMT_WARN for gf_event Raghavendra G found that posix is trying to print %s but passing an int when HEALTH_CHECK fails in posix. These are the kind of bugs that should be caught at compilation itself. Also fixed the problematic gf_event() callers. BUG: 1386097 Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15671 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: http://review.gluster.org/15884 (events: Add FMT_WARN for gf_event) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/15884 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 66f5c8a6f06c389cfb8b845254d3033f2b22801a Author: Pranith Kumar K <pkarampu> Date: Tue Oct 18 15:16:17 2016 +0530 events: Add FMT_WARN for gf_event Raghavendra G found that posix is trying to print %s but passing an int when HEALTH_CHECK fails in posix. These are the kind of bugs that should be caught at compilation itself. Also fixed the problematic gf_event() callers. >BUG: 1386097 >Change-Id: Id7bd6d9a9690237cec3ca1aefa2aac085e8a1270 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/15671 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Atin Mukherjee <amukherj> >CentOS-regression: Gluster Build System <jenkins.org> BUG: 1396778 Change-Id: Idf8e1f427578d02dccd2a8165884a5cf086eb07e Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15884 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/