Bug 1603576

Summary: glusterfs dying with SIGSEGV
Product: [Community] GlusterFS Reporter: João Carlos Mendes Luís <redhat>
Component: coreAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 4.1CC: bugs, glenn.brekke, moagrawa, rhb1
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-20 04:53:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description João Carlos Mendes Luís 2018-07-19 16:21:39 UTC
Description of problem:

After mounting gluster fuse, it crashes.


Version-Release number of selected component (if applicable):

glsuterfs 4.1.1-1.el7.x86_64 from CentOS repo
CentOS 7.5

How reproducible:

# mount -t glusterfs -o noatime,nodev,nosuid 127.0.0.1:/vol0 /mnt/gluster/vol0
# df
# df

The first df goes well, the second says 'Transport endpoint is not connected".  At this point there's a new coredump at system root.



Additional info:

This started happening after an upgrade from 4.0.  I'm trying to identify if it is a bug before reinstalling from scratch.


Core analysis from gdb:

Program terminated with signal 11, Segmentation fault.
#0  mem_put (ptr=0x7f658800c060) at mem-pool.c:870
870             GF_ATOMIC_DEC (hdr->pool->active);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-11.el7.x86_64 libgcc-4.8.5-28.el7.x86_64 libselinux-2.5-12.el7.x86_64 libuuid-2.23.2-52.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) where
#0  mem_put (ptr=0x7f658800c060) at mem-pool.c:870
#1  0x00007f65a40d038a in FRAME_DESTROY (frame=0x7f6588001798) at ../../../../libglusterfs/src/stack.h:178
#2  STACK_DESTROY (stack=0x7f6588000f88) at ../../../../libglusterfs/src/stack.h:198
#3  fuse_statfs_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=0, 
    buf=<optimized out>, xdata=0x0) at fuse-bridge.c:3253
#4  0x00007f659dece270 in io_stats_statfs_cbk (frame=0x7f6588000de8, cookie=<optimized out>, this=<optimized out>, op_ret=0, 
    op_errno=0, buf=0x7f659802fa30, xdata=0x0) at io-stats.c:2413
#5  0x00007f659e0fa76d in mdc_statfs (frame=frame@entry=0x7f6588001798, this=<optimized out>, loc=loc@entry=0x7f65880099e0, 
    xdata=xdata@entry=0x0) at md-cache.c:1084
#6  0x00007f659debc093 in io_stats_statfs (frame=frame@entry=0x7f6588000de8, this=this@entry=0x7f65980207c0, 
    loc=loc@entry=0x7f65880099e0, xdata=xdata@entry=0x0) at io-stats.c:3030
#7  0x00007f65acd7f1c7 in default_statfs (frame=frame@entry=0x7f6588000de8, this=this@entry=0x7f65980226b0, 
    loc=loc@entry=0x7f65880099e0, xdata=0x0) at defaults.c:3087
#8  0x00007f65a40d0044 in fuse_statfs_resume (state=0x7f65880099c0) at fuse-bridge.c:3275
#9  0x00007f65a40c2b45 in fuse_resolve_done (state=<optimized out>) at fuse-resolve.c:663
#10 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:690
#11 0x00007f65a40c2858 in fuse_resolve (state=0x7f65880099c0) at fuse-resolve.c:654
#12 0x00007f65a40c2b8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:686
#13 0x00007f65a40c1e23 in fuse_resolve_continue (state=0x7f65880099c0) at fuse-resolve.c:706
#14 0x00007f65a40c2545 in fuse_resolve_inode (state=0x7f65880099c0) at fuse-resolve.c:364
#15 0x00007f65a40c2a9d in fuse_resolve (state=0x7f65880099c0) at fuse-resolve.c:651
#16 0x00007f65a40c2b6e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#17 0x00007f65a40c2bb0 in fuse_resolve_and_resume (state=0x7f65880099c0, fn=0x7f65a40cfd80 <fuse_statfs_resume>) at fuse-resolve.c:718
#18 0x00007f65a40da6da in fuse_thread_proc (data=0x5628ef2537b0) at fuse-bridge.c:5178
#19 0x00007f65abb4edd5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f65ab417b3d in clone () from /lib64/libc.so.6
(gdb) 



From mnt-gluster-vol0.log:


[2018-07-19 16:00:09.164128] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol0-client-2: error returned while attempting to connect to host:(nu
ll), port:0
[2018-07-19 16:00:09.164334] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-vol0-client-2: changing port to 49155 (from 0)
[2018-07-19 16:00:09.164646] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol0-client-2: error returned while attempting to connect to host:(nu
ll), port:0
[2018-07-19 16:00:09.164865] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol0-client-2: error returned while attempting to connect to host:(nu
ll), port:0
[2018-07-19 16:00:09.165061] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-
0, attached to remote volume '/gfs/brick'.
[2018-07-19 16:00:09.165092] I [MSGID: 108005] [afr-common.c:5227:__afr_handle_child_up_event] 0-vol0-replicate-0: Subvolume 'vol0-clien
t-0' came back up; going online.
[2018-07-19 16:00:09.166629] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-vol0-client-2: Connected to vol0-client-
2, attached to remote volume '/gfs/brick'.
[2018-07-19 16:00:09.166650] I [MSGID: 108002] [afr-common.c:5502:afr_notify] 0-vol0-replicate-0: Client-quorum is met
[2018-07-19 16:00:09.168549] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kerne
l 7.22
[2018-07-19 16:00:09.168578] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-07-19 16:00:09.170042] I [MSGID: 108031] [afr-common.c:2580:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_chil
d vol0-client-0
[2018-07-19 16:00:09.170430] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-vol0-dht: Directory selfheal failed: Unabl
e to form layout for directory /


------> The crash happens here


pending frames:
frame : type(1) op(OPENDIR)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-07-19 16:00:13

configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.1
/lib64/libglusterfs.so.0(+0x25920)[0x7f82291f9920]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f8229203874]
/lib64/libc.so.6(+0x36280)[0x7f822785e280]
/lib64/libglusterfs.so.0(mem_put+0x4c)[0x7f822922514c]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x1538a)[0x7f82205df38a]
/usr/lib64/glusterfs/4.1.1/xlator/debug/io-stats.so(+0x1a270)[0x7f821a3dd270]
/usr/lib64/glusterfs/4.1.1/xlator/performance/md-cache.so(+0x1576d)[0x7f821a60976d]
/usr/lib64/glusterfs/4.1.1/xlator/debug/io-stats.so(+0x8093)[0x7f821a3cb093]
/lib64/libglusterfs.so.0(default_statfs+0xd7)[0x7f822928e1c7]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x15044)[0x7f82205df044]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7b45)[0x7f82205d1b45]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7858)[0x7f82205d1858]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7b8e)[0x7f82205d1b8e]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x6e23)[0x7f82205d0e23]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7545)[0x7f82205d1545]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7a9d)[0x7f82205d1a9d]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7b6e)[0x7f82205d1b6e]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7bb0)[0x7f82205d1bb0]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x1f6da)[0x7f82205e96da]
/lib64/libpthread.so.0(+0x7dd5)[0x7f822805ddd5]
/lib64/libc.so.6(clone+0x6d)[0x7f8227926b3d]
---------

Comment 1 Glenn Brekke 2018-10-01 14:41:19 UTC
Experienced similar type of error on a RHEL7.5 host;

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-09-27 11:33:06
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.1
/lib64/libglusterfs.so.0(+0x25920)[0x7faf0b1ef920]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7faf0b1f9874]
/lib64/libc.so.6(+0x36280)[0x7faf09854280]
/usr/lib64/glusterfs/4.1.1/xlator/cluster/replicate.so(+0x3d86a)[0x7faefd1f586a]
/usr/lib64/glusterfs/4.1.1/xlator/protocol/client.so(+0x74a9f)[0x7faefd4c1a9f]
/lib64/libgfrpc.so.0(+0xec20)[0x7faf0afbcc20]
/lib64/libgfrpc.so.0(+0xefb3)[0x7faf0afbcfb3]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7faf0afb8e93]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0x7626)[0x7faeffba3626]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0xa0f7)[0x7faeffba60f7]
/lib64/libglusterfs.so.0(+0x89094)[0x7faf0b253094]
/lib64/libpthread.so.0(+0x7dd5)[0x7faf0a053dd5]
/lib64/libc.so.6(clone+0x6d)[0x7faf0991cb3d]

Installed GlusterFS-packages;

glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-4.1.1-1.el7.x86_64

Not been able to reproduce error situation.

statfs() calls produced "Transport endpoint is not connected" error messages as a result.

Comment 2 Glenn Brekke 2018-10-25 11:20:26 UTC
Experienced same kind of issue today, this time on a different host, but it's related to same replicated Gluster-volume.

From «gluster-volume».log;

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FLUSH)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-10-25 10:37:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.1
/lib64/libglusterfs.so.0(+0x25920)[0x7f85ec890920]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f85ec89a874]
/lib64/libc.so.6(+0x36280)[0x7f85eaef5280]
/usr/lib64/glusterfs/4.1.1/xlator/cluster/replicate.so(+0x3d86a)[0x7f85de89686a]
/usr/lib64/glusterfs/4.1.1/xlator/protocol/client.so(+0x74a9f)[0x7f85deb62a9f]
/lib64/libgfrpc.so.0(+0xec20)[0x7f85ec65dc20]
/lib64/libgfrpc.so.0(+0xefb3)[0x7f85ec65dfb3]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f85ec659e93]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0x7626)[0x7f85e1244626]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0xa0f7)[0x7f85e12470f7]
/lib64/libglusterfs.so.0(+0x89094)[0x7f85ec8f4094]
/lib64/libpthread.so.0(+0x7dd5)[0x7f85eb6f4dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f85eafbdb3d]

Installed GlusterFS-packages;

glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-4.1.1-1.el7.x86_64

Not been able to reproduce this error situation yet.

Comment 3 Amar Tumballi 2019-06-18 08:43:21 UTC
Glenn, Carlos, apologies for delay in getting to this. Can you upgrade to glusterfs-6.2 and above? And see if the issue is still happening?

Comment 4 Mohit Agrawal 2020-02-20 04:53:44 UTC
Glenn, Carlos

For last 6 months, there is no update on the bug.
Please let us know if you are still facing the issue after upgrade on the latest release-6.
For now, I am closing the bug, please reopen it if you face the issue again.

Thanks,
Mohit Agrawal