Bug 1603576 - glusterfs dying with SIGSEGV
Summary: glusterfs dying with SIGSEGV
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 4.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-19 16:21 UTC by João Carlos Mendes Luís
Modified: 2020-02-20 04:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-20 04:53:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description João Carlos Mendes Luís 2018-07-19 16:21:39 UTC
Description of problem:

After mounting gluster fuse, it crashes.


Version-Release number of selected component (if applicable):

glsuterfs 4.1.1-1.el7.x86_64 from CentOS repo
CentOS 7.5

How reproducible:

# mount -t glusterfs -o noatime,nodev,nosuid 127.0.0.1:/vol0 /mnt/gluster/vol0
# df
# df

The first df goes well, the second says 'Transport endpoint is not connected".  At this point there's a new coredump at system root.



Additional info:

This started happening after an upgrade from 4.0.  I'm trying to identify if it is a bug before reinstalling from scratch.


Core analysis from gdb:

Program terminated with signal 11, Segmentation fault.
#0  mem_put (ptr=0x7f658800c060) at mem-pool.c:870
870             GF_ATOMIC_DEC (hdr->pool->active);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-11.el7.x86_64 libgcc-4.8.5-28.el7.x86_64 libselinux-2.5-12.el7.x86_64 libuuid-2.23.2-52.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) where
#0  mem_put (ptr=0x7f658800c060) at mem-pool.c:870
#1  0x00007f65a40d038a in FRAME_DESTROY (frame=0x7f6588001798) at ../../../../libglusterfs/src/stack.h:178
#2  STACK_DESTROY (stack=0x7f6588000f88) at ../../../../libglusterfs/src/stack.h:198
#3  fuse_statfs_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=0, 
    buf=<optimized out>, xdata=0x0) at fuse-bridge.c:3253
#4  0x00007f659dece270 in io_stats_statfs_cbk (frame=0x7f6588000de8, cookie=<optimized out>, this=<optimized out>, op_ret=0, 
    op_errno=0, buf=0x7f659802fa30, xdata=0x0) at io-stats.c:2413
#5  0x00007f659e0fa76d in mdc_statfs (frame=frame@entry=0x7f6588001798, this=<optimized out>, loc=loc@entry=0x7f65880099e0, 
    xdata=xdata@entry=0x0) at md-cache.c:1084
#6  0x00007f659debc093 in io_stats_statfs (frame=frame@entry=0x7f6588000de8, this=this@entry=0x7f65980207c0, 
    loc=loc@entry=0x7f65880099e0, xdata=xdata@entry=0x0) at io-stats.c:3030
#7  0x00007f65acd7f1c7 in default_statfs (frame=frame@entry=0x7f6588000de8, this=this@entry=0x7f65980226b0, 
    loc=loc@entry=0x7f65880099e0, xdata=0x0) at defaults.c:3087
#8  0x00007f65a40d0044 in fuse_statfs_resume (state=0x7f65880099c0) at fuse-bridge.c:3275
#9  0x00007f65a40c2b45 in fuse_resolve_done (state=<optimized out>) at fuse-resolve.c:663
#10 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:690
#11 0x00007f65a40c2858 in fuse_resolve (state=0x7f65880099c0) at fuse-resolve.c:654
#12 0x00007f65a40c2b8e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:686
#13 0x00007f65a40c1e23 in fuse_resolve_continue (state=0x7f65880099c0) at fuse-resolve.c:706
#14 0x00007f65a40c2545 in fuse_resolve_inode (state=0x7f65880099c0) at fuse-resolve.c:364
#15 0x00007f65a40c2a9d in fuse_resolve (state=0x7f65880099c0) at fuse-resolve.c:651
#16 0x00007f65a40c2b6e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:679
#17 0x00007f65a40c2bb0 in fuse_resolve_and_resume (state=0x7f65880099c0, fn=0x7f65a40cfd80 <fuse_statfs_resume>) at fuse-resolve.c:718
#18 0x00007f65a40da6da in fuse_thread_proc (data=0x5628ef2537b0) at fuse-bridge.c:5178
#19 0x00007f65abb4edd5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f65ab417b3d in clone () from /lib64/libc.so.6
(gdb) 



From mnt-gluster-vol0.log:


[2018-07-19 16:00:09.164128] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol0-client-2: error returned while attempting to connect to host:(nu
ll), port:0
[2018-07-19 16:00:09.164334] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-vol0-client-2: changing port to 49155 (from 0)
[2018-07-19 16:00:09.164646] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol0-client-2: error returned while attempting to connect to host:(nu
ll), port:0
[2018-07-19 16:00:09.164865] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol0-client-2: error returned while attempting to connect to host:(nu
ll), port:0
[2018-07-19 16:00:09.165061] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-
0, attached to remote volume '/gfs/brick'.
[2018-07-19 16:00:09.165092] I [MSGID: 108005] [afr-common.c:5227:__afr_handle_child_up_event] 0-vol0-replicate-0: Subvolume 'vol0-clien
t-0' came back up; going online.
[2018-07-19 16:00:09.166629] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-vol0-client-2: Connected to vol0-client-
2, attached to remote volume '/gfs/brick'.
[2018-07-19 16:00:09.166650] I [MSGID: 108002] [afr-common.c:5502:afr_notify] 0-vol0-replicate-0: Client-quorum is met
[2018-07-19 16:00:09.168549] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kerne
l 7.22
[2018-07-19 16:00:09.168578] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-07-19 16:00:09.170042] I [MSGID: 108031] [afr-common.c:2580:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_chil
d vol0-client-0
[2018-07-19 16:00:09.170430] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-vol0-dht: Directory selfheal failed: Unabl
e to form layout for directory /


------> The crash happens here


pending frames:
frame : type(1) op(OPENDIR)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-07-19 16:00:13

configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.1
/lib64/libglusterfs.so.0(+0x25920)[0x7f82291f9920]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f8229203874]
/lib64/libc.so.6(+0x36280)[0x7f822785e280]
/lib64/libglusterfs.so.0(mem_put+0x4c)[0x7f822922514c]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x1538a)[0x7f82205df38a]
/usr/lib64/glusterfs/4.1.1/xlator/debug/io-stats.so(+0x1a270)[0x7f821a3dd270]
/usr/lib64/glusterfs/4.1.1/xlator/performance/md-cache.so(+0x1576d)[0x7f821a60976d]
/usr/lib64/glusterfs/4.1.1/xlator/debug/io-stats.so(+0x8093)[0x7f821a3cb093]
/lib64/libglusterfs.so.0(default_statfs+0xd7)[0x7f822928e1c7]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x15044)[0x7f82205df044]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7b45)[0x7f82205d1b45]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7858)[0x7f82205d1858]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7b8e)[0x7f82205d1b8e]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x6e23)[0x7f82205d0e23]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7545)[0x7f82205d1545]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7a9d)[0x7f82205d1a9d]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7b6e)[0x7f82205d1b6e]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x7bb0)[0x7f82205d1bb0]
/usr/lib64/glusterfs/4.1.1/xlator/mount/fuse.so(+0x1f6da)[0x7f82205e96da]
/lib64/libpthread.so.0(+0x7dd5)[0x7f822805ddd5]
/lib64/libc.so.6(clone+0x6d)[0x7f8227926b3d]
---------

Comment 1 Glenn Brekke 2018-10-01 14:41:19 UTC
Experienced similar type of error on a RHEL7.5 host;

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-09-27 11:33:06
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.1
/lib64/libglusterfs.so.0(+0x25920)[0x7faf0b1ef920]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7faf0b1f9874]
/lib64/libc.so.6(+0x36280)[0x7faf09854280]
/usr/lib64/glusterfs/4.1.1/xlator/cluster/replicate.so(+0x3d86a)[0x7faefd1f586a]
/usr/lib64/glusterfs/4.1.1/xlator/protocol/client.so(+0x74a9f)[0x7faefd4c1a9f]
/lib64/libgfrpc.so.0(+0xec20)[0x7faf0afbcc20]
/lib64/libgfrpc.so.0(+0xefb3)[0x7faf0afbcfb3]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7faf0afb8e93]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0x7626)[0x7faeffba3626]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0xa0f7)[0x7faeffba60f7]
/lib64/libglusterfs.so.0(+0x89094)[0x7faf0b253094]
/lib64/libpthread.so.0(+0x7dd5)[0x7faf0a053dd5]
/lib64/libc.so.6(clone+0x6d)[0x7faf0991cb3d]

Installed GlusterFS-packages;

glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-4.1.1-1.el7.x86_64

Not been able to reproduce error situation.

statfs() calls produced "Transport endpoint is not connected" error messages as a result.

Comment 2 Glenn Brekke 2018-10-25 11:20:26 UTC
Experienced same kind of issue today, this time on a different host, but it's related to same replicated Gluster-volume.

From «gluster-volume».log;

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FLUSH)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2018-10-25 10:37:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.1
/lib64/libglusterfs.so.0(+0x25920)[0x7f85ec890920]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f85ec89a874]
/lib64/libc.so.6(+0x36280)[0x7f85eaef5280]
/usr/lib64/glusterfs/4.1.1/xlator/cluster/replicate.so(+0x3d86a)[0x7f85de89686a]
/usr/lib64/glusterfs/4.1.1/xlator/protocol/client.so(+0x74a9f)[0x7f85deb62a9f]
/lib64/libgfrpc.so.0(+0xec20)[0x7f85ec65dc20]
/lib64/libgfrpc.so.0(+0xefb3)[0x7f85ec65dfb3]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f85ec659e93]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0x7626)[0x7f85e1244626]
/usr/lib64/glusterfs/4.1.1/rpc-transport/socket.so(+0xa0f7)[0x7f85e12470f7]
/lib64/libglusterfs.so.0(+0x89094)[0x7f85ec8f4094]
/lib64/libpthread.so.0(+0x7dd5)[0x7f85eb6f4dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f85eafbdb3d]

Installed GlusterFS-packages;

glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-4.1.1-1.el7.x86_64

Not been able to reproduce this error situation yet.

Comment 3 Amar Tumballi 2019-06-18 08:43:21 UTC
Glenn, Carlos, apologies for delay in getting to this. Can you upgrade to glusterfs-6.2 and above? And see if the issue is still happening?

Comment 4 Mohit Agrawal 2020-02-20 04:53:44 UTC
Glenn, Carlos

For last 6 months, there is no update on the bug.
Please let us know if you are still facing the issue after upgrade on the latest release-6.
For now, I am closing the bug, please reopen it if you face the issue again.

Thanks,
Mohit Agrawal


Note You need to log in before you can comment on or make changes to this bug.