Bug 1632503

Summary: FUSE client segfaults when performance.md-cache-statfs is enabled for a volume
Product: [Community] GlusterFS Reporter: Stephen Muth <smuth4>
Component: fuseAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1CC: bugs, pasik, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-6.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-25 16:30:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stephen Muth 2018-09-24 22:38:00 UTC
Description of problem:

I recently tried to enable performance.md-cache-statfs for some testing, but every time I subject the FUSE mount to directory scans, the client ends up segfaulting.


Version-Release number of selected component (if applicable):
4.1.5-ubuntu1~xenial1 from the PPA for the client
4.1.5-ubuntu1~bionic1 from the PPA for the server
I was also able to reproduce this on a manual build of the client from the git master branch.


How reproducible:
I can consistently reproduce it with the steps below, although the time it takes to trigger is variable (e.g. it might happen in the middle of the 1st scan, or the 8th). I have not encountered any segfaults outside of when performance.md-cache-statfs is enabled.


Steps to Reproduce:
1. Enable performance.md-cache-statfs on a volume
`gluster volume set tank performance.md-cache-statfs on`
2. On the client, run the following command to put a little stress on the cache (there are about 8k files in various directories in /mnt/tank)
`for i in $(seq 1 10); do find /mnt/tank >/dev/null; done`


Actual results:
The client segfaults with the following info logged:
```
pending frames:
frame : type(1) op(STAT)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2018-09-24 21:02:40
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.1.5
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x2038a)[0x7fc54fb7538a]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x2e7)[0x7fc54fb7f0d7]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7fc54ef694b0]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_put+0x3e)[0x7fc54fb9e8ee]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/mount/fuse.so(+0x146aa)[0x7fc54d6106aa]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/debug/io-stats.so(+0x19071)[0x7fc548348071]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_statfs_cbk+0x13c)[0x7fc54fbf8c2c]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/performance/md-cache.so(+0x1471e)[0x7fc54878371e]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_statfs_resume+0x1e5)[0x7fc54fc160e5]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x75)[0x7fc54fb9a635]
/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/performance/io-threads.so(+0x5588)[0x7fc548565588]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fc54f3056ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc54f03b41d]
```
After the segfault, there are a cluster of `Transport endpoint is not connected` errors while the find commands continue running.


Expected results:
The command succeeds without error.


Additional info:
GDB stack trace, if that helps:
```
Thread 8 "glusteriotwr0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e3d700 (LWP 13880)]
0x00007ffff7b1d8ee in mem_put (ptr=0x7fffe43c2130) at mem-pool.c:870
870     mem-pool.c: No such file or directory.
(gdb) backtrace
#0  0x00007ffff7b1d8ee in mem_put (ptr=0x7fffe43c2130) at mem-pool.c:870
#1  0x00007ffff558f6aa in FRAME_DESTROY (frame=0x7fffe4415438) at ../../../../libglusterfs/src/stack.h:178
#2  STACK_DESTROY (stack=0x7fffe00079b8) at ../../../../libglusterfs/src/stack.h:198
#3  fuse_statfs_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=0, buf=<optimized out>, xdata=0x0)
    at fuse-bridge.c:3253
#4  0x00007ffff02c7071 in ?? () from /usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/debug/io-stats.so
#5  0x00007ffff7b77c2c in default_statfs_cbk (frame=0x7fffe0008518, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7fffec030d40, xdata=0x0)
    at defaults.c:1607
#6  0x00007ffff070271e in mdc_statfs (frame=frame@entry=0x7fffe4415438, this=<optimized out>, loc=loc@entry=0x7fffe0009488, xdata=xdata@entry=0x0) at md-cache.c:1084
#7  0x00007ffff7b950e5 in default_statfs_resume (frame=0x7fffe0008518, this=0x7fffec017920, loc=0x7fffe0009488, xdata=0x0) at defaults.c:2273
#8  0x00007ffff7b19635 in call_resume (stub=0x7fffe0009438) at call-stub.c:2689
#9  0x00007ffff04e4588 in iot_worker (data=0x7fffec02d5c0) at io-threads.c:231
#10 0x00007ffff72846ba in start_thread (arg=0x7ffff7e3d700) at pthread_create.c:333
#11 0x00007ffff6fba41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```

Volume info
```
Volume Name: tank
Type: Distribute
Volume ID: f801b0c4-c1c4-4d28-9ff0-3a2ba2eb1919
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: g1:/data/gluster/tank/brick-89f393fe/brick
Options Reconfigured:
performance.md-cache-statfs: on
nfs.disable: on
transport.address-family: inet
```
where /data/gluster/tank/brick-89f393fe is a ZFS mount.

Comment 1 Worker Ant 2019-01-10 19:17:17 UTC
REVIEW: https://review.gluster.org/22009 (performance/md-cache: Fix a crash when statfs caching is enabled) posted (#1) for review on master by Vijay Bellur

Comment 2 Worker Ant 2019-01-11 03:24:11 UTC
REVIEW: https://review.gluster.org/22009 (performance/md-cache: Fix a crash when statfs caching is enabled) posted (#3) for review on master by Raghavendra G

Comment 3 Shyamsundar 2019-03-25 16:30:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/