Bug 1384088

Summary: Ganesha crashes while doing IO when md-cache is enabled on the volume.
Product: Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED NOTABUG QA Contact: Shashank Raj <sraj>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: jthottan, kkeithle, ndevos, rhs-bugs, sashinde, skoduri, sraj, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-14 07:14:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shashank Raj 2016-10-12 13:46:18 UTC
Description of problem:

Ganesha crashes while doing IO when md-cache is enabled on the volume.

Version-Release number of selected component (if applicable):

using 3.2.0 ganesha bits with private build of md-cache related fixes (upgraded on top of 3.1.3 RHGS)

[root@dhcp43-92 ~]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.3-0.39.git97d1dde.el7.x86_64
nfs-ganesha-debuginfo-2.4.0-2.el7rhgs.x86_64
nfs-ganesha-2.4.0-2.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.0-2.el7rhgs.x86_64

[root@dhcp43-92 ~]# rpm -qa|grep glusterfs
glusterfs-libs-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-ganesha-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-cli-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-debuginfo-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-client-xlators-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-server-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-api-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-geo-replication-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-fuse-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-3.8.3-0.39.git97d1dde.el7.x86_64
glusterfs-rdma-3.8.3-0.39.git97d1dde.el7.x86_64

[root@dhcp43-92 ~]# rpm -qa|grep libntirpc
libntirpc-1.4.1-1.el7rhgs.x86_64

How reproducible:

3/3

Steps to Reproduce:
1.Create a ganesha cluster cluster and create a volume.
2.Enable ganesha on the volume and enable md-cache related parameters:

  # gluster volume set <volname> features.cache-invalidation on
  # gluster volume set <volname> features.cache-invalidation-timeout 600
  # gluster volume set <volname> performance.stat-prefetch on
  # gluster volume set <volname> performance.cache-invalidation on
  # gluster volume set <volname> performance.md-cache-timeout 600

3.Mount the volume on one client and creating some files. Observe that ganesha crashes on all nodes with segfault error and with following bt:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f28e82c8280 (LWP 11371)]
0x00007f290d9095f7 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f290d9095f7 in raise () from /lib64/libc.so.6
#1  0x00007f290d90ace8 in abort () from /lib64/libc.so.6
#2  0x00007f290d949327 in __libc_message () from /lib64/libc.so.6
#3  0x00007f290d951053 in _int_free () from /lib64/libc.so.6
#4  0x00007f290af76fe1 in GLUSTERFSAL_UP_Thread (Arg=0x7f29106dba50)
    at /usr/src/debug/nfs-ganesha-2.4.0/src/FSAL/FSAL_GLUSTER/fsal_up.c:225
#5  0x00007f290e2fddc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f290d9caced in clone () from /lib64/libc.so.6


Actual results:

Ganesha crashes while doing IO when md-cache is enabled on the volume.

Expected results:

There should not be any crashes.

Additional info:

core files will be attached.

Comment 2 Shashank Raj 2016-10-12 13:51:27 UTC
core files can be accessed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1384088

Comment 4 Shashank Raj 2016-10-14 07:14:52 UTC
With the latest provided private build [1] on top of 3.2.0, i don't see any crash with basic IO's. Hence closing this bug.

[1]: http://rhsqe-repo.lab.eng.blr.redhat.com/scratchbuilds/md-cache-october13/