Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1362621

Summary: Ganesha crashes during multithreaded reads on v3 mounts
Product: [Community] GlusterFS Reporter: Ambarish <asoman>
Component: ganesha-nfsAssignee: Soumya Koduri <skoduri>
Status: CLOSED DUPLICATE QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.8.1CC: asoman, bugs, jthottan, kkeithle, ndevos, skoduri, sraj
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-04 08:15:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ambarish 2016-08-02 16:38:06 UTC
Description of problem:
-----------------------

Ganesha crashed on 2/4 nodes during multithreaded Iozone reads from 4 clients and 16 threads.

Exact Workload : iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16

The same issue is reproducible once you create files on the mount point using smallfile tool and try reading them in a multithreaded-distributed way.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

[root@gqas015 ~]# rpm -qa|grep ganesha

glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-debuginfo-2.4.0-0.14dev26.el7.centos.x86_64


How reproducible:
-----------------

2/4

Steps to Reproduce:
-------------------

1.  Setup consisted of 4 clients,4 servers.Mount gluster volume via v3.Each server mounts from 1 client.

2.  Run multithreaded iozone sequential writes in a distributed way.

iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16

3 . Try running seq reads the same way

iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16

Actual results:
---------------

Ganesha crashed on 2/4 nodes

Expected results:
----------------

Ganesha should not crash.

Additional info:
----------------

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 9e8d9c1a-33da-4645-a6ad-630df25cb654
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
ganesha.enable: on
features.cache-invalidation: off
nfs.disable: on
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas015 ~]#

Comment 2 Soumya Koduri 2016-08-03 07:02:05 UTC
I see nfs-ganesha re-exporting the volume -

02/08/2016 06:28:15 : epoch 57a071e6 : gqas014.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-19047[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Operation not supported
02/08/2016 06:28:24 : epoch 57a071e6 : gqas014.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-19047[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvol exported at : '/'

Aug  2 06:28:55 gqas014 kernel: ganesha.nfsd[19062]: segfault at 7f2484946084 ip 00007f24bfb92210 sp 00007f24b4f94428 error 6 in libpthread-2.17.so[7f24bfb86000+16000]
Aug  2 06:28:57 gqas014 systemd: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV
Aug  2 06:28:57 gqas014 systemd: Unit nfs-ganesha.service entered failed state.
Aug  2 06:28:57 gqas014 systemd: nfs-ganesha.service failed.

And around the same time crash happened. So somehow volume is being re-exported resulting in crash which is being addressed as part of bug1361520 . Jiffin is building RPMs with the fix applied. Please re-test post that.

Comment 4 Jiffin 2016-08-04 05:48:45 UTC
If issue is not present in new rpms, Can you please close this bug as duplicate of BZ1361520.