1362621 – Ganesha crashes during multithreaded reads on v3 mounts

Bug 1362621 - Ganesha crashes during multithreaded reads on v3 mounts

Summary: Ganesha crashes during multithreaded reads on v3 mounts

Keywords:
Status:	CLOSED DUPLICATE of bug 1361520
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	ganesha-nfs
Sub Component:
Version:	3.8.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Soumya Koduri
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-02 16:38 UTC by Ambarish
Modified:	2016-08-04 08:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-08-04 08:15:39 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1361520	0	unspecified	CLOSED	ganesha crashes with segfault while mounting volume with v3 or v4.	2021-02-22 00:41:40 UTC

Internal Links: 1361520

Description Ambarish 2016-08-02 16:38:06 UTC

Description of problem:
-----------------------

Ganesha crashed on 2/4 nodes during multithreaded Iozone reads from 4 clients and 16 threads.

Exact Workload : iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16

The same issue is reproducible once you create files on the mount point using smallfile tool and try reading them in a multithreaded-distributed way.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

[root@gqas015 ~]# rpm -qa|grep ganesha

glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-debuginfo-2.4.0-0.14dev26.el7.centos.x86_64


How reproducible:
-----------------

2/4

Steps to Reproduce:
-------------------

1.  Setup consisted of 4 clients,4 servers.Mount gluster volume via v3.Each server mounts from 1 client.

2.  Run multithreaded iozone sequential writes in a distributed way.

iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16

3 . Try running seq reads the same way

iozone -+m <<config file> -+h <hostname> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16

Actual results:
---------------

Ganesha crashed on 2/4 nodes

Expected results:
----------------

Ganesha should not crash.

Additional info:
----------------

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 9e8d9c1a-33da-4645-a6ad-630df25cb654
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
ganesha.enable: on
features.cache-invalidation: off
nfs.disable: on
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas015 ~]#

Comment 2 Soumya Koduri 2016-08-03 07:02:05 UTC

I see nfs-ganesha re-exporting the volume -

02/08/2016 06:28:15 : epoch 57a071e6 : gqas014.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-19047[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Operation not supported
02/08/2016 06:28:24 : epoch 57a071e6 : gqas014.sbu.lab.eng.bos.redhat.com : ganesha.nfsd-19047[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvol exported at : '/'

Aug  2 06:28:55 gqas014 kernel: ganesha.nfsd[19062]: segfault at 7f2484946084 ip 00007f24bfb92210 sp 00007f24b4f94428 error 6 in libpthread-2.17.so[7f24bfb86000+16000]
Aug  2 06:28:57 gqas014 systemd: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV
Aug  2 06:28:57 gqas014 systemd: Unit nfs-ganesha.service entered failed state.
Aug  2 06:28:57 gqas014 systemd: nfs-ganesha.service failed.

And around the same time crash happened. So somehow volume is being re-exported resulting in crash which is being addressed as part of bug1361520 . Jiffin is building RPMs with the fix applied. Please re-test post that.

Comment 4 Jiffin 2016-08-04 05:48:45 UTC

If issue is not present in new rpms, Can you please close this bug as duplicate of BZ1361520.

Note You need to log in before you can comment on or make changes to this bug.