1476563 – [Stress] : Ganesha v4 mounts timed out during MTSH

Bug 1476563 - [Stress] : Ganesha v4 mounts timed out during MTSH

Summary: [Stress] : Ganesha v4 mounts timed out during MTSH

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Kaleb KEITHLEY
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503134
TreeView+	depends on / blocked

Reported:	2017-07-30 11:05 UTC by Ambarish
Modified:	2018-09-24 12:36 UTC (History)
CC List:	13 users (show)
Fixed In Version:	nfs-ganesha-2.5.4-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:53:35 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2610	0	None	None	None	2018-09-04 06:55:06 UTC

Description Ambarish 2017-07-30 11:05:26 UTC

Description of problem:
-----------------------

4 Node cluster,with MTSH in progress and continous IO from 3 mounts.


v4 mount timed out on one of my clients :

[root@gqac007 /]# mount -t nfs -o vers=4.0 192.168.97.161:/testvol /gluster-mount/ -v
mount.nfs: timeout set for Sun Jul 30 06:36:57 2017
mount.nfs: trying text-based options 'vers=4.0,addr=192.168.97.161,clientaddr=192.168.97.147'
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out
[root@gqac007 /]# 

v3 succeeded though :

[root@gqac007 /]# mount -t nfs -o vers=3 192.168.97.161:/testvol /gluster-mount/ -v
mount.nfs: timeout set for Sun Jul 30 06:59:38 2017
mount.nfs: trying text-based options 'vers=3,addr=192.168.97.161'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.97.161 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.97.161 prog 100005 vers 3 prot UDP port 20048
mount.nfs: portmap query retrying: RPC: Timed out
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 192.168.97.161 prog 100005 vers 3 prot TCP port 20048
[root@gqac007 /]# 


Mounts from other servers succeeded as well :

[root@gqac007 /]# mount -t nfs -o vers=4.0 192.168.97.162:/testvol /gluster-mount/ -v
mount.nfs: timeout set for Sun Jul 30 07:02:06 2017
mount.nfs: trying text-based options 'vers=4.0,addr=192.168.97.162,clientaddr=192.168.97.147'
[root@gqac007 /]# 


Version-Release number of selected component (if applicable):
---------------------------------------------------------------

nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-36.el7rhgs.x86_64



How reproducible:
----------------

1/1


Additional info:
----------------
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 41c5aa32-ec60-4591-ae6d-f93a0b13b47c
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
cluster.shd-wait-qlength: 655536
cluster.shd-max-threads: 64
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
ganesha.enable: on
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Comment 4 Ambarish 2017-07-30 11:17:28 UTC

Each time I try to mount , I see this in messages :

Jul 30 07:16:53 gqas013 lrmd[19750]:  notice: gqas013.sbu.lab.eng.bos.redhat.com-nfs_unblock_monitor_10000:28728:stderr [ 0+1 records in ]
Jul 30 07:16:53 gqas013 lrmd[19750]:  notice: gqas013.sbu.lab.eng.bos.redhat.com-nfs_unblock_monitor_10000:28728:stderr [ 0+1 records out ]
Jul 30 07:16:53 gqas013 lrmd[19750]:  notice: gqas013.sbu.lab.eng.bos.redhat.com-nfs_unblock_monitor_10000:28728:stderr [ 390 bytes (390 B) copied, 0.00407114 s, 95.8 kB/s ]

Comment 5 Ambarish 2017-07-30 12:50:29 UTC

I have gluster v heal info running periodially on that server too,to see when heal completes.

Comment 9 Frank Filz 2017-07-31 20:52:00 UTC

what are the other threads doing?

I assume this is a deadlock, it could be fixed by patches in 2.5 already.

Comment 10 Frank Filz 2017-07-31 20:55:08 UTC

Is both this bug and https://bugzilla.redhat.com/show_bug.cgi?id=1476559 occurring at the same time? If so, I think it's one deadlock bug...

Comment 11 Jiffin 2017-08-02 05:17:10 UTC

(In reply to Frank Filz from comment #10)
> Is both this bug and https://bugzilla.redhat.com/show_bug.cgi?id=1476559
> occurring at the same time? If so, I think it's one deadlock bug...

Yeah both bug is occurring on same server for two different clients

Comment 16 Kaleb KEITHLEY 2017-10-05 11:25:57 UTC

POST with rebase to nfs-ganesha-2.5.x

Comment 22 errata-xmlrpc 2018-09-04 06:53:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610

Note You need to log in before you can comment on or make changes to this bug.