Bug 1381940

Summary:	Ganesha crashes on one node during volume restart when performance.client-io-threads is off.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Shashank Raj <sraj>
Component:	nfs-ganesha	Assignee:	Jiffin <jthottan>
Status:	CLOSED ERRATA	QA Contact:	surabhi <sbhaloth>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	jthottan, kkeithle, mzywusko, ndevos, rcyriac, rhinduja, rhs-bugs, sbhaloth, skoduri, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	RHGS 3.2.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-3	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-23 06:24:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528

Description Shashank Raj 2016-10-05 11:52:40 UTC

Description of problem:

Ganesha crashes on one node during volume restart when performance.client-io-threads is off.

Version-Release number of selected component (if applicable):

[root@dhcp42-59 ~]# rpm -qa|grep ganesha
nfs-ganesha-2.4.0-2.el6rhs.x86_64
nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64


How reproducible:

Consistent

Steps to Reproduce:
1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Set performance.client-io-threads to off.
3.Stop the volume and then start the volume.
4.Observe that most of the times, ganesha crashes on one of the nodes with below messages in ganesha.log

05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com : ganesha.nfsd-23973[dbus_heartbeat] unregister_fsal :FSAL :CRIT :Unregister FSAL GLUSTER with non-zero refcount=1
05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com : ganesha.nfsd-23973[dbus_heartbeat] glusterfs_unload :FSAL :CRIT :FSAL Gluster unable to unload.  Dying ...

5.No bt is seen:

[Inferior 1 (process 6944) exited with code 02]
(gdb) bt
No stack.
(gdb) 

6. Also apart from this there is another observation which is seen only on RHEL 6:

everytime it crashes on one node, i see an unwanted entry getting created under exports folder.

[root@dhcp42-59 exports]# pwd
/var/run/gluster/shared_storage/nfs-ganesha/exports
[root@dhcp42-59 exports]# ls -ltr
total 1
----------. 1 root root   0 Oct  5 19:45 sedbj8MSj
----------. 1 root root   0 Oct  5 19:53 sedT7DQPC
----------. 1 root root   0 Oct  5 19:59 sedCneQYU
----------. 1 root root   0 Oct  5 20:01 sed1lcMM7
----------. 1 root root   0 Oct  5 20:02 sedSpRC1X
----------. 1 root root   0 Oct  5 20:02 sedAFz6LR
----------. 1 root root   0 Oct  5 20:13 sedVI3QsZ
----------. 1 root root 509 Oct  5 20:26 sed0uVXuV
----------. 1 root root   0 Oct  5 20:30 sedcl5uS8
-rw-r--r--. 1 root root 509 Oct  5 20:33 export.ozone.conf
----------. 1 root root   0 Oct  5 20:33 sedSFcZDV

Actual results:

Ganesha crashes on one node during volume restart when performance.client-io-threads is off and unwanted entries are seen under exports folder in case of only RHEL 6 whenever we this crash is seen.

Expected results:

There should not be any crashes.

Additional info:

Comment 2 Jiffin 2016-10-13 09:28:09 UTC

(In reply to Shashank Raj from comment #0)
> Description of problem:
> 
> Ganesha crashes on one node during volume restart when
> performance.client-io-threads is off.
> 
> Version-Release number of selected component (if applicable):
> 
> [root@dhcp42-59 ~]# rpm -qa|grep ganesha
> nfs-ganesha-2.4.0-2.el6rhs.x86_64
> nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
> glusterfs-ganesha-3.8.4-2.el6rhs.x86_64
> 
> 
> How reproducible:
> 
> Consistent
> 
> Steps to Reproduce:
> 1.Create a ganesha cluster, create a volume and enable ganesha on it.
> 2.Set performance.client-io-threads to off.
> 3.Stop the volume and then start the volume.
> 4.Observe that most of the times, ganesha crashes on one of the nodes with
> below messages in ganesha.log
> 
> 05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com :
> ganesha.nfsd-23973[dbus_heartbeat] unregister_fsal :FSAL :CRIT :Unregister
> FSAL GLUSTER with non-zero refcount=1
> 05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com :
> ganesha.nfsd-23973[dbus_heartbeat] glusterfs_unload :FSAL :CRIT :FSAL
> Gluster unable to unload.  Dying ...
> 
> 5.No bt is seen:
> 
> [Inferior 1 (process 6944) exited with code 02]
> (gdb) bt
> No stack.
> (gdb) 
> 
> 6. Also apart from this there is another observation which is seen only on
> RHEL 6:
> 


Even I hit similar issue in upstream one of the nodes. AFAIK it was package issue(conflicting / mismatching packages) when I cleaned up everything, then it worked fine. I suspect this behavior to similar to that only 

> everytime it crashes on one node, i see an unwanted entry getting created
> under exports folder.
> 
> [root@dhcp42-59 exports]# pwd
> /var/run/gluster/shared_storage/nfs-ganesha/exports
> [root@dhcp42-59 exports]# ls -ltr
> total 1
> ----------. 1 root root   0 Oct  5 19:45 sedbj8MSj
> ----------. 1 root root   0 Oct  5 19:53 sedT7DQPC
> ----------. 1 root root   0 Oct  5 19:59 sedCneQYU
> ----------. 1 root root   0 Oct  5 20:01 sed1lcMM7
> ----------. 1 root root   0 Oct  5 20:02 sedSpRC1X
> ----------. 1 root root   0 Oct  5 20:02 sedAFz6LR
> ----------. 1 root root   0 Oct  5 20:13 sedVI3QsZ
> ----------. 1 root root 509 Oct  5 20:26 sed0uVXuV
> ----------. 1 root root   0 Oct  5 20:30 sedcl5uS8
> -rw-r--r--. 1 root root 509 Oct  5 20:33 export.ozone.conf
> ----------. 1 root root   0 Oct  5 20:33 sedSFcZDV
> 

These are temporary files created by "sed -i" operation in the script. If operation was unsuccessful by some means(may be here it is due to abort of ganesha process)
 
> Actual results:
> 
> Ganesha crashes on one node during volume restart when
> performance.client-io-threads is off and unwanted entries are seen under
> exports folder in case of only RHEL 6 whenever we this crash is seen.
> 
> Expected results:
> 
> There should not be any crashes.
> 
> Additional info:

Comment 5 Jiffin 2016-11-08 06:32:27 UTC

The patch got merged downstream https://code.engineering.redhat.com/gerrit/#/c/86103/ and available in latest gluster bits

Comment 8 surabhi 2016-11-22 11:23:28 UTC

With following steps the issue mentioned is not seen:

1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Set performance.client-io-threads to off.
3.Stop the volume and then start the volume.
4.Observe for any crashes , error messages in logs:

No crash is seen with multiple start and stop of gluster volume with client-io-thread set to off.

With client-io-thread on and volume stop there are crashes seen which is tracked in another BZ.

Moving this BZ to verified.

nfs-ganesha-2.4.1-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64

Comment 10 errata-xmlrpc 2017-03-23 06:24:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html