1381940 – Ganesha crashes on one node during volume restart when performance.client-io-threads is off.

Bug 1381940 - Ganesha crashes on one node during volume restart when performance.client-io-threads is off.

Summary: Ganesha crashes on one node during volume restart when performance.client-io-...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Jiffin
QA Contact:	surabhi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-10-05 11:52 UTC by Shashank Raj
Modified:	2017-03-23 06:24 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.8.4-3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 06:24:05 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:0493	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.2.0 nfs-ganesha bug fix and enhancement update	2017-03-23 09:19:13 UTC

Description Shashank Raj 2016-10-05 11:52:40 UTC

Description of problem:

Ganesha crashes on one node during volume restart when performance.client-io-threads is off.

Version-Release number of selected component (if applicable):

[root@dhcp42-59 ~]# rpm -qa|grep ganesha
nfs-ganesha-2.4.0-2.el6rhs.x86_64
nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64


How reproducible:

Consistent

Steps to Reproduce:
1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Set performance.client-io-threads to off.
3.Stop the volume and then start the volume.
4.Observe that most of the times, ganesha crashes on one of the nodes with below messages in ganesha.log

05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com : ganesha.nfsd-23973[dbus_heartbeat] unregister_fsal :FSAL :CRIT :Unregister FSAL GLUSTER with non-zero refcount=1
05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com : ganesha.nfsd-23973[dbus_heartbeat] glusterfs_unload :FSAL :CRIT :FSAL Gluster unable to unload.  Dying ...

5.No bt is seen:

[Inferior 1 (process 6944) exited with code 02]
(gdb) bt
No stack.
(gdb) 

6. Also apart from this there is another observation which is seen only on RHEL 6:

everytime it crashes on one node, i see an unwanted entry getting created under exports folder.

[root@dhcp42-59 exports]# pwd
/var/run/gluster/shared_storage/nfs-ganesha/exports
[root@dhcp42-59 exports]# ls -ltr
total 1
----------. 1 root root   0 Oct  5 19:45 sedbj8MSj
----------. 1 root root   0 Oct  5 19:53 sedT7DQPC
----------. 1 root root   0 Oct  5 19:59 sedCneQYU
----------. 1 root root   0 Oct  5 20:01 sed1lcMM7
----------. 1 root root   0 Oct  5 20:02 sedSpRC1X
----------. 1 root root   0 Oct  5 20:02 sedAFz6LR
----------. 1 root root   0 Oct  5 20:13 sedVI3QsZ
----------. 1 root root 509 Oct  5 20:26 sed0uVXuV
----------. 1 root root   0 Oct  5 20:30 sedcl5uS8
-rw-r--r--. 1 root root 509 Oct  5 20:33 export.ozone.conf
----------. 1 root root   0 Oct  5 20:33 sedSFcZDV

Actual results:

Ganesha crashes on one node during volume restart when performance.client-io-threads is off and unwanted entries are seen under exports folder in case of only RHEL 6 whenever we this crash is seen.

Expected results:

There should not be any crashes.

Additional info:

Comment 2 Jiffin 2016-10-13 09:28:09 UTC

(In reply to Shashank Raj from comment #0)
> Description of problem:
> 
> Ganesha crashes on one node during volume restart when
> performance.client-io-threads is off.
> 
> Version-Release number of selected component (if applicable):
> 
> [root@dhcp42-59 ~]# rpm -qa|grep ganesha
> nfs-ganesha-2.4.0-2.el6rhs.x86_64
> nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
> glusterfs-ganesha-3.8.4-2.el6rhs.x86_64
> 
> 
> How reproducible:
> 
> Consistent
> 
> Steps to Reproduce:
> 1.Create a ganesha cluster, create a volume and enable ganesha on it.
> 2.Set performance.client-io-threads to off.
> 3.Stop the volume and then start the volume.
> 4.Observe that most of the times, ganesha crashes on one of the nodes with
> below messages in ganesha.log
> 
> 05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com :
> ganesha.nfsd-23973[dbus_heartbeat] unregister_fsal :FSAL :CRIT :Unregister
> FSAL GLUSTER with non-zero refcount=1
> 05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com :
> ganesha.nfsd-23973[dbus_heartbeat] glusterfs_unload :FSAL :CRIT :FSAL
> Gluster unable to unload.  Dying ...
> 
> 5.No bt is seen:
> 
> [Inferior 1 (process 6944) exited with code 02]
> (gdb) bt
> No stack.
> (gdb) 
> 
> 6. Also apart from this there is another observation which is seen only on
> RHEL 6:
> 


Even I hit similar issue in upstream one of the nodes. AFAIK it was package issue(conflicting / mismatching packages) when I cleaned up everything, then it worked fine. I suspect this behavior to similar to that only 

> everytime it crashes on one node, i see an unwanted entry getting created
> under exports folder.
> 
> [root@dhcp42-59 exports]# pwd
> /var/run/gluster/shared_storage/nfs-ganesha/exports
> [root@dhcp42-59 exports]# ls -ltr
> total 1
> ----------. 1 root root   0 Oct  5 19:45 sedbj8MSj
> ----------. 1 root root   0 Oct  5 19:53 sedT7DQPC
> ----------. 1 root root   0 Oct  5 19:59 sedCneQYU
> ----------. 1 root root   0 Oct  5 20:01 sed1lcMM7
> ----------. 1 root root   0 Oct  5 20:02 sedSpRC1X
> ----------. 1 root root   0 Oct  5 20:02 sedAFz6LR
> ----------. 1 root root   0 Oct  5 20:13 sedVI3QsZ
> ----------. 1 root root 509 Oct  5 20:26 sed0uVXuV
> ----------. 1 root root   0 Oct  5 20:30 sedcl5uS8
> -rw-r--r--. 1 root root 509 Oct  5 20:33 export.ozone.conf
> ----------. 1 root root   0 Oct  5 20:33 sedSFcZDV
> 

These are temporary files created by "sed -i" operation in the script. If operation was unsuccessful by some means(may be here it is due to abort of ganesha process)
 
> Actual results:
> 
> Ganesha crashes on one node during volume restart when
> performance.client-io-threads is off and unwanted entries are seen under
> exports folder in case of only RHEL 6 whenever we this crash is seen.
> 
> Expected results:
> 
> There should not be any crashes.
> 
> Additional info:

Comment 5 Jiffin 2016-11-08 06:32:27 UTC

The patch got merged downstream https://code.engineering.redhat.com/gerrit/#/c/86103/ and available in latest gluster bits

Comment 8 surabhi 2016-11-22 11:23:28 UTC

With following steps the issue mentioned is not seen:

1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Set performance.client-io-threads to off.
3.Stop the volume and then start the volume.
4.Observe for any crashes , error messages in logs:

No crash is seen with multiple start and stop of gluster volume with client-io-thread set to off.

With client-io-thread on and volume stop there are crashes seen which is tracked in another BZ.

Moving this BZ to verified.

nfs-ganesha-2.4.1-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64

Comment 10 errata-xmlrpc 2017-03-23 06:24:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html

Note You need to log in before you can comment on or make changes to this bug.