Bug 1381940

Summary: Ganesha crashes on one node during volume restart when performance.client-io-threads is off.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: surabhi <sbhaloth>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: jthottan, kkeithle, mzywusko, ndevos, rcyriac, rhinduja, rhs-bugs, sbhaloth, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 06:24:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528    

Description Shashank Raj 2016-10-05 11:52:40 UTC
Description of problem:

Ganesha crashes on one node during volume restart when performance.client-io-threads is off.

Version-Release number of selected component (if applicable):

[root@dhcp42-59 ~]# rpm -qa|grep ganesha
nfs-ganesha-2.4.0-2.el6rhs.x86_64
nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64


How reproducible:

Consistent

Steps to Reproduce:
1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Set performance.client-io-threads to off.
3.Stop the volume and then start the volume.
4.Observe that most of the times, ganesha crashes on one of the nodes with below messages in ganesha.log

05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com : ganesha.nfsd-23973[dbus_heartbeat] unregister_fsal :FSAL :CRIT :Unregister FSAL GLUSTER with non-zero refcount=1
05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com : ganesha.nfsd-23973[dbus_heartbeat] glusterfs_unload :FSAL :CRIT :FSAL Gluster unable to unload.  Dying ...

5.No bt is seen:

[Inferior 1 (process 6944) exited with code 02]
(gdb) bt
No stack.
(gdb) 

6. Also apart from this there is another observation which is seen only on RHEL 6:

everytime it crashes on one node, i see an unwanted entry getting created under exports folder.

[root@dhcp42-59 exports]# pwd
/var/run/gluster/shared_storage/nfs-ganesha/exports
[root@dhcp42-59 exports]# ls -ltr
total 1
----------. 1 root root   0 Oct  5 19:45 sedbj8MSj
----------. 1 root root   0 Oct  5 19:53 sedT7DQPC
----------. 1 root root   0 Oct  5 19:59 sedCneQYU
----------. 1 root root   0 Oct  5 20:01 sed1lcMM7
----------. 1 root root   0 Oct  5 20:02 sedSpRC1X
----------. 1 root root   0 Oct  5 20:02 sedAFz6LR
----------. 1 root root   0 Oct  5 20:13 sedVI3QsZ
----------. 1 root root 509 Oct  5 20:26 sed0uVXuV
----------. 1 root root   0 Oct  5 20:30 sedcl5uS8
-rw-r--r--. 1 root root 509 Oct  5 20:33 export.ozone.conf
----------. 1 root root   0 Oct  5 20:33 sedSFcZDV

Actual results:

Ganesha crashes on one node during volume restart when performance.client-io-threads is off and unwanted entries are seen under exports folder in case of only RHEL 6 whenever we this crash is seen.

Expected results:

There should not be any crashes.

Additional info:

Comment 2 Jiffin 2016-10-13 09:28:09 UTC
(In reply to Shashank Raj from comment #0)
> Description of problem:
> 
> Ganesha crashes on one node during volume restart when
> performance.client-io-threads is off.
> 
> Version-Release number of selected component (if applicable):
> 
> [root@dhcp42-59 ~]# rpm -qa|grep ganesha
> nfs-ganesha-2.4.0-2.el6rhs.x86_64
> nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
> glusterfs-ganesha-3.8.4-2.el6rhs.x86_64
> 
> 
> How reproducible:
> 
> Consistent
> 
> Steps to Reproduce:
> 1.Create a ganesha cluster, create a volume and enable ganesha on it.
> 2.Set performance.client-io-threads to off.
> 3.Stop the volume and then start the volume.
> 4.Observe that most of the times, ganesha crashes on one of the nodes with
> below messages in ganesha.log
> 
> 05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com :
> ganesha.nfsd-23973[dbus_heartbeat] unregister_fsal :FSAL :CRIT :Unregister
> FSAL GLUSTER with non-zero refcount=1
> 05/10/2016 20:13:12 : epoch dcf30000 : dhcp42-96.lab.eng.blr.redhat.com :
> ganesha.nfsd-23973[dbus_heartbeat] glusterfs_unload :FSAL :CRIT :FSAL
> Gluster unable to unload.  Dying ...
> 
> 5.No bt is seen:
> 
> [Inferior 1 (process 6944) exited with code 02]
> (gdb) bt
> No stack.
> (gdb) 
> 
> 6. Also apart from this there is another observation which is seen only on
> RHEL 6:
> 


Even I hit similar issue in upstream one of the nodes. AFAIK it was package issue(conflicting / mismatching packages) when I cleaned up everything, then it worked fine. I suspect this behavior to similar to that only 

> everytime it crashes on one node, i see an unwanted entry getting created
> under exports folder.
> 
> [root@dhcp42-59 exports]# pwd
> /var/run/gluster/shared_storage/nfs-ganesha/exports
> [root@dhcp42-59 exports]# ls -ltr
> total 1
> ----------. 1 root root   0 Oct  5 19:45 sedbj8MSj
> ----------. 1 root root   0 Oct  5 19:53 sedT7DQPC
> ----------. 1 root root   0 Oct  5 19:59 sedCneQYU
> ----------. 1 root root   0 Oct  5 20:01 sed1lcMM7
> ----------. 1 root root   0 Oct  5 20:02 sedSpRC1X
> ----------. 1 root root   0 Oct  5 20:02 sedAFz6LR
> ----------. 1 root root   0 Oct  5 20:13 sedVI3QsZ
> ----------. 1 root root 509 Oct  5 20:26 sed0uVXuV
> ----------. 1 root root   0 Oct  5 20:30 sedcl5uS8
> -rw-r--r--. 1 root root 509 Oct  5 20:33 export.ozone.conf
> ----------. 1 root root   0 Oct  5 20:33 sedSFcZDV
> 

These are temporary files created by "sed -i" operation in the script. If operation was unsuccessful by some means(may be here it is due to abort of ganesha process)
 
> Actual results:
> 
> Ganesha crashes on one node during volume restart when
> performance.client-io-threads is off and unwanted entries are seen under
> exports folder in case of only RHEL 6 whenever we this crash is seen.
> 
> Expected results:
> 
> There should not be any crashes.
> 
> Additional info:

Comment 5 Jiffin 2016-11-08 06:32:27 UTC
The patch got merged downstream https://code.engineering.redhat.com/gerrit/#/c/86103/ and available in latest gluster bits

Comment 8 surabhi 2016-11-22 11:23:28 UTC
With following steps the issue mentioned is not seen:

1.Create a ganesha cluster, create a volume and enable ganesha on it.
2.Set performance.client-io-threads to off.
3.Stop the volume and then start the volume.
4.Observe for any crashes , error messages in logs:

No crash is seen with multiple start and stop of gluster volume with client-io-thread set to off.

With client-io-thread on and volume stop there are crashes seen which is tracked in another BZ.

Moving this BZ to verified.

nfs-ganesha-2.4.1-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-1.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64

Comment 10 errata-xmlrpc 2017-03-23 06:24:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html