Bug 1667407

Summary: [Ganesha] Observed ganesha crash after setting 'ganesha.enable' to 'on' on volume which is not started
Product: [Community] GlusterFS Reporter: Jiffin <jthottan>
Component: coreAssignee: Jiffin <jthottan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, pasik
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1658050 Environment:
Last Closed: 2020-03-12 14:23:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1658050    
Bug Blocks:    

Description Jiffin 2019-01-18 11:48:51 UTC
+++ This bug was initially created as a clone of Bug #1658050 +++

Description of problem:
Ganesha crashed after setting 'ganesha.enable' to 'on' on volume which is not started. Crash observed on all nodes in the cluster.

Version-Release number of selected component (if applicable):
# rpm -qa | grep ganesha

How reproducible:

Steps to Reproduce:
1. Create a 6 node ganesha cluster.
2. Create a volume 'testvol'. Do not start the volume.
3. Set volume option 'ganesha.enable' to 'on' in 'testvol'.
4. Observe ganesha crash after sometime.

Actual results:
nfs-ganesha crashed on all nodes.

Expected results:
nfs-ganesha should not get crashed.

Additional info:

The initialization of glusterfs client happens twice for nfs-ganesha.
One via mgmt_rpc_notify() (the normal path for gfapi) and other with mgmt_cbk_spec() (callback send from glusterd at the end of volume set command)

So two io threads will be created.
If the volume is not started, the glfs_fini is destroy only one of the threads, leaving the context of another thread invalid and leads to crash.
If the volume is in started state, post init init_export_root->mdcache_lookup_path->lookup->..->priv_glfs_active_subvol() finds out there is oldsubvol
and sends notify on oldsubvol with PARENT_DOWN event so that the iot thread created first will be destroyed.

If the volume is not started the init will fail, so no lookup path will be send post t

Comment 1 Worker Ant 2019-01-18 11:59:36 UTC
REVIEW: https://review.gluster.org/22062 (graph: deactivate existing graph in glusterfs_graph_activate()) posted (#2) for review on master by jiffin tony Thottan

Comment 2 Worker Ant 2020-03-12 14:23:12 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1034, and will be tracked there from now on. Visit GitHub issues URL for further details