1243385 – nfs-ganesha: nfs-ganesha process coredump while executing the refresh config

Bug 1243385 - nfs-ganesha: nfs-ganesha process coredump while executing the refresh config

Summary: nfs-ganesha: nfs-ganesha process coredump while executing the refresh config

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Soumya Koduri
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1299184
TreeView+	depends on / blocked

Reported:	2015-07-15 10:59 UTC by Saurabh
Modified:	2016-11-08 03:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:	nfs-ganesha-2.3.1-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-23 05:35:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nfs12 nfs-ganesha coredump (4.61 MB, application/x-xz) 2015-07-15 11:38 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:1288	0	normal	SHIPPED_LIVE	nfs-ganesha update for Red Hat Gluster Storage 3.1 update 3	2016-06-23 09:12:51 UTC

Description Saurabh 2015-07-15 10:59:21 UTC

Description of problem:
I updated the nfs-ganesha related export file of a volume to enable acls and executed refresh-config. There on I have seen nfs-ganesha process crashed on two nodes of the cluster.

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-9.el6rhs.x86_64
nfs-ganesha-2.2.0-5.el6rhs.x86_64

How reproducible:
happened only once

Steps to Reproduce:
1. create a 6x2 type volume, start it
2. configure ganesha
3. export the volume
4. enable acls in the /etc/ganesha/exports/export.<volname>.conf
5. update all the nodes for the change in step 4 using the command 
   /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ volname
6. pcs status

Actual results:
before executing the refresh config,
# cat /etc/ganesha/exports/export.vol4.conf 
 # WARNING : Using Gluster CLI will overwrite manual
 # changes made to this file. To avoid it, edit the
 # file, copy it over to all the NFS-Ganesha nodes
 # and run ganesha-ha.sh --refresh-config.
 EXPORT{
       Export_Id= 4 ;
       Path = "/vol4";
       FSAL {
            name = "GLUSTER";
            hostname="localhost";
            volume="vol4";
            }
       Access_type = RW;
       Disable_ACL = true;
       Squash="No_root_squash";
       Pseudo="/vol4";
       Protocols = "3", "4" ;
       Transports = "UDP","TCP";
       SecType = "sys";
 }
# pcs status
Cluster name: nozomer
Last updated: Wed Jul 15 20:11:57 2015
Last change: Wed Jul 15 00:08:36 2015
Stack: cman
Current DC: nfs11 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
16 Resources configured


Online: [ nfs11 nfs12 nfs13 nfs14 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs11 nfs12 nfs13 nfs14 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs11 nfs12 nfs13 nfs14 ]
 nfs11-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs11 
 nfs11-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs11 
 nfs12-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs12 
 nfs12-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 
 nfs13-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs13 
 nfs13-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs13 
 nfs14-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs14 
 nfs14-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs14 

update the export file for and execute the refresh config from same node,
# time bash /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ vol4
export.vol4.conf                                                                                                                                                             100%  550     0.5KB/s   00:00    
Error org.freedesktop.DBus.Error.InvalidArgs: lookup_export failed with Export id not found
export.vol4.conf                                                                                                                                                             100%  550     0.5KB/s   00:00    
Error org.freedesktop.DBus.Error.InvalidArgs: lookup_export failed with Export id not found
export.vol4.conf                                                                                                                                                             100%  550     0.5KB/s   00:00    
method return sender=:1.131 -> dest=:1.136 reply_serial=2
Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

[root@nfs11 ~]# pcs status
Cluster name: nozomer
Last updated: Wed Jul 15 20:13:39 2015
Last change: Wed Jul 15 20:13:30 2015
Stack: cman
Current DC: nfs11 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
18 Resources configured


Online: [ nfs11 nfs12 nfs13 nfs14 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs11 nfs12 nfs13 nfs14 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs12 nfs13 nfs14 ]
     Stopped: [ nfs11 ]
 nfs11-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs11-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs11 
 nfs12-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs12-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs11 
 nfs13-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs13-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs13 
 nfs14-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped 
 nfs14-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs14 
 nfs12-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 
 nfs13-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 

here is the bt,

(gdb) bt
#0  0x0000003d0c232625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003d0c233e05 in abort () at abort.c:92
#2  0x0000003d0c22b74e in __assert_fail_base (fmt=<value optimized out>, assertion=0x5af07f "export->refcnt == 0", file=0x5aef68 "/builddir/build/BUILD/nfs-ganesha-2.2.0/src/support/export_mgr.c", 
    line=<value optimized out>, function=<value optimized out>) at assert.c:96
#3  0x0000003d0c22b810 in __assert_fail (assertion=0x5af07f "export->refcnt == 0", file=0x5aef68 "/builddir/build/BUILD/nfs-ganesha-2.2.0/src/support/export_mgr.c", line=249, 
    function=0x5af9a2 "free_export") at assert.c:105
#4  0x000000000051a36d in free_export ()
#5  0x0000000000507165 in export_init ()
#6  0x0000000000534643 in proc_block ()
#7  0x0000000000535319 in load_config_from_node ()
#8  0x000000000051c43f in gsh_export_addexport ()
#9  0x000000000052edfc in dbus_message_entrypoint ()
#10 0x0000003d0f21cefe in _dbus_object_tree_dispatch_and_unlock (tree=0x218db20, message=0x218e0e0) at dbus-object-tree.c:856
#11 0x0000003d0f210b4c in dbus_connection_dispatch (connection=0x218de20) at dbus-connection.c:4492
#12 0x0000003d0f210dd9 in _dbus_connection_read_write_dispatch (connection=0x218de20, timeout_milliseconds=100, dispatch=<value optimized out>) at dbus-connection.c:3476
#13 0x000000000052f9bf in gsh_dbus_thread ()
#14 0x0000003d0c607a51 in start_thread (arg=0x7f0d1ac30700) at pthread_create.c:301
#15 0x0000003d0c2e896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

ganesha.log,

15/07/2015 14:49:44 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] dbus_message_entrypoint :DBUS :MAJ :Method (RemoveExport) on (org.ganesha.nfsd.exportmgr) failed: name = (org.freedesktop.DBus.Error.InvalidArgs), message = (lookup_export failed with Export id not found)
15/07/2015 14:49:46 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol4 exported at : '/'
15/07/2015 14:49:47 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Pseudo path (/vol4) is a duplicate
15/07/2015 14:49:47 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Duplicate export id = 4

Expected results:
refresh config should properly on all nodes and there should not be any crash for nfs-ganesha process.

Additional info:

Comment 2 Saurabh 2015-07-15 11:38:03 UTC

Created attachment 1052321 [details]
nfs12 nfs-ganesha coredump

Comment 7 Shashank Raj 2016-04-04 09:51:49 UTC

Verified this bug with latest glusterfs-3.7.9-1 build and is working as expected.

Details below:

Export file and pcs status before changing the acl parameter:

EXPORT{
      Export_Id= 2 ;
      Path = "/newvolume";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="newvolume";
           }
      Access_type = RW;
      Disable_ACL = true;
      Squash="No_root_squash";
      Pseudo="/newvolume";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }

pcs status:

Stack: corosync
Current DC: dhcp37-127.lab.eng.blr.redhat.com (3) - partition with quorum
Version: 1.1.12-a14efad
4 Nodes configured
16 Resources configured


Online: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ]
 dhcp37-180.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp37-180.lab.eng.blr.redhat.com 
 dhcp37-158.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp37-158.lab.eng.blr.redhat.com 
 dhcp37-127.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp37-127.lab.eng.blr.redhat.com 
 dhcp37-174.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp37-174.lab.eng.blr.redhat.com 


Export file after enabling the acl:


EXPORT{
      Export_Id= 2 ;
      Path = "/newvolume";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="newvolume";
           }
      Access_type = RW;
      Disable_ACL = false;
      Squash="No_root_squash";
      Pseudo="/newvolume";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }


Running the refresh config completed without any issues:

[root@dhcp37-180 exports]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume
Refresh-config completed on dhcp37-127.
Refresh-config completed on dhcp37-158.
Refresh-config completed on dhcp37-174.
Success: refresh-config completed.


Other nodes updated with the changes in export file:

EXPORT{
      Export_Id= 2 ;
      Path = "/newvolume";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="newvolume";
           }
      Access_type = RW;
      Disable_ACL = false;
      Squash="No_root_squash";
      Pseudo="/newvolume";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }


Based on the above observations, marking this bug as verified.

Comment 8 Shashank Raj 2016-04-04 09:53:45 UTC

No crash observed on any of the nodes and nfs-ganesha process is running properly after the changes and refresh config command.

Comment 10 errata-xmlrpc 2016-06-23 05:35:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1288

Note You need to log in before you can comment on or make changes to this bug.