Description of problem: I updated the nfs-ganesha related export file of a volume to enable acls and executed refresh-config. There on I have seen nfs-ganesha process crashed on two nodes of the cluster. Version-Release number of selected component (if applicable): glusterfs-3.7.1-9.el6rhs.x86_64 nfs-ganesha-2.2.0-5.el6rhs.x86_64 How reproducible: happened only once Steps to Reproduce: 1. create a 6x2 type volume, start it 2. configure ganesha 3. export the volume 4. enable acls in the /etc/ganesha/exports/export.<volname>.conf 5. update all the nodes for the change in step 4 using the command /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ volname 6. pcs status Actual results: before executing the refresh config, # cat /etc/ganesha/exports/export.vol4.conf # WARNING : Using Gluster CLI will overwrite manual # changes made to this file. To avoid it, edit the # file, copy it over to all the NFS-Ganesha nodes # and run ganesha-ha.sh --refresh-config. EXPORT{ Export_Id= 4 ; Path = "/vol4"; FSAL { name = "GLUSTER"; hostname="localhost"; volume="vol4"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/vol4"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } # pcs status Cluster name: nozomer Last updated: Wed Jul 15 20:11:57 2015 Last change: Wed Jul 15 00:08:36 2015 Stack: cman Current DC: nfs11 - partition with quorum Version: 1.1.11-97629de 4 Nodes configured 16 Resources configured Online: [ nfs11 nfs12 nfs13 nfs14 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs11 nfs12 nfs13 nfs14 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs11 nfs12 nfs13 nfs14 ] nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs11 nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs11 nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs12 nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs12 nfs13-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs13 nfs13-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs13 nfs14-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs14 nfs14-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs14 update the export file for and execute the refresh config from same node, # time bash /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ vol4 export.vol4.conf 100% 550 0.5KB/s 00:00 Error org.freedesktop.DBus.Error.InvalidArgs: lookup_export failed with Export id not found export.vol4.conf 100% 550 0.5KB/s 00:00 Error org.freedesktop.DBus.Error.InvalidArgs: lookup_export failed with Export id not found export.vol4.conf 100% 550 0.5KB/s 00:00 method return sender=:1.131 -> dest=:1.136 reply_serial=2 Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. [root@nfs11 ~]# pcs status Cluster name: nozomer Last updated: Wed Jul 15 20:13:39 2015 Last change: Wed Jul 15 20:13:30 2015 Stack: cman Current DC: nfs11 - partition with quorum Version: 1.1.11-97629de 4 Nodes configured 18 Resources configured Online: [ nfs11 nfs12 nfs13 nfs14 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs11 nfs12 nfs13 nfs14 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs12 nfs13 nfs14 ] Stopped: [ nfs11 ] nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs11 nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs11 nfs13-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs13-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs13 nfs14-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs14-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs14 nfs12-dead_ip-1 (ocf::heartbeat:Dummy): Started nfs12 nfs13-dead_ip-1 (ocf::heartbeat:Dummy): Started nfs12 here is the bt, (gdb) bt #0 0x0000003d0c232625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003d0c233e05 in abort () at abort.c:92 #2 0x0000003d0c22b74e in __assert_fail_base (fmt=<value optimized out>, assertion=0x5af07f "export->refcnt == 0", file=0x5aef68 "/builddir/build/BUILD/nfs-ganesha-2.2.0/src/support/export_mgr.c", line=<value optimized out>, function=<value optimized out>) at assert.c:96 #3 0x0000003d0c22b810 in __assert_fail (assertion=0x5af07f "export->refcnt == 0", file=0x5aef68 "/builddir/build/BUILD/nfs-ganesha-2.2.0/src/support/export_mgr.c", line=249, function=0x5af9a2 "free_export") at assert.c:105 #4 0x000000000051a36d in free_export () #5 0x0000000000507165 in export_init () #6 0x0000000000534643 in proc_block () #7 0x0000000000535319 in load_config_from_node () #8 0x000000000051c43f in gsh_export_addexport () #9 0x000000000052edfc in dbus_message_entrypoint () #10 0x0000003d0f21cefe in _dbus_object_tree_dispatch_and_unlock (tree=0x218db20, message=0x218e0e0) at dbus-object-tree.c:856 #11 0x0000003d0f210b4c in dbus_connection_dispatch (connection=0x218de20) at dbus-connection.c:4492 #12 0x0000003d0f210dd9 in _dbus_connection_read_write_dispatch (connection=0x218de20, timeout_milliseconds=100, dispatch=<value optimized out>) at dbus-connection.c:3476 #13 0x000000000052f9bf in gsh_dbus_thread () #14 0x0000003d0c607a51 in start_thread (arg=0x7f0d1ac30700) at pthread_create.c:301 #15 0x0000003d0c2e896d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 ganesha.log, 15/07/2015 14:49:44 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] dbus_message_entrypoint :DBUS :MAJ :Method (RemoveExport) on (org.ganesha.nfsd.exportmgr) failed: name = (org.freedesktop.DBus.Error.InvalidArgs), message = (lookup_export failed with Export id not found) 15/07/2015 14:49:46 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol4 exported at : '/' 15/07/2015 14:49:47 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Pseudo path (/vol4) is a duplicate 15/07/2015 14:49:47 : epoch 55a3b560 : nfs12 : ganesha.nfsd-24506[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Duplicate export id = 4 Expected results: refresh config should properly on all nodes and there should not be any crash for nfs-ganesha process. Additional info:
Created attachment 1052321 [details] nfs12 nfs-ganesha coredump
Verified this bug with latest glusterfs-3.7.9-1 build and is working as expected. Details below: Export file and pcs status before changing the acl parameter: EXPORT{ Export_Id= 2 ; Path = "/newvolume"; FSAL { name = GLUSTER; hostname="localhost"; volume="newvolume"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/newvolume"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } pcs status: Stack: corosync Current DC: dhcp37-127.lab.eng.blr.redhat.com (3) - partition with quorum Version: 1.1.12-a14efad 4 Nodes configured 16 Resources configured Online: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp37-127.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com dhcp37-180.lab.eng.blr.redhat.com ] dhcp37-180.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-180.lab.eng.blr.redhat.com dhcp37-158.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-158.lab.eng.blr.redhat.com dhcp37-127.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-127.lab.eng.blr.redhat.com dhcp37-174.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp37-174.lab.eng.blr.redhat.com Export file after enabling the acl: EXPORT{ Export_Id= 2 ; Path = "/newvolume"; FSAL { name = GLUSTER; hostname="localhost"; volume="newvolume"; } Access_type = RW; Disable_ACL = false; Squash="No_root_squash"; Pseudo="/newvolume"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } Running the refresh config completed without any issues: [root@dhcp37-180 exports]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Success: refresh-config completed. Other nodes updated with the changes in export file: EXPORT{ Export_Id= 2 ; Path = "/newvolume"; FSAL { name = GLUSTER; hostname="localhost"; volume="newvolume"; } Access_type = RW; Disable_ACL = false; Squash="No_root_squash"; Pseudo="/newvolume"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } Based on the above observations, marking this bug as verified.
No crash observed on any of the nodes and nfs-ganesha process is running properly after the changes and refresh config command.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1288