Description of problem: The problem is that I was earlier having 4 node ganesha cluster, out of these four one the deleted and then I tried to disable ganesha. Now I brought up the glusterfs-nfs. This was done intentionally to bring up ganesha on only three nodes. Now, when I bringing nfs-ganesha on three nodes I am getting this error, [root@nfs1 ~]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Commit failed on bed0a6f9-a6fa-49d2-995a-2a9e271040c5. Please check log file for details. Version-Release number of selected component (if applicable): glusterfs-3.7.0-2.el6rhs.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 How reproducible: first attempt fail Steps to Reproduce: 1. create a volume of 6x2 2. bring up nfs-ganesha after completing pre-requisites 3. delete a ganesha node 4. disable nfs-ganesha 5. bring up glusterfs-nfs 6. now try to bring up nfs-ganesha on only three nodes, this is to be done after making changes in the ganesha-ha.conf Actual results: step 3 we already have a issue BZ 1224619 step 6, root@nfs1 ~]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Commit failed on bed0a6f9-a6fa-49d2-995a-2a9e271040c5. Please check log file for details. nfs1 Cluster name: ganesha-ha-360 Last updated: Tue May 26 16:50:44 2015 Last change: Tue May 26 16:14:31 2015 Stack: cman Current DC: nfs1 - partition with quorum Version: 1.1.11-97629de 3 Nodes configured 12 Resources configured Online: [ nfs1 nfs2 nfs3 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs1 nfs2 nfs3 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs1 nfs2 nfs3 ] nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs1 nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs1 nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs2 nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs2 nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 --- nfs2 Cluster name: ganesha-ha-360 Last updated: Tue May 26 16:50:44 2015 Last change: Tue May 26 16:14:31 2015 Stack: cman Current DC: nfs1 - partition with quorum Version: 1.1.11-97629de 3 Nodes configured 12 Resources configured Online: [ nfs1 nfs2 nfs3 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs1 nfs2 nfs3 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs1 nfs2 nfs3 ] nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs1 nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs1 nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs2 nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs2 nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 --- nfs3 Cluster name: ganesha-ha-360 Last updated: Tue May 26 16:50:45 2015 Last change: Tue May 26 16:14:31 2015 Stack: cman Current DC: nfs1 - partition with quorum Version: 1.1.11-97629de 3 Nodes configured 12 Resources configured Online: [ nfs1 nfs2 nfs3 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs1 nfs2 nfs3 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs1 nfs2 nfs3 ] nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs1 nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs1 nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs2 nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs2 nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 ganesha status, ================= nfs1 root 9737 1 0 16:13 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --- nfs2 root 7594 1 0 16:13 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --- nfs3 root 16607 1 0 16:13 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid Expected results: No error expected with command execution Additional info:
[root@nfs1 ~]# for i in 1 2 3 4 ; do ssh nfs$i "hostname"; ssh nfs$i "cat /etc/ganesha/ganesha-ha.conf"; echo "---"; done nfs1 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" --- nfs2 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" --- nfs3 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219"
Created attachment 1029971 [details] sosreport of node1
Created attachment 1029972 [details] sosreport of node2
Created attachment 1029973 [details] sosreport of node3
Created attachment 1029975 [details] sosreport of node4
infact post this ganesha.enable also fails [root@nfs1 ~]# gluster volume set vol2 ganesha.enable on volume set: failed: Commit failed on bed0a6f9-a6fa-49d2-995a-2a9e271040c5. Error: The option nfs-ganesha should be enabled before setting ganesha.enable.
I did the similar test and found the issue again. I had four node RHGS cluster, now I peer probed one more RHGS node, then after configuring nfs-ganesha pre-requisites for first four nodes, I executed gluster nfs-ganesha enable, the error seen is this, [root@nfs11 ~]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y This will take a few minutes to complete. Please wait .. nfs-ganesha: failed: Commit failed on 10.70.46.39. Please check log file for details. Post this I tried to add the node with IP 10.70.46.39 to nfs-ganesha cluster and it worked successfully as per pcs status. But, ganesha.enable on for any volume whether existing or newly created throws error. pcs status, Started: [ nfs11 nfs12 nfs13 nfs14 nfs15 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs11 nfs12 nfs13 nfs14 nfs15 ] nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs11 nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs11 nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs12 nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs12 nfs13-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs13 nfs13-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs13 nfs14-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs14 nfs14-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs14 nfs15-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs15 nfs15-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs15 ganesha.enable on newly created volume, [root@nfs11 ~]# gluster volume set vol3 ganesha.enable on volume set: failed: Commit failed on 10.70.46.39. Error: The option nfs-ganesha should be enabled before setting ganesha.enable. showmount on 4 nodes of RHGS cluster(nodes not having IP 10.70.46.39) [root@nfs11 ~]# showmount -e localhost Export list for localhost: /vol2 (everyone) /vol3 (everyone) showmount on 5 the node with IP 10.70.46.39 [root@nfs15 ~]# showmount -e localhost Export list for localhost: [root@nfs15 ~]# whereas ganesha on node having IP 10.70.46.39, [root@nfs15 ~]# ps -eaf | grep ganesha root 4059 1 0 22:23 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 4575 4474 0 22:24 pts/0 00:00:00 grep ganesha pcs status on node having IP 10.70.46.39, Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs11 nfs12 nfs13 nfs14 nfs15 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs11 nfs12 nfs13 nfs14 nfs15 ] nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs11 nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs11 nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs12 nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs12 nfs13-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs13 nfs13-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs13 nfs14-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs14 nfs14-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs14 nfs15-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs15 nfs15-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs15
Presently unexport of the volume is not working, [root@nfs11 ~]# gluster volume set vol3 ganesha.enable off volume set: failed: Commit failed on 10.70.46.39. Error: Dynamic export addition/deletion failed. Please see log file for details
Saurabh, please mount the shared volume on all the nodes and try again. Let me know what you find.
(In reply to Meghana from comment #13) > Saurabh, please mount the shared volume on all the nodes and try again. > Let me know what you find. yes, the error message is not seen if the mount of the shared volume is also done on the non-participating RHGS node. So, we may have to take a call how we want forward with this kind of scenario. [root@nfs11 ~]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y This will take a few minutes to complete. Please wait .. nfs-ganesha : success [root@nfs11 ~]# pcs status Cluster name: nozomer Last updated: Wed Jul 8 18:08:23 2015 Last change: Wed Jul 8 18:06:15 2015 Stack: cman Current DC: nfs11 - partition with quorum Version: 1.1.11-97629de 4 Nodes configured 16 Resources configured Online: [ nfs11 nfs12 nfs13 nfs14 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs11 nfs12 nfs13 nfs14 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs11 nfs12 nfs13 nfs14 ] nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs11 nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs11 nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs12 nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs12 nfs13-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs13 nfs13-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs13 nfs14-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs14 nfs14-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs14 [root@nfs11 ~]# [root@nfs11 ~]# [root@nfs11 ~]# gluster volume create vol3 replica 2 10.70.46.8:/rhs/brick1/d1r13 10.70.46.27:/rhs/brick1/d1r23 10.70.46.25:/rhs/brick1/d2r13 10.70.46.29:/rhs/brick1/d2r23 10.70.46.8:/rhs/brick1/d3r13 10.70.46.27:/rhs/brick1/d3r23 10.70.46.25:/rhs/brick1/d4r13 10.70.46.29:/rhs/brick1/d4r23 10.70.46.8:/rhs/brick1/d5r13 10.70.46.27:/rhs/brick1/d5r23 10.70.46.25:/rhs/brick1/d6r13 10.70.46.29:/rhs/brick1/d6r23volume create: vol3: success: please start the volume to access data [root@nfs11 ~]# gluster volume start vol3 volume start: vol3: success [root@nfs11 ~]# gluster volume status Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.25:/var/lib/glusterd/ss_bric k 49152 0 Y 26501 Brick 10.70.46.29:/var/lib/glusterd/ss_bric k 49152 0 Y 30649 Brick 10.70.46.8:/var/lib/glusterd/ss_brick 49152 0 Y 5420 Self-heal Daemon on localhost N/A N/A Y 9567 Self-heal Daemon on 10.70.46.25 N/A N/A Y 29814 Self-heal Daemon on 10.70.46.29 N/A N/A Y 1485 Self-heal Daemon on 10.70.46.39 N/A N/A Y 17104 Self-heal Daemon on 10.70.46.27 N/A N/A Y 32631 Task Status of Volume gluster_shared_storage ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.8:/rhs/brick1/d1r13 49153 0 Y 9451 Brick 10.70.46.27:/rhs/brick1/d1r23 49152 0 Y 32553 Brick 10.70.46.25:/rhs/brick1/d2r13 49153 0 Y 29739 Brick 10.70.46.29:/rhs/brick1/d2r23 49153 0 Y 1408 Brick 10.70.46.8:/rhs/brick1/d3r13 49154 0 Y 9469 Brick 10.70.46.27:/rhs/brick1/d3r23 49153 0 Y 32576 Brick 10.70.46.25:/rhs/brick1/d4r13 49154 0 Y 29757 Brick 10.70.46.29:/rhs/brick1/d4r23 49154 0 Y 1426 Brick 10.70.46.8:/rhs/brick1/d5r13 49155 0 Y 9487 Brick 10.70.46.27:/rhs/brick1/d5r23 49154 0 Y 32599 Brick 10.70.46.25:/rhs/brick1/d6r13 49156 0 Y 29785 Brick 10.70.46.29:/rhs/brick1/d6r23 49155 0 Y 1454 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 9567 NFS Server on 10.70.46.39 N/A N/A N N/A Self-heal Daemon on 10.70.46.39 N/A N/A Y 17104 NFS Server on 10.70.46.29 N/A N/A N N/A Self-heal Daemon on 10.70.46.29 N/A N/A Y 1485 NFS Server on 10.70.46.25 N/A N/A N N/A Self-heal Daemon on 10.70.46.25 N/A N/A Y 29814 NFS Server on 10.70.46.27 N/A N/A N N/A Self-heal Daemon on 10.70.46.27 N/A N/A Y 32631 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks [root@nfs11 ~]# gluster volume set vol3 ganesha.enable on volume set: success [root@nfs11 ~]# showmount -e localhost Export list for localhost: /vol3 (everyone) [root@nfs11 ~]# [root@nfs11 ~]# [root@nfs11 ~]# gluster volume set vol3 ganesha.enable off volume set: success [root@nfs11 ~]# [root@nfs11 ~]# showmount -e localhost Export list for localhost: [root@nfs11 ~]#
Well, but one more thing is that ganesha is also up on the non-participating node, [root@nfs15 ~]# ps -eaf | grep ganesha root 16994 1 0 18:04 ? 00:00:04 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 18483 4474 0 19:36 pts/0 00:00:00 grep ganesha [root@nfs15 ~]# pcs status Error: cluster is not currently running on this node So this is definitely a issue, as ganesha is not suppose to be on this one. infact showmount is also working on this non-participating one, [root@nfs15 ~]# showmount -e localhost Export list for localhost: /vol3 (everyone)
Proposing this as a blocker because of https://bugzilla.redhat.com/show_bug.cgi?id=1225010#c15
Hi Saurabh, You did have the non-participating node in the config file. Please make the required changes and update the bug.
[root@rhs1 ganesha]# ./ganesha-ha.sh --status grep: /ganesha-ha.conf: No such file or directory grep: /ganesha-ha.conf: No such file or directory grep: /ganesha-ha.conf: No such file or directory Error: cluster is not currently running on this node status should not expect ganesha-ha.conf's path. That's why this bug is opened.
(In reply to Meghana from comment #18) > [root@rhs1 ganesha]# ./ganesha-ha.sh --status > grep: /ganesha-ha.conf: No such file or directory > grep: /ganesha-ha.conf: No such file or directory > grep: /ganesha-ha.conf: No such file or directory > Error: cluster is not currently running on this node > > > status should not expect ganesha-ha.conf's path. That's why this bug is > opened. I think you have updated the wrong BZ.
(In reply to Meghana from comment #17) > Hi Saurabh, > > You did have the non-participating node in the config file. Please make the > required changes and update the bug. Only the non-participating node was having the config file updated with it's VIP and nodename, whereas other participating nodes were not. Post discussion with dev, I got to know that CLI "gluster nfs-ganesha enable" goes on all nodes whether participating or non-participating for ganesha cluster. Earlier I was not having this information. Post deletion of the ganesha-ha.conf file from the non-participating node, the result is as expected, [root@nfs11 ~]# for i in `seq 11 15`; do echo nfs$i; ssh -i /var/lib/glusterd/nfs/secret.pem nfs$i "ps -eaf | grep ganesha" ; echo "---------------"; done nfs11 root 28534 1 0 20:11 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 30609 2311 5 20:14 pts/0 00:00:00 ssh -i /var/lib/glusterd/nfs/secret.pem nfs11 ps -eaf | grep ganesha root 30613 30610 2 20:14 ? 00:00:00 bash -c ps -eaf | grep ganesha root 30619 30613 0 20:14 ? 00:00:00 grep ganesha --------------- nfs12 root 18304 1 0 20:11 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 19779 19776 3 20:14 ? 00:00:00 bash -c ps -eaf | grep ganesha root 19785 19779 0 20:14 ? 00:00:00 grep ganesha --------------- nfs13 root 15456 1 0 20:11 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 16926 16920 3 20:14 ? 00:00:00 bash -c ps -eaf | grep ganesha root 16932 16926 0 20:14 ? 00:00:00 grep ganesha --------------- nfs14 root 19632 1 0 20:11 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 21099 21096 3 20:14 ? 00:00:00 bash -c ps -eaf | grep ganesha root 21105 21099 0 20:14 ? 00:00:00 grep ganesha --------------- nfs15 root 19497 19494 0 20:14 ? 00:00:00 bash -c ps -eaf | grep ganesha root 19503 19497 0 20:14 ? 00:00:00 grep ganesha --------------- Still the question remains open mount or no mount on all nodes of RHGS even if some of them are non participating for ganesha cluster and documentation related to ganesha-ha.conf presence on non participating node.
Shared storage volume needs to be mounted on all the nodes before configuring nfs-ganesha cluster. Raised bug1241773 to document the same in the admin guide. Closing this bug as this issue will not be seen when followed this step.